Reading the Signs

Chris Shughrue - July 1, 2020

As StreetCred players compete to map their cities, they submit photos of storefronts, menus, and other signs. By automatically extracting information from these signs, we can enhance our understanding of the data, from validating the names of places to double-checking their hours and addresses.

A picture is worth 1000 words (in theory)

Extracting meaningful text from pictures automatically is easier said than done. Pictures of signs aren’t always ideal for computer vision, and Indonesian menus and storefronts in particular have various idiosyncrasies that make them fun to look at yet challenging to automatically process.

Below are some fun (and puzzling) textual finds from a sample of player-contributed images. Learning from these missteps will help us improve our models to extract more meaningful information.

Feast for the eyes

Nasi Goreng Menu

Indonesian specialties abound on this menu in Jakarta. Optical character recognition (OCR) models used by our text recognition process have been trained on a variety of languages, including English and Indonesian, so we can always find food for thought in player-submitted photos.

Ice Cream Menu

Word art is prevalent in signs throughout Indonesia, which presents its own challenges. The text recognition system extracts block-lettered words like “ICE CREAM.” However, distorted, embossed lettering prevents us from getting a full taste of this menu.

Working it out

Fitness Sign

Though this sub-section of a player-contributed photo has relatively low resolution, the algorithm managed to find every workout imaginable. “Yogalates” and “Zumba” might not have been the words we expected to pull out of our Indonesia data, but we’re glad to have options for burning off the fried rice.

Window of opportunity

Window Sign

This sign contains lots of useful information, including name, phone number, website, and even a catchy slogan. But getting the full picture is made challenging by the glare from the window and the stylized, heavily-spaced lettering.

(Not) getting the picture

Clot Promotion Sign

Ultimately, computer vision provides an incomplete view of place information, even with the best technology in place. This advertised “PROMOTIONAL CLOT” serves as a reminder that even with perfected computer vision, nothing adds as much value as the community of players we’re building who bring the rich contexts of their cities to the map.

How we see it

Our text processing is built on an open-source foundation. We identify text in each scene using Efficient and Accurate Scene Text (EAST) detection. Text is re-projected and processed using a series of StreetCred computer vision filters and transformations. Text values within the processed images are predicted by Tesseract, an open-source OCR software. Finally, place attributes are reconstructed from the word soup OCR output using a probabilistic approach.

Our journey into automated data extraction is just beginning, but we’re already excited about the possibilities for using computer vision to improve data quality and make the most of player contributions.