Improving Data Accuracy

Diana Shkolnikov - October 31, 2018

We believe that unless you have a wide, diverse audience creating the map, your data will fail to represent reality. We also value POI data created on location by users, not borrowed from the many existing sources on the internet. These goals guided our UI choices, and we learned a lot over the course of MapNYC, and iterated rapidly.

We launched the contest with minimal data requirements for each created place. All we required was the place’s name, category, and an original image taken at the location. Participants were able to add many other fields in the creation process, but those fields were optional and we pre-filled as much of the data as we could using device location, reverse geocoding for address lookup, and sensible default values for most attributes.

As data started to flow in at a (surprisingly) steady rate, we began to adjust the user experience gradually to guide our participants toward higher-quality data. While many users were extremely detail-oriented and made sure to update every field in the creation process, some others accidentally created erroneous or incomplete data. From our initial launch on September 24, we continuously iterated on the user experience to ensure participants were creating the most accurate and complete data possible.

“Pin Droppers” vs. “Address Choosers”

We launched MapNYC with an empty map. We did this because we wanted to test data collection from scratch, but it posed a problem for users. Normally we use nearby places to orient ourselves on a map, but we started with a blank slate. We knew this would be challenging to navigate for early contributors and wanted to ensure they had a positive experience. So we gathered location information from the user’s mobile device and used it to set the location of the created place. We also looked up the likely address of the place using the Pelias geocoding engine seeded with data from OpenAddresses.

From our early tests, we expected users to drag and drop a pin to specify their locations, a concept introduced in apps like Uber and Lyft. We provided a list of nearby addresses to choose from as well. We were surprised by user behavior here, and learned that the majority were not dragging the pin at all, but were selecting the correct address instead. When the device location was accurate, the resulting data was pretty amazing! Unfortunately, GPS and tall concrete buildings are not a good mix and we started to see data with incorrect placement of the pin indicating its location.

In some cases, our users were diligent and manually selected the correct address. However, that did not affect the pin placement by design: we didn’t want to override user knowledge of their actual location with an approximation from a geocoder algorithm. But we stored the geocoded approximation for good measure and began comparing them where possible. We saw a significant number of records that could be made near-perfect just by using the geocoder-derived location instead of the location reported by the mobile device of the user. In those cases we made the corrections and it made a tremendous difference!

While the geocoded locations proved beneficial in some cases, we needed to fix the problem in the app instead of patching it up after the fact. We realized we had two types of users: those who interacted with the pin (“pin droppers”), and those who selected addresses (“address pickers“). Pin droppers produced records with much more accurate locations. So we needed to drive more users to move the pin, first and foremost. In the second half of the contest, we changed the user experience to require the user to actively position the pin on the map, even if the adjustment was very slight. This made users aware that moving the pin is possible and forced them to take stock of where they were actually located compared to the pin on the map. Once we released the app update, an increasingly more accurate map began to emerge.

Location Selection: Before Location Selection: Before

There is still a lot more we can do to improve the user experience and help our contributors create the most accurate location data possible. We will continue to iterate on this interface and look forward to future user feedback.

Would you look at the time?

We all know the devastation of showing up at a salon or coffee shop only to find it not yet open or worse, closed for the day… Day. Ruined.

Operating hours are a critical component of a complete place dataset and our goal is to achieve significant coverage of this. After launching the MapNYC app, we quickly learned that the original hours chooser made entering hours somewhat confusing and time consuming enough that many users just blew past it leaving the field blank. It also became clear from early user behavior that if we required business hours for each record without making changes to the user experience, we would lose many valuable contributions. We made the tradeoff of launching the app with as few required fields as necessary. So we set off to make it a real delight (almost) to enter business hours. After implementing a few helpful tweaks and shortcuts, we felt more comfortable requiring hours as part of the data creation process, and shipped this to users in the latter half of MapNYC.

Hours Chooser: Before Hours Chooser: After

With bated breath we watched as users began to comply with the new requirement of entering operating hours for businesses. As we had hoped, creation rate was not impacted by this shift but the newly created data was at least a bit closer toward our goal of complete coverage of hours data.

Yes, No, ¯\_(ツ)_/¯

We collected a number of properties with YES/NO values, such as wheelchair accessibility and outdoor seating. As mentioned earlier, we wanted to minimize friction during the creation process and do some of the contributor’s work for them upfront. So we decided to default these binary attributes to NO and assumed the user would take the appropriate action of flipping the value to YES when necessary…

Again, we witnessed some diligent participants meticulously set each attribute to the correct value, while others completely blew past them without even noticing that the default value was incorrect. Once again, we fixed it by slightly increasing the amount of work and attention required during the creation workflow. We updated the interface to default YES/NO attributes to be unset and insisted that the record could not be created until the user intentionally selected a value.

Wheelchair Access: Before Wheelchair Access: After

Of course this alone doesn’t guarantee the selected value’s correctness, but we would leave that task to our validator role. The goal here was to ensure the data creator’s true intention was reflected in the record they created, with no missed attributes pre-filled with default values.

We knew the work of creating a new map from scratch would be demanding and wanted to ensure that our earliest adopters were not deterred by unnecessary friction in the user experience. In the short four-week duration of MapNYC, we learned from our users how to enforce more requirements without driving excited participants away with complexity. In turn, our users stepped up their creation game and continued to impress us with their attention to detail and dedication to mapping the city they love, and to climbing the leaderboard (get excited for tomorrow’s post on that!). These learnings will undoubtedly translate into an improved user experience in future iterations of our mobile app as well as the underlying decentralized protocol we’re ultimately working towards.