Opening MapNYC Data

We’ve been analyzing MapNYC all week. You know, with graphics like this:

 

Places Created

Start September 24

End October 22

We started with a time-lapse map, continued by encouraging category and neighborhood data collection, explored how we iterated on the product to improve data accuracy, and finally talked about contest dynamics and how they’ll inform an eventual protocol.

Today? Well, it’s been a long week and we’re honestly a bit tired. So we’re going to take a break from our own analysis and open up the MapNYC dataset for other people to use and analyze.

The Fun Part: Data Licensing

The Linux Foundation released the Community Data License Agreement in late 2017. It’s a new open data license with two versions: “sharing” and “permissive”. We’re big fans of the work they’ve done, in particular how they worked with companies to ensure this license made sense to legal teams working with real-world commercial applications.

StreetCred will likely use both the permissive and the sharing versions of the CDLA in the future, possibly in a dual-license scenario. For now, we’re releasing MapNYC data under the CDLA - Sharing - Version 1.0 license. In a nutshell, this means that you can use this data if you provide attribution to StreetCred and if you contribute back any improvements you make. We’re also allowing this data to be used by OpenStreetMap and distributed under the ODbL, although we have no plans to license data under the ODbL ourselves or engage in any imports into OpenStreetMap.

We’re looking at dual licensing as a key part of a sustainable business model, and believe that open, accessible data will be the best data. We also understand well that companies need permissive licenses to build their products, and those companies should have an incentive to fund real-time POI data. We’ll write much more on this in the future.

MapNYC Data

burger-burger.jpg

If you read our earlier post Improving Data Accuracy, you’ll remember that we iterated rapidly on the MapNYC app, mostly in response to incoming data quality. We ended the contest collecting better data than we started with. It’s all in here :)

The MapNYC community did a great job creating content from scratch and on location. Data creation required participants to be physically present, and multiple others visited the spot to validate or reject. We were happy to see that a few cases of spam or copyrighted images were rejected quickly by the validation process. Our product is designed to discourage copy/pasted data from copyrighted sources like Google Maps or OpenStreetMap, and that’s how we can offer a clear and usable license here. Questions on this? Let us know.

No external POI sources were used in the MapNYC product, whether in map display or in search results. Geocoding data comes from our very own instance of Pelias, which was loaded only with OpenAddresses and the Who’s on First gazetteer.

Privacy Matters

We are not publishing user data, including IDs and create/update timestamps. We’re working on ways to share data freshness in the future without compromising user privacy, but for now we’re being cautious.

Where’s the Data?

MapNYC data is available here on Github. You might also be interested in our categories project, which we used to create MapNYC data. Enjoy!

What’s Next?

Right now, we’re working on what’s next after MapNYC. We know we want to improve on existing datasets, and we want to work with partners toward this goal. We learned a lot here and built some encouraging momentum: reach out if you’d like to discuss what we’re building or want to work together on better POI data!

Randy Meech