Fishing for POIs

Chris Shughrue - September 18, 2019

The StreetCred community created, validated, and enriched more than 25,000 places during MapNYC and MapLA. On the face of it, this is a lot of places. But it begs the question, how many more have yet to be mapped? How much further do we need to go to map not just a lot of places, but all the places?

One of StreetCred’s strengths is that we enable our community to add data about interesting places anytime, anywhere. But, decentralizing the data collection process introduces one fundamental challenge: it is difficult to know if and when we have exhaustively added all the places in an area.

POIs as Populations

Here we take a cue from wildlife ecologists who have solved exactly this problem—more or less. Imagine we want to know the size of a fish population in a particular pond. It’s generally not possible (or desirable) to exhaustively seek out every single fish. Looking under every rock and behind every blade of grass would take an impossible amount of time—in addition to destroying the pond in the process.

Instead, ecologists use information about the difficulty of fishing over time to estimate the population (known as the Leslie Depletion method). At the beginning of a fishing season, the population is large and catching many fish is easy. As the population is winnowed through fishing, it takes increasingly more effort to catch the same number of fish. Building on this logic, the fish population can be quantified by the relationship between effort and catch over successive fishing trips.

Models like this have been applied to fisheries, deer population, and now POIs.

By analogy, as an area in a city is increasingly well-mapped, the rate at which new places are added will decrease (eventually to zero when it is fully mapped). We implement this concept to estimate the number of POIs waiting to be mapped.

Mapping Completeness

To illustrate how this works, let’s look at a few hexagonal areas in LA and NYC. These 184 hexagons are the most heavily mapped areas over MapLA and MapNYC, making them the best case studies for calculating completeness (Figure 1). By breaking these predictions into discrete areas (rather than aggregating to the city-level), we can focus in on specific areas where more mapping needs to happen. These hexagons are populated by approximately 24,000 POIs—roughly 21,000 and 3,000 in NYC and LA, respectively.

Figure 1. Completeness estimate of hexagonal areas in NYC (top) and LA (bottom) where completeness is represented by greenness.

Our adaptation of the Leslie model estimates these hexagons contain a total of ~4,100 and ~40,000 in LA and NYC, respectively. This breaks down to overall average completeness scores of 79% and 52% in LA and NYC.

Spreading Out vs Honing In

Not all hexes have been mapped equally. If we look more closely within each city, two important differences emerge: NYC has been more expansively mapped, but hexes in LA have been more exhaustively mapped.

Figure 2. Distribution of hexagon completeness by city. Nearly-complete hexagons are more abundant in LA than NYC, which has a more even spread of hexagon completeness.

Twice as many hexagons in NYC have been mapped than in LA. Place data in NYC spans the five boroughs, whereas the most well-mapped part of LA are concentrated in dense pockets, such as downtown.

Many areas in NYC are not as thoroughly mapped as in LA, though. Nearly-complete hexagons are more abundant in LA (figure 2). This indicates that mapping efforts in NYC are spread out, while in LA, the community has focused on thoroughly mapping contained segments of the city.

These results suggest an apparent tradeoff between different mapping strategies. NYC is relatively accessible and has a high density of POIs throughout, allowing the community to lay out POIs in many areas across the region. On the other hand, LA is more sparse, leading our mappers to focus in on the few dense areas where they can be most effective.

Insights like these will help our community more easily identify areas most in need of mapping. Our modeling approach lays the groundwork to continuously assess where our best place data is situated within and among cities, where we think more effort would yield results, and ultimately how to map all the places.