OpenAddresses contains a large repository of addresses, 282,064,595 in total. For a while I’ve checked the growing list of included sources. I want to take a look at just the US sources and ask how many people are covered by the current set of data?
This also gave me a chance to try building a swift command line tool with the Swift Package Manager. OpenAddressesCensus is a simple tool that looks at a census csv file and the OpenAddresses repository and compares which counties have exact coverage, coverage from a state source or no coverage. If you’re interested on how it works check out the repository. One big assumption the tool makes a state level source contains all addresses in that state. It’s possible that a state source is incomplete and excludes some counties. I also manually marked the New York City counties as covered because they are so large and covered in a city level source.
The best I can do right now looking at county level data is find a lower bounds on the population covered. For example I noticed there are a lot of counties in New York that are not marked as covered because there is no geoid in the New York State source. Although it appears the New York State source does not cover all counties. There are also quite a few city level sources that OpenAddressesCensus doesn’t handle yet. You can see Detroit, Austin are all big cities that aren’t counted in my population numbers yet.
So with that all in mind here are the results:
That puts the lower bounds at 77.7% of the US population covered. That’s a lot better than I expected.
The 10 biggest counties missing are below. Every one of them has an own issue or is mostly covered by a city in that county. You can see the full list of missing counties in the OpenAddressesCensus results.
- Wayne County, Michigan (partial coverage in the Detroit source)
- Suffolk County, New York (#579)
- Nassau County, New York (#1990)
- Travis County, Texas (partial coverage in the Austin source)
- Gwinnett County, Georgia (#2060)
- Pierce County, Washington (Related #1947)
- Montgomery County, Pennsylvania (#1982 & #1979)
- Oklahoma County, Oklahoma (#192)
- Cobb County, Georgia (#456)
- DeKalb County, Georgia (#460)
I want to make this tool more accurate. Right now it’s limited to county level population data. I could add city level population data and handle sources like the New York State that don’t have a geoid but rather a geometry. In both cases the tool needs to support finding the union of geometries covered and then finding the population within that geometry. That way it doesn’t double count overlapping sources from a city and a intersecting county.