You may know that I am a fan of the CivicSpace US ZIP Code Database compiled by Schuyler Erle of Mapping Hacks fame. It contains nearly 10,000 more records than the ZIP Code Tabulation Areas file from the U.S. Census Bureau upon which it is based, so a lot of work has gone into it.
I have been using the database a lot recently to correlate with survey respondents, so I have saved it as an R data.frame. Since others may find it useful, too, I have packaged it into the ‘zipcode’ package now available on CRAN.
One you load the package, the database is available in the ‘zipcode’ data.frame:
> library(zipcode) > data(zipcode) > nrow(zipcode) [1] 43191 > head(zipcode) zip city state latitude longitude timezone dst 1 00210 Portsmouth NH 43.00590 -71.0132 -5 TRUE 2 00211 Portsmouth NH 43.00590 -71.0132 -5 TRUE 3 00212 Portsmouth NH 43.00590 -71.0132 -5 TRUE 4 00213 Portsmouth NH 43.00590 -71.0132 -5 TRUE 5 00214 Portsmouth NH 43.00590 -71.0132 -5 TRUE 6 00215 Portsmouth NH 43.00590 -71.0132 -5 TRUE
Note that the ‘zip’ column is a string, not an integer, in order to preserve leading zeroes — a sensitive topic for those of us in the Northeast… 🙂
The package also includes a clean.zipcodes()
function to help clean up zip codes in your data. It strips off “ZIP+4” suffixes, attempts to restore missing leading zeroes, and replaces anything with non-digits (like non-U.S. postal codes) with NAs:
> library(zipcode) > data(zipcode) > somedata = data.frame(postal = c(2061, "02142", 2043, "20210", "2061-2203", "SW1P 3JX", "210", '02199-1880')) > somedata postal 1 2061 2 02142 3 2043 4 20210 5 2061-2203 6 SW1P 3JX 7 210 8 02199-1880 > somedata$zip = clean.zipcodes(somedata$postal) > somedata postal zip 1 2061 02061 2 02142 02142 3 2043 02043 4 20210 20210 5 2061-2203 02061 6 SW1P 3JX <NA> 7 210 00210 8 02199-1880 02199 > data(zipcode) > somedata = merge(somedata, zipcode, by.x='zip', by.y='zip') > somedata zip postal city state latitude longitude timezone dst 1 00210 210 Portsmouth NH 43.00590 -71.01320 -5 TRUE 2 02043 2043 Hingham MA 42.22571 -70.88764 -5 TRUE 3 02061 2061 Norwell MA 42.15243 -70.82050 -5 TRUE 4 02061 2061-2203 Norwell MA 42.15243 -70.82050 -5 TRUE 5 02142 02142 Cambridge MA 42.36230 -71.08412 -5 TRUE 6 02199 02199-1880 Boston MA 42.34713 -71.08234 -5 TRUE 7 20210 20210 Washington DC 38.89331 -77.01465 -5 TRUE
Now we wouldn’t be R users if we didn’t try to do something with data, even if it’s just a lookup table of zip codes. So let’s take a look at how they’re distributed by first digit:
library(zipcode) library(ggplot2) data(zipcode) zipcode$region = substr(zipcode$zip, 1, 1) g = ggplot(data=zipcode) + geom_point(aes(x=longitude, y=latitude, colour=region)) # simplify display and limit to the "lower 48" g = g + theme_bw() + scale_x_continuous(limits = c(-125,-66), breaks = NA) g = g + scale_y_continuous(limits = c(25,50), breaks = NA) # don't need axis labels g = g + labs(x=NULL, y=NULL)
If we make the points smaller, cities and interstates are clearly visible, at least once you leave the Northeast Megalopolis:
January 5, 2011 at 7:19 PM
Very cool! Thanks for posting.
January 5, 2011 at 8:45 PM
Thanks, Vincent.
Enjoy!
Jeffrey
January 6, 2011 at 2:12 AM
This is really neat. Great concise description of your package and very cool, quick example.
Thank you!
January 6, 2011 at 1:36 PM
Thanks Jerome! I hope you find it useful.
January 6, 2011 at 1:16 PM
[…] sämtliche Postleitzahlen und Koordinaten der USA hinterlegt sind. Passend dazu findet sich in eine nette kleine Visualisierung. Sowas müsste sich doch auch für Deutschland realisieren lassen. Und siehe da: Es geht! Die […]
February 9, 2012 at 2:41 AM
Thanks, I found this useful!
September 7, 2012 at 3:07 PM
Great package!
October 11, 2012 at 8:12 PM
Great package, thanks!
Very useful
October 12, 2012 at 7:47 AM
I think I would like your package even more if I would know how to apply this. I’m pretty new to R but I’m taking a course in university where I have to make geographical maps like yours. So I want to map my zipcodes with the help of your package (my zips are from the US as well). could you maybe post the code on how I could get my zips on a map?I’ve been trying for over 2 days to get this working but I’m not an IT person so R is like hieroglyphics to me. Since this would be the first step out of many I would really appreciate some help.
Thanks a lot
April 3, 2013 at 12:27 PM
This package is very useful! Thanks for your contribution Jeffrey.
http://kevinldavenport.info
May 6, 2013 at 3:06 PM
I had to do a very quick plot of data on map, given US zipcodes. I searched for method in Excel, SAS and Matlab and this one turned out to be simple and efficient. Thanks for the posting.
October 9, 2013 at 3:28 PM
Wondering how do you change the point size and if its possible to add Country and state boundaries to this plot?
February 12, 2015 at 1:34 PM
Works perfectly. Thank you.