My first R package: zipcode

You may know that I am a fan of the CivicSpace US ZIP Code Database compiled by Schuyler Erle of Mapping Hacks fame. It contains nearly 10,000 more records than the ZIP Code Tabulation Areas file from the U.S. Census Bureau upon which it is based, so a lot of work has gone into it.

I have been using the database a lot recently to correlate with survey respondents, so I have saved it as an R data.frame. Since others may find it useful, too, I have packaged it into the ‘zipcode’ package now available on CRAN.

One you load the package, the database is available in the ‘zipcode’ data.frame:

> library(zipcode)
> data(zipcode)

> nrow(zipcode)
[1] 43191

> head(zipcode)
    zip       city state latitude longitude timezone  dst
1 00210 Portsmouth    NH 43.00590  -71.0132       -5 TRUE
2 00211 Portsmouth    NH 43.00590  -71.0132       -5 TRUE
3 00212 Portsmouth    NH 43.00590  -71.0132       -5 TRUE
4 00213 Portsmouth    NH 43.00590  -71.0132       -5 TRUE
5 00214 Portsmouth    NH 43.00590  -71.0132       -5 TRUE
6 00215 Portsmouth    NH 43.00590  -71.0132       -5 TRUE

Note that the ‘zip’ column is a string, not an integer, in order to preserve leading zeroes — a sensitive topic for those of us in the Northeast… :)

The package also includes a clean.zipcodes() function to help clean up zip codes in your data. It strips off “ZIP+4″ suffixes, attempts to restore missing leading zeroes, and replaces anything with non-digits (like non-U.S. postal codes) with NAs:

> library(zipcode)
> data(zipcode)

> somedata = data.frame(postal = c(2061, "02142", 2043, "20210", "2061-2203", "SW1P 3JX", "210", '02199-1880'))
> somedata
      postal
1       2061
2      02142
3       2043
4      20210
5  2061-2203
6   SW1P 3JX
7        210
8 02199-1880

> somedata$zip = clean.zipcodes(somedata$postal)
> somedata
      postal   zip
1       2061 02061
2      02142 02142
3       2043 02043
4      20210 20210
5  2061-2203 02061
6   SW1P 3JX  <NA>
7        210 00210
8 02199-1880 02199

> data(zipcode)
> somedata = merge(somedata, zipcode, by.x='zip', by.y='zip')
> somedata
    zip     postal       city state latitude longitude timezone  dst
1 00210        210 Portsmouth    NH 43.00590 -71.01320       -5 TRUE
2 02043       2043    Hingham    MA 42.22571 -70.88764       -5 TRUE
3 02061       2061    Norwell    MA 42.15243 -70.82050       -5 TRUE
4 02061  2061-2203    Norwell    MA 42.15243 -70.82050       -5 TRUE
5 02142      02142  Cambridge    MA 42.36230 -71.08412       -5 TRUE
6 02199 02199-1880     Boston    MA 42.34713 -71.08234       -5 TRUE
7 20210      20210 Washington    DC 38.89331 -77.01465       -5 TRUE

Now we wouldn’t be R users if we didn’t try to do something with data, even if it’s just a lookup table of zip codes. So let’s take a look at how they’re distributed by first digit:

library(zipcode)
library(ggplot2)

data(zipcode)
zipcode$region = substr(zipcode$zip, 1, 1)

g = ggplot(data=zipcode) + geom_point(aes(x=longitude, y=latitude, colour=region))

# simplify display and limit to the "lower 48"
g = g + theme_bw() + scale_x_continuous(limits = c(-125,-66), breaks = NA)
g = g + scale_y_continuous(limits = c(25,50), breaks = NA)

# don't need axis labels
g = g + labs(x=NULL, y=NULL)

If we make the points smaller, cities and interstates are clearly visible, at least once you leave the Northeast Megalopolis:

About these ads

10 Responses to “My first R package: zipcode”

  1. Vincent Says:

    Very cool! Thanks for posting.

  2. Jerome Says:

    This is really neat. Great concise description of your package and very cool, quick example.

    Thank you!

  3. Arne Hendrik Schulz » Blog Archiv » Postleitzahlen visualisiert Says:

    […] sämtliche Postleitzahlen und Koordinaten der USA hinterlegt sind. Passend dazu findet sich in eine nette kleine Visualisierung. Sowas müsste sich doch auch für Deutschland realisieren lassen. Und siehe da: Es geht! Die […]

  4. Trevor Says:

    Thanks, I found this useful!

  5. Ricky Says:

    Great package, thanks!
    Very useful

  6. Eva Says:

    I think I would like your package even more if I would know how to apply this. I’m pretty new to R but I’m taking a course in university where I have to make geographical maps like yours. So I want to map my zipcodes with the help of your package (my zips are from the US as well). could you maybe post the code on how I could get my zips on a map?I’ve been trying for over 2 days to get this working but I’m not an IT person so R is like hieroglyphics to me. Since this would be the first step out of many I would really appreciate some help.
    Thanks a lot

  7. Harmeek Hans Says:

    Wondering how do you change the point size and if its possible to add Country and state boundaries to this plot?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 63 other followers

%d bloggers like this: