Nightlights: cool data, bad geocoding

A global source of population density has been on my low-priority wish list for some time, so I was very excited when I found Steve Mosher’s work with the Nighlights data set. “Nightlights” refers to the artificial lights seen from space at night. Astronomers call it “light pollution” which is pretty accurate since it’s decidedly not the light which we all use to see and avoid peril at night. Rather, it’s the light (and energy) wasted in that pursuit by being emitted or reflected straight up into the sky.

Steve has since spent some quality time with other R packages like Rgooglemap exploring this data set and has noticed some problems with the geocoding of the nightlights data.  I noticed the same thing, though much more naively, just trying to check out the data around my home:

library(RCurl)
library(R.utils)
library(rgdal)
library(raster)

url_radianceCalibrated = "ftp://ftp.ngdc.noaa.gov/DMSP/web_data/x_rad_cal/rad_cal.tar"
calibratedLights = "rad_cal.tar"
hiResTif = "world_avg.tif"

download.file(url_radianceCalibrated,calibratedLights,mode="wb")
untar(calibratedLights)
gunzip(paste(hiResTif, '.gz', sep='')

hiResLights = raster("world_avg.tif")

# Eastern Mass., Cape Cod & Islands:
e = extent(-71.5, -69.5, 41, 43)

r = crop(hiResLights,e)
plot(r)

which looks amazing… right up until you overlay the county boundaries from the standard ‘maps’ package:

library(maps)
map("county",xlim=c(-71.5,-69.5),ylim=c(41,43),add=T)

Alas. To my eye, there’s a clear shift to the northwest (see Provincetown at the tip of Cape Cod), and perhaps a bit of a clockwise rotation as well (see the big bulge of Cape Ann north of Boston).

Newer = better… ?

I have a lot to learn about this data, but in my poking around, I did find more recent observations available on a “Version 4 DMSP-OLS Nighttime Lights Time Series” page. But warning — these files are big. The tar file I download next is over 350MB:

url = "http://www.ngdc.noaa.gov/dmsp/data/web_data/v4composites/F152007.v4.tar"
dest = "F152007.v4.tar"
tif = "F152007.v4b_web.stable_lights.avg_vis.tif"

download.file(url, dest)
untar(dest)
gunzip( paste(tif, '.gz', sep='') )

f15 = raster(tif)
e = extent(-71.5, -69.5, 41, 43)
r = crop(f15, e)
plot(r)
map("county",xlim=c(-71.5,-69.5),ylim=c(41,43),add=T)

which looks a lot better, though still probably not perfect.

ggplot2 with raster

I am a huge fan of ggplot2, but since this was my first exposure to the raster package, I just copied and pasted Steve’s code to make the plots. But I couldn’t help myself to try to reproduce them in ggplot2.

Getting your data into a data.frame is the key to using ggplot2. Fortunately, the raster package includes a rasterToPoints() function which outputs a list which is easily cast to a data.frame:

p = rasterToPoints(r)
df = data.frame(p)
colnames(df) = c("lon", "lat", "dn")

which makes the actual plotting so easy, even qplot() will do it:

library(ggplot2)
qplot(lon, lat, color=dn, data=df) 
  + scale_colour_gradient(low="black", high='white', transform='sqrt') 
  + theme_bw() + borders("state", colour="yellow") 
  + xlim(c(-71.5, -69.5)) + ylim(c(41, 43))

The only technical glitch is in the overlay, as zooming in truncates the northernmost coastline points but the geom_polygon() layer created by borders() seems compelled to close the shape and connects the northern Mass. coast with Rhode Island.

12 Responses to “Nightlights: cool data, bad geocoding”

  1. Steven Mosher Says:

    Very cool.

    a couple of us have been trying to figure out the geocoding errors with Nightlights. It may come down to a rounding issue. there appears to be a 1 cell shift in certain areas,

    The other data you refer to is a product I hope to look at, but for now I am sticking with the radiance calibrated lights to finish this work

    When I’m done I’ll post all the code.

    Also, population density at 1km is available from 3 sources. I’ll be covering that in future posts.

    • Jeffrey Breen Says:

      Hi Steve:

      Thanks for visiting and commenting. I have enjoyed your posts on this topic and look forward to more.

      I am particularly interested in any sources of population density you can recommend since I would like to use it as to compare the noise impact of various approach and departure procedures to airports. NextGen’s satellite-based navigation significantly increases the freedom to design new approaches; a readily-available source of population density could make it easier to be considerate of those on the ground.

      Thanks again,
      Jeffrey

      • Steven Mosher Says:

        Ok do you have specific region you are thinking about?

        Global datasets are available ( google GPW) and there is GRUMP and a third ( I’ll remeber the name)

        Then also you’ll want to look perhaps At NCEP reanalysis for historical wind reconstructions. I’m not sure of the accuracy there

      • Steven Mosher Says:

        if you like you can also just go from Nights to population density using Imoff’s catagories.

        That is, the nightlights are cut into 7 categories, each corresponding to a different density of people per square ha.

  2. Robert Says:

    about the data shift:
    The county data in the maps package are somewhat generalized and imprecise. For better comparisons, here is much more precise coastline (border) of the USA (but the shift remains)

    library(raster_
    usa <- getData('GADM', country="USA", level=0)

    One minor thing to correct for imprecision because gdal reports the upper left corner and the resolution:

    xmax(hiResLights) = 180
    ymin(hiResLights) = -90

  3. Ben Bolker Says:

    I believe you could use
    scale_x_continuous(limits=c(41, 43))
    instead of xlim(c(41,43)) to get the axes truncated without the artifacts …

    • Jeffrey Breen Says:

      Hi Ben:

      Thanks for visiting and for the suggestion. Unfortuntely, that didn’t make a difference for me. The docs say that xlim(…) and ylim(…) are just shorthand for scale_x_continuous(limits=…) and scale_y_continuous(limits=…).

      This code after that from the help page for borders() seems to fix the artifacts:

      ma = map_data(“county”, “massachusetts”)
      ggplot() + geom_polygon(data=ma, aes(long,lat,group=group), color=”yellow”) + xlim(c(-71.5, -69.5)) + ylim(c(41, 43))

      But I think Robert (comment above) nails it by using higher quality coastline data.

      Thanks,
      Jeffrey

  4. Incremental improvements to Nightlights mapping thanks to R-Bloggers « Things I tend to forget Says:

    […] Mac users are bett…Marc Schwartz on vecLib: Why Mac users are bett…Jeffrey Breen on Nightlights: cool data, bad…Jeffrey Breen on Nightlights: cool data, bad…Jeffrey Breen on Nightlights: cool data, […]

  5. luiscarlosmr Says:

    I Have been trying to access again, however is not working now!


Leave a comment