Plotting Zip Codes on a Map

1,923 views
Skip to first unread message

Yves Bergquist

unread,
Aug 26, 2015, 1:03:48 PM8/26/15
to ggp...@googlegroups.com
Hi everyone, 

I have the attached list of Zip Codes (the data has only one column), which I’m trying to plot (preferably as color blocks) on a map of Texas. None of the resources I have found online have allowed me to do it. 

Anybody has a simple way of doing it ? Perhaps using ggmap?

Best, 

Yves
Panel Zip Codes.csv

hrbrmstr

unread,
Aug 26, 2015, 5:48:51 PM8/26/15
to ggplot2
There are a [choro]plethora of ggplot2 mapping resources, many of which do precisely what you're looking for. I suppose one more example won't hurt:

library(maptools)
library(sp)
library(rgdal)
library(ggplot2)
library(mapproject)
library(dplyr)
library(ggthemes)
library(viridis) # devtools::install_github("sjmgarnier/viridis")

# faster downlaod the tigirs::zcta
fil <- basename(url)
if (!file.exists(fil)) download.file(url, fil)

# only care about the shapefile
tx_shp <- grep("shp", unzip(fil), value=TRUE)

# read it in
tx <- readOGR(tx_shp, ogrListLayers(tx_shp)[1], stringsAsFactors=FALSE)

# enable ggploting
tx_map <- fortify(tx, region="NAME")

# get your data
zips <- read.csv("Panel Zip Codes.csv", stringsAsFactors=FALSE)

# summarize your data
zips_dat <- count(zips, ZIPCODE)

# shrink the bounding box to just your area
for_bb <- subset(tx, NAME %in% zips_dat$ZIPCODE)
txbb <- bbox(for_bb)

# make the plot
gg <- ggplot()
gg <- gg + geom_map(data=tx_map, map=tx_map,
                    aes(x=long, y=lat, map_id=id),
                    color="#7f7f7f", size=0.1, fill="white")
gg <- gg + geom_map(data=zips_dat, map=tx_map,
                    aes(fill=factor(n), map_id=ZIPCODE),
                    color="#7f7f7f", size=0.1)
gg <- gg + scale_fill_viridis(discrete=TRUE)
gg <- gg + coord_map(xlim=txbb[1,], ylim=txbb[2,])
gg <- gg + theme_map()
gg <- gg + theme(legend.position="right")
gg

Brian

unread,
Aug 26, 2015, 5:57:04 PM8/26/15
to hrbrmstr, ggplot2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

very pretty.
thanks for the example.
Cheers
> <https://lh3.googleusercontent.com/-TNkwY4LViA4/Vd40MTqqzgI/AAAAAAAAGu
U/cBa1m4TjTQs/s1600/Rplot01.png>
>
>
>
> On Wednesday, August 26, 2015 at 1:03:48 PM UTC-4, ybergquist
> wrote:
>>
>> Hi everyone,
>>
>> I have the attached list of Zip Codes (the data has only one
>> column), which I’m trying to plot (preferably as color blocks) on
>> a map of Texas. None of the resources I have found online have
>> allowed me to do it.
>>
>> Anybody has a simple way of doing it ? Perhaps using ggmap?
>>
>> Best,
>>
>> Yves
>>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJV3jYDAAoJEB+kbV9VX51IN8EQAIph8uEXePQwqfdgkLtdiwJu
mkAd/+hxOlwGFPg7fj/Pvc/z1V+AG7+zUW5JkIt15ZtnXaHveE0W0ytBi/iHHF1I
MVY/cbZUqaTCLYcrpMPul6cGdPPjH+/IMsu7aN9SsC8eso2Xsd2SeLQbzZuDjJ1P
NgzNCrd1/Xnkt66USHUCp05rXD/ZfznUXy5gjWUQv41IMKu3S3l0VOEqMXBhgULn
s1vouQoD38v0k4pXVihq7nILj5k+/I5BYUibaGMwRMRvbjGQXw6gLyY+EiSX2I0M
2D2TREogY4SC4LidzUSDuSXpaBUpGHxkjFdLp+Jv8tW3h9BOaKpTGphi+9J+Z3Fh
lMRc+lBhC9MqV7x4Il4kxhOhsSNNE9sCkVrL4o2QzGXVaa55+pkL8CU33ZtTB6fK
cgB/VkvyoU2QEt1fsVKlvUyxULicx9I92oeH3Wp5+/UCh63mwJ8mBiR0iOA2uoYU
xlQv2NDibhYBF3DpU4enM7O8lmUVQPq7SUFrhAtvJQS5rESKqmVDjsS46S1NX/9d
kt/zRqKp79QSt8iA1HKBGfJLMgDE/nvH3msWOrNhpcDdl4f+y/EmL7RGPMHKzpJn
jJpS+VcECdzXkh20frvCljatov/8WlaW2o/yKGbIjnAlPJ2jHDTsopnQ3yRY1Egm
VsHUMhwo72xpUJx9qxPq
=RD3G
-----END PGP SIGNATURE-----

Yves Bergquist

unread,
Aug 26, 2015, 6:12:28 PM8/26/15
to Brian, hrbrmstr, ggplot2
This is really great, thank you

-- 
-- 
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

--- 
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yves Bergquist




Ari L

unread,
Aug 27, 2015, 5:59:07 PM8/27/15
to ggplot2, zenl...@gmail.com, b...@rudis.net
Yves, you might want to check out the choroplethrZip package. In addition to the 3 vignettes on that page, there is also a lesson on how to use it in the free census mapping course I created.

boB Rudis

unread,
Aug 27, 2015, 6:42:30 PM8/27/15
to Ari L, ggplot2, zenl...@gmail.com
I second that. choroplethZip is absolutely the best way to start doing
this The "gateway drug" to choropleths in R ;-)

arila...@gmail.com

unread,
Aug 27, 2015, 7:22:02 PM8/27/15
to boB Rudis, ggplot2, zenl...@gmail.com
Lol. Bob, I want to put that in my marketing material.

Yves, waitasec, let me try to write code for you to solve your problem using choroplethrZip.

Yves Bergquist

unread,
Aug 27, 2015, 7:29:53 PM8/27/15
to arila...@gmail.com, boB Rudis, ggplot2, zenl...@gmail.com
Thanks everyone, I really appreciate your help.

arila...@gmail.com

unread,
Aug 27, 2015, 7:51:02 PM8/27/15
to Yves Bergquist, boB Rudis, ggplot2, zenl...@gmail.com
OK. Attached is my image. Note that I had to do a lot of processing of your raw data to map it. Here is my code plus notes.

# note that choroplethrZip is on github, not CRAN. The link I sent before explains how to install it.
library(choroplethrZip)

# all choroplethr functions require a data.frame with one column called "region" and one column called "value". Your csv just has regions, and calls the column "ZIPCODE". Here I change that.
zips = read.csv("~/Downloads/Panel Zip Codes.csv", col.names="region")
zips$value = 1:nrow(zips) # random data

# Your originally said that you wanted to map all of texas, so I used state_zoom. When I tried it I got an error.
zip_choropleth(zips, state_zoom="texas")
Error: anyDuplicated(self$user.df$region) == 0 is not TRUE

zip_choropleth throws an error because your data.frame has duplicate regions. Here's how I removed the duplicated regions.
zips2=zips[!duplicated(zips$region), ]
nrow(zips) #321
nrow(zips2) #154

# also note that zip_choropleth wants the zipcodes to be characters, not integers. Here's how I converted it.
zips2$region = as.character(zips2$region)

Also, because there are so few zipcodes in your dataset I decided to zoom in on just them, not all of texas. 

zip_choropleth(zips2, zip_zoom=zips2$region)

At this point I got an error about unmappable regions. zip_choropleth uses the map zip.map, which contains all ZCTAs (Zip Code Tabulated Areas) in the census bureau as of 2013.  Perhaps the zips in your dataset were created after 2013, or are just considered not important by the census bureau (i.e. some zips are just post offices, and have no people living there) - these are some of common problems that occur when you try to map zip codes.

# remove unmappable regions
zips2 = zips2[!zips2$region %in% c("76124", "75138"), ]
nrow(zips2) # 152

# now it works, see attached image
zip_choropleth(zips2, zip_zoom=zips2$region) 
for-yves.png

Brian

unread,
Sep 22, 2015, 5:26:22 AM9/22/15
to ggplot2
Dear list,

a considerable limitation of ggplot2 in meteorological analysis for me
is the lack of ability to plot concurrent time series of different
variables with panels stacked on top of each other. What I mean:

library(reshape2)
## simulate a model study
airquality <- transform(airquality, Ozone.model = Ozone +
rnorm(length(Ozone), 2),
Wind.model = Wind + rnorm(length(Wind)))
## Add time, see ?airquality
airquality$dtm <- as.Date(paste("1973", airquality$Month, airquality$Day,
sep = "-"), format="%Y-%m-%d", tz="EST")
d.f <- melt(airquality, id.vars = "dtm", measure.vars = c("Wind",
"Wind.model", "Ozone", "Ozone.model"))
d.f$type <- ifelse(grepl("model", d.f$variable), "Modeled", "Measured")
d.f$Variables <- sub("\\.model", "", d.f$variable)
## The problem
(box <- ggplot(d.f, aes(dtm, value, color=type)) +
geom_line() +
facet_grid(Variables~., scales="free_y"))

## A solution, please note that =ggsave= will no longer work
move_facet_strip_y <- function(object, sane.defaults = TRUE)
{
## From:
http://stackoverflow.com/questions/23497682/alter-facet-grid-output-in-ggplot2
if (sane.defaults)
{
object <- object +
theme(strip.text.y = element_text(angle=90)) +
labs(y = "")
}
g <- ggplotGrob(object)
## Find bordering elements
adjustment.index <- g$layout[, "r"] == unique(g$layout[g$layout$name
== "strip-right", "r"]) - 1
## Pull them over
g$layout[adjustment.index, "r"] <- g$layout[adjustment.index , "r"] + 1
## Push panels over
g$layout[g$layout$name == "strip-right", c("l", "r")] <- 2
## change colors to sane defaults
if (sane.defaults)
{
index <- which(unlist(lapply(sapply(g$grobs, "[[", "childrenOrder"),
function(x)
any(grepl(pattern="strip.text.y", x)))))
## loop over panels
for(i in index) {
place <- grep("strip.background.rect",
g$grobs[[i]]$childrenOrder)
g$grobs[[i]]$children[[place]][["gp"]][["col"]] <- "transparent"
g$grobs[[i]]$children[[place]][["gp"]][["fill"]] <-
"transparent"
}
}
class(g) <- c("crap", class(g))
g
}

grid.draw(move_facet_strip_y(box))


It is a brutish fix to allow plotting of time series stacked on top of each
other. It will work for more than just time series. Probably violates
a few principles
of the grammar of graphics and usually requires the violation of tidy data
principles. It helps me immensely.

I hope that helps somebody else too.

Cheers,
Brian


Dennis Murphy

unread,
Sep 22, 2015, 4:34:59 PM9/22/15
to Brian, ggplot2
Hi:

This situation is not common, but it's not exactly rare, either. In
situations like this, one typically leans on one or more of grid,
gtable and gridExtra to perform the necessary surgery. Your case
entails two adjustments:

(i) alignment of the plot regions so that the widths of each are the
same when stacked;
(ii) a single legend that should be positioned intermediate to the two plots.

The conventional way to go about this is to create a gtable object and
use that information to modify the graphics page. The approach below
plots ozone and wind separately, so starts with two melt() operations
that stack the observed and fitted responses. (You could also use
dplyr with tidyr::gather() as a replacement for melt().)

library(reshape2)
library(ggplot2)
library(grid)
library(gridExtra)

# Your initial data transformations - built-in data sets should always
# be left in their original state
airq <- transform(airquality, Ozone.model = Ozone +
rnorm(length(Ozone), 2),
Wind.model = Wind + rnorm(length(Wind)))
## Add time, see ?airquality
airq$dtm <- as.Date(paste("1973", airquality$Month, airquality$Day,
sep = "-"), format="%Y-%m-%d", tz="EST")

# Perform separate melt operations on ozone and wind
# and create a new variable type whose labels match the
# desired ones in the (common) legend
aq_ozone <- melt(airq, id.var = "dtm",
measure.var = c("Ozone", "Ozone.model"))
aq_ozone$type <- factor(aq_ozone$variable, labels = c("Measured", "Modeled"))

aq_wind <- melt(airq, id.var = "dtm",
measure.var = c("Wind", "Wind.model"))
aq_wind$type <- factor(aq_wind$variable, labels = c("Measured", "Modeled"))

######## Generate the two plots separately #########
# Get rid of the x-axis info in the top plot
A <- ggplot(aq_ozone, aes(x = dtm, y = value, color = type)) +
geom_line() + labs(x = NULL, y = "Ozone") +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank())

# Get a gtable of the grobs from A and extract the legend grob
gA <- ggplot_gtable(ggplot_build(A))

# Function to extract a legend from a gtable - adapted from
# http://stackoverflow.com/questions/12539348/ggplot-separate-legend-and-plot/12539820#12539820
g_legend <- function(gt){
leg <- which(sapply(gt$grobs, function(x) x$name) == "guide-box")
gt$grobs[[leg]]
}

# Grab the legend box as a grob
legend <- g_legend(gA)

# Redo A to get rid of the legend in the final plot
A <- A + theme(legend.position = "none")

# Remove the default top margin spacing in the bottom plot
B <- ggplot(aq_wind, aes(x = dtm, y = value, color = type)) +
geom_line() + labs(y = "Wind") +
theme(plot.margin = unit(c(0, 1, 0.5, 0.5), "lines"))

#############
# Generate gtables in order to align the plot regions
gA <- ggplot_gtable(ggplot_build(A))
gB <- ggplot_gtable(ggplot_build(B))



########## Align the widths of the two plot regions ########
# Determine the maximum width of A and B
maxWidth = grid::unit.pmax(gA$widths[2:3], gB$widths[2:3])
# Reassign widths to maxWidth in each of gA and gB as lists
gA$widths[2:3] <- as.list(maxWidth)
gB$widths[2:3] <- as.list(maxWidth)

#################
# Generate a layout matrix for positioning the grobs
# Four 1's and 2's means that we want 80% of the width in
# each row to be devoted to the plots
lay <- rbind(c(1, 1, 1, 1, 3), c(2, 2, 2, 2, 3))

## use the grid.arrange() function from gridExtra
## to compose the final graphic
grid.arrange(gA, gB, legend, layout_matrix = lay)

Normally you can add to a ggplot in grid.arrange(), but in this case
we modified the gtables of A and B to align the horizontal plot
widths, so we need to use the gtables as the arguments in
grid.arrange() in this example. This is why I edited A after its
legend grob was extracted and redid its gtable. (If there's a more
efficient method, I'm happy to hear about it.)

To condense the (admittedly verbose) code, one could write a function
that takes the input data and response variable of interest as
arguments, along with a flag to keep or remove the legend, such that
it returns the result of ggplot_gtable(ggplot_build(plot)) as its
output, using the intermediate steps as its body.

Dennis

Brian

unread,
Sep 25, 2015, 3:19:14 PM9/25/15
to Dennis Murphy, ggplot2
Hello Dennis,

thank you for the example. The typical approach is by far the most
useful when combining different elements and being able to tweak every
little element. I have also seen other examples (which obviously left
me wanting more), and found the following the most informative (yay for
Rpubs):
https://rpubs.com/MarkusLoew/13295
The rbind method at the end is especially promising. The cowplot
package is also pretty handy. I especially enjoy its aligning
functionality.

As you have admitted, there is a lot of handwork/surgery involved with
the typical approach. A different function would be required for say 3
rows (say if your boss wants you to "have a look") and legend placement
also has to be done manually. This is work, and requires a lot of
detailed knowledge of the inner workings of grid/gtable. You also get
away from the beauty and simplicity of ggplot, where things just work.
You return to the typical R fiddling - you break the promise of ggplot.

I can't praise enough the consistent, logical, sufficiently simple
interface. Proof of it's elegance: python adopted it.

For a user, the first thing they learn when they learn ggplot is how to
organize their data in order to plot. Therefore, most ggplot users have
already developed their data wrangling skills. Getting data ready in my
proposed approach is a simple extension, in my opinion. I may be wrong.

Of course, it is a brutish hack. Also, we now have a different, less
intuitive interface for determining the y axis labels and the x-facet no
longer coincides with the ggplot philosophy, etc.

However, my extension builds on skills I already have and use daily and
thereby allows me to be lazy, and therefore I like it. Maybe somebody
shares my view and I've been able to help them. One day I will put my
useful stuff (more brutish hacks) on github, but not today.

Cheers,
Brian
Reply all
Reply to author
Forward
0 new messages