Plotting Data on a Map

1,013 views
Skip to first unread message

Tom Hopper

unread,
Jun 17, 2010, 10:56:19 AM6/17/10
to ggp...@googlegroups.com
I am attempting, with little success, to plot data on a map using ggplot2, but I am making no headway.

What I would like is to create choropleth maps showing electricity production by country, similar to the ones in the ggplot2 book (pp. 79, fig 5.13) or the map at
http://www.steamtablesonline.com/electricity/electricity-installed-capacity.aspx.
Ultimately, I will be plotting with slightly different data, such as calculating and plotting production per capita, renewables production per capita, and five-year change in the fraction of renewables per capita.

I would like to plot a map with colors defined by the net electricity generating data available at
http://tonto.eia.doe.gov/cfapps/ipdbproject/iedindex3.cfm?tid=2&pid=2&aid=12&cid=&syid=2004&eyid=2008&unit=BKWH.
I have converted this data into a U.S.-format csv file ("." for decimal separators; "," for field separators), available at
http://www.box.net/tomhopper/1/22918943/452126656.

I started out attempting to follow the example in the ggplot2 book, which uses the maps package, but the world map in that package is badly out of date and won't work with many of the countries in the EIA data.

I can get the EIA data into R, and I've found a couple of sources of updated map data (thematic mapping, http://thematicmapping.org/downloads/world_borders.php, and gadm, http://gadm.org/world), and I was able to get the thematicmapping data into R (still downloading the gadm data).

To get the thematicmapping data into R, I used:
> library("rgdal")
> world.map <- readOGR(dsn="C:/", layer="TM_WORLD_BORDERS-0.3")

I can plot this with

> plot(world.map)

For those who do not have easy access to rgdal, I have uploaded a copy of the world.map data to
http://www.box.net/tomhopper/1/22918943/452150764

You should be able to plot it with
> load("worldmap.rdata")
> library(sp)
> plot(world.map)

Having loaded the new map data in R, I have not been able to figure out how to plot it in ggplot as a simple map showing country borders, or how to combine the map data and the EIA data to create the choropleth map. I am sure that part of my problem is that I do not understand the structure of the data coming in from thematicmapping or gadm, or what data structure ggplot2 expects.

Any guidance will be very much appreciated.

Thank you,

Tom Hopper

fernando

unread,
Jun 17, 2010, 8:10:03 PM6/17/10
to Tom Hopper, ggp...@googlegroups.com

Hi Tom,

 

I think fortify can help you in translating the sp object to a data.frame. Then you can use merge to join the two tables. The code bellow illustrates this, although I think there are some problems in matching country names. Another issue is that if you add coord_map(), something strange happens with Antarctica (maybe a problem in shapefile order?).

 

Hope this helps,

fernando

 

library(maptools)

gpclibPermit()

library(ggplot2)

 

## setwd()

load("worldmap.rdata")

world <- fortify(world.map, region = "NAME")

elect <- read.csv("Total_Electricity_Net_Generation_(Billion_Kilowatthours).csv",

                sep = ",", dec = ".")

names(elect) <- c("id", "y2004", "y2005", "y2006", "y2007", "y2008")

world <- merge(world, elect, by = "id", all.x = TRUE)

world <- world[order(world$order), ]

 

# borders

ggplot(data = world, aes(x = long, y = lat, group = group)) +

                geom_path()

 

# choropleth

ggplot(data = world, aes(x = long, y = lat, group = group)) +

                geom_polygon(aes(fill = y2004))

 

 

## sessionInfo()

# R version 2.11.1 (2010-05-31)

# x86_64-pc-mingw32

#

# locale:

# [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252  

# [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                 

# [5] LC_TIME=Spanish_Spain.1252   

#

# attached base packages:

# [1] grid      stats     graphics  grDevices utils     datasets  methods 

# [8] base    

#

# other attached packages:

# [1] ggplot2_0.8.7   digest_0.4.2    reshape_0.8.3   plyr_0.1.9    

# [5] proto_0.3-8     maptools_0.7-34 lattice_0.18-8  sp_0.9-64     

# [9] foreign_0.8-40

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Tom Hopper

unread,
Jun 18, 2010, 5:39:04 AM6/18/10
to fernando, ggp...@googlegroups.com
Fernando,

That worked perfectly; thank you!

There were some mismatches in the country names, as you noted, but after cleaning up the electricity generation data everything looks great. I've updated the electricity generation data with the cleaned version and posted a graph to box.net.
the graph: http://www.box.net/tomhopper/1/22918943/452739712

Below
, for reference, is the complete R code.

Thank you, and best regards,

Tom

# Download electricity generation data from http://tonto.eia.doe.gov/cfapps/ipdbproject/iedindex3.cfm?tid=2&pid=2&aid=12&cid=&syid=2004&eyid=2008&unit=BKWH
# Download new map data from http://thematicmapping.org/downloads/world_borders.php

# Bring the thematicmapping data into R
library(maptools)
gpclibPermit()
library("rgdal")
world.map <- readOGR(dsn="C:/", layer="TM_WORLD_BORDERS-0.3")

# Save the map data as an R object for later use
save(world.map, "worldmap.rdata")

# As needed, reload the data
# load("worldmap.rdata")

# Prepare the world.map data for ggplot2
library(ggplot2)
world.ggmap <- fortify(world.map, region = "NAME")

# Load the electricity generation data and clean it up to match with world.ggmape
elect.gen.tot <- read.csv("Total_Electricity_Net_Generation_(Billion_Kilowatthours).csv",

     sep = ",", dec = ".")
names(elect) <- c("id", "y2004", "y2005", "y2006", "y2007", "y2008")
elect.gen.tot$id <- tolower(elect.gen.tot$id)

# merge the two data sets
world.ggmape <- merge(world.ggmap, elect.gen.tot, by = "id", all = TRUE)
world.ggmape <- world.ggmape[order(world.ggmape$order), ]

# NOTE: at this point, the electricity data country names may not match 100%
# with the thematicmapping country names (column "id").
# print out the country names using
# table(world.ggmape$id)
# and do a search for values of 1. Then find the corresponding country
# name with values > 1 and rename the country names in the electricity
# generation data to match (e.g. "Tanzania" becomes "United Republic of
# Tanzania").
# Once this data cleaning operation is complete, repeat the above steps
# starting with reading data into elect.gen.tot.

# Plot the data using ggplot2
world.plot <- ggplot(data = world, aes(x = long, y = lat, group = group))
world.plot + geom_polygon(aes(fill = y2007)) +
     labs(x = "Longitude", y = "Latitude", fill = "TWh") +
     opts(title = "Net Electricity Generation in TWh for 2007\nData from EIA International Energy Statistics, 2008")

Tom Hopper

unread,
Jun 19, 2010, 1:59:46 PM6/19/10
to ggp...@googlegroups.com
Well, it turns out that, in my haste to thank Fernando, I posted code with some typos. I have also had a chance to test it on my Mac, and found that fortify() was throwing an error ("Error in nchar(ID) : invalid multibyte string 1"). I have added a call, currently commented out, to Sys.setlocale() to fix this. I have tested the code below under R 2.11.1 on both Windows XP and Mac OS X 10.5.8, with the latest packages installed. In fact, the plot looks better in the Mac Quartz window.

Sorry for the previous errors.

# Download electricity generation data from http://tonto.eia.doe.gov/cfapps/ipdbproject/iedindex3.cfm?tid=2&pid=2&aid=12&cid=&syid=2004&eyid=2008&unit=BKWH

# Download new map data from http://thematicmapping.org/downloads/world_borders.php


# Bring the thematicmapping data into R

library("rgdal")

world.map <- readOGR(dsn="C:/", layer="TM_WORLD_BORDERS-0.3")


# Save the map data as an R object

save(world.map, "worldmap.rdata")


# As needed, reload the data

library(maptools)

gpclibPermit()

load("worldmap.rdata")


# Prepare the world.map data for ggplot2

library(ggplot2)

# On some setups, fortify throws "Error in nchar(ID)"

# in that case, use setlocale

# Sys.setlocale("LC_ALL", locale = "C")

world.ggmap <- fortify(world.map, region = "NAME")


# Load the electricity generation data and clean it up to match with world.ggmape

elect.gen.tot <- read.csv("Total_Electricity_Net_Generation_(Billion_Kilowatthours).csv", sep = ",", dec = ".")

names(elect.gen.tot) <- c("id", "y2004", "y2005", "y2006", "y2007", "y2008")


# make sure the id column is in the same case for both sets

elect.gen.tot$id <- tolower(elect.gen.tot$id)

world.ggmap$id <- tolower(world.ggmap$id)


# merge the two data sets

world.ggmape <- merge(world.ggmap, elect.gen.tot, by = "id", all = TRUE)

world.ggmape <- world.ggmape[order(world.ggmape$order), ] 


# NOTE: at this point, the electricity data country names do not match 100%

# with the thematicmapping country names (column "id").

# print out the country names using

# table(world.ggmape$id)

# and do a search for values of 1. Then find the corresponding country

# name with values > 1 and rename the country names in the electricity

# generation data to match (e.g. "Tanzania" becomes "united republic of

# tanzania").

# Once this data cleaning operation is complete, repeat the above steps

# starting with reading data into elect.gen.tot.


# Plot the data using ggplot2

world.plot <- ggplot(data = world.ggmape, aes(x = long, y = lat, group = group))

world.plot + geom_polygon(aes(fill = y2007)) + labs(x = "Longitude", y = "Latitude", fill = "TWh") + opts(title = "Net Electricity Generation in TWh for 2007\nData from EIA International Energy Statistics, 2008")


Hadley Wickham

unread,
Jun 20, 2010, 9:00:42 PM6/20/10
to Tom Hopper, ggp...@googlegroups.com
Hi Tom,

Thanks for contribution! Ploting map data is never easy and its always
nice to see a success story.

Hadley

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Tom Hopper

unread,
Jun 21, 2010, 5:27:14 PM6/21/10
to ggp...@googlegroups.com
Hadley,

Thank you!

I doing this, I've encountered an encountered and unexpected difference between the  graphs on my Mac and my Windows machines. It appears that there is a default setting different between the two programs.

Using the same commands on both Mac and Windows, I found that the polygon borders are plotted on the Mac, but not on Windows, so on the Mac I see the country borders, but on Windows I do not. On the Mac, it looks like the default for geom_polygon is something like size = 0.01, colour = "gray50". On Windows, it's more like colour = alpha("gray50", 0), though in fact the visibility of polygon borders seems to be independent of both the size and colour settings.

Regards,

Tom

Hadley Wickham

unread,
Jun 22, 2010, 5:52:03 PM6/22/10
to Tom Hopper, ggp...@googlegroups.com
> Using the same commands on both Mac and Windows, I found that the polygon
> borders are plotted on the Mac, but not on Windows, so on the Mac I see the
> country borders, but on Windows I do not. On the Mac, it looks like the
> default for geom_polygon is something like size = 0.01, colour = "gray50".
> On Windows, it's more like colour = alpha("gray50", 0), though in fact the
> visibility of polygon borders seems to be independent of both the size and
> colour settings.

It's likely to be some combination of the different graphics device
code in R and the different viewer you're using to look at the file.
Unfortunately it's all well outside the control of ggplot2.

Hadley

Felipe Carrillo

unread,
Jun 23, 2010, 4:52:29 PM6/23/10
to Tom Hopper, r-h...@stat.math.ethz.ch, ggp...@googlegroups.com
Hi:
I am practicing with the attached shapefile and was wondering
if I can get some help. Haven't used 'rgdal' and 'maptools' much
but it appears to be a great way bring map data into R.
Please take a look at the comments and let me know if I need to
explain better what I am trying to accomplish.

library(rgdal)
library(maptools)
library(ggplot2)
dsn="C:/Documents and Settings/fcarrillo/Desktop/Software/R Scripts
and Datasets/NCTC/Data Analisys II/Data 4 Data Analysis II March 2009-
March2010/Data"
wolves.map <- readOGR(dsn=dsn, layer="PNW_wolf_habitat_grid")
class(wolves.map)
dim(wolves.map)
head(wolves.map,1)
names(wolves.map)
gpclibPermit()
wolves.ds <- fortify(wolves.map)
head(wolves.ds);tail(wolves.ds)
# Shapefile of 5 states
wolves.plot <- ggplot(wolves.ds,aes(long,lat,group=group)) +
geom_polygon(colour='white',fill='lightblue')
wolves.plot
# How to show the state borders of Washington, Oregon, Idaho, Montana
and Wyoming on map?

# Subset from wolves to create a logistic regression model based on
WOLVES_99 and then apply to all the 5 states
# Remove the WOLVES_99 2(two value) and use "one" for presence and
"zero" for absence to end up with 153 records.
#wolfsub<-subset(wolves,subset=!(WOLVES_99==2))
#wolfsub <- subset(wolves.map,WOLVES_99!=2)
wolfsub <- wolves.map[!wolves.map$WOLVES_99 %in% 2,];wolfsub
dim(wolfsub)
# 42 = Forest, 51 = Shrub, > 81 = Agriculture
wolfsub$Forest<-ifelse(wolfsub$MAJOR_LC==42,1,0)
wolfsub$Shrub<-ifelse(wolfsub$MAJOR_LC==51,1,0)
wolfsub$Agriculture<-ifelse(wolfsub$MAJOR_LC>81,1,0)
names(wolfsub);dim(wolfsub)
# create the model
mod1<-glm(WOLVES_99~RD_DENSITY+Forest+Shrub
+Agriculture,family=binomial,data=wolfsub)
summary(mod1)
wolfsub$pred99<-fitted(mod1)
names(wolfsub)
#fitted(mod1)
wolfsub$pred99

# Add the wolfsub data to the map to see the map
wolfsub <- fortify(wolfsub);names(wolfsub)
area_mod <- wolves.plot + geom_polygon(data=wolfsub,aes(fill=????))
#Want to fill by WOLVES_99 but is gone #after fortify
area_mod
#Not sure how to proceed from here to fit the 'mod1' model to all
#the 5 states.

 


From: Tom Hopper <tomh...@gmail.com>
To: Felipe Carrillo <mazatla...@yahoo.com>
Sent: Tue, June 22, 2010 9:40:19 PM
Subject: Re: Plotting Data on a Map

Felipe,

I am just learning these tools, too, so it may be a good learning opportunity for both of us. Please send me the files and I will try to put it together and plot it.

Regards,

Tom

On Mon, Jun 21, 2010 at 11:57 PM, Felipe Carrillo <mazatla...@yahoo.com> wrote:
Hi Tom:
I am just starting to use rgdal and maptools but I have a long way to go. I went to a training
a couple of weeks ago and the instructor showed us a csv file and a shapefile with wolf data from
a national park in the midwest. I am trying to put all of the csv data and some predicted data
on a map using ggplot2 but I am stuck with it. I am just trying to do this example because I want to
see if I can apply this example to fish. Let me know if interested.
 
 
Felipe D. Carrillo
Supervisory Fishery Biologist
Department of the Interior
US Fish & Wildlife Service
California, USA

Felipe Carrillo

unread,
Jun 23, 2010, 4:57:35 PM6/23/10
to Felipe Carrillo, Tom Hopper, r-h...@stat.math.ethz.ch, ggp...@googlegroups.com
For some reason the shapefile can't get attached.
The shapefile is too large to put it in dput..Is there
another way to do this?

> Mon, Jun 21, 2010 at 11:57 PM, Felipe Carrillo <> ymailto="mailto:mazatla...@yahoo.com"
> href="mailto:mazatla...@yahoo.com">mazatla...@yahoo.com>

> Mon, Jun 21, 2010 at 3:00 AM, Hadley Wickham <> ymailto="mailto:had...@rice.edu"
> href="mailto:had...@rice.edu">had...@rice.edu>

> wrote:
>>>
>>>Hi
> Tom,
>>>>
>>>>Thanks for contribution! Ploting map
> data is never easy and its always
>>>>nice to see a success
> story.
>>>>
>>>>Hadley
>>>>
>>>>
>>>>On

> Saturday, June 19, 2010, Tom Hopper <> href="mailto:tomh...@gmail.com">tomh...@gmail.com>

> On Fri, Jun 18, 2010 at 11:39 AM, Tom Hopper <> ymailto="mailto:tomh...@gmail.com"
> href="mailto:tomh...@gmail.com">tomh...@gmail.com>

> wrote:
>>>>>
> Fernando,
>>>>>
>>>>> That worked perfectly;
> thank you!
>>>>>
>>>>> There were some
> mismatches in the country names, as you noted, but after cleaning up the
> electricity generation data everything looks great. I've updated the electricity

> generation data with the cleaned version and posted a graph to > target="_blank" href="http://box.net">box.net.


>>>>> the
> graph:
> http://www.box.net/tomhopper/1/22918943/452739712
>>>>>
>>>>>
> Below, for reference, is the complete R
> code.
>>>>>
>>>>> Thank you, and best
> regards,
>>>>>
>>>>>
> Tom
>>>>>
>>>>> # Download electricity

> generation data from > href="http://tonto.eia.doe.gov/cfapps/ipdbproject/iedindex3.cfm?tid=2&pid=2&aid=12&cid=&syid=2004&eyid=2008&unit=BKWH"
> target=_blank
> >http://tonto.eia.doe.gov/cfapps/ipdbproject/iedindex3.cfm?tid=2&pid=2&aid=12&cid=&syid=2004&eyid=2008&unit=BKWH
>>>>>
> # Download new map data from > href="http://thematicmapping.org/downloads/world_borders.php" target=_blank

> On Fri, Jun 18, 2010 at 2:10 AM, fernando <> ymailto="mailto:fgta...@gmail.com"
> href="mailto:fgta...@gmail.com">fgta...@gmail.com>

> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
> Hi
> Tom,
>>>>>
>>>>>
>>>>>
>>>>>
> I think fortify can help you in translating the sp object to
> a
>>>>> data.frame. Then you can use merge to join the two
> tables. The code bellow
>>>>> illustrates this, although I
> think there are some problems in matching country
>>>>> names.
> Another issue is that if you add coord_map(), something strange
> happens
>>>>> with Antarctica (maybe a problem in shapefile
> order?).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
> --
>>>>
>>>>> You received this message because
> you are subscribed to the ggplot2 mailing list.
>>>>> Please
> provide a reproducible example:
> http://gist.github.com/270442
>>>>>
>>>>> To

> post: email > href="mailto:ggp...@googlegroups.com">ggp...@googlegroups.com
>>>>>
> To unsubscribe: email ggplot2+> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com


>>>>>
> More options:
> http://groups.google.com/group/ggplot2
>>>>>
>>>>
>>>>
>>>>--
>>>>Assistant
> Professor / Dobelman Family Junior Chair
>>>>Department of
> Statistics / Rice
> University
>>>>http://had.co.nz/
>>>>
>>>--
>
>>>
>>>You received this message because you are
> subscribed to the ggplot2 mailing list.
>>>Please provide a
> reproducible example: > >http://gist.github.com/270442
>>> 
>>>To post:

> email > href="mailto:ggp...@googlegroups.com">ggp...@googlegroups.com
>>>To
> unsubscribe: email ggplot2+> href="mailto:unsub...@googlegroups.com">unsub...@googlegroups.com
>>>More
> options: > >http://groups.google.com/group/ggplot2
>>>
>>
>


 
>    
    [[alternative HTML version
> deleted]]


Reply all
Reply to author
Forward
0 new messages