How to plot months of cumulative data grouped by year in ggplot2 with geom_bar?

2,030 views
Skip to first unread message

Charles Snyder

unread,
Apr 10, 2012, 7:45:35 PM4/10/12
to ggp...@googlegroups.com
Hi

 I have a df that looks like this:
 monthyearcase.dump.rvudate
1Jan2004460.302004-Jan-01
2Feb2004435.012004-Feb-01
3Mar2004402.782004-Mar-01
4Apr2004371.722004-Apr-01
5May2004483.642004-May-01
6Jun2004469.042004-Jun-01


With this script, I get the first graph (G1):

a <- ggplot(aggdata, aes(x = month, y = case.dump.rvu,
+                       fill = year)) + facet_wrap(~year)  + opts(title = "My Data") +
+                         labs(x = NULL, y = "RVU")
b <- a + geom_bar(position="dodge")
print(b)


What I want to get is the second graph, where all the months are grouped by year (G2)

Thanks!

CS
G1.png
G2.png

Ista Zahn

unread,
Apr 11, 2012, 6:19:51 AM4/11/12
to Charles Snyder, ggp...@googlegroups.com
Hi Charles,

If you don't want facets, don't use them! Try this (untested):

a <- ggplot(aggdata, aes(x = month, y = case.dump.rvu,

fill = year, group = year))


b <- a + geom_bar(position="dodge")
print(b)

HTH,
Ista

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2

Brandon Hurr

unread,
Apr 11, 2012, 6:27:42 AM4/11/12
to Ista Zahn, Charles Snyder, ggp...@googlegroups.com
I was going to suggest something similar. 

Looks like the desired graph is plotting the year on the x-axis and then doing color/fill by month. You'll have to order the levels of month as a factor so they don't get done alphabetically...

a <- ggplot(aggdata, aes(x = year, y = case.dump.rvu,
                      fill = month))

b <- a + geom_bar(position="dodge")

b

Also untested in the absence of a dataset...

Charles Snyder

unread,
Apr 11, 2012, 10:54:45 AM4/11/12
to ggp...@googlegroups.com
Thanks

I attached my data in csv form :)

I had tried these two methods before - the results for the first were:

> library("ggplot2")

> a <- ggplot(aggdata, aes(x = month, y = case.dump.rvu,
+                          fill = year, group = year))

> b <- a + geom_bar(position="dodge")
> print(b)

Error in data.frame(list(count = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
  arguments imply differing number of rows: 100, 5636


And for the second:

> a <- ggplot(aggdata, aes(x = year, y = case.dump.rvu,
+                          fill = month))

> b <- a + geom_bar(position="dodge")
> b

stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
Error in data.frame(list(count = c(0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,  :
  arguments imply differing number of rows: 396, 5636


Thanks again

CLS
aggdata.csv

Brandon Hurr

unread,
Apr 11, 2012, 11:50:46 AM4/11/12
to Charles Snyder, ggp...@googlegroups.com
Ok, 

I added dummy data for the extra months (0's) because ggplot likes to make bars fatter when there is missing data. 

aggdata<-structure(list(month = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), .Label = c("Jan", 
"Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", 
"Nov", "Dec"), class = "factor"), year = c(2004L, 2004L, 2004L, 
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 
2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 
2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 
2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 
2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2010L, 2010L, 2010L, 
2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 2010L, 
2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 
2011L, 2011L, 2011L, 2012L, 2012L, 2012L, 2012L, 2012L, 2012L, 
2012L, 2012L, 2012L, 2012L, 2012L, 2012L), case.dump.rvu = c(460.3, 
435.01, 402.78, 371.72, 483.64, 469.04, 428.93, 449.86, 619.69, 
285.31, 574.25, 245.74, 650.58, 385.62, 331.16, 400.58, 461.22, 
461.97, 671.1, 449.54, 619.95, 437.13, 616.17, 259.86, 541.23, 
453.86, 346.81, 430.38, 438.44, 499.28, 345.85, 472.62, 364.67, 
382.33, 493.49, 268.54, 296.95, 454.39, 257.02, 525.51, 475.41, 
627.4, 258.02, 571.2, 423.1, 555.2, 524.05, 276.78, 363.61, 572.8, 
191.96, 532.24, 398.12, 704.61, 328.82, 383.13, 481.45, 329.92, 
452.8, 304.26, 511.83, 390.43, 235.05, 600.2, 343.64, 397.48, 
533.77, 192.98, 356.55, 263.49, 451.3, 240.6, 348.74, 356.1, 
247.56, 474.72, 338.93, 581.55, 538.93, 366.54, 457.74, 358.39, 
492.69, 243.22, 431.73, 275.08, 328.09, 485.6, 436.48, 534.77, 
545.05, 330.25, 505.22, 199.86, 264.07, 384.71, 432.51, 309.01, 
225.48, 25.86, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("month", 
"year", "case.dump.rvu"), row.names = c(NA, -108L), class = "data.frame")

#I then reorderd the factor levels of month so they are in the right order for plotting

aggdata$month<-factor(aggdata$month, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))

#Then plot

ggplot(aggdata, aes(x = year, y = case.dump.rvu, fill = month))+geom_bar(stat="identity", position="dodge")

#GRAPH GET!


IsthisIt.png

Dennis Murphy

unread,
Apr 11, 2012, 3:36:47 PM4/11/12
to Charles Snyder, ggp...@googlegroups.com
Hi:

Thanks for providing a reproducible example.

I have to admit I'm not a big fan of using bar charts to represent
time series data, especially when you want to dodge year within month,
since it destroys the flow of the series. More generally, there is
nothing sacrosanct about using zero as an origin when plotting time
series data, but you're forced into it when using bar charts. Another
criticism is that the bars dominate the display and distract one away
from the real purpose, which is to observe how the series evolves in
time and to elicit useful patterns, such as trend or periodicity.

Here are several alternatives...the first one is obvious, but since
you seem to want to compare months as the series evolves, there exist
other ways to visualize it without the distraction of dodged bars.

library('ggplot2')
library('plyr')
library('scales')

aggdata <- read.csv('aggdata.csv', header = TRUE, stringsAsFactors = FALSE)

# Convert month to ordered factor, date to Date format, and define mon
# as a numeric count of month from an origin of Jan. 2004 (to be used
# in the spiral plot below). Uses the mutate() function in plyr.
aggdata <- mutate(aggdata, month = factor(month, levels = month.abb),
date = as.Date(date, format = '%Y-%b-%d'),
mon = X - 1)

# Time series plot:
ggplot(aggdata, aes(x = date, y = case.dump.rvu)) + geom_line() +
scale_x_date(breaks = date_breaks('6 months'),
labels = date_format('%b\n%Y'))

# The time series plot indicates that the series has a mean level that
# gradually declines over time. You can see this with the following,
# but the drop at the end is, in part, an artifact of including the
# incomplete April 2012 data.

last_plot() + geom_smooth()

# An STL plot might be useful (seasonal, trend, residual) as well;
# see ?stl in the stats package.

# Another 'obvious' plot is a spaghetti plot of response by month
# with year as the 'subject':

ggplot(aggdata, aes(x = month, y = case.dump.rvu, group = year)) +
geom_line(aes(colour = factor(year))) + labs(colour = 'Year')


# If you must plot y vs. year by month, be explicit about it
# and facet by month - that way you can isolate the behavior of the
# series by year for a fixed month. I used a bar chart below, but
# geom_line() or geom_path() could also be used. Faceting breaks
# up the series and obviates that the continuity in time is lost.

# Annual plot by month:
ggplot(aggdata, aes(x = factor(year), y = case.dump.rvu)) +
geom_bar(stat = 'identity', fill = 'violet') +
facet_wrap(~ month, nrow = 3) +
opts(axis.text.x = theme_text(angle = 90))

# A better way to compare year/month combinations IMO is to produce a
# calendar heatmap, where color is mapped to the value of the response.
# The downside is that it rounds off the response, but it does provide a way
# to make gross comparisons across month-year combinations:

# Calendar heatmap:
ggplot(aggdata, aes(x = month, y = factor(year), fill = case.dump.rvu)) +
geom_tile() + scale_fill_gradient(low = 'navy', high = 'yellow') +
guides(fill = 'colorbar')

# A variant on the above idea is to convert the above plot to polar
# coordinates, in which case it is called a spiral plot. There is no
# geom in ggplot2 which will do this directly, but the function below
# was submitted to the list a couple of months ago. It is not yet
# ready for general use, but it works well in this specific case.

# Spiral plot (Jean-Olivier Irisson, ggplot2 list, Feb. 2012)

ggspiral <- function(x, y, p) {

# remove period from time coordinate
xx <- x %% p
dt <- x[2] - x[1]
xx2 <- xx + dt

# compute number of periods elapsed
yy <- x / p
yy2 <- yy + 1

# prepare data
d <- data.frame(xx, xx2, yy, yy2, y)
yyMax <- max(yy2)

require("ggplot2")
ggplot(d) +
# plot tiles of the appropriate colour
geom_rect(aes(xmin=xx, xmax=xx2, ymin=yy, ymax=yy2, fill=y)) +
# switch to polar coord, starting from -pi/2, going anticlockwise
coord_polar(start=-pi/2, direction=-1) +
# add extra blank space in the center of the spiral
scale_y_continuous(expand=c(0,0), breaks=NULL,
limits=c(-yyMax/5, yyMax)) +
# force the x coordinate (otherwise, NA appear for some reason)
scale_x_continuous(expand=c(0,0), limits=c(0,p))

}

with(aggdata, ggspiral(mon, case.dump.rvu, 12))
last_plot() + scale_fill_gradient(low = 'navy', high = 'yellow') +
guides(fill = 'colorbar')

The spiral plot winds counterclockwise from the inside out. January is
the segment above the 0/12 mark and December is the one below it.

I would suggest the spiral plot or the calendar heatmap to look for
monthly periodicity or other patterns of response in the series, as
they are better organized and less cluttered than bar charts.

HTH,
Dennis

charles snyder

unread,
Apr 11, 2012, 6:20:42 PM4/11/12
to Dennis Murphy, ggp...@googlegroups.com
Thanks very much to all !!!
--
Charles L. Snyder, MD
Professor of Surgery
Children's Mercy Hospital
Kansas City, MO
www.clsnyder.com
Reply all
Reply to author
Forward
0 new messages