Aligning time series plots

532 views
Skip to first unread message

Brian Diggs

unread,
Jul 8, 2010, 3:15:35 PM7/8/10
to ggp...@googlegroups.com
I have a problem for which I imagine the answer is "you can't do that," but I'm hoping that I am wrong or that, at the least, this may provide a use case for some possible future feature.

I have data collected at regular intervals over time. However, the data are of different measures: two are continuous values (temperatures) (numeric), one is a count (number of sources) (numeric), and one is a categorical variable (status) (factor). An example data.frame with this data is:

ex <- structure(list(Time = structure(c(1278428400, 1278429300,
1278430200, 1278431100, 1278432000, 1278432900, 1278433800, 1278434700,
1278435600, 1278436500, 1278437400, 1278438300, 1278439200, 1278440100,
1278441000, 1278441900, 1278442800, 1278443700, 1278444600, 1278445500,
1278446400, 1278447300, 1278448200, 1278449100, 1278450000, 1278450900,
1278451800, 1278452700, 1278453600, 1278454500, 1278455400, 1278456300,
1278457200), class = c("POSIXt", "POSIXct")), `Temperature 1` = c(23.4994760481,
23.5691608609, 23.4065467209, 23.3366466476, 23.7551289027, 23.8713964903,
23.8017531186, 23.8713964903, 23.8017531186, 23.7319104094, 23.7086908787,
23.7086908787, 23.6390259428, 23.6390259428, 23.7319104094, 23.7086908787,
23.7086908787, 23.7783463702, 23.8713964903, 23.9874496028, 24.0572606946,
24.2428788024, 24.3126625639, 24.4750221758, 24.4054436873, 24.4518300234,
24.4286371978, 24.6375394346, 24.7768442676, 24.6375394346, 24.6839132299,
24.4982136668, 24.8695770905), `Temperature 2` = c(26.1917071192,
26.4004768163, 26.7251744961, 26.6092703221, 26.5627215105, 27.0035834406,
26.7251744961, 26.5627215105, 26.7019928016, 26.6324503199, 26.8412800193,
26.7251744961, 26.8876497084, 26.8180959444, 26.7715392541, 26.7715392541,
26.6788115483, 26.8180959444, 26.8644646035, 26.9803955621, 27.0965310426,
27.1197219834, 27.0501510521, 27.0267719079, 26.7947223403, 26.8644646035,
26.9572082609, 27.2124923147, 27.560865829, 27.514456368, 27.5376606738,
27.6536951709, 27.4446582586), sources = c(1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0), status = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("closed",
"open"), class = "factor")), .Names = c("Time", "Temperature 1",
"Temperature 2", "sources", "status"), row.names = c(NA, -33L
), class = "data.frame")


What I would like to do is make plots of this data with time on the horizontal axis. There would be multiple plots stacked one above the other, the top one showing the temperatures, and ones below that showing various influencing factors. The hard part is getting these with their time scales all aligned. Examples of things I've tried that don't work or work only to some degree:

ggplot(ex, aes(x=Time)) +
geom_point(aes(y=`Temperature 1`)) +
geom_point(aes(y=`Temperature 2`)) +
geom_line(aes(y=sources)) +
geom_line(aes(y=status))

The most straightforward approach, just graphing each part, doesn't work because status is a factor and can not be put on the same scale as the continuous variables. [Error: Non-continuous variable supplied to scale_y_continuous.] I understand why this doesn't work.

ggplot(melt(ex, id.vars="Time"),
aes(x=Time, y=value)) +
geom_point() +
facet_grid(variable~.,
scales="free_y",
space="free")

A typical approach to problems similar to this is to melt the data and facet on the different variables. I get a plot out of this, but it is not right; melt promotes the data to the type that can hold all the values, which combining numeric and factor, is character. The factor and the counts don't suffer much from this, but the temperature becomes a categorical variable and so is not plotted as a continuous variable. Again, I understand why this does not work.

A third approach is to make each part of the graph I want separately and then try to stitch them together, following the example in 8.4.2 of the ggplot2 book.

temp.gg <-
ggplot(melt(ex,
id.vars="Time",
measure.vars=c("Temperature 1","Temperature 2")),
aes(x=Time, y=value)) +
geom_point(aes(colour=variable))

source.gg <-
ggplot(ex, aes(x=Time, y=sources)) + geom_step()

status.gg <-
ggplot(ex, aes(x=Time, y=status)) + geom_point()


l <- grid.layout(nrow=3, ncol=1,
heights = unit(c(4,1,1), "in"),
widths = unit(8, "in"))

grid.newpage()
pushViewport(viewport(layout=l))
print(temp.gg,
vp = viewport(layout.pos.row=1, layout.pos.col=1))
print(source.gg,
vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(status.gg,
vp = viewport(layout.pos.row = 3, layout.pos.col = 1))

The problem with this is that the time axes are not lined up with each other. And the problem is not just the legend in the topmost graph. If that were it, I'd follow an approach similar to the example in gridExtra::arrange to pull out the legend(s) and arrange them manually. But the size of the labels on the vertical scale impacts how much space the horizontal scale takes up. This approach would have promise if there were a way to extract out a measure of the actual size used by the horizontal axis, and then the various viewports could be arranged so that each axis took up the same physical space on the composite plot.

A fourth and final approach, which I have had the most luck with, but which still has drawbacks, is to encode the categorical data in some other aesthetic and fake a continuous y variable. This makes all the y axes continuous and thus faceting can be used.

ggplot() +
geom_point(data=transform(melt(ex,
id.vars="Time",
measure.vars=c("Temperature 1","Temperature 2")),
panel="Temperature"),
aes(x=Time, y=value, colour=variable)) +
geom_step(data=transform(ex, panel="sources"),
aes(x=Time, y=sources)) +
geom_rect(data=transform(ex,
endTime=c(Time[-1],Time[length(Time)]),
panel="status"),
aes(xmin=Time, xmax=endTime, ymin=0, ymax=0.5, fill=status)) +
facet_grid(panel~., scales="free_y", space="free")

ggplot() +
geom_point(data=transform(melt(ex,
id.vars="Time",
measure.vars=c("Temperature 1","Temperature 2")),
panel="Temperature"),
aes(x=Time, y=value, colour=variable)) +
geom_step(data=transform(ex, panel="sources"),
aes(x=Time, y=sources)) +
geom_point(data=transform(ex,
panel="status"),
aes(x=Time, y=1, shape=status)) +
facet_grid(panel~., scales="free_y", space="free")

In the first version, the status in encoded in the fill colors of rectangles, in the second, in the shape of points. The drawbacks here are extensibility and scaling. In my actual problem, I have at least two categorical variables; while I could encode one with fill color and one with shape, if I end up with a third, I wouldn't know how to represent that (I could switch temperature to line type and then use color, but as a general solution, there are only so many aesthetics); also, this means that different categorical indicators are being represented in completely different ways. The other problem is scaling. I used scales="free_y" and space="free" in the faceting, but really the measure in temperature and sources are unrelated and so there is no reason one unit on the temperature graph should be the same size as one unit on the sources graph.

Ultimately, I would want something that combines some of the features of faceting (aligning and unifying scales in one direction) without some of the restrictions (same scale in other direction, single mapping for other aesthetics). Maybe thought of another way, a way to arrange multiple plots which share one coordinate aesthetic (and it need only be one) that behaves like facet (aligning and unifying). The attached PDF gives a rough idea of what I am thinking (I created it from a version of the third attempt, modifying the resulting PDF in OpenOffice.org Draw, and re-exporting as a PDF).

I would appreciate any suggestions as to another approach that will do what I want, or general discussion about this layout idea.

--
Brian Diggs, Ph.D.
Senior Research Associate, Department of Surgery, Oregon Health & Science University


example.manual.pdf

Hadley Wickham

unread,
Jul 16, 2010, 6:36:23 AM7/16/10
to Brian Diggs, ggp...@googlegroups.com
Hi Brian,

In the next month or so, I hope to make it possible to fix the size of
any plot element to make this type of alignment possible when you are
using grid to combine multiple plots.

To solve this problem "properly", I think R needs a mixed
numeric-factor variable type, but I'm not sure how to implement it.

Hadley

> --
> You received this message because you are subscribed to the ggplot2 mailing list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Jonathan Christensen

unread,
Jul 16, 2010, 11:05:54 AM7/16/10
to Brian Diggs, ggp...@googlegroups.com
Brian,

I've done something like this in the past. My method (which is a bit of a hack, but works) is to  convert the factor into a numeric variable and set the breaks and labels so that it "looks" like a factor variable, using faceting. For this to work the numeric values of the converted factor must be outside of the range of all other variables. The fact that you wanted some variable as lines and some as points gave a little bit of an additional complication, but I handled that by creating a second "value" variable in the melted data.frame--one holds the values of the variables you want as lines, the other holds the values of the variables you want as points. All other rows are filled with NAs:

ex$status <- 100 + (ex$status=="open")
melted <- melt(ex,id.vars="Time")
melted$shortvar <- substring(melted$variable,1,11) # This is to put both temperature readings on the same facet.
melted$value2 <- ifelse(melted$shortvar=="Temperature",melted$value,NA)
melted$value <- ifelse(is.na(melted$value2),melted$value,NA)

ggplot(melted, aes(x=Time)) +
 geom_point(aes(y=value2)) +
 geom_line(aes(y=value,group=variable)) +
 facet_grid(shortvar ~ ., scales="free", space="free") +
 scale_y_continuous(expand=c(.1,0), breaks=seq(0,101,1), minor_breaks=seq(0,100,.5),
 labels=c(as.character(seq(0,99)),"closed","open"))


This should give you at least a starting point.

Jonathan


Brian Diggs

unread,
Jul 20, 2010, 3:05:05 PM7/20/10
to ggp...@googlegroups.com, Harish
Thank all of you for your replies and suggestions. Jonathan’s suggestion showed me I hadn’t quite gone as far as I could with mapping the factors to various parts of the numeric axis, and then using breaks and labels to display the labels appropriately. Also, Harish replied to me off list (reproduced below with permission) and pointed out ggExtra::align.plots which does most of what I wanted, as well as giving a version of align.plots which can arrange in both rows and columns. What ggExtra::align.plots does not do is allow for custom specification of the layout, which would allow different graphs to take up different amounts of space. The following patch adds a gl argument which is the result of a call to grid.layout which is used. If not specified, the previous behavior is used.

Index: align.r
===================================================================
--- align.r (revision 76)
+++ align.r (working copy)
@@ -2,14 +2,15 @@
##' @param vertical
##' @return ...
##' @examples
-##' p1 <- qplot(1,1000, colour="") + ylab("double\nline")
+##' p1 <- qplot(1,1000, colour="c") + ylab("double\nline")
##' p2 <- qplot(1,1, colour="this is a legend")
##' p3 <- qplot(1,100, colour="legend")
##'
##' align.plots(p1, p2, p3)
+##' align.plots(p1, p2, p3, gl=grid.layout(nrow=3, height=c(3,2,1)))


-align.plots <- function(..., vertical=TRUE){
+align.plots <- function(..., vertical=TRUE, gl=NULL){

dots <- list(...)
dots <- lapply(dots, ggplotGrob)
@@ -18,7 +19,9 @@
legends <- lapply(dots, function(.g) if(!is.null(.g$children$legends))
editGrob(.g$children$legends, vp=NULL) else ggplot2:::.zeroGrob)

- gl <- grid.layout(nrow=length(dots))
+ if(is.null(gl) || !inherits(gl, "layout")) {
+ gl <- grid.layout(nrow=length(dots))
+ }
vp <- viewport(layout=gl)
pushViewport(vp)
widths.left <- mapply(`+`, e1=lapply(ytitles, grobWidth),

Hi,

I saw your post titled "Aligning time series plots" in the ggplot2 group and I think I have a solution that will work for you. I am responding directly to you because I am not in a position to sign up to the group to respond to everyone. Please feel free to forward it if you want.

I have modified a function in the ggExtra package called align.plots() to be more "powerful". The original function (found at http://ggextra.googlecode.com/svn/trunk/R/align.r) allows you to create one column of charts with the charts aligned perfectly. (This function by itself might solve your problem.) I modified it to be able to create any layout of charts and align them horizontally and vertically -- and even allows you to have some charts spanning multiple rows and columns.

Consider my function work in progress. The interface isn't the greatest but it works. Caution: it does not handle horizontal alignment whenever one of the charts has a main title.

The code below shows you how to create complex layouts of charts.

If you find any bugs in it or find ways to improve it, kindly let me know.

Regards,
Harish

==========================

library( ggplot2 )

ex <- structure(list(Time = structure(c(1278428400, 1278429300, 1278430200, 1278431100, 1278432000, 1278432900, 1278433800, 1278434700, 1278435600, 1278436500, 1278437400, 1278438300, 1278439200, 1278440100, 1278441000, 1278441900, 1278442800, 1278443700, 1278444600, 1278445500, 1278446400, 1278447300, 1278448200, 1278449100, 1278450000, 1278450900, 1278451800, 1278452700, 1278453600, 1278454500, 1278455400, 1278456300, 1278457200), class = c("POSIXt", "POSIXct")), `Temperature 1` = c(23.4994760481, 23.5691608609, 23.4065467209, 23.3366466476, 23.7551289027, 23.8713964903, 23.8017531186, 23.8713964903, 23.8017531186, 23.7319104094, 23.7086908787, 23.7086908787, 23.6390259428, 23.6390259428, 23.7319104094, 23.7086908787, 23.7086908787, 23.7783463702, 23.8713964903, 23.9874496028, 24.0572606946, 24.2428788024, 24.3126625639, 24.4750221758, 24.4054436873, 24.4518300234, 24.4286371978, 24.6375394346, 24.7768442676, 24.6375394346, 24.6839132299, 24.4982136668, 24.8695770905), `Temperature 2` = c(26.1917071192, 26.4004768163, 26.7251744961, 26.6092703221, 26.5627215105, 27.0035834406, 26.7251744961, 26.5627215105, 26.7019928016, 26.6324503199, 26.8412800193, 26.7251744961, 26.8876497084, 26.8180959444, 26.7715392541, 26.7715392541, 26.6788115483, 26.8180959444, 26.8644646035, 26.9803955621, 27.0965310426, 27.1197219834, 27.0501510521, 27.0267719079, 26.7947223403, 26.8644646035, 26.9572082609, 27.2124923147, 27.560865829, 27.514456368, 27.5376606738, 27.6536951709, 27.4446582586), sources = c(1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0), status = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("closed", "open"), class = "factor")), .Names = c("Time", "Temperature 1", "Temperature 2", "sources", "status"), row.names = c(NA, -33L ), class = "data.frame")


# Example:
# grid_layout <- grid.layout( nrow=2, ncol=1, widths=c(1,1), heights=c(.8, .2) )
# grid.newpage()
# pushViewport( layout=grid_layout )
# align.plots( grid_layout, list( chrt1, row1, col1 ), list( chrt2, row2, col2 ) )
align.plots <- function(gl, ...){

# Adopted from http://ggextra.googlecode.com/svn/trunk/R/align.r

# BUGBUG: Does not align horizontally when one has a title.
# There seems to be a spacer used when a title is present. Include the
# size of the spacer. Not sure how to do this yet.

stats.row <- vector( "list", gl$nrow )
stats.col <- vector( "list", gl$ncol )

lstAll <- list(...)

dots <- lapply(lstAll, function(.g) ggplotGrob(.g[[1]]))
ytitles <- lapply(dots, function(.g) editGrob(getGrob(.g,"axis.title.y.text",grep=TRUE), vp=NULL))
ylabels <- lapply(dots, function(.g) editGrob(getGrob(.g,"axis.text.y.text",grep=TRUE), vp=NULL))
xtitles <- lapply(dots, function(.g) editGrob(getGrob(.g,"axis.title.x.text",grep=TRUE), vp=NULL))
xlabels <- lapply(dots, function(.g) editGrob(getGrob(.g,"axis.text.x.text",grep=TRUE), vp=NULL))
plottitles <- lapply(dots, function(.g) editGrob(getGrob(.g,"plot.title.text",grep=TRUE), vp=NULL))

legends <- lapply(dots, function(.g) if(!is.null(.g$children$legends))
editGrob(.g$children$legends, vp=NULL) else ggplot2:::.zeroGrob)

widths.left <- mapply(`+`, e1=lapply(ytitles, grobWidth),
e2= lapply(ylabels, grobWidth), SIMPLIFY=FALSE)
widths.right <- lapply(legends, grobWidth)
# heights.top <- lapply(plottitles, grobHeight)
heights.top <- lapply( plottitles, function(x) unit(0,"cm") )
heights.bottom <- mapply(`+`, e1=lapply(xtitles, grobHeight),
e2= lapply(xlabels, grobHeight), SIMPLIFY=FALSE)

for ( i in seq_along( lstAll ) ) {
lstCur <- lstAll[[i]]

# Left
valNew <- widths.left[[ i ]]
valOld <- stats.col[[ min(lstCur[[3]]) ]]$widths.left.max
if ( is.null( valOld ) ) valOld <- unit( 0, "cm" )
stats.col[[ min(lstCur[[3]]) ]]$widths.left.max <- max( do.call( unit.c, list(valOld, valNew) ) )

# Right
valNew <- widths.right[[ i ]]
valOld <- stats.col[[ max(lstCur[[3]]) ]]$widths.right.max
if ( is.null( valOld ) ) valOld <- unit( 0, "cm" )
stats.col[[ max(lstCur[[3]]) ]]$widths.right.max <- max( do.call( unit.c, list(valOld, valNew) ) )

# Top
valNew <- heights.top[[ i ]]
valOld <- stats.row[[ min(lstCur[[2]]) ]]$heights.top.max
if ( is.null( valOld ) ) valOld <- unit( 0, "cm" )
stats.row[[ min(lstCur[[2]]) ]]$heights.top.max <- max( do.call( unit.c, list(valOld, valNew) ) )

# Bottom
valNew <- heights.bottom[[ i ]]
valOld <- stats.row[[ max(lstCur[[2]]) ]]$heights.bottom.max
if ( is.null( valOld ) ) valOld <- unit( 0, "cm" )
stats.row[[ max(lstCur[[2]]) ]]$heights.bottom.max <- max( do.call( unit.c, list(valOld, valNew) ) )
}

for(i in seq_along(dots)){
lstCur <- lstAll[[i]]
nWidthLeftMax <- stats.col[[ min( lstCur[[ 3 ]] ) ]]$widths.left.max
nWidthRightMax <- stats.col[[ max( lstCur[[ 3 ]] ) ]]$widths.right.max
nHeightTopMax <- stats.row[[ min( lstCur[[ 2 ]] ) ]]$heights.top.max
nHeightBottomMax <- stats.row[[ max( lstCur[[ 2 ]] ) ]]$heights.bottom.max
pushViewport( viewport( layout.pos.row=lstCur[[2]],
layout.pos.col=lstCur[[3]], just=c("left","top") ) )
pushViewport(viewport(
x=unit(0, "npc") + nWidthLeftMax - widths.left[[i]],
y=unit(0, "npc") + nHeightBottomMax - heights.bottom[[i]],
width=unit(1, "npc") - nWidthLeftMax + widths.left[[i]] -
nWidthRightMax + widths.right[[i]],
height=unit(1, "npc") - nHeightBottomMax + heights.bottom[[i]] -
nHeightTopMax + heights.top[[i]],
just=c("left","bottom")))
grid.draw(dots[[i]])
upViewport(2)
}
}

temp.gg <-
ggplot(melt(ex,
id.vars="Time",
measure.vars=c("Temperature 1","Temperature 2")),
aes(x=Time, y=value)) +
geom_point(aes(colour=variable))

source.gg <-
ggplot(ex, aes(x=Time, y=sources)) + geom_step()

status.gg <-
ggplot(ex, aes(x=Time, y=status)) + geom_point()


grid_layout <- grid.layout( nrow=3, ncol=2, widths=c(5,8), heights=c(8, 5, 3) )
grid.newpage()
pushViewport( viewport( layout=grid_layout ) ) align.plots( grid_layout,
list( temp.gg, 1, 1:2 ),
list( source.gg, 2, 1 ),
list( status.gg, 3, 1 ),
list( source.gg, 2:3, 2 ) )

Reply all
Reply to author
Forward
0 new messages