Question on ggplot2: how to create a geom to mimic what stripChart in the R package EnvStats does

173 views
Skip to first unread message

Steve Millard

unread,
Oct 10, 2016, 8:42:23 PM10/10/16
to ggplot2

Hello, 

I am new to this list, so please forgive me if a similar question has already been asked.  I am the creator of the R package EnvStats (https://cran.r-project.org/web/packages/EnvStats/EnvStats.pdf).  There is a function I use quite often called stripChart.  I am just starting to learn ggplot2, and have spent the past several days poring over Hadley's book, Winston’s book, StackOverflow, and other resources in an attempt to create a geom that approximates what stripChart does.  I am unable to figure out how to put summary statistics below the x-axis tick marks and also at the top of the plot (outside the plotting region).  Here is a simple example using the built-in dataset mtcars:

 

library(EnvStats)

stripChart(mpg ~ cyl, data = mtcars, col = 1:3,

     xlab = "Number of Cylinders", ylab = "Miles per Gallon", p.value = TRUE)

 

Here is an early draft of a geom to try to reproduce most of the functionality of stripChart:

 

geom_stripchart <-

function(..., x.nudge = 0.3,

     jitter.params = list(width = 0.3, height = 0),

     mean.params = list(size = 2, position = position_nudge(x = x.nudge)),

     errorbar.params = list(size = 1, width = 0.1, position = position_nudge(x = x.nudge)),

     n.text = TRUE, mean.sd.text = TRUE, p.value = FALSE) {

     params <- list(...)

     jitter.params   <- modifyList(params, jitter.params)

     mean.params     <- modifyList(params, mean.params)

     errorbar.params <- modifyList(params, errorbar.params)

 

 

     jitter <- do.call("geom_jitter", jitter.params)

     mean   <- do.call("stat_summary", modifyList(

          list(fun.y = "mean", geom = "point"),

          mean.params)

     )

     errorbar <- do.call("stat_summary", modifyList(

          list(fun.data = "mean_cl_normal", geom = "errorbar"),

          errorbar.params)

     )

 

     stripchart.list <- list(

          jitter,

          theme(legend.position = "none"),

          mean,

          errorbar

     )

 

     if(n.text || mean.sd.text) {

# Compute summary statistics (sample size, mean, SD) here?

          if(n.text) {

# Add information to stripchart.list to

# compute sample size per group and add text below x-axis

          }

          if(mean.sd.text) {

# Add information to stripchart.list to

# compute mean and SD and add text above top of plotting region

          }

     }

     if(p.value) {

# Add information to stripchart.list to

# compute p-value (and 95% CI for difference if only 2 groups)

# and add text above top of plotting region

     }

     stripchart.list

}

 

 

library(ggplot2)

dev.new()

p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl)))

p + geom_stripchart() +

     xlab("Number of Cylinders") +

     ylab("Miles per Gallon")

 

 

You can see that the plots are pretty much the same.  The problem I’m having is figuring out how to add the sample size below each group, and to add the means and standard deviations at the top, along with the result of the ANOVA test (ignoring the issue of unequal variances at this point).  I know it is straightforward to compute summary statistics and then plot them as points or text *within* the plotting area, but I don’t want to do that.

 

Would appreciate any help or direction anyone can give me.  Thanks!


Steve Millard

unread,
Oct 11, 2016, 3:03:28 PM10/11/16
to ggplot2
I should add the following:

I had already found examples showing how to place text outside the plot (e.g., using annotation_custom():  

http://stackoverflow.com/questions/31079210/how-can-i-add-annotations-below-the-x-axis-in-ggplot2).  The problem is that the examples show how to do this where the user has pre-defined what the annotation is.  My problem is that within geom_stripchart, I have to compute summary statistics and test results based on the data that was defined in the call to ggplot(), and then pass those results to annotation_custom().  I don’t know how to get at the x and y variables that are defined in the call to ggplot().


--Steve Millard

Joyce Robbins

unread,
Oct 11, 2016, 5:26:27 PM10/11/16
to Steve Millard, ggplot2
A thought: use facets and put the info in the strip labels. By default they contain the variable levels, but they can be changed:


--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+unsubscribe@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

FF

unread,
Oct 11, 2016, 6:43:22 PM10/11/16
to ggplot2

Hi Steve,

Did you try combining ggplot() with what it is built on: grid graphics?

You can place graphical elements wherever you want using viewports. Make the main ggplot into a grob: p <- ggplotGrob(ggplot(....
and place it in the main viewport:  Place everything else in viewports outside.

See https://www.stat.auckland.ac.nz/~paul/grid/grid.html for documentation on how to use grid. A short study might give you all you need.

Steve Millard

unread,
Oct 21, 2016, 10:17:23 PM10/21/16
to ggplot2
Joyce and FF,
Thanks so much for your great suggestions!!  Regarding using faceting, there are two problems:

1.      I want the user to be able to use facet_wrap() or facet_grid() in addition to the geom_stripchart() function.  For example:

p <- ggplot(mtcars, aes(x = factor(cyl), y = mpg, color = factor(cyl)))

p + geom_stripchart() + facet_wrap(~am) +


     xlab("Number of Cylinders") +
     ylab("Miles per Gallon")


so I don’t want to mix text that has to do with the mean and SD in with values of the factor that is being used for faceting because I don’t want the reader to get confused.  But maybe if I separate the mean and SD enough from the level of the faceting variable that would work (not sure, however, that I could call a labeller function within the geom_stripchart() function and then still have an additional call to facet_wrap() work outside of geom_stripchart() ).

2.      Even if I could figure out a way to use facets and labeller functions to put the information on top of the plot, I still have the problem of computing the summary statistics in the first place since I don’t know how to get at what has been defined as the x and y variables in the call to aes in the call to ggplot.


Regarding using grid graphics, I am aware of examples that do this, but the ones I have seen are all calls from a top level where the user knows what the data are, rather than from inside a function.  I created two different posts to StackOverflow:  the first one is essentially my post here on the ggplot2 mailing list:


and the second one is a simplified version:


As you can see, I answered both of these posts myself.  I was able to contact Hadley Wickham, and in his words, "Unfortunately ggplot2 just isn't built for this sort of task."  Oh well, I still very much appreciate the fantastic ggplot2 package!  


On Monday, October 10, 2016 at 5:42:23 PM UTC-7, Steve Millard wrote:
Reply all
Reply to author
Forward
0 new messages