Referring to plot data in arguments to stat_function

1,284 views
Skip to first unread message

MaxyB

unread,
Aug 7, 2010, 5:36:28 PM8/7/10
to ggplot2
I'm wondering if it's possible for stat_function to pass arguments to
its function that come from the plot data, rather than from the global
namespace.

Suppose I have a dataframe like so:

response <- c(rnorm(100), rnorm(100, mean=2, sd=2))
mygroup <- factor(c(rep('group a', 100), rep('group b', 100)))
d <- data.frame(response, mygroup)
rm(response, mygroup)

Thus d$response is mixture of two gaussians, with group a having
mean=0, sd=1 and group b having mean=sd=2. I can easily plot a density/
rug-plot of the mixture overlaid with the density of a single maximum-
likelihood gaussian:

ggplot(d, aes(x=response)) + geom_density() + geom_rug() +
stat_function(fun=dnorm, args=list(mean=mean(d$response), sd=sd(d
$response)), alpha=0.25)

To plot the maximum-likelihood gaussian I'm using stat_function with
fun=dnorm and with mean and standard deviation derived from the sample
estimates of those quantities in d$response.

Now what I would like to do is break the density plot up into two
facets, one for each group. So the left panel should show an empirical
density plot of subset(d, group=="group a")$response, together with an
overlaid maximum-likelihood gaussian *for that group*, and the right
panel should similarly have an empirical density and max-likelihood
density *for group b*. Basically, I want to be able to do this:

ggplot(d, aes(x=response)) + geom_density() + geom_rug() +
facet_wrap(~mygroup) + stat_function(fun=dnorm,
args=list(mean=mean(response), sd=sd(response)), alpha=0.25)

where the meaning of "response" in stat_function() depends on which
facet (data subset) is being plotted.

Unfortunately this doesn't seem to be possible, as stat_function
complains that object "response" does not exist. So I was wondering if
anybody knows a simple way to do this.

Thanks
--
Max

Dennis Murphy

unread,
Aug 7, 2010, 7:30:31 PM8/7/10
to ggplot2
Hi:

I can see what Max is after and it seems one would need something analogous to a panel function in lattice. Using the following as the base ggplot object  [I'm using grp as the group name rather than mygroup, BTW] :

g <- ggplot(d, aes(x=response)) + geom_density() + geom_rug()   ,

I started by writing a simple little function:

estd <- function(x) {
    m <- mean(x)
    s <- sd(x)
    dnorm(x, m, s)
   }

g + stat_function(fun = estd) + facet_wrap(~ grp)

provides the same (overall) N(xbar, s) density superimposed on the kernel density in each group . After some fooling around, I tried

g + stat_function(fun = dnorm,
    args = list(m = mean(d$response), s = sd(d$response)), group = d$grp) +
   facet_wrap( ~ grp)

which yields a somewhat rescaled version of the first try. stat_function() is not recognizing group differences as far as I can tell. Adding the grouping variable as a group aesthetic in g has no effect either. So I guess the question can be rephrased into:

(a) Is there a ggplot2 analogue to lattice's panel function, particularly the analogy to  .., subscripts = , ...   ?
or
(b) How would one write a function to pass to stat_function() that can be executed groupwise?

I didn't see anything in the examples of stat_function() to indicate how one would deal with grouping. Does the input function need to be able to pass the grouping variable, or ???  Checking the list archives didn't scare up anything I could use.

It seems to be that if stat_function() inherited the group aesthetic, then my little function should have been run separately for each group, but it appears from the result that the function is using the entire data set. I'm clearly missing something, perhaps something obvious, but I'm not seeing it at the moment.

TIA,
Dennis


--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442

To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Hadley Wickham

unread,
Aug 7, 2010, 10:59:37 PM8/7/10
to Dennis Murphy, ggplot2

The function already _is_ executed group wise - but your function
returns the same values for every group ;) The problem is that the
parameters are executed when the plot is created, not during the
evaluation of the statistical transformation.

In this case, as in most cases when you start to do something more
complicated, you're best off doing the transformation yourself outside
of ggplot2.

Hadley


--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Dennis Murphy

unread,
Aug 7, 2010, 11:35:15 PM8/7/10
to Hadley Wickham, ggplot2
Hi Hadley:


I get it now. Thanks for the clarification.

In this case, as in most cases when you start to do something more
complicated, you're best off doing the transformation yourself outside
of ggplot2.

So in that spirit, let's try again...

 > head(d, 3)
    response     grp
1 -0.4140121 group a
2 -0.8506043 group a
3 -1.6704603 group a
> estd  
 # function to compute the N(xbar, s) density

function(x) {
   m <- mean(x)
   s <- sd(x)
   dnorm(x, m, s)
  }
dd <- ddply(d, .(grp), transform, ndens = estd(response))
ggplot(dd, aes(x = response)) + geom_density() + geom_rug() +
   geom_line(aes(y = ndens), colour = 'red') + facet_wrap( ~ grp)

It's not perfect, but it's in the neighborhood. Perhaps a different data set with a finer grid of x-values would improve the rendering of the normal density plots, especially in the tails.

Thanks for the insight, Hadley!

Dennis
Reply all
Reply to author
Forward
0 new messages