Getting ggplot_build data subsets from a plot layer with aes(color=some_discrete_value)

326 views
Skip to first unread message

tylerecouture

unread,
Aug 29, 2015, 12:05:04 PM8/29/15
to ggplot2
I can create a plot with a trend and get the trend data:

p <- ggplot() + geom_point(data=diamonds, aes(x=carat, y=price))
p <- p + stat_smooth(data=diamonds, aes(x=carat, y=price)) 

#get trend data: p_data
<- ggplot_build(p)$data[[2]]

How can I do the same thing, but get data subsets when the plot is divided by some discrete value like this:

p <- ggplot() + geom_point(data=diamonds, aes(x=carat, y=price, color=clarity)) p + stat_smooth(data=diamonds, aes(x=carat, y=price, color=clarity))

How can I get something like this (apologies for my poor R syntax):

p_data[0] = (stat_smooth data from only color=I1)
p_data[1] =
(stat_smooth data from only color=SI1)
p_data[2] = (stat_smooth data from only color=SI2)
etc.

Thanks!

Dennis Murphy

unread,
Aug 30, 2015, 4:07:11 AM8/30/15
to tylerecouture, ggplot2
Hi:

Running str() on the p_data object from your initial plot:

> str(p_data)
'data.frame': 80 obs. of 7 variables:
$ x : num 0.2 0.261 0.322 0.383 0.444 ...
$ y : num 273 497 725 961 1209 ...
$ ymin : num 218 461 703 941 1185 ...
$ ymax : num 328 534 748 981 1233 ...
$ se : num 28 18.5 11.5 10.1 12.4 ...
$ PANEL: int 1 1 1 1 1 1 1 1 1 1 ...
$ group: int 1 1 1 1 1 1 1 1 1 1 ...
> table(p_data$group)

1
80


Do the same thing you did with the first plot:

p <- ggplot(data=diamonds, aes(x=carat, y=price, color=clarity)) +
geom_point() + geom_smooth()

p_data <- ggplot_build(p)$data[[2]]

# Now,
> table(p_data$group)

1 2 3 4 5 6 7 8
80 80 80 80 80 80 80 80
> str(p_data)
'data.frame': 640 obs. of 8 variables:

You can use subset() to extract rows by level of clarity, or if you're
using dplyr, filter(); e.g.,

subset(p_data, clarity == "I1")

library(dplyr)
p_data %>% filter(clarity == "I1")

The reason each subset has 80 observations is because stat_smooth()
selects 80 equally spaced x-values over the domain of the explanatory
variable (in this case, carat).

Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Hadley Wickham

unread,
Aug 30, 2015, 2:37:10 PM8/30/15
to tylerecouture, ggplot2
I'd highly recommend that you just do this by hand, instead of relying
on ggplot2 for the computation.

Hadley
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/
Reply all
Reply to author
Forward
0 new messages