It takes a bit of code rewriting, but you can get your data into the
appropriate format for plyr to split and then call a function that
does your plotting for each split.
My example was temperatures of components within a group of identical
machines where the x axis was the machine name and the y axis of the
box plot the temperatures.
However, I wound up moving away from that strategy since I couldn't
figure out how to number the plots sequentially while also providing
each with a descriptive file name.
If anyone else has thoughts on that issue I'd love to hear them too!
Also I'm curious why d_ply doesn't take the .parallel argument since
it seems taylor made for this application.
Hope that helps...
Justin
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
1) (and this is by far what I do most often, which is not that much
because my data tend to be small, but anyway): split the data, make
separate saved but unrendered plots, then use grid + viewports to lay
them all out in one big plot.
2) Preprocess the data so the manipulations are done manually outside
of ggplot2, then just pass in data ready to be rendered (e.g., for
boxplots, precalculate the min, lower/upper hinge, median, and max).
I hate this because it is utterly unflexible aside from aesthetics.
Those are the best I've come up with so far though, hopefully there is
a nicer way.
Cheers,
Josh
On Wed, Nov 2, 2011 at 4:51 AM, Brandon Hurr <brando...@gmail.com> wrote:
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example: http://gist.github.com/270442
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, ATS Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/
library(doMC)
registerDoMC()
library(ggplot2)
dat<-data.frame(site=letters[1:4],t1=rnorm(20),t2=rnorm(20),t3=rnorm(20))
dat.melt<-melt(dat,id.vars='site')
dat.melt$variable<-levels(dat.melt$variable)[dat.melt$variable]
my.func<-function(df){
print(ggplot(df,aes(x=site,y=value))+
geom_boxplot())
}
png('/tmp/plots%d.png')
ddply(dat.melt,.(variable),my.func,.parallel=T)
or you can modify the my.func a bit to give better file names...
however I still can't figure out how to do both.
also its worth noting that with small data, there is no increase in
speed but once you're close to 1e6 rows it becomes a bit more
advantageous.
> dat<-data.frame(site=letters[1:4],t1=rnorm(1000),t2=rnorm(1000),t3=rnorm(1000))
> dat.melt<-melt(dat,id.vars='site')
>
> dat.melt$variable<-levels(dat.melt$variable)[dat.melt$variable]
>
> my.func<-function(df){
+ print(ggplot(df,aes(x=site,y=value))+
+ geom_boxplot())
+ }
>
> png('/tmp/plots%d.png')
>
> system.time(ddply(dat.melt,.(variable),my.func,.parallel=T))
user system elapsed
3.788 0.344 1.609
> system.time(ddply(dat.melt,.(variable),my.func,.parallel=F))
user system elapsed
2.792 0.020 2.816
> dat<-data.frame(site=letters[1:4],t1=rnorm(1000000),t2=rnorm(1000000),t3=rnorm(1000000))
> dat.melt<-melt(dat,id.vars='site')
>
> dat.melt$variable<-levels(dat.melt$variable)[dat.melt$variable]
>
> my.func<-function(df){
+ print(ggplot(df,aes(x=site,y=value))+
+ geom_boxplot())
+ }
>
> png('/tmp/plots%d.png')
>
> system.time(ddply(dat.melt,.(variable),my.func,.parallel=T))
user system elapsed
50.275 1.964 28.228
> system.time(ddply(dat.melt,.(variable),my.func,.parallel=F))
user system elapsed
46.947 0.248 47.305
also its worth noting that with small data, there is no increase in
speed but once you're close to 1e6 rows it becomes a bit more
advantageous.
Hi Brandon,
I’d like to add myself to the list of people interested in seeing how ggplot2 was used with multicore functions. Related to this issue, since you are saving the plots as images on your HD, and since access to the device is serial, would that not be the main performance bottleneck? I’d also be interested to hear how to quickly render ggplot2 graphs on web browsers.
Leo
--
I wrote a script and benchmarked regular versus byte compiled ggplot2
(+dependencies) and single versus triple core use of ggplot2. I drew
from Brandon's example, so it is basically just the same plot on nine
different columns of data so the speedup from parallelizing is large
and easy to implement. If you are interested, I just uploaded a page
with all the scripts and the log files as well as the final timing
results:
https://joshuawiley.com/R/ggplot2_benchmark.aspx
For those who are just interested in how to easily parallelize looping
through data/columns making plots, here is just the code for I used
for that:
###########
## define a function that makes the plots I want
## renders as PDFs, and returns the grob (so I get a list of grobs at the end)
## (I could theoretically combine into one with grid viewports or the like)
myPlot <- function(ycol, dat) {
p <- ggplot(data = dat, aes_string(x = "x", y = ycol, colour = "g3")) +
geom_point() +
stat_smooth(size = 2) +
facet_grid(g1 ~ g2) +
opts(title = paste("Plot of '", ycol, "' on 'x'", sep = ''))
ggsave(paste("Benchmark_of_", ycol, "_", format(Sys.time(),
"%H-%M-%S"), ".pdf", sep = ""),
plot = p, width=10, height=10)
return(p)
}
## initiate local cluster and push relevant packages and objects
cl <- makeCluster(getOption("cl.cores", 3))
clusterEvalQ(cl, {
library(ggplot2)
})
clusterExport(cl, varlist = list("mydf", "myPlot"))
## actually do it
results <- parLapply(cl, X = colnames(mydf)[2:10], fun = myPlot, dat = mydf)
## shut the cluster down
stopCluster(cl)
Cheers,
Josh
--