--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: http://gist.github.com/270442
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2
Here's the chance: I've been writing with Hadley last week about a joint google
summer of code project.
I'd like to provide ggplot2 functionality for my hyperSpec package (that's about
dealing with spectra), and I think of this of an example "how to provide ggplot2
for non-data.frame-objects".
The straightforward way of making a data.frame would just eat up all memory and
calculate for ever for even medium objects. Hadley already had in mind some more
applications that have similar inefficiency (e.g. maps).
Not knowing anything about the internals of ggplot2, my guess is that things
could be sped up quite a bit for my data, if I could teach ggplot2 how to accept
matrices instead of data.frames. And this may help lots of other people, too.
SO: if you happen to be (or have or know) a student, and are willing and able to
do something about this, this is your chance. Just contact me (or Hadley) and
let's get thinking.
Claudia
PS: In order not to drink too much coffee, I transform my data first (while it's
still in matrices or arrays) so that ggplot doesn't need to do much
transformation ;-)
--
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Universit� degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste
phone: +39 0 40 5 58-37 68
email: cbel...@units.it
If the plots are independent, you should be able to distribute them
across multiple cores fairly readily using plyr, foreach and
multicore.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/
The other alternative for speed would be to rewrite ggplot in a low-level language. I'm a huge fan of ggplot and would love to see its facilities even outside of R where it could be called programmatically (without R evaluation).
Could only wish to roll back history and build the language on top of one of the successful JIT VMs. R's lazy expression resolution and evaluation would make this quite challenging, but doable. Probably the most challenging scripting language to compile.
I believe Hadley has talked before on this list about the possibility
of refactoring/rewriting appropriate bits of ggplot to make it faster; I
am sure it can be done (although only by someone who knows the code, or
is willing to invest a large amount of time in understanding the code,
and who is very good at R) without re-writing in lower-level languages.
(Obviously it would be great if you could just drop it into a JIT
compiler and make it faster, and obviously if you wrote it in a
lower-level language it could be made even faster -- but it could
probably be rewritten "fast enough" in R)
Ben Bolker
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk1JdCoACgkQc5UpGjwzenPxNACfWr7r+5kJhlw7PjGoJfR2vkUm
Hi4AoIz1mtUDievg9AYgKbt98qGBdyMr
=/2gp
-----END PGP SIGNATURE-----
I honestly don't think that ggplot2 is slow because of R - it's slow
because at the time I wrote it, I didn't know how to write efficient
code in R. If I knew then what I know now, it would be much much
faster.
I am slowly working towards a faster version of ggplot2. However,
progress is slow, because it requires a lot of refactoring and
cleaning up of the basics - I can't make it fast until I can ensure it
continues to work, and works in a way that I actually understand. The
other problem is that this is the sort of work that I have to do in my
spare time - it's pure software development, not research.
I would love to have an assistant, but it really would require a
substantial time investment, and I don't currently have access to the
financial resources to fund someone long term. A summer is easy to
find money for - two years is not.
> Could only wish to roll back history and build the language on top of one of the successful JIT VMs. R's lazy expression resolution and evaluation would make this quite challenging, but doable. Probably the most challenging scripting language to compile.
There's a group in the CS department at Stanford working on an R
implementation built on top of Lua - the Lua JIT is extremely fast,
and has much better support for dynamic language features than the
JVM.
I have to underline what Hadley said. Porting loops from R code to some other language or hoping for a JIT doesn't give much speedup. In order to exploit the underlying structure of computations, they have to be expressed in higher level terms -- not lower-level languages. For example, when you replace loops over vectors by proper vector operations, the R system will perform them with nicely optimized, built-in machine code already.
Thus, most of the expensive operations could be done quickly if the built-in operations were used.
The research group I am working in, is currently looking into implementing distributed data frames in C++. The aim is to make programming parallel computations on data frames much easier (even on distributed memory systems). I personally think that it is possible to have something like google's MapReduce framework or rply based on such distributed data frames. In this regard I guess that no JIT will have the reasoning power to transform a bunch of loops into a high-level description exploitable by such a system.
> I am slowly working towards a faster version of ggplot2.
Would be interesting to see, which kind of expensive/complex operations are used typically.
> spare time - it's pure software development, not research.
It is the same here: making such frameworks usable inside R would most probably be pure marketing.
Randolf Rotta
--
PhD student / research assistant
Computer Science Department
Brandenburg University of Technology, Cottbus, Germay
>
> Am 02.02.2011 um 16:23 schrieb Hadley Wickham:
>>> The other alternative for speed would be to rewrite ggplot in a low-level language. I'm a huge fan of ggplot and would love to see its facilities even outside of R where it could be called programmatically (without R evaluation).
>>
>> I honestly don't think that ggplot2 is slow because of R - it's slow
>> because at the time I wrote it, I didn't know how to write efficient
>> code in R. If I knew then what I know now, it would be much much
>> faster.
>
> I have to underline what Hadley said. Porting loops from R code to some other language or hoping for a JIT doesn't give much speedup. In order to exploit the underlying structure of computations, they have to be expressed in higher level terms -- not lower-level languages. For example, when you replace loops over vectors by proper vector operations, the R system will perform them with nicely optimized, built-in machine code already.
> Thus, most of the expensive operations could be done quickly if the built-in operations were used.
>
(I'm not holding my breath on R JIT ;) I've looked at LuaJIT and am quite impressed with its performance. Hopefully the Stanford R -> luaVM work makes it out of research.
As for speeding up R code in general you can certainly go a long way with the use of vector primitives and avoiding an imperative style. However, I find that there are dramatic slowdowns as soon as your vector function, say, apply() needs to call a script based function. Evaluation of R code proper is much slower than any modern scripting language I've used.
For data-parallel problems one can make use of snow or other distribution approaches, however, in some cases (like mine), I have to do stateful computations, where the result of one computation requires that of the preceding. Hence R evaluation speed presents difficulties for me.
I use R as opposed to Clojure or any number of other functional languages out there just because it has such a large package base and core statistics functionality. Moving a small fraction of this to another language environment is a huge endeavor.
Jonathan Shore
--
Systematic Trading Group