[ANN] gg4clj 0.1.0 - ggplot2 in Clojure and Gorilla REPL

327 views
Skip to first unread message

Jony Hudson

unread,
Dec 26, 2014, 10:36:42 AM12/26/14
to clo...@googlegroups.com
Hi all,

 from the README:

gg4clj is a lightweight wrapper to make it easy to use R's ggplot2 library from Clojure. It provides a straightforward way to express R code in Clojure, including easy mapping between Clojure data and R's data.frame, and some plumbing to send this code to R and recover the rendered graphics. It also provides a Gorilla REPL renderer plugin to allow rendered plots to be displayed inline in Gorilla worksheets. It is not a Clojure rewrite of ggplot2 - it calls R, which must be installed on your system (see below), to render the plots. You'll need to be familiar with R and ggplot2, or else the commands will seem fairly cryptic.


Works better than I thought it would!


Jony

adriaan...@gmail.com

unread,
Dec 26, 2014, 2:43:59 PM12/26/14
to clo...@googlegroups.com
Looks beautifull :) Good work

I don't know if you're also aware of ggvis. The ggplot2 reincarnation from the same developer. It has some extra niceties like interactivity. It also renders it output in vega. So it should ouput render also nicely in gorrila (I guess)
http://ggvis.rstudio.com/

Greetz

Op vrijdag 26 december 2014 16:36:42 UTC+1 schreef Jony Hudson:

Jony Hudson

unread,
Dec 26, 2014, 4:28:30 PM12/26/14
to clo...@googlegroups.com
Thanks :-)

And thanks for the pointer to ggvis. I've been shying away from interactive plots in Gorilla, since I haven't really seen or thought of a way to do it that seems satisfactory - and I'm not sure ggvis is there yet. But definitely will keep an eye on it though ...


Jony

Daniel Slutsky

unread,
Dec 27, 2014, 3:53:22 AM12/27/14
to clo...@googlegroups.com
Wonderful, gg4clj is really nice!


Regarding ggvis, it might be worth knowing that it can generate not only interactive htmls, but also a static JSONs in Vega format (which is of course fun to edit from Clojure).

For example:
capture.output(data.frame(x=c(1,2)) %>% ggvis(x=~x) %>% show_spec);

Therefore, from Gorilla point of view, ggvis may be considered a powerful DSL for generating Vega plots (which can then be edited for Gorilla needs).

Christopher Small

unread,
Dec 27, 2014, 8:46:28 PM12/27/14
to clo...@googlegroups.com
Hahaha; Well, you beat me to it... But awesome!

I'd still love to work on a native clojure implementation, but also acknowledge that it might be a while before I'm able to given a shift in focus of late. In the mean time, this will be super useful when base gorilla-repl plotting functionality isn't enough. 

I haven't used ggvis, but I've heard good things about it from others. Would certainly be cool to see something in that direction.

Cheers!

Chris

Mikera

unread,
Dec 27, 2014, 10:01:41 PM12/27/14
to clo...@googlegroups.com
Very cool!

On the data representation front, would you be open to making it support the core.matrix Dataset protocols as well as regular Clojure maps? That would make it much easier to integrate with Incanter 2.0 etc., and potentially avoid some copying overhead.

It should be a simple change to the "data-frame" function, happy to send you a PR if that is a direction you want to go.

Jony Hudson

unread,
Dec 28, 2014, 10:42:18 AM12/28/14
to clo...@googlegroups.com
@Chris Thanks, hope it's useful for you. I might have a play with ggvis and see how it works out.

@Mike Yeah, it would definitely be good to support core.matrix datasets. One thing that would be nice would be to avoid the overhead of loading all of core.matrix for those that don't use it. Do you think it would work to just have gg4clj depend on the 'protocols' ns in core.matrix? Would be very happy to take a PR if you've got time to look at it :-)


Jony

Mikera

unread,
Dec 28, 2014, 10:02:47 PM12/28/14
to clo...@googlegroups.com
core.matrix isn't that big of a dependency itself - it only gets expensive in/when you load the implementations (NDArray, vectorz-clj, Clatrix etc.). Which should be a choice of the ultimate user.

It is possible to just depend on the protocols, but I think that risks breakage since protocols are really just an implementation detail. Best to depend on the API in clojure.core.matrix (which are mostly just simple functions that delegate to the right protocols)

Jony Hudson

unread,
Dec 29, 2014, 6:27:09 AM12/29/14
to clo...@googlegroups.com
Hi Mike,

 some numbers on my 2012 MacBook Air (i7):

Making a new namespace that requires gg4clj, in a newly started Gorilla REPL session (i.e. a newly started JVM):

(time (ns test
  (:require [gg4clj.core :as gg4clj])))

takes ~80ms.

If I add [clojure.core.matrix :as matrix] to the :require vector of the gg4clj.core namespace, then the form above takes ~8200ms to evaluate.

Interestingly, if I only require [clojure.core.matrix.protocols :as cmp] then I find it takes ~3500ms, which seems like a very long time given the minimal amount of code in, and referred to, by that namespace.

Is this what you'd expect, or am I doing something dopey?

Reason I'm fussing about the load time is that already the time it takes from wanting to make a plot to getting Gorilla running is uncomfortably long. And I've been thinking recently about how it might be made quicker. So, I'm not too keen on anything that makes it longer! (This would add about 50% to the Gorilla start-up-to-plot time as it stands, which on my machine is about 14s currently).

So, assuming I'm not doing something silly, could we maybe think of any easy ways to reduce the load time in a case like this, where the functions might never be used? That might be a useful thing to do anyway if the c.m dataset API is becoming a standard. [Probably not the right answer here, but in Gorilla REPL, the rendering protocol lives in its own project gorilla-renderable, which has all of about 5 lines of code. This gives a way for other code to add Gorilla renderers without having to depend on Gorilla REPL itself (which has many dependencies).]


Jony

Jony Hudson

unread,
Dec 29, 2014, 4:13:24 PM12/29/14
to clo...@googlegroups.com
@Mike Thinking out loud here ... one option would be to put the core.matrix dependent stuff in gg4clj in a separate ns, like gg4clj.datasets or similar. This would then avoid the loading time for users just wanting to use gg4clj.core. I'm not sure I think this as good a solution, ultimately, as trying to make core.matrix load in a more incremental fashion -  but I do appreciate that it's not so easy to change core.matrix, and that there are good reasons not to. What do you think - does this sound like a reasonable way forward to you? 


Jony

Mikera

unread,
Dec 30, 2014, 2:22:02 AM12/30/14
to clo...@googlegroups.com
I'm trying to figure out how to get core.matrix to load much faster - I think it's actually some kind of Clojure issue with protocols but I'm not *exactly* sure what is causing

Jony Hudson

unread,
Dec 30, 2014, 6:01:13 AM12/30/14
to clo...@googlegroups.com
That would be great, if possible! I did try looking yesterday with visualvm to see what was going on, but some 50,000 findClass calls in, visualvm ran out of memory and crashed. And then I got distracted ...


Jony
Reply all
Reply to author
Forward
0 new messages