--
--
--
--
By defining a set of standards for creating graphics in Julia, we can rely on the particular backend to generate output that is optimal for the platform you target. If I want to do web graphics I can use a d3-based backend, but if I want to do 3d graphics I could swap out the backend for something that handles 3d graphics.Of course, you could argue that 3d graphics and web graphics require different grammars altogether, and it doesn't make sense to try and use one grammar that we force to apply to all. That's why I like the idea of just having one (or maybe a few) plot functions that take the data you want to plot, and then a dictionary of configuration options. It allows us to define a standard interface, but allows each backend to require a unique set of options.That said, Grammar of Graphics is an excellent (albeit voluminous) work, and I like a lot of the principles it outlines. To your point, Harlan and John, we could implement ggplot2 in Julia and call it a day, and it would be instantly familiar to most R users. However, it may not cover all the use cases we want to support.
--
one "plea" I would like to make is to also keep in mind the separation between "interactive research" graphics and "presentation" graphics".I think plot() has to be kept fast and efficient and "smart" for interactively plotting "large" (~1e6 ?) datasets, with quick zooming and resizing and reasonable axis with reasonable tick mark divisions, and if I save it as a PDF it looks "ok", not publication quality but conference talk quality...in matlab, I find plot() and subplot() indispensable -- I work in digital signal processing and I can throw up a subplot(4,4) with each plot() having ~1e6 data points and it stays snappy and I can zoom in and resize quickly to another monitor. same with imagesc() for matrix data.I understand the desire for layers and abstraction and being able to plug in new code bases for new displaysand data visualization research and the importance of fine-grained parameters for publication graphics as well.What I fear most is something like the Java Eclipse IDE in the early days -- just launching it could bring my state-of-the-art workstation to its knees just displaying some code.... The early matlab Java IDE was pretty painful as well, I still run matlab with no IDE out of habit (even though last I checked there is no java dependency anymore and there is a even a debugger (gasp)).My only hope is that since Julia has a FAST state of the art JIT REPL layer that the graphics can "keep up" and not bog down.I don't know how to do this, but if there were some way to add interactive "unit" tests to the graphics layer with multiple ~1e6 datasets and to test for "speed to display" but more importantly interactive zooming and resizing is smooth and usable.perrin
--
--
--
--
--
Word? How about putting this in a google doc for people to comment in-line?
--
--
So, if you're wanting to redesign graphs (or graphics) in Julia, the right way
to do it is start with:
2. Design the graph data structure
because this is how you bind together the different efforts to a common purpose.
--
Hi everyone,I just signed up to julia-dev so I could chime in on this thread. I'm a long-time Python dev working at Continuum Analytics. Previously at Enthought, I wrote Chaco. At Continuum, we are working on a new graphing system for Python (called Bokeh) that tries to take the best ideas of Grammar of Graphics, Protovis/d3, Matplotlib/MATLAB, and Chaco, and build it into a single stack that natively targets web-based display.
As you may have seen on HackerNews, we recently received some DARPA funding to pursue work on Blaze, Numba, and Bokeh, as part of the XData "big data" project. (I also heard that some Julia devs are also participating in the XData work...?)
The reason I am piping up here is because I think there is a lot of good that could result from us joining forces on the *graphics* layer.
There have been many great points raised in the thread, but the principal ones I would like to second are:- Need to separate plot specification API from display interface ("backend")
- Different use cases will drive the API in different ways; Grammar of Graphics (and ggplot) is not the end-all and be-all of plot specification approaches, especially for users coming from a science and engineering plotting background. The learning curve, even for those with statistical graphics needs, are non-trivial. And interactivity, brushing, and the like are not really demonstrated with this model.
Some additional comments:
d3 (and protovis) are powerful approaches to specifying novel graphics, but I have not seen them scale to very large datasets. The learning curve is also non-trivial, and the fundamental goals of the project must be taken into account. d3 is very much designed for embedding in a programmable DOM. (Protovis was, in my opinion, a more infoviz-centric toolkit.)
For rich-client embedding or native apps, there are a huge number of potential backend options. Both Chaco and Matplotlib took the approach of abstracting a drawing layer over the combinatoric mess of GUI toolkits and 2D canvas libraries: {Qt, Wx, Tk, Cocoa, Quartz} x {OpenGL, GTK, toolkit-dependent painter, Anti-Grain Geometry (AGG) software rasterizer, Quartz, PDF, SVG, Cairo, etc.} This is a non-trivial undertaking and is always a mess. It is important to figure out early on what is really needed, and what are the desired use cases. If you can cut down on the number of platforms or types of rendering you want to do, it will save a lot of time that would otherwise be spent on creating yet another abstraction layer.
I think that having an "abstract representation of a graph" that the front-end can generate and that various backends can render is a nice idea, but perhaps very difficult to do in a way that does not either lock in specific classes of visualizations expressible by the front-end, or force the rendering backend to be too slow for interactive and large data work. I'm not saying it's impossible; I just think it's hard to create this intermediate representation of a visual without baking in assumptions about what the backend rendering system can do in an optimal way.I don't know how much bandwidth and resources you guys are devoting to tackling this "new plotting system" project, but I would love to be able to work together.
Most of the challenges in implementing a backend are not language specific.
The big question is one of scope & use cases: if it turns out that Bokeh is tackling a much different use case than Julia, then it's possible that we may not even be able to find a useful way to collaborate on a backend.
A front-end API would be interesting to talk about as well, because I do think that there is plenty of space for innovation in this regard, and I think that Python and Julia can speak similar vocabulary and grammar for the narrower purpose of defining visualizations over a dataset.
--
--
Do you, or anyone else, happen to know what fraction of the GUI toolkits (Qt,
Wx, Tk, Cocoa, ... or at least the subset we end up deciding to support) allow
one to specify layout purely in terms of callbacks?
If so, then we might be
able to write a "cross-platform" Julia layout manager. If not, then it might
be a hard problem "simply" to specify layout in a way that can work with many
different underlying toolkit layout managers. If we can't even do that, then
settling on a few choices seems like the best approach.
--Tim
--
There have been many great points raised in the thread, but the principal ones I would like to second are:- Need to separate plot specification API from display interface ("backend")Agreed. The hard part here is getting the abstraction layer right. Sounds like your background could be quite valuable here.
{Qt, Wx, Tk, Cocoa, Quartz} x {OpenGL, GTK, toolkit-dependent painter, Anti-Grain Geometry (AGG) software rasterizer, Quartz, PDF, SVG, Cairo, etc.}
--
Something like https://code.google.com/p/chromiumembedded/?
(Note that I would hate to be stuck installing Chrom(e/ium) because Julia graphics end up with a NaCl/Pepper dependency; I'm quite happy with Firefox, and it would be great if we kept interactive graphics cross-browser if we go that way. Now, LLVM IR can be converted to JavaScript with Emscripten...)
Something like https://code.google.com/p/chromiumembedded/?
Yes! Exactly like that. Very promising looking.
(Note that I would hate to be stuck installing Chrom(e/ium) because Julia graphics end up with a NaCl/Pepper dependency; I'm quite happy with Firefox, and it would be great if we kept interactive graphics cross-browser if we go that way. Now, LLVM IR can be converted to JavaScript with Emscripten...)I agree. I'm very anti NaCl/Pepper. For me the embedded Chrome idea is about being able to use the same JavaScript graphics in an otherwise native app, rather than about being Chrome-specific. It would allow the best of both worlds, hopefully, the benefits of a native app but sharing much code and logic with a web version. Just an idea.
--
--
--
--
--
--
--
However, acknowledging that the graph will only get 5in of space at 300dpi, there are really only 1500 distinct x values shown. Break your 10^8 data points up into 1500 time bins, compute the min/max for each one of these bins, and plot it as a fill between the min-line and max-line. Presto, you have something that displays quickly, generates nice compact figure files, and looks the same as the full line plot.
... if we use a
browser/webkit, how much of the plotting can be written in Julia, vs. how much
needs to be written in JavaScript?
Actually, now I'm wondering whether this is something that needs to be built
in at the lowest level, or whether it's more of a top-layer thing. Example
(one I was indeed already thinking about adding to Winston): I need to plot
timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
balk at accepting such figure files.
However, acknowledging that the graph will only get 5in of space at 300dpi,
there are really only 1500 distinct x values shown. Break your 10^8 data
points up into 1500 time bins, compute the min/max for each one of these bins,
and plot it as a fill between the min-line and max-line. Presto, you have
something that displays quickly, generates nice compact figure files, and looks
the same as the full line plot.
however, the 300dpi must come from the actual device that is being plotted to. otherwise i worry 300dpi will get "hardcoded" as a magic number in the "fast" plotting code as a heuristic and 10 years from now when my display is 20000dpi then I miss things.
--
I haven't played with Chaco; it was mentioned many times at SciPy 2012. In your view, what are the strengths and weaknesses? I presume it's got some weaknesses or we probably wouldn't be having this conversation :-)
Why is interactivity so hard in an extension of ggplot? I can easily imagine things like:ggplot(data, aes(x = Year, y = RepublicanVote)) + geom_line() + slider_bar(control = Region)Which produces an NYT style line graph that controls the region of the USA being shown via a slider bar.
In general, I think the GGplot API could be simpler, but extensions for interactivity and animation seem very easy.Am I missing something?
Do you, or anyone else, happen to know what fraction of the GUI toolkits (Qt,
Wx, Tk, Cocoa, ... or at least the subset we end up deciding to support) allow
one to specify layout purely in terms of callbacks? If so, then we might be
able to write a "cross-platform" Julia layout manager. If not, then it might
be a hard problem "simply" to specify layout in a way that can work with many
different underlying toolkit layout managers. If we can't even do that, then
settling on a few choices seems like the best approach.
On 12/19/2012 12:17 PM, Tim Holy wrote:
Actually, now I'm wondering whether this is something that needs to be built
in at the lowest level, or whether it's more of a top-layer thing. Example
(one I was indeed already thinking about adding to Winston): I need to plot
timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
balk at accepting such figure files.
...
Yes, this is exactly what I'm talking about, and "I know it's a research field in itself" though I can't point you specific papers or software. I think Peter Wang is the one who would know most, as his company (http://continuum.io) specifically targets Big Data.
Yes, I exactly meant that the characteristic of "last-year" type plotting architectures is that you don't have this feedback all the way back to the data source from the zoom&resize.This could be triggered simply by a plot function that does method dispatch on
a Range type for the first input (because that guarantees that the x-coordinate
is on an evenly-spaced grid). The only extra step is that when you zoom in on
such a plot, you need to recompute the bins for whatever range of x-axis is
included within your zoom region. So such objects would need zoom & resize
callbacks.