Plot interface

Westley Hennigh

unread,

Nov 27, 2012, 2:07:11 PM11/27/12

to juli...@googlegroups.com

Winston seems to be coming along really well. I like the Plot.jl interface in particular because it's simple and flexible.

I've noticed it moving around a little though (first it spent some time as a stand alone package and now it seems to be wrapped into Winston), and thought it might be worth asking about / discussing.

We'd like to see a plotting interface independent of a backend graphics engine. Something simple like in Matlab, R, and the current Plot.jl (ie plot(data..., options)). This would allow people to use Winston, d3, or any other graphics package they want, provided there is a wrapper that makes the engine compatible with the plot interface.

I think this would look like a "Plot" package that things like Winston or something built on D3 (that might or might not be in development) would depend on. Users would write code against the Plot interface and backend providers would write code to map that interface to theirs. Sort of like an OpenGL for plotting.

Eh?

Adam Savitzky

unread,

Nov 27, 2012, 3:58:45 PM11/27/12

to juli...@googlegroups.com

I want to follow up on Wes's post.

When we considered the state of graphics in Julia at the moment, we realized the only option is Winston. However, in the future, other developers may want to implement their own graphics engines. That could lead to fragmented mess of packages, each with their own interface.

We're proposing creating a standard library method for plotting, similar to what Wes mentioned. That leaves the user free to specify which engine they want to use. As long as the engine supports the standards, then it can be swapped in easily.

Each engine could support a unique set of configuration options, which would allow you to leverage the strengths of whatever engine you choose.

Winston is a great start, but honestly it has a lot of idiosyncrasies that I'd rather not learn how to use, only to have another library come along and implement things differently. I'd much rather have the standard library abstract away all of that.

We've worked a fair amount with Highcharts, which is a javascript-based svg charting library. I'm not suggesting it's the optimal choice for a graphics engine, but I do like the way it gives you a kitchen-sink of options that you just pass into the chart constructor, allowing a high degree of flexibility. I could see us doing something like this with julia, e.g...

plot(T, X, { "title" => "X vs Y" })

John Myles White

unread,

Nov 27, 2012, 4:29:25 PM11/27/12

to juli...@googlegroups.com

I think we all like the idea of having a backend-agnostic language for describing graphics. I think we're just very far from having one.

But if we're going to try to standardize upon a language for describing graphics, I _very_ strongly think we should all standardize on Wilkinson's Grammar of Graphics, whose best incarnation is R's ggplot2 library.

Separating the language for describing visualizations from their algorithmic implementation is the real virtue of Wilkinson's work.

-- John

--

Harlan Harris

unread,

Nov 27, 2012, 4:41:27 PM11/27/12

to juli...@googlegroups.com

+1

And whatever is done for Julia should be done in coordination with similar discussions in Python, R, etc.

This said, it's quite difficult, or maybe impossible, to design a good general-purpose language for simultaneously describing graphics for print, for desktop, for web, for 3-D, for mapping, and for interactive data exploration. At least, I've never seen a good general-purpose system that hits all of those.

--

Adam Savitzky

unread,

Nov 27, 2012, 5:48:03 PM11/27/12

to juli...@googlegroups.com

By defining a set of standards for creating graphics in Julia, we can rely on the particular backend to generate output that is optimal for the platform you target. If I want to do web graphics I can use a d3-based backend, but if I want to do 3d graphics I could swap out the backend for something that handles 3d graphics.

Of course, you could argue that 3d graphics and web graphics require different grammars altogether, and it doesn't make sense to try and use one grammar that we force to apply to all. That's why I like the idea of just having one (or maybe a few) plot functions that take the data you want to plot, and then a dictionary of configuration options. It allows us to define a standard interface, but allows each backend to require a unique set of options.

That said, Grammar of Graphics is an excellent (albeit voluminous) work, and I like a lot of the principles it outlines. To your point, Harlan and John, we could implement ggplot2 in Julia and call it a day, and it would be instantly familiar to most R users. However, it may not cover all the use cases we want to support.

Stefan Karpinski

unread,

Nov 27, 2012, 5:50:18 PM11/27/12

to Julia Dev

A grammar of graphics implementation would be a great thing to have (I have the book on my bedside table, to that end). But this will take a very long time to implement. In the meantime, it would be nice to have some very simple plots and basic options any plotting implementation ought to be able to implement. I think that can easily coexist with an eventual ggplot-like plotting API for much greater sophistication. Some basic plot types: scatter, line, bar, pie, histogram. I'm sure we can scare up the basic options these need.

Towards the grammar of graphics end, it would be great if there were some API layers between the full grammar of graphics and low-level plotting libraries so that the higher layers of the grammar can be implemented in terms of lower layers that are still abstract and each implementation can provide that lower level API instead of having to build all of the grammar itself.

--

John Myles White

unread,

Nov 27, 2012, 5:50:53 PM11/27/12

to juli...@googlegroups.com

I'm pretty sure Harlan and I would both be thrilled to see GGPlot2.jl

I agree that one might need another grammar for 3D graphics or interactive graphics, but trying to extend the grammar established by ggplot2 seems like a good start to me.

-- John

--

Douglas Bates

unread,

Nov 27, 2012, 5:52:51 PM11/27/12

to juli...@googlegroups.com

On Tuesday, November 27, 2012 4:48:03 PM UTC-6, Adam Savitzky wrote:

By defining a set of standards for creating graphics in Julia, we can rely on the particular backend to generate output that is optimal for the platform you target. If I want to do web graphics I can use a d3-based backend, but if I want to do 3d graphics I could swap out the backend for something that handles 3d graphics.

Of course, you could argue that 3d graphics and web graphics require different grammars altogether, and it doesn't make sense to try and use one grammar that we force to apply to all. That's why I like the idea of just having one (or maybe a few) plot functions that take the data you want to plot, and then a dictionary of configuration options. It allows us to define a standard interface, but allows each backend to require a unique set of options.

That said, Grammar of Graphics is an excellent (albeit voluminous) work, and I like a lot of the principles it outlines. To your point, Harlan and John, we could implement ggplot2 in Julia and call it a day, and it would be instantly familiar to most R users. However, it may not cover all the use cases we want to support.

Perhaps a better initial target would be grid graphics (as described in Paul Murrell's book "R Graphics"). Grid is probably only appropriate for static graphics but that covers a lot of ground. Both ggplot2 and lattice are built on top of grid.

Mike Nolta

unread,

Nov 27, 2012, 6:49:24 PM11/27/12

to juli...@googlegroups.com

On Tue, Nov 27, 2012 at 5:50 PM, Stefan Karpinski <ste...@karpinski.org> wrote:
> A grammar of graphics implementation would be a great thing to have (I have
> the book on my bedside table, to that end). But this will take a very long
> time to implement. In the meantime, it would be nice to have some very
> simple plots and basic options any plotting implementation ought to be able
> to implement. I think that can easily coexist with an eventual ggplot-like
> plotting API for much greater sophistication. Some basic plot types:
> scatter, line, bar, pie, histogram. I'm sure we can scare up the basic
> options these need.
>
> Towards the grammar of graphics end, it would be great if there were some
> API layers between the full grammar of graphics and low-level plotting
> libraries so that the higher layers of the grammar can be implemented in
> terms of lower layers that are still abstract and each implementation can
> provide that lower level API instead of having to build all of the grammar
> itself.
>

Winston is composed of 4 separate layers: hi-level object, lo-level
object, abstract imperative, and concrete imperative; and only the
last layer is tied to the backend. It made porting from plotuils to
cairo straightforward.

-Mike

> --
>
>
>

Jeff Bezanson

unread,

Nov 27, 2012, 10:28:29 PM11/27/12

to juli...@googlegroups.com

Yes, I feel Winston is quite close to embodying the needed layers. I think two should be formally specified: low-level drawing (based on cairo's API), and plot objects.

The first consists of things like moveto, lineto, fill, and stroke. This kind of "immediate mode" API is favored for the low-level thing, and is consistent with both the cairo and opengl general approach. An alternate backend might work by sending these commands to a browser.

The second says things like "this is a plot with these axes, containing these two curves". What would be really cool is extending this with information about where the data came from and how to update it in case anything changes.

It would be nice to think about collapsing some of the layers in Winston, particularly CairoContext and CairoRenderer. Ideally our CairoContext could be a bare cairo_context, plus whatever extra features we need, in such a way that they can be ignored when you just want plain cairo bindings. I am especially confused by the presence of move_to, _move_to, and move (a couple other functions are like this too). We should have a general notion of world vs. device coordinates, and add support for resizing.

One issue is needing to plot within some axes (which may be in a plot or a subplot), as opposed to drawing over a full device surface (e.g. window-backed cairo surface). We should have something like a SubSurface that provides the full drawing API within a part of another device. That way I can, for example, write code to draw a shape that lives in x:-1..1, y:-1..1 against an abstract graphics device, and later have it drawn in some other context. I know Winston does something like this already but I don't fully understand how it works.

--

Jeff Bezanson

unread,

Nov 28, 2012, 3:11:01 AM11/28/12

to juli...@googlegroups.com

I hear you. I agree there needs to be more emphasis on productivity while making the 1000s of exploratory plots rather than just the final one. I don't think a reasonable graphics architecture will get in the way of this. Cairo itself has a nicely layered API supporting multiple back-ends (x11, quartz, svg, png, etc.) and it is plenty fast.

On Wed, Nov 28, 2012 at 2:55 AM, perrinmeyer <perri...@gmail.com> wrote:

one "plea" I would like to make is to also keep in mind the separation between "interactive research" graphics and "presentation" graphics".

I think plot() has to be kept fast and efficient and "smart" for interactively plotting "large" (~1e6 ?) datasets, with quick zooming and resizing and reasonable axis with reasonable tick mark divisions, and if I save it as a PDF it looks "ok", not publication quality but conference talk quality...

in matlab, I find plot() and subplot() indispensable -- I work in digital signal processing and I can throw up a subplot(4,4) with each plot() having ~1e6 data points and it stays snappy and I can zoom in and resize quickly to another monitor. same with imagesc() for matrix data.

I understand the desire for layers and abstraction and being able to plug in new code bases for new displaysand data visualization research and the importance of fine-grained parameters for publication graphics as well.

What I fear most is something like the Java Eclipse IDE in the early days -- just launching it could bring my state-of-the-art workstation to its knees just displaying some code.... The early matlab Java IDE was pretty painful as well, I still run matlab with no IDE out of habit (even though last I checked there is no java dependency anymore and there is a even a debugger (gasp)).

My only hope is that since Julia has a FAST state of the art JIT REPL layer that the graphics can "keep up" and not bog down.

I don't know how to do this, but if there were some way to add interactive "unit" tests to the graphics layer with multiple ~1e6 datasets and to test for "speed to display" but more importantly interactive zooming and resizing is smooth and usable.

perrin

--

Harlan Harris

unread,

Nov 28, 2012, 7:51:44 AM11/28/12

to juli...@googlegroups.com

Excellent points, Perrin and Jeff. Another class of feature that's hard to do in a general-purpose graphics framework is interactive/exploratory techniques like brushing. Requires the front-end to be a fully-fledged event-based UI, though, and needs very fast incremental redraws.

--

Tim Holy

unread,

Nov 28, 2012, 8:53:53 AM11/28/12

to juli...@googlegroups.com

To put some numbers on the current situation: assuming you've "warmed up" by
creating one plot already,

julia> x = linspace(0.0, 10.0, 1_000_000); # plot with 1e6 points in it

julia> @time p = plot(x, sin(x));
elapsed time: 0.029558897018432617 seconds

julia> @time show(p)
<plot>elapsed time: 2.6982569694519043 seconds

The first step creates "abstract object" representing the scene, and as you can
see it's very fast. So there's no serious penalty for abstraction.

The second step does the actual rendering to a Tk window, and currently it's
pretty slow.

--Tim

Isaiah Norton

unread,

Nov 28, 2012, 9:27:10 AM11/28/12

to juli...@googlegroups.com

I mention this as a potential resource for design ideas: I have had a very good experience with the Chaco [1] plotting tool from Enthought, based on a backend-abstraction layer called Enable [2]. It is not well-known in python world because matplotlib fills the publication niche already. In comparison, Chaco is designed for high-performance interactive data exploration, while also maintaining publication-quality output options (ps, svg, pdf, etc.). Chaco implementation is excellent, and interaction is very fast (I've written several visualization tools in it for moderately-sized hyperspectral images in the mid 100s of MB).

Also of note for future consideration (!) Chaco has GUI-abstracted plot interaction tools [3] enabling very easy creation of interactive data visualizations (linked plots, region selections, coordinated zooming, etc.).

[1] https://github.com/enthought/chaco

[2] https://github.com/enthought/enable

[3] https://github.com/enthought/chaco/tree/master/chaco/tools

--

Adam Savitzky

unread,

Nov 28, 2012, 12:47:10 PM11/28/12

to juli...@googlegroups.com

I think Jeff is right about needing to formally specify these two layers. I'd like to take a crack at writing a spec for the "plot objects" layer, since that's the one I'm most familiar with. I can put something together and then share it with the larger group and we can revise it until we have something that we're mostly happy with.

There are a couple of approaches that I've heard so far:

Implement ggplot2 in Julia verbatim, or at least some implementation based on grammar of graphics
Emulate a Matlab style plot command, where you pass in data sets and then configuration options
Create some very basic standard plot types, e.g. line, scatter, histogram as a stopgap until we can implement something more complete
Emulate grid graphics
Break out parts of Winston and use those as the basis for a standard plotting library
Something based on Chaco

Am I missing anything? I'm probably in favor of options one or two, with a slight preference for two. I'd like to look into option 4 a little bit more, though, as I'm unfamiliar with grid graphics.

In addition, we need to make sure we're supporting the following needs:

Ability to switch out backends
Create publication-quality graphics
Create interactive research style graphics
Make it fast for large datasets
Coordinate with similar efforts in Python and R
Some type of subplot() interface would also be useful

On Tuesday, November 27, 2012 7:28:29 PM UTC-8, Jeff Bezanson wrote:

John Myles White

unread,

Nov 28, 2012, 9:42:54 PM11/28/12

to juli...@googlegroups.com

Ultimately I think the virtue of the grammar of graphics approach is compositionally, which I don't think the Matlab approach has. If we came up with a language like Matlab's that was composition, I'd be pretty stoked since the ggplot2 language is more than a little quirky.

-- John

--

Viral Shah

unread,

Dec 2, 2012, 9:26:04 PM12/2/12

to juli...@googlegroups.com

I think this does sum up all the approaches in this thread. If we do end up with a spec, which would be really nice to have, we can also ask authors of ggplot and authors to critique it.

-viral

Adam Savitzky

unread,

Dec 5, 2012, 4:56:57 PM12/5/12

to juli...@googlegroups.com

Still working on a spec document. Will post in a day or two.

Adam Savitzky

unread,

Dec 6, 2012, 3:32:22 PM12/6/12

to juli...@googlegroups.com

Here is our first crack at a spec for the plotting interface.

In summary, it's based very closely on ggplot, with a number of modifications to make it compatible with the Julia language. Also included are a number of archetypes for commonly used graph types, so you can just call a convenient "histogram" method instead of building one from its components.

Let me know what you think? I'd like to iterate on this fairly rapidly so we can start implementing it!!

JuliaPlottingAPISpec.docx

Harlan Harris

unread,

Dec 6, 2012, 3:55:48 PM12/6/12

to juli...@googlegroups.com

Word? How about putting this in a google doc for people to comment in-line?

In the meantime...

What are your thoughts about the relationship between plotting and dataframes? Julia has a (still quite rough) DataFrames.jl package. ggplot2, in R, depends entirely on data.frames, which are of course a core part of the package. Do you want the Julia plotting package to depend on the DataFrames package? Would optional support make more sense? I see that the proposal is to use Dicts.

I do like the direct matrix plotting method, and having high-level ("histogram") functions that wrap the lower-level grammar.

There is also some nascent support for factors in DataFrames.jl. I'm unclear whether you're proposing that, or instead some sort of wrapper that just tells the plotting system to treat the variable as categorical instead of continuous.

You haven't yet thought out how to deal with critical and complex aspects of the grammar, such as scales or coordinates or faceting or statistical transformations, I don't think. I'd urge you to use something other than add() to replace scale_x_foo and coord_blah and facet_wrap, as those adjustments aren't layers, but adjustments to the plot as a whole. See an old blog post of mine for why: http://www.harlan.harris.name/2010/03/ggplot-and-concepts-whats-right-and-whats-wrong/

There needs to be some architectural thought given to how this style of graphics engine can support facilities that ggplot2 doesn't, such as interaction.

A good start, though.

-Harlan

--

Patrick O'Leary

unread,

Dec 6, 2012, 4:00:58 PM12/6/12

to juli...@googlegroups.com

On Thursday, December 6, 2012 2:55:48 PM UTC-6, Harlan Harris wrote:

Word? How about putting this in a google doc for people to comment in-line?

Reminder: we have a wiki. https://github.com/JuliaLang/julia/wiki

Adam Savitzky

unread,

Dec 6, 2012, 4:38:20 PM12/6/12

to juli...@googlegroups.com

Great. Didn't know about the dataframes package. I'll check that out. My thoughts were that in the absence of data frames dicts would be a good replacement, but if data frames are really the "right" way to do it, it's probably worth doing it that way.

You're right that I haven't really thought through all the nitty gritty details. I wanted to make sure that I was on the right track before I ended up diving too deep. I'll take a look at the article you wrote, which might inspire some ideas for how to handle scales, coordinates, and facetting other than add() as well as provide some context for working out the more complex issues.

Here's the link to the goole doc:

https://docs.google.com/document/d/1RJ8-SVPmg6L5Xc5B0NvMQo_F00hmBn6GkQxW7xyGOvc/edit

Adam Savitzky

unread,

Dec 6, 2012, 8:19:14 PM12/6/12

to juli...@googlegroups.com

Now in github:

https://github.com/forio/ggplot.jl/blob/master/spec.md

Tom Short

unread,

Dec 6, 2012, 10:01:31 PM12/6/12

to juli...@googlegroups.com

Interesting blog post, Harlan. Good points about ggplot. I hadn't seen the jjplot package you mentioned. I do like that syntax better than ggplot's.

--

Harlan Harris

unread,

Dec 6, 2012, 10:12:49 PM12/6/12

to juli...@googlegroups.com

Thanks. Doesn't look like jjplot had any traction, though. Hadley and company have made a lot of improvements to ggplot2 in the last 2 years. Not in the core syntax, mostly, but in other aspects, especially maintainability.

--

Tim Holy

unread,

Dec 7, 2012, 6:56:46 AM12/7/12

to juli...@googlegroups.com

Thanks for tackling this, Adam. It's a very important topic for the future of
Julia.

However, I'm wondering if the plotting API is really the right level of
abstraction to start with. I don't know R, so I don't really grok most of the
thinking that went into your document. But perhaps that illustrates a problem:
people coming from different languages may be interested in having the API
"look like what they know," and it seems inevitable that someone will add a
"matplotlib" in addition to the existing choices. (I should say that I won't
do that: I'm happy to learn something new, and don't want to see fragmentation
in this space.) Then one is faced with a conundrum: either try to force
uniformity (ideally by making one thing more awesome than anything else), or
else plan for this from the get-go. My concern is that it will take long
enough to build one thing that's super-awesome, that appeals to people who
need 3d, who need GUIs, etc, that we might be better off starting in a state
where we can accommodate some diversity.

And here's the key point: if we're good, perhaps we can do it in a way that
encourages cooperation rather than fragmentation.

To me, the right place to start is the "scene graph." That is, perhaps we
should first focus on the abstract representation of graphical objects: e.g., a
window in Julia is represented in the following way:

type Window <: GraphicsContainer
renderer::Function
update::Bool
visible::Bool
position::ScreenPosition
children::Dict{Handle, GraphicsObject}
...
end

The notion is that the scene graph becomes the nexus: the graphics API's only
job is to build scene graphs, and then the REPL hands anything that needs
updating to the renderers. That way people who are interested in working on
OpenGL support don't have to care which plotting API is currently favored, and
likewise people who have artistic "design" talent don't have to worry (much)
about the details of rendering. We can safely accommodate different
"experiments" in API and/or rendering.

I know that this may seem more ambitious and/or abstract than your original
vision, and perhaps that's true. But it may get us to a "place of safety"
faster, where we can ensure cooperation among what might otherwise become
partially redundant efforts.

Since Mike and Jeff have already put a lot of excellent work into Winston, I'd
say we might be best off starting with a close look at how Winston represents
scenes. I've only just scratched the surface there myself; there's a lot of
good stuff in there, but it also clearly will need some generalization for
GUIs, 3d, etc. We should also look at Tk, Qt, and other toolkits, because in
the end we're going to want to be able to support much of their functionality.
We might as well build as much of this as seems reasonable into the scene
graph spec.

Best,
--Tim

> >>> 1. Implement ggplot2 in Julia verbatim, or at least some

> >>> implementation based on grammar of graphics

> >>> 2. Emulate a Matlab style plot command, where you pass in data sets
> >>> and then configuration options
> >>> 3. Create some very basic standard plot types, e.g. line, scatter,

> >>> histogram as a stopgap until we can implement something more complete

> >>> 4. Emulate grid graphics
> >>> 5. Break out parts of Winston and use those as the basis for a
> >>> standard plotting library
> >>> 6. Something based on Chaco

> >>>
> >>> Am I missing anything? I'm probably in favor of options one or two, with
> >>> a slight preference for two. I'd like to look into option 4 a little bit
> >>> more, though, as I'm unfamiliar with grid graphics.
> >>>
> >>> In addition, we need to make sure we're supporting the following needs:

> >>> 1. Ability to switch out backends
> >>> 2. Create publication-quality graphics
> >>> 3. Create interactive research style graphics
> >>> 4. Make it fast for large datasets
> >>> 5. Coordinate with similar efforts in Python and R
> >>> 6. Some type of subplot() interface would also be useful

John Myles White

unread,

Dec 7, 2012, 8:47:21 AM12/7/12

to juli...@googlegroups.com

I still need to chime in more fully on Adam's very nice start for a spec, but I wanted to reiterate a distinction between a Grammar of Graphics (which I now think is confusingly named) and a graphical system like OpenGL:

* From what I understand, Tim seems to be talking about how Julia should support graphics. Graphics involve windows, pixels, etc.

* Adam's ggplot2-like API is really about how Julia should support graphs. Graphs are built using graphics, but they are a very substantial abstraction away from them. ggplot2 is so nice to use because it often entirely separates content from form. For example, the content of my graph might be a barplot paired with a line plot , but all of the graphical decisions about sizes, colors, etc, are made for me by an expert system. At no point do I need to know about graphics nor should I interfere with them unless I am an expert. Hadley makes this point over and over again in talks: people who build graphs by working directly with graphics invariably make absurd mistakes like using color schemes that are meaningless to the colorblind.

* We can't possibly hope to build graphs without graphics first, so graphics are the real priority. But discussing an API for the types of graphs we're going to build may help to make sure that the graphics engine will support the types of graphs we eventually want. Right now Adam's spec seems to only real touch on graphs and not really tell us much about graphics.

-- John

> --
>
>
>

Tim Holy

unread,

Dec 7, 2012, 9:37:14 AM12/7/12

to juli...@googlegroups.com

On Friday, December 07, 2012 08:47:21 AM John Myles White wrote:
> * From what I understand, Tim seems to be talking about how Julia should
> support graphics. Graphics involve windows, pixels, etc.

That's not quite what I mean. Perhaps I chose a bad example.

What I'm saying, is "don't start with the user-facing API." I'm saying, "start
with the data representation: what is a graph?" Let's write a spec for that,
down to the level of types and field names.

Otherwise, here's what happens:
1. You design the user-facing API. You can't test it because you can't
actually render anything.
2. You design the graph data structure, but you rush through it because you're
eager to get started on #3 so you can see something.
3. You start working on the renderer.

Meanwhile, a second group, who thinks "that's dumb using Tk, really you should
go with d3" starts on 1, 2, and 3 independently. Because no one has anything
really working properly yet, nothing picks up any momentum. We're already at
risk for that, because currently Winston is coming along really nicely (and it
has two heavyweights behind it, Mike and Jeff), and suddenly there's interest
in starting something new from scratch.

So, if you're wanting to redesign graphs (or graphics) in Julia, the right way
to do it is start with:
2. Design the graph data structure
because this is how you bind together the different efforts to a common purpose.

"Folks interested in a particular graphing API" start working on #1. ggplot
adherents can jump right in here. "Folks interested in rendering" start
working on #3. But as long as they both agree on #2, they can work to common
purpose. As technology evolves, you can mix and match graphics APIs and
renderers. I can start building the next whiz-bang renderer independently,
without worrying that I'll trash everyone's ability to display plots while it
evolves. I'm not saying there wouldn't be limits to this mix-and-match, but
we'd be really stupid not to see how far along that direction we can get.

In other words: for all the respect that Hadley's ideas get, they're not
enough on their own. In an open source project, designing a good plotting
system begins with social engineering and with realizing that technology
changes over time. We need to start with the "glue layers" that will bind the
efforts of many people together. It's not like this is going to be finished off
in a week.

The right glue layer is the _representation_ of the graph, not the code that
produces it.

--Tim

Stefan Karpinski

unread,

Dec 7, 2012, 12:10:04 PM12/7/12

to Julia Dev

On Fri, Dec 7, 2012 at 9:37 AM, Tim Holy <tim....@gmail.com> wrote:

So, if you're wanting to redesign graphs (or graphics) in Julia, the right way
to do it is start with:
2. Design the graph data structure
because this is how you bind together the different efforts to a common purpose.

I think this is a crucial piece of wisdom. This is like the IP layer of networking systems, or the LLVM layer for compilers.

Viral Shah

unread,

Dec 7, 2012, 12:15:25 PM12/7/12

to juli...@googlegroups.com

I am amazed by the kind of stuff d3 can handle. I have not looked into the interface, but clearly we want to be able to do visualizations beyond the traditional plotting (line graphs, bar graphs, scatter plots, etc.), and Tim's description is excellent.

We also now have Peter Wang from Continuum on the list, and it would be good to combine energies in the open source community for visualization that multiple projects can leverage.

-viral

> --
>
>
>

Jeff Bezanson

unread,

Dec 14, 2012, 3:55:40 PM12/14/12

to juli...@googlegroups.com

I am working on the low-level API here:
https://github.com/JuliaLang/Cairo.jl/tree/jb/working

The biggest change is that there are two basic levels: GraphicsDevice
(concrete: CairoSurface), and GraphicsContext (concrete:
CairoContext). A GraphicsContext is a clipped subwindow of some
graphics device (e.g. a window or image), with its own coordinate
system and drawing settings. This makes it easy to separate actual
drawing from layout and compositing layers. Managing clipping and the
coordinate transform inside this abstraction makes a huge difference,
since you can think in terms of overlapping virtual windows instead of
affine transformations.

CairoContext(surface, x, y, w, h, l, t, r, b) makes one of these,
specifying the position and size within the parent device, and the
coordinates for the left, top, right, and bottom. The generic version
is creategc(device, args...), which is one of the very few functions a
GraphicsDevice needs.

Cairo happens to support this well since its clip and transform
functions compose cleanly with the current state. This could be added
manually for back-ends that work differently. As long as
clip/transform changes are surrounded by save(ctx)...restore(ctx) the
abstraction remains intact.

Other than that, it is just plain cairo bindings and can be used as
such if that's what you want.

I removed some functionality to get rid of the extra statefulness in
CairoRenderer. Next I will try to port Winston to work with my Cairo
branch (also on a branch of course). It would be nice if the low-level
transformation machinery can replace Projection and AffineTransform,
and possibly even BoundingBox. This will simplify things, and we might
even get speedups since we will just be using Cairo's transformations
instead of our own on top of what Cairo is already doing.

On Thu, Dec 6, 2012 at 8:19 PM, Adam Savitzky <asav...@forio.com> wrote:
> Now in github:
>
> https://github.com/forio/ggplot.jl/blob/master/spec.md
>

> --
>
>
>

Tim Holy

unread,

Dec 14, 2012, 4:59:32 PM12/14/12

to juli...@googlegroups.com

On Friday, December 14, 2012 03:55:40 PM Jeff Bezanson wrote:
> Next I will try to port Winston to work with my Cairo
> branch (also on a branch of course).

Do you mean, the first part is mostly done, and the Winston port is immediately
on your queue?

> It would be nice if the low-level
> transformation machinery can replace Projection and AffineTransform,
> and possibly even BoundingBox. This will simplify things, and we might
> even get speedups since we will just be using Cairo's transformations
> instead of our own on top of what Cairo is already doing.

I did a little performance-profiling with a plot with 10^6 points, and indeed
BoundingBox manipulations looked like they were a major part of the expense.

If I understand correctly:
1. The Cairo layer needs to be told explicitly every detail about the scene to
be rendered. In particular, the "look and feel" of the output (e.g., every
tick mark on an axis) has to be specified before Cairo gets it. Cairo handles
the actual drawing of those lines and patches.
2. Winston is a conglomeration of what I think of as three things:
a) A user-facing plotting API, which populates...
b) ...a collection of types that are high level containers and objects. Things
like "an axis, which has x-limits [0,20]" and other properties. The populated
objects are passed to...
c) ...a set of functions which build the exact lines etc that get passed to
Cairo (including the "look and feel").

Adam has put some good thought into an alternative version for (a). I've been
wondering about studying layer (b), and seeing whether it's worth trying to
"codify" it so that it becomes possible to experiment with different
renderers/front ends. Worth doing? Or is it better to see how your refactoring
goes, because it's too important to first figure out what tasks will be done in
which layers?

--Tim

Jeff Bezanson

unread,

Dec 14, 2012, 5:24:37 PM12/14/12

to juli...@googlegroups.com

If you're feeling motivated to do something, it's best just to do it.
We might end up rebuilding everything, reusing pieces as we go. We
know Winston could use some refactoring anyway, since it has some
un-julian things in it like pt_add(). We should decide on a preferred
representation of primitives like points, lines, and polygons. We
could use length-2 Vectors, or define Vector2, and so on.

> --
>
>
>

Alan Edelman

unread,

Dec 16, 2012, 7:49:16 AM12/16/12

to juli...@googlegroups.com

simple dumb question from a user
--
jeff set me up with winston on julia.mit.edu
how can i just get a list of commands and functionality

so far all i know how to do is
plot(x,y) and plot(array)

--

Tim Holy

unread,

Dec 16, 2012, 8:08:06 AM12/16/12

to juli...@googlegroups.com

Right now the best bet is on the web,
https://github.com/nolta/Winston.jl
https://github.com/nolta/Winston.jl/blob/master/doc/examples.md
https://github.com/nolta/Winston.jl/blob/master/doc/reference.md

Jason Knight

unread,

Dec 16, 2012, 2:32:11 PM12/16/12

to juli...@googlegroups.com, ede...@math.mit.edu

This will be made much easier in the future as work is currently ongoing to establish package documentation: https://github.com/JuliaLang/julia/pull/1619

Peter Wang

unread,

Dec 18, 2012, 3:24:58 AM12/18/12

to juli...@googlegroups.com

Hi everyone,

I just signed up to julia-dev so I could chime in on this thread. I'm a long-time Python dev working at Continuum Analytics. Previously at Enthought, I wrote Chaco. At Continuum, we are working on a new graphing system for Python (called Bokeh) that tries to take the best ideas of Grammar of Graphics, Protovis/d3, Matplotlib/MATLAB, and Chaco, and build it into a single stack that natively targets web-based display. As you may have seen on HackerNews, we recently received some DARPA funding to pursue work on Blaze, Numba, and Bokeh, as part of the XData "big data" project. (I also heard that some Julia devs are also participating in the XData work...?)

The reason I am piping up here is because I think there is a lot of good that could result from us joining forces on the *graphics* layer. There have been many great points raised in the thread, but the principal ones I would like to second are:

- Need to separate plot specification API from display interface ("backend")

- Different use cases will drive the API in different ways; Grammar of Graphics (and ggplot) is not the end-all and be-all of plot specification approaches, especially for users coming from a science and engineering plotting background. The learning curve, even for those with statistical graphics needs, are non-trivial. And interactivity, brushing, and the like are not really demonstrated with this model.

Some additional comments:

d3 (and protovis) are powerful approaches to specifying novel graphics, but I have not seen them scale to very large datasets. The learning curve is also non-trivial, and the fundamental goals of the project must be taken into account. d3 is very much designed for embedding in a programmable DOM. (Protovis was, in my opinion, a more infoviz-centric toolkit.)

For rich-client embedding or native apps, there are a huge number of potential backend options. Both Chaco and Matplotlib took the approach of abstracting a drawing layer over the combinatoric mess of GUI toolkits and 2D canvas libraries: {Qt, Wx, Tk, Cocoa, Quartz} x {OpenGL, GTK, toolkit-dependent painter, Anti-Grain Geometry (AGG) software rasterizer, Quartz, PDF, SVG, Cairo, etc.} This is a non-trivial undertaking and is always a mess. It is important to figure out early on what is really needed, and what are the desired use cases. If you can cut down on the number of platforms or types of rendering you want to do, it will save a lot of time that would otherwise be spent on creating yet another abstraction layer.

I think that having an "abstract representation of a graph" that the front-end can generate and that various backends can render is a nice idea, but perhaps very difficult to do in a way that does not either lock in specific classes of visualizations expressible by the front-end, or force the rendering backend to be too slow for interactive and large data work. I'm not saying it's impossible; I just think it's hard to create this intermediate representation of a visual without baking in assumptions about what the backend rendering system can do in an optimal way.

I don't know how much bandwidth and resources you guys are devoting to tackling this "new plotting system" project, but I would love to be able to work together. Most of the challenges in implementing a backend are not language specific. The big question is one of scope & use cases: if it turns out that Bokeh is tackling a much different use case than Julia, then it's possible that we may not even be able to find a useful way to collaborate on a backend. A front-end API would be interesting to talk about as well, because I do think that there is plenty of space for innovation in this regard, and I think that Python and Julia can speak similar vocabulary and grammar for the narrower purpose of defining visualizations over a dataset.

My $.02...

Regards,

Peter

Stefan Karpinski

unread,

Dec 18, 2012, 6:07:48 AM12/18/12

to Julia Dev

On Tue, Dec 18, 2012 at 3:24 AM, Peter Wang <pw...@continuum.io> wrote:

Hi everyone,

I just signed up to julia-dev so I could chime in on this thread. I'm a long-time Python dev working at Continuum Analytics. Previously at Enthought, I wrote Chaco. At Continuum, we are working on a new graphing system for Python (called Bokeh) that tries to take the best ideas of Grammar of Graphics, Protovis/d3, Matplotlib/MATLAB, and Chaco, and build it into a single stack that natively targets web-based display.

Very nice to have you here!

As you may have seen on HackerNews, we recently received some DARPA funding to pursue work on Blaze, Numba, and Bokeh, as part of the XData "big data" project. (I also heard that some Julia devs are also participating in the XData work...?)

The reason I am piping up here is because I think there is a lot of good that could result from us joining forces on the *graphics* layer.

We do actually have some XData funding – I particularly noted that when I read that part about your grant announcement. Congrats on the funding, by the way! 100% agree that we should collaborate on the visualization layer, especially for web-based visualization – which, could potentially be the primary visualization target. I would point out that Winston, which is our leading plotting package at this point, was originally written for Python and got ported over to Julia by Mike early on.

There have been many great points raised in the thread, but the principal ones I would like to second are:

- Need to separate plot specification API from display interface ("backend")

Agreed. The hard part here is getting the abstraction layer right. Sounds like your background could be quite valuable here.

- Different use cases will drive the API in different ways; Grammar of Graphics (and ggplot) is not the end-all and be-all of plot specification approaches, especially for users coming from a science and engineering plotting background. The learning curve, even for those with statistical graphics needs, are non-trivial. And interactivity, brushing, and the like are not really demonstrated with this model.

Agreed. Especially integrating interactivity into a GG-style system concerns me because I'm not sure how to do it. I gather Hadley may be working on something related. Maybe we should try to rope him in too.

Some additional comments:

d3 (and protovis) are powerful approaches to specifying novel graphics, but I have not seen them scale to very large datasets. The learning curve is also non-trivial, and the fundamental goals of the project must be taken into account. d3 is very much designed for embedding in a programmable DOM. (Protovis was, in my opinion, a more infoviz-centric toolkit.)

As you say, I see D3 more as a brilliant JS framework for graphical DOM manipulation, which is only at the very far front end for web-based visualization. A lot of work needs to be put in to bridge the gap between back-end data crunching and e.g. a bunch of SVG elements, which is the level at which D3 works.

For rich-client embedding or native apps, there are a huge number of potential backend options. Both Chaco and Matplotlib took the approach of abstracting a drawing layer over the combinatoric mess of GUI toolkits and 2D canvas libraries: {Qt, Wx, Tk, Cocoa, Quartz} x {OpenGL, GTK, toolkit-dependent painter, Anti-Grain Geometry (AGG) software rasterizer, Quartz, PDF, SVG, Cairo, etc.} This is a non-trivial undertaking and is always a mess. It is important to figure out early on what is really needed, and what are the desired use cases. If you can cut down on the number of platforms or types of rendering you want to do, it will save a lot of time that would otherwise be spent on creating yet another abstraction layer.

I haven't played with Chaco; it was mentioned many times at SciPy 2012. In your view, what are the strengths and weaknesses? I presume it's got some weaknesses or we probably wouldn't be having this conversation :-)

I think that having an "abstract representation of a graph" that the front-end can generate and that various backends can render is a nice idea, but perhaps very difficult to do in a way that does not either lock in specific classes of visualizations expressible by the front-end, or force the rendering backend to be too slow for interactive and large data work. I'm not saying it's impossible; I just think it's hard to create this intermediate representation of a visual without baking in assumptions about what the backend rendering system can do in an optimal way.

I don't know how much bandwidth and resources you guys are devoting to tackling this "new plotting system" project, but I would love to be able to work together.

It's our biggest weak spot, so I think everyone who is enthusiastic about Julia thinks this should be a priority. I certainly do.

Most of the challenges in implementing a backend are not language specific.

Definitely. The design would probably benefit from not being tied to a single language. A well-designed, efficient protocol for talking to a JavaScript rendering front end *or* an entirely different front end would be ideal.

The big question is one of scope & use cases: if it turns out that Bokeh is tackling a much different use case than Julia, then it's possible that we may not even be able to find a useful way to collaborate on a backend.

For the short term, we mainly need something that works (whereas Python obviously already has several fully-functional options), but longer term, I'm quite sure we can work together on this.

A front-end API would be interesting to talk about as well, because I do think that there is plenty of space for innovation in this regard, and I think that Python and Julia can speak similar vocabulary and grammar for the narrower purpose of defining visualizations over a dataset.

I agree. Excited to collaborate on this. Thanks for chiming in.

John Myles White

unread,

Dec 18, 2012, 9:17:58 AM12/18/12

to juli...@googlegroups.com

Why is interactivity so hard in an extension of ggplot? I can easily imagine things like:

ggplot(data, aes(x = Year, y = RepublicanVote)) + geom_line() + slider_bar(control = Region)

Which produces an NYT style line graph that controls the region of the USA being shown via a slider bar.

In general, I think the GGplot API could be simpler, but extensions for interactivity and animation seem very easy.

Am I missing something?

-- John

--

Stefan Karpinski

unread,

Dec 18, 2012, 9:21:20 AM12/18/12

to Julia Dev

No, that certainly seems pretty nice. I think the lower-level API is the harder part for interactivity.

--

Tim Holy

unread,

Dec 18, 2012, 10:31:21 AM12/18/12

to juli...@googlegroups.com

On Tuesday, December 18, 2012 02:24:58 AM Peter Wang wrote:
> I think that having an "abstract representation of a graph" that the
> front-end can generate and that various backends can render is a nice idea,
> but perhaps very difficult to do in a way that does not either lock in
> specific classes of visualizations expressible by the front-end, or force
> the rendering backend to be too slow for interactive and large data work.
> I'm not saying it's impossible; I just think it's hard to create this
> intermediate representation of a visual without baking in assumptions about
> what the backend rendering system can do in an optimal way.

As a primary advocate of "abstract representation of a graph," I'll confess
that this is my chief concern, too. I've started diving in to these matters a
little, just to see how feasible it is.

In my own efforts (very minimalistic, due to other commitments) to try to
contribute positively to this problem, I've just starting taking the first
steps towards trying to understand how far this can get in one particular
area, layout management. This is something that Winston does not yet seem to
do much with, Matlab (my own background) does badly, but which is pretty
fundamental to any sophisticated graphics/GUI system because it specifies where
things appear and how they change under resizing.

Do you, or anyone else, happen to know what fraction of the GUI toolkits (Qt,
Wx, Tk, Cocoa, ... or at least the subset we end up deciding to support) allow
one to specify layout purely in terms of callbacks? If so, then we might be
able to write a "cross-platform" Julia layout manager. If not, then it might
be a hard problem "simply" to specify layout in a way that can work with many
different underlying toolkit layout managers. If we can't even do that, then
settling on a few choices seems like the best approach.

--Tim

Tim Holy

unread,

Dec 18, 2012, 10:48:02 AM12/18/12

to juli...@googlegroups.com

I also should have said, indeed the challenges in doing this right are
nontrivial and language-agnostic in many important ways. I'm excited to see
all the recent interest in plotting!

--Tim

Isaiah Norton

unread,

Dec 18, 2012, 12:26:40 PM12/18/12

to juli...@googlegroups.com

Do you, or anyone else, happen to know what fraction of the GUI toolkits (Qt,
Wx, Tk, Cocoa, ... or at least the subset we end up deciding to support) allow
one to specify layout purely in terms of callbacks?

I think there might be two distinct issues: layout of GUI widgets, which is quite specific to the GUI library, and within-plot layout. Abstracting the latter is probably implicitly necessary for any frontend/backend architecture.

Regarding GUIs, Tk uses "pack" (callable function), while both Wx and Qt use a layout manager class hierarchy. Layout manager subclasses have certain possibly-inherited behaviors (horizontal,vertical,grid), and certain configurable options (spacing,border). At least in Qt, widgets are added directly to the layout instance, and the SetLayout method is called on the parent container.

Abstracting the GUI library layout is certainly possible: Traits GUI builder (also from Enthought) has a layout dsl with at least Wx and Qt backends. Some flexibility is lost in the abstraction - and occasionally there are bewildering quirks - but it generally works nicely for quick visualization GUIs.

If so, then we might be
able to write a "cross-platform" Julia layout manager. If not, then it might
be a hard problem "simply" to specify layout in a way that can work with many
different underlying toolkit layout managers. If we can't even do that, then
settling on a few choices seems like the best approach.

--Tim

--

Jeffrey Sarnoff

unread,

Dec 18, 2012, 2:38:48 PM12/18/12

to juli...@googlegroups.com

A good solution should support publication quality graphics (2D, 3D and colorized 4D) and interactive graphical widgets under a harmonious umbrella.

Jason Knight

unread,

Dec 18, 2012, 2:40:40 PM12/18/12

to juli...@googlegroups.com

There have been many great points raised in the thread, but the principal ones I would like to second are:

- Need to separate plot specification API from display interface ("backend")

Agreed. The hard part here is getting the abstraction layer right. Sounds like your background could be quite valuable here.

...

{Qt, Wx, Tk, Cocoa, Quartz} x {OpenGL, GTK, toolkit-dependent painter, Anti-Grain Geometry (AGG) software rasterizer, Quartz, PDF, SVG, Cairo, etc.}

I think the case for using the browser as a "backend" is very strong, take Chrome for example:

Chrome has 310M active users as of June 2012
It is always up to date (how often can you say that about Wx, Qt, etc..)
Decoupling the frontend (compute side) and the backend to support effortless plotting from heterogeneous or remote compute architectures (cluster/cloud)
Flexibility to directly embed native backend code in the browser through NaCl and paint directly to 2D or 3D graphics contexts through the Pepper API
Flexibility to include computation directly in the browser through NaCl (once Julia supports NaCl or PNaCl compilation)

The only downside I can see is getting data into the browser fast enough (sandboxing limiting our options), but there are a few possibilities (base64 encoded binary data in JSON being the simplest, and mmaping files that are then passed directly through HTTP requests over loopback interface being another).

Perhaps I'm preaching to the choir, but I often get the feeling that people don't see the browser for what it is (becoming): the universal graphical toolkit.

Stefan Karpinski

unread,

Dec 18, 2012, 2:58:24 PM12/18/12

to Julia Dev

Does anyone know if you can embed a Chomium canvas in a native app and use it to render HTML in the native window easily? If so, can you share memory with the embedded process so that you can use the same chunk of memory for a Julia/Python data array and JavaScript ArrayBuffer. That would allow a single graphics model to do serialization over the network, but do non-copying memory transfer when running locally.

--

Stefan Karpinski

unread,

Dec 18, 2012, 2:59:11 PM12/18/12

to Julia Dev

Btw, I don't think that NaCl is a particularly good target. It's a very cool technology, but it's very Chrome-specific and not especially portable.

Patrick O'Leary

unread,

Dec 18, 2012, 3:03:44 PM12/18/12

to juli...@googlegroups.com

Something like https://code.google.com/p/chromiumembedded/?

(Note that I would hate to be stuck installing Chrom(e/ium) because Julia graphics end up with a NaCl/Pepper dependency; I'm quite happy with Firefox, and it would be great if we kept interactive graphics cross-browser if we go that way. Now, LLVM IR can be converted to JavaScript with Emscripten...)

Stefan Karpinski

unread,

Dec 18, 2012, 3:13:14 PM12/18/12

to Julia Dev

On Tue, Dec 18, 2012 at 3:03 PM, Patrick O'Leary <patrick...@gmail.com> wrote:

Something like https://code.google.com/p/chromiumembedded/?

Yes! Exactly like that. Very promising looking.

(Note that I would hate to be stuck installing Chrom(e/ium) because Julia graphics end up with a NaCl/Pepper dependency; I'm quite happy with Firefox, and it would be great if we kept interactive graphics cross-browser if we go that way. Now, LLVM IR can be converted to JavaScript with Emscripten...)

I agree. I'm very anti NaCl/Pepper. For me the embedded Chrome idea is about being able to use the same JavaScript graphics in an otherwise native app, rather than about being Chrome-specific. It would allow the best of both worlds, hopefully, the benefits of a native app but sharing much code and logic with a web version. Just an idea.

Jason Knight

unread,

Dec 18, 2012, 3:20:33 PM12/18/12

to juli...@googlegroups.com

Something like https://code.google.com/p/chromiumembedded/?

Yes! Exactly like that. Very promising looking.

From here under browser-client integration: "Off-screen rendering (cef_render_handler.h)"

(Note that I would hate to be stuck installing Chrom(e/ium) because Julia graphics end up with a NaCl/Pepper dependency; I'm quite happy with Firefox, and it would be great if we kept interactive graphics cross-browser if we go that way. Now, LLVM IR can be converted to JavaScript with Emscripten...)

I agree. I'm very anti NaCl/Pepper. For me the embedded Chrome idea is about being able to use the same JavaScript graphics in an otherwise native app, rather than about being Chrome-specific. It would allow the best of both worlds, hopefully, the benefits of a native app but sharing much code and logic with a web version. Just an idea.

I understand the hesitation, luckily it is always there as a backdoor if someone were to need specific functionality/speed to closely integrate with the stock, pure javascript plotting backend.

Elliot Saba

unread,

Dec 18, 2012, 3:34:52 PM12/18/12

to juli...@googlegroups.com

Stefan, if you want to embed a web renderer into a native exectuable, I don't think you need to go all the way to chromium embedded. You can directly embed WebKit, the renderer used by chromium, and if you want Chromium's V8 javascript engine, you can specify in the WebKit build process to use that as well. This is a little lower-level than using embedded chromium, I'm sure, but will probably carry around a little less baggage.

--

Stefan Karpinski

unread,

Dec 18, 2012, 3:36:31 PM12/18/12

to Julia Dev

That seems reasonable too. My main question was whether it was possible, which seems to be resoundingly affirmative.

--

Mason

unread,

Dec 18, 2012, 4:34:14 PM12/18/12

to juli...@googlegroups.com

Neat discussion. Here's an architecture that seems like it would support language-agnostic rendering, using web tech:

1] Do computations in Julia (or Python, etc...) resulting in variables X1, X2, ... holding data to be plotted.

2] Make a Plotter class that runs a web server which responds to GET requests for those variables (e.g. GET http://localhost:31415/X1?index=[0,59]), or maybe functions of those variables, but with defaults provided to just extract slices from them, for instance.

3] That's it. Let users do the rest in JavaScript and friends however they please.

Of course you could provide some nice defaults like:

4] Make the base Plotter serve a default /index.html that provides a GUI for interactively building a plot, and pop open a browser pointed at that.

5] Make child classes of Plotter which serve pre-made /index.html pages for rendering the basic plots (scatterplot, ...).

It looks like http://phantomjs.org/ could be used for producing static output under this architecture.

-Mason

--

Stefan Karpinski

unread,

Dec 18, 2012, 4:35:55 PM12/18/12

to Julia Dev

That seems like it would be painfully slow for anything more than a trivial amount of data.

--

Elliot Saba

unread,

Dec 18, 2012, 4:57:24 PM12/18/12

to juli...@googlegroups.com

I've actually built a (very simple, linegraphs only, transferring data via JSON) platform that does what Mason suggests, using Python and D3, and I can affirm that it is simple on the julia/python side, complex on the javascript/d3 side, does not scale very well for large amounts of data (>10000 datapoints) and functions exactly as you think it would. Transfer from the webserver to the webclient is pretty negligible, most of the time was spent rendering rather than transferring.

I don't have enough experience to speak on the relative speed of webGL or server-side graphing/just serving up images to the user, but I don't think it'll be very beneficial to debate these kinds of details without having a concrete idea as to what exactly we need, and some numbers to back up what is and isn't slow. With plotting, the difference between 10ms render time and 100ms render time isn't going to be a dealbreaker, I don't think, (excluding animations, of course) so we definitely have some breathing room to develop an interface that is sub-optimal performance wise, with some boosts for user-convenience.

A quick aside on why I made that platform to begin with: I wanted a system where I could give the plotting tools my data, e.g. simply call "plot( X )" where X is a matrix, and all plotting arguments after that point could be manipulated in the browser, e.g. I could change plot colors, change plot type, (line, stem, scatter (with enumerated x-axis), etc....). This is something I've always wanted in Matplotlib. Just as the xlim/ylim parameters can be changed interactively, I think other parameters should be able to be changed as well. This, of course, is easiest when the client side is doing all the rendering, which is why I did that experiment. Not saying Julia must follow this path as well, it's a fair amount of work, I know. :)

-E

--

Mason

unread,

Dec 18, 2012, 4:58:36 PM12/18/12

to juli...@googlegroups.com

If you're saying this would hurt when trying to scatterplot a million data points, agreed.

But my own experience with plotting large amounts of data is that plot is usually based on an aggregation of the original data, which makes the transfer not so bad. E.g. if plotting a histogram, you'd do the binning server-side and only transmit data about the bars--not the full data set. This can be made to feel even faster by streaming the data out and plotting one bar at a time.

-Mason

--

Jeffrey Sarnoff

unread,

Dec 18, 2012, 5:21:18 PM12/18/12

to juli...@googlegroups.com

+1 for browser backend, it is the only way to ensure long term, high quality support; and it leverages the nicest javascript plotting.

Jeff Bezanson

unread,

Dec 18, 2012, 5:28:45 PM12/18/12

to juli...@googlegroups.com

Hi Peter, good to hear from you, and congrats on the wakari launch btw.

The browser is the new X server / vt-100 terminal / gui toolkit etc. for sure.

Do you have any thoughts on how this fits in with IPython? Will/should
it use the same messaging infrastructure and/or protocols?

This is a good time to talk about this, since we are just getting
started and would prefer to avoid walled gardens where possible. For
my part, I happen to know and enjoy cairo so I've been working on the
bindings that Mike started, but that does not set any particular
direction for an overall graphics system. What should our next step
be? For example, if implementing the IPython protocol helps, we could
get started on that.

We already have some proof-of-concept level stuff for going between
julia and JS-based browser rendering. It seems like Bokeh will have a
JS runtime similar to this on some level, so I would go ahead and
conceptually merge the two. We can go right in and change the formats
of our JSON messages, etc. easily since there is very little there.

> --
>
>
>

Tim Holy

unread,

Dec 18, 2012, 5:29:57 PM12/18/12

to juli...@googlegroups.com

Can we get an estimate of numbers here? In Matlab I can see a scatterplot of a
million points in approximately 0.7s (it returns to the command line in ~50ms,
the rest is what happens after it gets passed off). I've long thought of
Matlab's performance as "good, but even better would be nice."

A scatterplot of 10^6 numbers is not a rare thing, so I definitely appreciate
the attention to performance in choosing an approach.

--Tim

John Myles White

unread,

Dec 18, 2012, 5:35:57 PM12/18/12

to juli...@googlegroups.com

I definitely agree that we should develop a system with good performance, but I don't think we should ban the use of D3 just because it won't perform as well.

-- John

> --
>
>
>

Dag Sverre Seljebotn

unread,

Dec 19, 2012, 3:29:35 AM12/19/12

to juli...@googlegroups.com

On 12/18/2012 10:58 PM, Mason wrote:
> If you're saying this would hurt when trying to scatterplot a million
> data points, agreed.
>
> But my own experience with plotting large amounts of data is that plot
> is usually based on an aggregation of the original data, which makes the
> transfer not so bad. E.g. if plotting a histogram, you'd do the binning
> server-side and only transmit data about the bars--not the full data
> set. This can be made to feel even faster by streaming the data out and
> plotting one bar at a time.

Much of this discussion (not necesarrily the above) feels to me like
(re-)writing the software of the past, not for the future.

The usecase I would have in mind is to plot terabytes of data. Obviously
you can't do that in any reasonable way without reducing it, but the
point is to choose good reductions dynamically based on the specific
view of the data without bothering the user with doing a reduction up-front.

This has obviously already happened for lots of web applications (e.g.,
as you zoom into a map a single large dot is split up into many small
constituent dots), where the plotting technology is custom built for the
purpose. The trick is to provide an architecture that makes these kinds
of plots possible, at least in the long run. That is, some combination
of "cluster-side" reduction and dynamic
"I-changed-my-view" callbacks from the browser.

Dag Sverre

Tim Holy

unread,

Dec 19, 2012, 4:47:32 AM12/19/12

to juli...@googlegroups.com

I agree wholeheartedly. Of the existing scientific plotting solutions out
there, which come closest to doing this aspect correctly?

--Tim

Tim Holy

unread,

Dec 19, 2012, 6:17:05 AM12/19/12

to juli...@googlegroups.com

Actually, now I'm wondering whether this is something that needs to be built
in at the lowest level, or whether it's more of a top-layer thing. Example
(one I was indeed already thinking about adding to Winston): I need to plot
timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
balk at accepting such figure files.

However, acknowledging that the graph will only get 5in of space at 300dpi,
there are really only 1500 distinct x values shown. Break your 10^8 data
points up into 1500 time bins, compute the min/max for each one of these bins,
and plot it as a fill between the min-line and max-line. Presto, you have
something that displays quickly, generates nice compact figure files, and looks
the same as the full line plot.

This could be triggered simply by a plot function that does method dispatch on
a Range type for the first input (because that guarantees that the x-coordinate
is on an evenly-spaced grid). The only extra step is that when you zoom in on
such a plot, you need to recompute the bins for whatever range of x-axis is
included within your zoom region. So such objects would need zoom & resize
callbacks.

In other words, it seems that this is a high-level optimization, one that
could easily be built on top of a well-designed "last year" type of plotting
system. I suspect 2d scatterplots would be in the same category.

--Tim

Dag Sverre Seljebotn

unread,

Dec 19, 2012, 8:34:44 AM12/19/12

to juli...@googlegroups.com

On 12/19/2012 12:17 PM, Tim Holy wrote:
> Actually, now I'm wondering whether this is something that needs to be built
> in at the lowest level, or whether it's more of a top-layer thing. Example
> (one I was indeed already thinking about adding to Winston): I need to plot
> timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
> export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
> balk at accepting such figure files.
>
> However, acknowledging that the graph will only get 5in of space at 300dpi,
> there are really only 1500 distinct x values shown. Break your 10^8 data
> points up into 1500 time bins, compute the min/max for each one of these bins,
> and plot it as a fill between the min-line and max-line. Presto, you have
> something that displays quickly, generates nice compact figure files, and looks
> the same as the full line plot.

Yes, this is exactly what I'm talking about, and "I know it's a research
field in itself" though I can't point you specific papers or software. I
think Peter Wang is the one who would know most, as his company
(http://continuum.io) specifically targets Big Data.

>
> This could be triggered simply by a plot function that does method dispatch on
> a Range type for the first input (because that guarantees that the x-coordinate
> is on an evenly-spaced grid). The only extra step is that when you zoom in on
> such a plot, you need to recompute the bins for whatever range of x-axis is
> included within your zoom region. So such objects would need zoom & resize
> callbacks.

Yes, I exactly meant that the characteristic of "last-year" type
plotting architectures is that you don't have this feedback all the way
back to the data source from the zoom&resize.

> In other words, it seems that this is a high-level optimization, one that
> could easily be built on top of a well-designed "last year" type of plotting
> system. I suspect 2d scatterplots would be in the same category.

I really know too little to comment on this. I think it depends on the
reduction functions. I can imagine a line-plot being reduced to a set of
Bezier-curves rather than a new set of lines, for instance, i.e. the
data may be reduced to graphical primitives rather than "new data".

Dag Sverre

Tim Holy

unread,

Dec 19, 2012, 9:24:28 AM12/19/12

to juli...@googlegroups.com

> Yes, I exactly meant that the characteristic of "last-year" type
> plotting architectures is that you don't have this feedback all the way
> back to the data source from the zoom&resize.

I think method dispatch can make this a lot easier. If the original architects
of the plot command provide a generic method resize(obj::Line, xdata, ydata),
there's nothing that prevents me from later adding resize(obj::Line,
xdata::Range, ydata) and "pre-empting" the original callback function in a
special case where I can get higher efficiency.

However, I guess where one could run into trouble is if it's just
resize(obj::Line), and Line.x can be either a vector or a range, and
everything is handled inside that resize function. Then you can't do method
dispatch on the x, y fields of Line, so you're stuck.

So you're right, it will be important to take some care in how things are
designed at the low level. But much of that might simply involve splitting big
functions into pieces, so that method dispatch can do its magic.

--Tim

Stefan Karpinski

unread,

Dec 19, 2012, 12:18:11 PM12/19/12

to Julia Dev

D3 is appropriate for some things (scatter plots, line plots, bar charts, pie charts), but less so for others (full image rendering, heat maps, 3D visualizations). There's no reason we can't use D3 for some things and HTML5 canvas or WebGL for others.

I'm mostly just concerned about getting data to the browser efficiently and in a form where it can hopefully be used directly without deserialization. Hence my interest in ArrayBuffers, which are the most promising way to do that sort of thing.

--

Tim Holy

unread,

Dec 19, 2012, 1:00:40 PM12/19/12

to juli...@googlegroups.com

I like the idea of using the browser (or a WebKit component) to solve the
"which toolkit to use" dilemma. It would also make it easier to use the same
approach to serve up documentation.

But I'm a complete idiot when it comes to web technology---every time I have
to use a tiny bit of HTML, I still have to look up which order the various
angle brackets and slashes appear :-). So a dumb question: if we use a
browser/webkit, how much of the plotting can be written in Julia, vs. how much
needs to be written in JavaScript?

--Tim

Stefan Karpinski

unread,

Dec 19, 2012, 1:12:42 PM12/19/12

to Julia Dev

On Wed, Dec 19, 2012 at 6:17 AM, Tim Holy <tim....@gmail.com> wrote:

However, acknowledging that the graph will only get 5in of space at 300dpi, there are really only 1500 distinct x values shown. Break your 10^8 data points up into 1500 time bins, compute the min/max for each one of these bins, and plot it as a fill between the min-line and max-line. Presto, you have something that displays quickly, generates nice compact figure files, and looks the same as the full line plot.

This suggests to me that rendering a graphic itself becomes a dialogue between the display and compute sides. The compute side has to indicate that it wants to display something, and the display side needs to provide the properties of where the graphic will get rendered. The compute side then combines the information about the display context with the data and information about how it is to be displayed to produce a reduced representation of the data, which can be sent to the display side for rendering. Even that may be iterative, for example, producing a series of better and better approximations and displaying each of them.

Then there's the interactive bit. The reduction step might want to produce some kind of data structure that will make interaction faster. For example, zooming out can be done very efficiently by collapsing adjacent data points. Or maybe the compute side uses something like that data structure where you keep a binary tree of sums of power-of-two-sized buckets of a series, which allows you to quickly compute the sum of any particular range (I've completely blanked on what these things are called). Once that's been computed for a data set, you can use it to quickly re-histogram something or rapidly zoom in on a CDF, as a few examples.

Another clever technique that comes to mind is doing bounded-error approximate quantile estimation on a large data set, which can be done in a single pass, or alternatively can be updated efficiently on streamed data. The appropriate error bound can be determined from the display resolution so that what is displayed is guaranteed to be perfectly accurate, even though exact quantiles needn't be computed. You can imagine using the approximate quantiles to zoom immediately, but kicking or a recomputation with tighter error bound that later updates the graphic with more accurate data.

In any case, I think the challenge here is designing this protocol for allowing a dialogue between the display side and the compute side.

Jason Knight

unread,

Dec 19, 2012, 3:10:03 PM12/19/12

to juli...@googlegroups.com

On Wednesday, December 19, 2012 12:00:40 PM UTC-6, Tim wrote:

... if we use a

browser/webkit, how much of the plotting can be written in Julia, vs. how much
needs to be written in JavaScript?

For two simple cases:

* On the pro-Julia side, you could technically just render the plot in Julia and save the image into a (n,n,3) dimension buffer and then paint it directly on the HTML5 canvas.

* OTOH, to have a language agnostic browser backend, you would be communicating data (reduced or otherwise) and imaging intents, in which case nearly all of the plotting would occur from javascript in the browser.

The third case that seems to be floating up recently is a smart window/smart plotter duet where (here I will use the terminology frontend=>browser, backend=>Julia/Python etc..) the browser may tell the backend "I have a space to render x,y big at z dpi, and the user has requested these options on the plots", then the backend will take those requests, manipulate, simplify (and possibly do some pre-rendering), and then pass the simplified result to the frontend for painting.

So it really depends.

perrinmeyer

unread,

Dec 19, 2012, 4:19:48 PM12/19/12

to juli...@googlegroups.com

On Wednesday, December 19, 2012 3:17:05 AM UTC-8, Tim wrote:

Actually, now I'm wondering whether this is something that needs to be built
in at the lowest level, or whether it's more of a top-layer thing. Example
(one I was indeed already thinking about adding to Winston): I need to plot
timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
balk at accepting such figure files.

However, acknowledging that the graph will only get 5in of space at 300dpi,
there are really only 1500 distinct x values shown. Break your 10^8 data
points up into 1500 time bins, compute the min/max for each one of these bins,
and plot it as a fill between the min-line and max-line. Presto, you have
something that displays quickly, generates nice compact figure files, and looks
the same as the full line plot.

yes, but.... when I plot 10^6 points of course I'm not expecting to see all of them... I'm looking for outliers, sometimes a single value (as in on off-by one error on a boundary), or some other "glitch".

so i need it to plot fast, and interactively zoom fast.

I worry about "too much smarts" that averages or min/max or something that causes me to visually miss the ONE data point I'm looking for in 10^7...

As an aside, one thing I've thought clever was how tablets handle zooming -- quickly turning a vector image into a raster image - scale that crudely and quickly as I pinch and poke -- and then render the "vector" stuff when I'm done scrolling and zooming.

IDEA: render large data sets quickly but crudely, and then iteratively "render" progressively better detail, object features, other interactive stuff.

ASIDE: its an interesting question re: PDF export. Part of me wants all the points to get exported to the PDF by default because i've been around long enough to realize that sometimes 20 years later all we have left from research projects are some slides / images /etc and its really really nice when the raw data is in those things... but that is a small point...

this is a great and and exciting and important discussion! -- perrin

Tim Holy

unread,

Dec 19, 2012, 6:38:43 PM12/19/12

to juli...@googlegroups.com

On Wednesday, December 19, 2012 01:19:48 PM perrinmeyer wrote:
> yes, but.... when I plot 10^6 points of course I'm not expecting to see all
> of them... I'm looking for outliers, sometimes a single value (as in on
> off-by one error on a boundary), or some other "glitch".
>
> so i need it to plot fast, and interactively zoom fast.
>
> I worry about "too much smarts" that averages or min/max or something that
> causes me to visually miss the ONE data point I'm looking for in 10^7...

In this case you would miss absolutely nothing---the plot would look identical
to what it would look like if you rendered it with a line. This is a
timeseries, not an arbitrary scatterplot, and so you know that the x-axis
increases monotonically.

Here's a link to this type of graphic, maybe it will be clearer why you can
plot it as a fill rather than a line:
http://tinyurl.com/d8neg3m

Think about it in terms of an old-fashioned plotting technology: a stylus
plotter. When there are only 1500 pixels worth of information horizontally,
and 10^8 data points, the stylus ends up drawing up and down at the same
horizontal location, adding more and more ink, and of course occasionally
straying beyond the boundaries of what it has traced before (which is the
interesting part). After putting 10^8/1500 ~ 10^5 data points along a single
vertical line, it then moves one notch to the right, and does the same thing
all over again.

We can save it a lot of rendering time simply by computing the exterior of the
path.

--Tim

PS I like the idea of updating the plot with increasing detail, as long as
it's actually an efficiency gain.

perrinmeyer

unread,

Dec 20, 2012, 2:11:18 PM12/20/12

to juli...@googlegroups.com

however, the 300dpi must come from the actual device that is being plotted to. otherwise i worry 300dpi will get "hardcoded" as a magic number in the "fast" plotting code as a heuristic and 10 years from now when my display is 20000dpi then I miss things... (ok, not a well formed thought but I think my worry is possibly well founded -- does one always know what DPI the raster device is? since if the guess is wrong i think there could be "issues", as they say)

randomly browsing arxive.org I found this recent paper, which I share only as a data point of what is currently being done with large datasets and julia could do "better" than numpy and gnuplot and python....

http://arxiv.org/abs/1212.4458

Filtergraph: A Flexible Web Application for Instant Data Visualization of Astronomy Datasets

(aside, if you have an android device there is a super cool app that displays recent papers from arxive.org -- perfect for the loo ) -- perrin

Stefan Karpinski

unread,

Dec 20, 2012, 3:11:43 PM12/20/12

to Julia Dev

On Thu, Dec 20, 2012 at 2:11 PM, perrinmeyer <perri...@gmail.com> wrote:

however, the 300dpi must come from the actual device that is being plotted to. otherwise i worry 300dpi will get "hardcoded" as a magic number in the "fast" plotting code as a heuristic and 10 years from now when my display is 20000dpi then I miss things.

Yes, of course. Only a maniac would just pick an arbitrary resolution and hard code it. That's why the first step of the process I outlined about is the compute side requesting details about the display context.

Brian Mattern

unread,

Dec 20, 2012, 5:16:46 PM12/20/12

to juli...@googlegroups.com

Although only tangentially related to this thread, there was a mention of d3py (https://github.com/mikedewar/d3py) in a weekly python newsletter today. I haven't dug deeply yet, but it is one example of a ggplot-ish API passing on to d3 for display.

Brian

--

Peter Wang

unread,

Dec 28, 2012, 1:47:42 AM12/28/12

to juli...@googlegroups.com

OK, getting back into this thread about 10 days later, but better late than never?

On Tuesday, December 18, 2012 5:07:48 AM UTC-6, Stefan Karpinski wrote:

I haven't played with Chaco; it was mentioned many times at SciPy 2012. In your view, what are the strengths and weaknesses? I presume it's got some weaknesses or we probably wouldn't be having this conversation :-)

I could drone on quite a while on this topic (especially since this is the rabbit hole which lead to Blaze and the founding of Continuum :-) but in summary, it comes down to:

- Chaco is a very powerful, flexible library for *programmers*. Data analysts who could use an OOP library were generally able to do useful things with it. But it's unlikely that a novice programmer could really use its extensibility. Seeing Protovis's conciseness and flexibility was really quite inspiring and I wanted to do that, but in Python, and handling the streaming and large data use cases which Chaco could tackle.

- Chaco was originally architected to handle a very difficult interactive data exploration problem, namely, acting as the primary data viz widget for a geophysics application. This meant that while its bones are pretty solid (very flexible layout, very fast rendering, lots of config options for various renderers, etc.), the overall framework was architected for the "data viz widget" usecase, and without the needs of statistical plotting as a primary goal.

- One major drawback was that I built Chaco on an underlying graphics library which existed at Enthought when I started there, and that library has historically been difficult to build for some platforms. While I think the technical decision was justified, the build dependencies definitely reduced user adoption. At this point, I would really whittle down the dependencies there and focus on just hitting one solid native client GUI toolkit.

A core strength of Chaco is that there have been many, many applications which have been built using it to display all manner of large and streaming data sets, and the architecture has proven to be a reasonably powerful one. (There are not many plotting systems and languages where you can build a realtime spectrogram app in ~150 lines of code: http://www.youtube.com/watch?v=3_Q6dRF9Sxo) However, I personally think I over-architected it a bit, and doing it again, I'd stick much more closely to YAGNI principles. The layout system and event handling systems were definitely a bit overbuilt.

-Peter

Peter Wang

unread,

Dec 28, 2012, 1:53:54 AM12/28/12

to juli...@googlegroups.com

On Tuesday, December 18, 2012 8:17:58 AM UTC-6, John Myles White wrote:

Why is interactivity so hard in an extension of ggplot? I can easily imagine things like:
ggplot(data, aes(x = Year, y = RepublicanVote)) + geom_line() + slider_bar(control = Region)
Which produces an NYT style line graph that controls the region of the USA being shown via a slider bar.

In general, I think the GGplot API could be simpler, but extensions for interactivity and animation seem very easy.
Am I missing something?

When I talk about interactivity, I am referring to interactive visual elements in the plot itself. This is a somewhat harder problem because beyond simple range/box/lasso selection tools, there are all sorts of brushing and drag-based tweak behaviors that can be usefully leveraged to create interactive visualizations. A couple of simple examples:

http://www.youtube.com/watch?v=s0i12H85yMc

http://www.youtube.com/watch?v=GakRARoIbq8

However, you are right in that tying in a fixed set of UI widgets to the ggplot "data cube" object model is pretty straightforward, and the syntax you suggested above actually fits in pretty well with the various stat_* transforms that ggplot offers...

-Peter

Peter Wang

unread,

Dec 28, 2012, 2:37:14 AM12/28/12

to juli...@googlegroups.com

On Tuesday, December 18, 2012 9:31:21 AM UTC-6, Tim wrote:

Do you, or anyone else, happen to know what fraction of the GUI toolkits (Qt,
Wx, Tk, Cocoa, ... or at least the subset we end up deciding to support) allow
one to specify layout purely in terms of callbacks? If so, then we might be
able to write a "cross-platform" Julia layout manager. If not, then it might
be a hard problem "simply" to specify layout in a way that can work with many
different underlying toolkit layout managers. If we can't even do that, then
settling on a few choices seems like the best approach.

I'm not aware of any that allow specifying layout with callbacks, based on my understanding of your use of the term... as far as I know, they all have some scheme like Qt's, where you can specify size hints on the widget, or use Sizers of some variety to indicate stretch and alignment preferences. The exception is OS X Cocoa, which uses Cassowary.

-Peer

Peter Wang

unread,

Dec 28, 2012, 3:03:24 AM12/28/12

to juli...@googlegroups.com

On Wed, Dec 19, 2012 at 7:34 AM, Dag Sverre Seljebotn <d.s.se...@astro.uio.no> wrote:

On 12/19/2012 12:17 PM, Tim Holy wrote:

Actually, now I'm wondering whether this is something that needs to be built
in at the lowest level, or whether it's more of a top-layer thing. Example
(one I was indeed already thinking about adding to Winston): I need to plot
timeseries. Take a timeseries 10^8 points long, naively plot it as a line,
export it as an EPS, and you get a ~1GB file. For some reason, journals tend to
balk at accepting such figure files.

...

Yes, this is exactly what I'm talking about, and "I know it's a research field in itself" though I can't point you specific papers or software. I think Peter Wang is the one who would know most, as his company (http://continuum.io) specifically targets Big Data.

This is precisely the problem we are tackling with Bokeh. It's not just about the amount of data, it's also about how much it visually clusters. The "large data" visualization systems I've seen are generally scientific/simulation volume datasets that require a display wall and a viz cluster. I am not really aware of general purpose *infoviz* tools that specifically try to handle extremely large datasets. Obviously specialty verticals will have their own tools (e.g. finance, log analysis, etc.). But lots of people have data processing needs that can't afford those tools, or their data doesn't fit in those pigeonhole.

This could be triggered simply by a plot function that does method dispatch on
a Range type for the first input (because that guarantees that the x-coordinate
is on an evenly-spaced grid). The only extra step is that when you zoom in on
such a plot, you need to recompute the bins for whatever range of x-axis is
included within your zoom region. So such objects would need zoom & resize
callbacks.

Yes, I exactly meant that the characteristic of "last-year" type plotting architectures is that you don't have this feedback all the way back to the data source from the zoom&resize.

Even if you have this feedback, it will not save you. The binning process itself needs to be dynamic. Consider the case of the humble scatter plot. Once you get to more than a few hundred thousand points, it's quite likely that a simple scatter plot is not actually the right retinal variable to use. You want to "visually integrate" and produce a contour plot or a heatmap, and maybe show discrete points when densities are low enough. Likewise for a colormapped scatterplot with any amount of overdraw.

Lineplots suffer similar problems. Simply fitting with bezier curves is not enough, because most large timeseries that I've seen have a lot of "fuzziness", i.e. high frequency jitter within a band, with a few outliers every so often. People are usually interested in the width of that band and its macro movement, and also the outliers. So the lineplot really needs to be visually integrated into a dynamic band plot, with actual points for outliers beyond a few standard devs within each timeslice bin.

Anyways, these sorts of "visual integration" ideas and semantic zoom are at the core of our vision for "big data" plotting with Bokeh.

-Peter

Reply all

Reply to author

Forward