intermediate graph representation for D3 Layouts

1,433 views
Skip to first unread message

Rich Morin

unread,
Apr 7, 2012, 9:14:36 PM4/7/12
to d3...@googlegroups.com
Kai and I spent the afternoon noodling about ways to represent
graphs (ie, sets of nodes and edges) for use with D3 layouts.

This note tries to summarize some of our thinking. Please help
us to spot flaws in our analysis and design before we start
committing them to code!

-r


Background
==========

Each input data set will have a "data format" (DF), typically
based on the way the data was collected, obtained, etc. Each
layout will expect to see its own "layout format" (LF).

Somehow, we need to convert the data from DF to LF. The direct
solution is to create a function for each combination of data
and layout formats. Unfortunately, this solution scales very
poorly, requiring DF * LF conversion functions. (Bleah!)

To avoid this, we can define an intermediate format (IF) that
handles all of the expected variations in DF and LF. This lets
us create one function (DF->IF) for each input and one function
(IF->LF) for each layout.

Because the number of DFs and LFs can be expected to grow, this
approach has real scaling advantages. For example, if we have
50 DFs and 15 LFs:

The direct approach requires 750 (50 * 15) filters.
The indirect approach requires 65 (50 + 15) filters.


Use cases
=========

Let's look at some graph representation use cases, based on Mike
Bostock's examples.


flare-imports.json
------------------

flare-imports.json, from Mike's Hive Plot demo, uses a list of
hashes to define a directed graph. Each hash defines a software
module, giving its name, size, and a list of modules it imports.

For example, the module "flare.animate.Transitioner" is imported
by the module "flare.analytics.cluster.AgglomerativeCluster":

nodes = [
{ "name": "flare.analytics.cluster.AgglomerativeCluster",
"size": 3938,
"imports": [ "flare.animate.Transitioner", ... ]
}, ...
];

This format is easy for humans to use (eg, read, edit), but it
could be pretty inefficient in time and space (depending on the
JavaScript implementation and the size of the name strings).


miserables.json
---------------

Mike has used the Les Miserables data set a few times, eg:

http://mbostock.github.com/protovis/ex/arc.html
http://mbostock.github.com/protovis/ex/miserables.js

miserables.js has some useful comments:

This file contains the weighted network of coappearances of
characters in Victor Hugo's novel "Les Miserables". Nodes
represent characters as indicated by the labels, and edges
connect any pair of characters that appear in the same
chapter of the book. The values on the edges are the number
of such coappearances. The data on coappearances were taken
from D. E. Knuth, The Stanford GraphBase: A Platform for
Combinatorial Computing, Addison-Wesley, Reading, MA (1993).

The group labels were transcribed from "Finding and evaluating
community structure in networks" by M. E. J. Newman and M.
Girvan. [http://arxiv.org/pdf/cond-mat/0308217.pdf]

var miserables = {
nodes: [ { nodeName: "Myriel",
group: 1 }, ... ],

links: [ { source: 1,
target: 0,
value: 1 }, ... ]
}


miserables.json, from Mike's "force" example, has basically the
same format. A top-level hash contains a pair of arrays (nodes
and links) which define the graph. The extract below tells us
that Napoleon (source 1) appears once (value 1) with Myriel
(target 0) and that Napoleon and Myriel are in the same
coappearance group (1).

{
"nodes": [ { "name": "Myriel", "group": 1 },
"name": "Napoleon", "group": 1 }, ... ]

"links": [ { "source": 1, "target": 0, "value": 1 }, ... ]
}


Discussion
==========

A generalized version of the Flare format might look something like:

{
"...this_id...": [ [ "...that_id...", ... ], {...meta...} ], ...
}

The Les Miz format eliminates the strings, in favor of index values.
It might therefore have efficiency advantages. A generalized version
of this format might look something like:

{
"nodes": [ [ "...this_id...", {...meta...} ], ... ],

"links": [ [ this_ndx, that_ndx, {...meta...} ], ... ]
}


Neither of these formats has any direct support for N-ary graphs.
So, for example, they can't represent statements such as "Rich
drove his Scion to the San Bruno BART station on Saturday".

However, various diagramming techniques use artificial nodes to
resolve this deficiency, eg:

Conceptual Graphs (John Sowa)
Object-Role Modeling (Terry Halpin)

Ok, now for feedback. Are there any obvious problems with either
of these formats? Are there any reasons to prefer one of them (or
some completely different format)?

Inquiring minds need to know. (ducks :-)

--
http://www.cfcl.com/rdm Rich Morin
http://www.cfcl.com/rdm/resume r...@cfcl.com
http://www.cfcl.com/rdm/weblog +1 650-873-7841

Software system design, development, and documentation


Mike Bostock

unread,
Apr 7, 2012, 9:39:46 PM4/7/12
to d3...@googlegroups.com
As a general principle, D3 tries to make minimal requirements on the
input data format. The goal is that users should be able to represent
data however they like.

Likewise, all of the layouts (perhaps with the exception of the stack
layout, which allows you to override the `out` accessor) have
well-defined output formats. This is useful for documentation purposes
(and understanding), and allows greater code reuse. Flexible input,
strict output.

To allow layouts to understand arbitrary input data, most D3 layouts
provide accessor functions. For example, hierarchy layouts have a
`children` accessor that is used to retrieve the array of child nodes
for each internal node, and a `value` accessor that returns the
quantitative value for leaf nodes.

https://github.com/mbostock/d3/wiki/Hierarchy-Layout

Likewise, you have components such as D3's shape generators that are
agnostic about the input format and can be completely customized using
accessors. Consider the arc generator, for example:

https://github.com/mbostock/d3/wiki/SVG-Shapes#wiki-arc

In effect, D3 uses these accessors to perform an implicit map from an
arbitrary user-defined representation to a standard representation. In
the case of layouts, the standard representation can then be decorated
with properties computed by the layout. Or in the case of shape
generators, the standard representation is used to compute attributes
(path data) and never exposed externally.

Even with the ability to override accessors, some assumptions must
still be made regarding the input format. The force layout, for
example, doesn't currently allow you to override the source and target
accessor for link objects—those are required to either be references
to nodes, or zero-based indexes that are converted to node references
upon initialization. That said, force layouts do use accessors for
link strength, link distance, charge strength, etc.

The input assumptions of the force layout are as follows:

* there's an array of nodes
* there's an array of links
* nodes are objects
* links are objects
* links have source and target properties
* source and target are either a node index or reference

That seems to be a smaller set of assumptions than the "generalized
versions" you described. It's still useful to establish conventions,
but I think it's nice if people can distinguish conventions from
requirements.

The force layout's output format is:

* node.index
* node.x
* node.y
* node.px
* node.py
* node.fixed
* node.weight

And, in some cases:

* link.source (when converting from index)
* link.target (when converting from index)

Hierarchy layouts have output formats too, such as setting the
`parent` node reference and the `value` for internal nodes. But, it's
nice to try to keep these output formats small, as we'd like to avoid
colliding with other meta data users ascribe to nodes.

Mike

Kai Chang

unread,
Apr 7, 2012, 10:59:46 PM4/7/12
to d3...@googlegroups.com
Here's what I've managed to synthesize from the discussion:

https://github.com/d3/d3-plugins/tree/master/d3/data.graph

d3.data.graph accepts data in either the matrix (chord layout) or list
of links (force layout) formats. It stores the data internally as a
list of nodes and links. It needs a bit more work before it can handle
state for both the chord and force layout.

Another structure to target with pack/unpack would be this:

mbostock.github.com/d3/data/flare.json

The end goal would be users can import data in any one of these data
structures, and output the correct format for any d3.js graph layout
or visualization.

Filtering and graph traversal would make sense to include as well.

Mike Bostock

unread,
Apr 7, 2012, 11:15:25 PM4/7/12
to d3...@googlegroups.com
> https://github.com/d3/d3-plugins/tree/master/d3/data.graph

I see what you're getting at now. Yeah, it would be nice if the chord
layout and force layout were more interoperable in terms of the input
representation.

Nice work on the plugin. I wonder if it would be simpler as stateless
conversion methods, though. For example, consider:

d3.graph.nodes = function(matrix) {
return […]; // array of {index: i} objects, perhaps?
};

d3.graph.links = function(matrix) {
return […]; // array of {source: i, target: j}, perhaps?
};

d3.graph.matrix = function(nodes, links) {
return [[…], …]; // two-dimensional array
};

Mike

Rich Morin

unread,
Apr 8, 2012, 12:06:29 AM4/8/12
to d3...@googlegroups.com
Clearly, D3 should make minimal requirements on the input data format.
I also understand that accessor functions can perform mappings from
a wide range of input formats to data arrays, as used by layouts.

However, I'm pretty sure that I could come up with input data formats
that would not play nicely with the accessor function approach (eg,
requiring an expensive lookup for each data value).

Also, I don't see any consistent common interface (ie, "well-defined
output format") in the examples I discussed. Lacking such an interface,
several things become more difficult than they might otherwise be, eg:

* If I want to feed the Flare data to the force-directed layout
(or the Les Miz data to the Hive Plot), I'll need different
sets of accessor functions for each data/layout combination.

* I see no easy way to write generalized data filters (eg, to
select sub-graphs) that can work with all combinations of
input data and layouts.

* In the Chord layout that Kai was showing me, the layout data
is stored in a (non-sparse) array. Filtering out nodes and
edges is far from trivial in this format, whereas it could be
trivial in an intermediate format.

In summary, an intermediate format seems to solve several problems.
It's quite possible, however, that there is a cleaner approach (eg,
a mixture of intermediate data formats and accessor functions).

Let's work toward finding such an approach, while satisfying all of
the use cases that we feel are important.

-r

Kai Chang

unread,
Apr 8, 2012, 1:17:03 AM4/8/12
to d3...@googlegroups.com
I'll add a bit more functionality and create some examples to motivate
the stateful approach. Currently with the chord diagram, I use the
difference in data space to create interpolators for the transition.
There are a few ambiguities when adding/removing nodes though.

For instance, going from a 3x3 -> 3x3 matrix. What if one group had
been removed and another added? A data model could resolve this by
having the chart listen to remove then add events (in that order) so
charts could update properly:

var graph = d3.data.graph().matrix(my_matrix);
graph.on('add.node', function(entering_nodes) {
chart.add(entering_nodes);
});
graph.on('remove.node', chart.remove ); // without unnecessary
wrapping function

There may be a better solution with data-binding. I'm still a bit
fuzzy on how binding data works with layouts. The above pattern is
inspired by Backbone's models, which fire events when data is
modified. So that's my idea for the d3.data namespace: event-firing
stateful models that follow the reusable charts spec. Here are two
other libraries I'm drawing inspiration from:

https://github.com/tinkerpop/gremlin/wiki/Basic-Graph-Traversals
http://substance.io/michael/data-js

Gremlin's graph traversals and filter syntax is particularly appealing
to me. I watched Marko use Gremlin interactively at a meetup last
year, and was stunned how easily he explored a subset of dbpedia.

For now though, pure functions going from the different formats would
be much less complex. I'm not sure about this though:

d3.graph.matrix = function(nodes, links) {
return [[…], …]; // two-dimensional array
};

In the chord layout, chord.matrix expects a matrix. I think any matrix
method should always take/return a matrix.

Mike Bostock

unread,
Apr 8, 2012, 1:28:46 AM4/8/12
to d3...@googlegroups.com
> In the chord layout, chord.matrix expects a matrix. I think any matrix
> method should always take/return a matrix.

Not sure I understand. I was intending my example d3.graph.matrix to
be a function which converted from a nodes + links (adjacency list)
representation to a matrix representation. By "two-dimensional array"
I mean a "matrix".

Mike

Kai Chang

unread,
Apr 9, 2012, 7:21:40 AM4/9/12
to d3...@googlegroups.com
I've exposed the conversion functions so they can be used without the
stateful part of the plugin, and renamed it d3.graph:

https://github.com/d3/d3-plugins/tree/master/d3/graph

I've also added the concept of a traversal, which is analogous to d3's
selections. Traversals could be used for getting a subgraph to bind to
a visualization, or to bind data to the graph itself.

I need to do a bit more research before implementing traversals.
Here's a resource I found on the topic:

http://opendatastructures.org/versions/edition-0.1d/ods-java/node59.html

Kai Chang

unread,
Apr 11, 2012, 1:35:36 PM4/11/12
to Kai Chang, d3...@googlegroups.com
Found a set of slides by Marko on graph representations, including a json adjacency list, and traversals:

Fan Mongxie

unread,
Apr 11, 2012, 2:36:14 PM4/11/12
to d3...@googlegroups.com
Hi Kai,

There is a precedent in successful use of a "grammar of graphics", which you may know, in the so-called ggplot2 R package (http://had.co.nz/ggplot2/), developed by Hadley Wickham. 

The ggplot2 package builds upon the Grammar of Graphics, a "reference" book and (beyond) framework, from Leland Wilkinson (http://www.cs.uic.edu/~wilkinson/).
This grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. Faceting (what would be called "small multiples" in the Edward Tufte's language or trellis graphics according to Cleveland) can be used to generate the same plot for different subsets of the dataset.

The reason why I'm talking about ggplot2 is because I have the impression that it could be a great source of inspiration if we (the d3 community) want to progress in the direction of re-usable charts.

There are many aspects of the ggplot2 package setup and usage which would not apply to d3, and I'm not saying that we should translate ggplot2 in a d3 format. But it looks like this idea of "grammar of graphics" comes close to what Bob and you and others, are getting at ... I think.

May be what could be interesting is to seek for advice from Hadley or Leland, regarding re-usable charts, as they have already walked this path.

Just my two cents ...
Fan


2012/4/11 Kai Chang <kai.s...@gmail.com>

Kai Chang

unread,
Apr 11, 2012, 4:56:55 PM4/11/12
to d3...@googlegroups.com
This discussion is on a subtly different topic than charts. We're talking about graph data structures. Currently various graph json structures are used in the d3 examples for the force directed layout, hive plot layout, chord layout, matrix plot, etc.

The idea here is to break out these data structures, and provides functions to transform graph data into the form convenient for a particular visualization. For example the chord layout uses an adjacency matrix, while the force layout uses a list of nodes and links (usually called vertices and edges in the literature).

There are several design issues to think about, but both nodes and link values will definitely be any JavaScript data structure you like (probably an object). I'm even looking into hyperedges, which may link any number of nodes rather than a single source and target.

This plugin might include utilities for analyzing graphs, such as computing strongly connected components:


The result of the computation could then be mapped to an aesthetic attribute, such as color to indicate a group.

Rich Morin

unread,
Apr 11, 2012, 7:07:10 PM4/11/12
to d3...@googlegroups.com
On Apr 11, 2012, at 11:36, Fan Mongxie wrote:
> There is a precedent in successful use of a "grammar of
> graphics", which you may know, in the so-called ggplot2
> R package (http://had.co.nz/ggplot2/), developed by
> Hadley Wickham.


This sounds interesting, so I did a bit of looking into it.
Here are some reactions and speculation, followed by some
resources that I found useful.

-r


Reactions:

D3 needs a flexible, extensible set of libraries to manipulate
data, create different kinds of charts, etc. Work is proceeding
on a variety of fronts:

* D3 and its examples are under active development.

* Efforts are under way to refactor working D3 examples into
driver scripts and (abstracted, generalized) libraries.

* The work in graphs started with some example data, but Kai
and I abstracted that a bit. Kai is now implementing Real
Code (TM) to do data conversion, add accessors, etc.

However, there is nothing to keep any of us from looking into
alternative approaches (eg, based on ggplot2) if we wish.


ggplot2 is basically a chart creation library for R, based on a
"Grammar of Graphics". So, it is likely to be nicely organized,
general, and theoretically-grounded. It also comes with a user
community, examples, reference implementation, documentation, etc.

So, it may make sense to use the Grammar as a testbed and a way
of organizing and naming (at least some) libraries. In any case,
it can't hurt to give the Grammar a fair evaluation.


I haven't made a detailed comparison, but I believe that D3, SVG,
and the Grammar share many concepts and capabilities. Looking
over the ggplot2 examples, I didn't notice anything that seems to
be out of reach for D3 and SVG. So, a plausible fit.

That said, there appear to be some real differences:

* ggplot2 can take advantage of R's statistical tools, etc.
D3 (unless used as a front-end to R) cannot.

* The D3/SVG platform has features (eg, data manipulation,
some geometric objects, interactivity) that ggplot2 lacks.


Speculation:

Here's an entirely speculative scenario:

* Have ggplot2 generate JavaScript-friendly serializations of
plotting requests (eg, command scripts and supporting data).

* Using D3, accept and implement the plotting requests.

* Create a CI framework, driven by ggplot2 examples and tests.

Are there any R and/or ggplot2 enthusiasts who would like to
give some of this a try?

Resources:

ggplot2: Elegant Graphics for Data Analysis (Use R!)
http://www.amazon.com/dp/B0041KLFRW

The Grammar of Graphics
http://www.amazon.com/dp/0387245448


http://had.co.nz/ggplot2
http://had.co.nz/ggplot2/resources/2007-past-present-future.pdf
http://had.co.nz/ggplot2/resources/2007-vanderbilt.pdf

http://cran.r-project.org/web/packages/ggplot2/index.html
http://cran.r-project.org/web/packages/ggplot2/ggplot2.pdf

Nate Agrin

unread,
Apr 11, 2012, 7:34:05 PM4/11/12
to d3...@googlegroups.com
Have you checked out http://polychart.com/js#about ?

Rich Morin

unread,
Apr 11, 2012, 7:51:22 PM4/11/12
to d3...@googlegroups.com
On Apr 11, 2012, at 16:34, Nate Agrin wrote:

> Have you checked out http://polychart.com/js#about ?

I hadn't seen that. It would be lovely to take advantage of this work,
but the licensing is a poison pill for any sort of inclusion in D3:

For personal use, Polychart is licensed under Creative Commons
Attribution-NonCommercial. This means that Polychart is free for
personal, academic, and non-profit use.

We also provide a licensing options for commercial use. Since
Polychart is still in active development, our current licensing
package will allow your company or organization to use all
versions of Polychart up until version 1.0 at a discounted price.

-- http://polychart.com/js#license

-r

Nate Agrin

unread,
Apr 11, 2012, 8:00:59 PM4/11/12
to d3...@googlegroups.com
Yeah, the license is the biggest issue for us.

I've done a fair amount of investigating into ggplot2, read the book
and much of the R source. It's a really good framework for breaking
down graphics into reusable pieces. But, it doesn't explore any kind
of interactivity or transitions. It's also fairly static when it comes
to things like legend generation, so it depends what you're looking
for out of a graphing library.

-N

Fan Mongxie

unread,
Apr 12, 2012, 5:51:51 AM4/12/12
to d3...@googlegroups.com
Hi Nate, Rich,

A couple of comments:

1/ From the last webinar given by Hadley on the future of ggplot2, hosted by Revolution (http://blog.revolutionanalytics.com/2012/01/hadley-wickham-goes-behind-the-scenes-on-ggplot2.html), it seems like he is getting more and more into html stuff, if not interactive and animated graphics.
Actually Hadley recently released a new R package called 'httr' recently which, if I'm not mistaken, allows you to push R plots into your website.

2/ If my recollection is correct, Hadley contacted Mike in the past already ... so I'm pretty sure the ggplot2 team is keeping an eye on how d3.js is evolving ... and this is not to mention polychart.com.

3/ I think d3.js goes beyond the 'statistical graphics library'. It offers more flexibility I would say, and it's of course an advantage and a burden. So again, my initial proposal was to "seek for advice" from Hadley and/or Leland, rather than to copy/paste :)

Cheers
F


2012/4/12 Nate Agrin <n8a...@gmail.com>

Hadley Wickham

unread,
Apr 16, 2012, 9:38:33 AM4/16/12
to Rich Morin, d3...@googlegroups.com
> I haven't made a detailed comparison, but I believe that D3, SVG,
> and the Grammar share many concepts and capabilities.  Looking
> over the ggplot2 examples, I didn't notice anything that seems to
> be out of reach for D3 and SVG.  So, a plausible fit.

The main thing the d3/js lacks are the statistical functions that
power many of the plots - e.g. loess, quantile regression, density
estimation, boxplots, ... I don't think any of these are too hard to
write by themselves, but in aggregate it's a lot of work.

ggplot2 also tries harder to implement non-Cartesian coordinate
systems. This is rather attractive theoretically, but I'm not sure
how useful it is in practice.

> That said, there appear to be some real differences:
>
> *  ggplot2 can take advantage of R's statistical tools, etc.
>   D3 (unless used as a front-end to R) cannot.
>
> *  The D3/SVG platform has features (eg, data manipulation,
>   some geometric objects, interactivity) that ggplot2 lacks.
>
>
> Speculation:
>
> Here's an entirely speculative scenario:
>
> *  Have ggplot2 generate JavaScript-friendly serializations of
>   plotting requests (eg, command scripts and supporting data).
>
> *  Using D3, accept and implement the plotting requests.
>
> *  Create a CI framework, driven by ggplot2 examples and tests.
>
> Are there any R and/or ggplot2 enthusiasts who would like to
> give some of this a try?

This is very close to possible in the latest version of ggplot2,
because when plotting you also get an (invisible) data frame that has
(almost) all of the data you need to generate the plot. It would be
easy to serialise this to json and then have d3 render it. I've also
been thinking it might be possible to convert ggplot2 code
automatically into d3, thinking more along the lines of creating
something basic that could then be hand tweaked.

I'm going to be in SF for a month this summer - working with
metamarkets to figure out what d3 + ggplot2 equals.

Hadley

--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

Rich Morin

unread,
Apr 16, 2012, 1:04:44 PM4/16/12
to Hadley Wickham, d3...@googlegroups.com
On Apr 16, 2012, at 06:38, Hadley Wickham wrote:
> This is very close to possible in the latest version of ggplot2,
> because when plotting you also get an (invisible) data frame that
> has (almost) all of the data you need to generate the plot. It
> would be easy to serialise this to json and then have d3 render it.

Assuming that this serialization were in place, what would it take
to make R/ggplot2 act as a back-end server for D3? For example, I
wonder whether it might be possible to use a web browser (with D3)
as an interactive front end for R.


Getting even more carried away, it might be interesting to play
with some of Bret Victor's ideas (see http://worrydream.com) on
rapid interaction, etc.

See http://gabrielflor.it/water is an example of what this might
look like. FYI, although the page says to hold down the control
key, on my Mac I found that the option key seems to be needed.


> I've also been thinking it might be possible to convert ggplot2
> code automatically into d3, thinking more along the lines of
> creating something basic that could then be hand tweaked.


This could be very useful (eg, in transforming examples and tests
for use in a combined system). BTW, do you have any thoughts on
how to do automated testing? Does ggplot2 have a test suite?

-r

Hadley Wickham

unread,
Apr 16, 2012, 1:45:19 PM4/16/12
to Rich Morin, d3...@googlegroups.com
> Assuming that this serialization were in place, what would it take
> to make R/ggplot2 act as a back-end server for D3?  For example, I
> wonder whether it might be possible to use a web browser (with D3)
> as an interactive front end for R.

Almost nothing - R already has a built-in server (for documentation,
but it can be (mis-)used for other purposes), or there are more
standard options (like apache etc). I have a half-written port of
Ruby's sinartra to R at https://github.com/hadley/sinartra

> Getting even more carried away, it might be interesting to play
> with some of Bret Victor's ideas (see http://worrydream.com) on
> rapid interaction, etc.

Yes, that would be really cool. See also Jeroen Oonm's web interface
for ggplot2: http://www.stat.ucla.edu/~jeroen/ggplot2/. It's not
quite rapid iteration, but it's a step in the right direction.

> See http://gabrielflor.it/water is an example of what this might
> look like.  FYI, although the page says to hold down the control
> key, on my Mac I found that the option key seems to be needed.

That is cool, but I wonder if it makes it too easy to get distracted
by surface features of the visualisation, instead of thinking deeply
about the problem you are trying to solve.

>> I've also been thinking it might be possible to convert ggplot2
>> code automatically into d3, thinking more along the lines of
>> creating something basic that could then be hand tweaked.
>
>
> This could be very useful (eg, in transforming examples and tests
> for use in a combined system).  BTW, do you have any thoughts on
> how to do automated testing?  Does ggplot2 have a test suite?

ggplot2 has two test suites - a standard unit testing suite
(https://github.com/hadley/ggplot2/tree/master/inst/tests) which tests
data structures, and a visual (regression) testing suite
(https://github.com/wch/ggplot2/wiki/Visual-test-system), that
compares renderings across commits.

Automated testing for graphics is hard, but as I write more tests, the
underlying code becomes more amenable to testing. Some purely visual
tests will always be necessary, but since they require human
intervention, I'd rather keep them to a minimum.

Rich Morin

unread,
Apr 16, 2012, 8:01:21 PM4/16/12
to Hadley Wickham, d3...@googlegroups.com
I'm becoming more and more convinced that ggplot2 and D3 have a lot
to offer each other. For the folks who haven't purchased your book,

gplot2: Elegant Graphics for Data Analysis (Use R!)

http://www.amazon.com/dp/0387981403
Hadley Wickham; Springer

here are a couple of relevant pull quotes (readable in context
via Amazon's "First Pages" feature and on Google Books,
http://books.google.com/books?id=F_hwtlzPXBcC):

Wilkinson (2005) created the grammar of graphics to describe
the deep features that underlie all statistical graphics. The
grammar of graphics is an answer to a question: what is a
statistical graphic? The layered grammar of graphics (Wickham,
2009) builds on Wilkinson's grammar, focussing on the primacy
of layers and adapting it for embedding within R.

In brief, the grammar tells us that a statistical graphic is a


mapping from data to aesthetic attributes (colour, shape, size)
of geometric objects (points, lines, bars). The plot may also
contain statistical transformations of the data and is drawn on

a specific coordinate system. Faceting can be used to generate
the same plot for different subsets of the dataset. It is the
combination of these independent components that make up a
graphic.

and

It does not describe interaction: the grammar of graphics
describes only static graphics and there is essentially no
benefit to displaying on a computer screen as opposed to on a
piece of paper. ggplot2 can only create static graphics, so
for dynamic and interactive graphics you will have to look
elsewhere. ...


Like the Grammar, D3 thinks in terms of mappings "from data to


aesthetic attributes (colour, shape, size) of geometric objects
(points, lines, bars)".

Also, the same declarative style is found in both D3 and ggplot2
calls. In summary, D3 users will find quite a bit of familiar
thinking in the Grammar and its ggplot2 implementation.

Syntactic issues aside, the major differences seem to be that:

* D3 doesn't understand statistics and has no conceptual
framework (or preconceptions) about how data graphics can
and should be presented.

* The Grammar doesn't cover dynamic and interactive graphics,
nor do I see any support for graph and network analysis.
However, I see no reason why the Grammar could not be
extended to support these and other capabilities.


Following up on the question of rapid interaction, I'd love to see
a "workbench", based on D3, ggplot2, and the Grammar. It would let
the user experiment with both high- and low-level controls on how
the data is being presented.

Although I agree that this could be a danger:

> ... I wonder if it makes it too easy to get distracted by surface


> features of the visualisation, instead of thinking deeply about
> the problem you are trying to solve.


I believe that users would quickly evolve (or learn) an approach
which allows them to experiment with various aspects of selection,
aggregation, presentation, etc. ggplot2 reduces the hassle of
creating statistical graphics in R; the user need only type in a
few parameters to see a different presentation. The workbench I
have in mind would simply carry that to a new level of speed and
convenience.


Finally, a note on representation. I think it should be possible
to define a JSON serialization for ggplot2 calls. For example:

qplot(carat, price, data = dsmall, geom = c("point", "smooth"))

might look something like:

{
'data': 'dsmall',
'geom': 'c("point", "smooth")',
'x': 'carat',
'y': 'price'
}

Encoding the parameters in object form is not just a syntactic
change. It could let D3 and its users manipulate the graphing
parameters (and results) in a dynamic, interactive fashion.

Reply all
Reply to author
Forward
0 new messages