Using arrays as keys for crossfilter or "how do you build a d3 sunburst / force layout / chord chart that works with dc.js"

1,690 views
Skip to first unread message

Blair Nilsson

unread,
Sep 6, 2014, 3:47:42 AM9/6/14
to dc-js-us...@googlegroups.com
Be warned, there is a lot of badly written javascript ahead - and you will have to know how to draw a d3 sunburst chart already. This is just explaining how to intergrate it with crossfilter and DC.js.

As a reference, our data looks something like....

YEAR,COURT,OFFENCE,AGEGROUP,SENTENCE,Value
2000,10,012,F,28,1  
2000,10,012,F,T,1  
2000,10,0132,D,7,1  
2000,10,0132,D,T,1  
2000,10,0132,G,28,1  
2000,10,0132,G,T,1  



Crossfilter doesn't do hierarchical data so, dc.js doesn't normally do so either, but we have never let this stop us in the past, so here we go.

step one, turn one of your dimensions into an array. This will give you the hierarchical aspect to the data.
In my case, we are building a new courts data visualization. 

One of our columns in our tsv is an offence code. It breaks down like this

01 Homicide And Related Offences
010 Homicide And Related Offences not further defined
011 Murder
012 Attempted murder
013 Manslaughter and driving causing death
0131 Manslaughter
0132 Driving causing death
02 Acts Intended To Cause Injury
021 Assault
029 Other acts intended to cause injury
03 Sexual Assault And Related Offences
031 Sexual assault
0311 Aggravated sexual assault
0312 Non-aggravated sexual assault
032 Non-assaultive sexual offences

The first 2 numbers is the main categories, each digit after that gives us a sub category.

We can use this to turn it into an array (a little bit of underscore is used here)

function parseOffenceCode(code) {
  result = [];
  result.push(parseInt(code.substring(0,2)));
  _.each(code.substring(2), function (c) {
    result.push(parseInt(c))
  });
  return result;
}

so.... a 'Murder' would have a key of [1,1] where as a 'Non-aggravated sexual assault' would have a key of [3,1,2]

All good! (Well as good as conversations about Murders etc can be)

so, we push this into crossfilter.

offencesDim = ndx.dimension(_.property('OFFENCE'))
offencesGroup = offencesDim.group().reduceSum(getValue);

Soooo

if we call offencesGroup.all()

we will get something that looks like.... 

[{"key":[1,0],"value":12},
{"key":[1,1],"value":736},
{"key":[1,2],"value":238},
{"key":[1,3,1],"value":806},
{"key":[1,3,2],"value":528},
{"key":[10,1],"value":1700},
...
]

huh... this looks like all the stuff you would want for a hierarchy.

so... we will need a function to turn the results into one.


function buildHierarchy(list) {
  var root = {"name": "root", "children": []};
  for (var i = 0; i < list.length; i++) {
    var parts = list[i].key;
    var value = +list[i].value;
    var currentNode = root;
    for (var j = 0; j < parts.length; j++) {
      var children = currentNode["children"];
      var nodeName = parts[j];
      var childNode;
      if (j + 1 < parts.length) {
        var foundChild = false;
        for (var k = 0; k < children.length; k++) {
          if (children[k]["name"] == nodeName) {
            childNode = children[k];
            foundChild = true;
            break;
          }
        }
        if (!foundChild) {
          childNode = {"name": nodeName, "children": []};
          children.push(childNode);
        }
        currentNode = childNode;
      } else {
        // Reached the end of the sequence; create a leaf node.
        childNode = {"name": nodeName, "value": value};
        children.push(childNode);
      }
    }
  }
  return root;
}

ok, it isn't pretty, and there are most likely 100 better ways of doing this, but.... the results are, it is in d3's standard json format for hierarchical data. :)

you can now build yourself a sunburst with it (we warned this is a pretty frustrating thing to do)

I'm not putting the code for doing this here, since there is a lot of it. 

The data that is in flare.json, we already have if we call
buildHierarchy(offencesGroup.all())

Integrating this with dc.js

You will need a function to redraw your sunburst chart.

  function redrawSunburst() {
    tree = buildHierarchy(offencesGroup.all());
    var nodes = partition.nodes(tree)
    path.data(nodes)
         ...
     }


and on the end of each chart, you will need to have 
    .on('filtered', redrawSunburst);

so, when you filter anything, your sunburst chart gets redrawn.

likewise, if you put an onclick or something on your chart, and you want to filter the dc.charts....
then call .filterExact on the dimension you are drawing your sunburst chart with (in my case, offencesDim) then call dc.redraw() to get dc to redraw the charts.

Now you have 2 way filtering with your d3 sunburst chart, and dc.js.

I'm putting up some examples soon (just putting in the last few bits)

--- Blair

Blair Nilsson

unread,
Sep 6, 2014, 5:25:14 AM9/6/14
to dc-js-us...@googlegroups.com
An example is here - http://111.69.97.105/court-visualization/

until I can get it to a better home (will work for a couple of days)

Matt Traynham

unread,
Sep 7, 2014, 11:43:49 PM9/7/14
to dc-js-us...@googlegroups.com
Hey Blair,

Just as a performance optimization, it's actually better to use Objects instead of Arrays.  But... only because dc.js uses the filterFunction instead of Crossfilter's filter/filterRange functions.

When Crossfilter initially creates a dimensional key index, it does a pretty optimized two-way quick sort.  The sort is for the filter and filterRange functions, so it can binary bisect the dimension for fast filtering.  In dc.js' case, this is unnecessary because the filterFunction will loop every record anyway

Here's why it's better to use Objects:
The quick sort coerces values to a usable representation for comparison.  Arrays can be coerced for comparison; Objects cannot.  The quick sort should complete in O(n) time instead of O(nlogn), because every value will equal the pivot point.  This specific implementation, might actually have a quicker optimization for large amounts of equal keys, improving performance even more.


For people like me, that have literally hundreds of rows in a dataset, the initial load time of a dimension matters.  Array keys with long strings will dramatically reduce performance.

By using <= and >= (which are coercive), you can try out Array vs Object coercive comparison for yourself:

> var x = [1, 2];
> undefined
> var y = [2, 3];
> undefined
> x <= y
> true
> x >= y
> false
>
> var x = {a: 1, b: 2};
> undefined
> var y = {x: 2, b: 3};
> undefined
> x <= y
> true
> x >= y
> true

Matt Traynham

unread,
Sep 7, 2014, 11:59:53 PM9/7/14
to dc-js-us...@googlegroups.com
A comparison of Array vs Object initial dimension load on 200,000 items:
Array:

Object:

Blair Nilsson

unread,
Sep 8, 2014, 12:48:08 AM9/8/14
to dc-js-us...@googlegroups.com
I'll do some performance testing around that!

Array loads are FAR easier for people to do (which is a gain) - (see http://111.69.97.105:8888/web/examples/sunburst-cat.html)
filePathDimension = ndx.dimension(function(d) {return d.file.split('/');})

or

clientProjectDimension = ndx.dimension(function(d) {return [d.client,d.project];})

I have a look into it for when we filter 200,000 rows or so in crossfilter. If it makes a large enough change there, then it is most certainly worth building a helper for.

Thanks for the feedback!!! :)

Matt Traynham

unread,
Sep 8, 2014, 10:39:02 AM9/8/14
to dc-js-us...@googlegroups.com
I just gotta say, that example is freaking cool!  I get what you're saying about arrays representing the hierarchy:

[root, level2, level3...]

In that aspect, you could just use d3.nest to represent a tree hierarchy as well:

{key: a, values: [{key: b, values: [key: c, values: 3]}]};

But if you are looking for the dirt simple performance test, just wrap the array into an object:

clientProjectDimension = ndx.dimension(function(d) {return {keys:[d.client,d.project]};})


Blair Nilsson

unread,
Sep 9, 2014, 2:13:05 AM9/9/14
to dc-js-us...@googlegroups.com
Ok, I've had more of a look... 

Would the ability to use a path (or key) accessor to pull the array out work for you?

So... normal usage would look like....


filePathDimension = ndx.dimension(function(d) {return d.file.split('/');})

fast loading usage would be 
filePathDimension = ndx.dimension(function(d) {return {path:d.file.split('/');}})

with a PathAccessor like
.pathAccessor(function (d){return d.key.path;});

which will pull the key out of the path.

This should give you the speed you want, and the simplicity for the normal use. I'll test it to see if it works.

in...@madebydna.com

unread,
Sep 10, 2014, 12:34:54 PM9/10/14
to dc-js-us...@googlegroups.com
Hi Blair,

Great work on adding a sunburst chart to dc! I actually have a need for something like that and have been working on my own version. I wanted to give your code a try with my own data but your example links are broken right now. I could access them yesterday, so maybe this is temporary?

Anyway, could you please post the sunburst chart code somewhere. I'm excited to give it a spin ;-)

Thanks,

Andrea 
Message has been deleted

Blair Nilsson

unread,
Sep 10, 2014, 5:54:33 PM9/10/14
to dc-js-us...@googlegroups.com
ok, it is back up and running.

It is my home machine, so when I reboot into windows to play games the demo goes offline :). I've booted it back up this morning.

This will give you links to the dc.js fork, and the mini version

The filepath demo is the only really useful one of the ones I have made. Use that to see the 


Andrea Singh

unread,
Sep 12, 2014, 8:05:24 PM9/12/14
to Blair Nilsson, dc-js-us...@googlegroups.com
I've successfully used the new dc.sunburstChart code with my own data, which is great! I have a block in preparation that I will link to when I'm finished. However, to recreate my use case, I actually need a zoomable sunburst as in this example:


I've been trying to tweak the code to implement that behavior but I'm realizing that a zoomable sunburst is different in some fundamental ways, as it implies that there can only be a single active filter rather than multiple concurrent filters as in the pieChart, etc. Digging into the dc code a bit, I've seem something similar in the coordinateGridMixin where the replaceFilter() method is utilized on zoom and focus. 

A zoomable sunburst would need some extra attributes such as a x and y scale, which will need to be reset according to the x / y / dx / dy attributes of the currently zoomed node. Generally, it may be nice to add a current_node attribute to the chart so that the value attribute encoded in the node is accessible in listeners for example (useful for displaying percentages, for instance). 

I'm wondering if this kind of zoom behavior would be better implemented as an option (e.g. chart.zoomable(true) ...) or as a separate chart type? Any comments on that?
--
~ Andrea

Blair Nilsson

unread,
Sep 14, 2014, 8:12:18 AM9/14/14
to dc-js-us...@googlegroups.com, blair....@gmail.com
Opps, we ended up having an email conversation. I'll give the Tl;Dr of it here... which is yes, I'll implement that once I have finished all my tests. :)

Reply all
Reply to author
Forward
0 new messages