Accessing nested data?

2,973 views
Skip to first unread message

Majella

unread,
Jul 14, 2012, 11:38:00 AM7/14/12
to d3...@googlegroups.com
I have large .csv file with alot of repeating values and I need to implement a nested array to group the data by region.   

Original code (worked - data just not grouped):
this.data = data;

With nesting(tested  -  nesting correctly implemented))
this.data =  d3.nest().key(function(d) { return d.region; }).entries(data);

Know data is now structured differently  - just not sure how to access it.

Original code to access data( worked - again data just not grouped):
this.circles = this.vis.selectAll("circle").data(this.nodes,function(d) {return d.id;});

With nesting(not working??)
this.circles = this.vis.selectAll("circle").data(this.nodes,function(d) {return d.key;}); 

csv file sample:
id,region,country,us_country_code,iso2_code,main_trade-group,imports_to,exports_from,total_trade,group,trade_balance,year
1,Africa,Algeria,7210,DZ,OPEC,15455.9,1101.9,16557.8,high,-14354,2006
2,Africa,Algeria,7210,DZ,OPEC,17816.1,1652.4,19468.5,high,-16163.6,2007
3,Africa,Algeria,7210,DZ,OPEC,19354.8,1243.2,20598,high,-18111.6,2008
4,Africa,Algeria,7210,DZ,OPEC,10717.8,1107.8,11825.6,high,-9610,2009
5,Africa,Algeria,7210,DZ,OPEC,14518,1194.7,15712.6,high,-13323.3,2010
6,Africa,Angola,7620,AO,OPEC,11719.2,1388.8,13108,high,-10330.3,2006
7,Africa,Angola,7620,AO,OPEC,12507.6,1242,13749.6,high,-11265.5,2007
8,Africa,Angola,7620,AO,OPEC,18911.3,2019.2,20930.5,high,-16892.1,2008
9,Africa,Angola,7620,AO,OPEC,9338.9,1423.1,10761.9,high,-7915.8,2009
10,Africa,Angola,7620,AO,OPEC,11939.6,1293.6,13233.2,high,-10646,2010
11,Africa,Benin,7610,BJ,,0.6,115.5,116,low,114.9,2006

I'd really appreciate any help with this.

Thanks,

Majella


Zack Maril

unread,
Jul 14, 2012, 10:10:49 PM7/14/12
to d3...@googlegroups.com
Try `console.log(this.data)` and open up the terminal in chrome. That should show you how the new object is now structured. Good on you for using d3.nest().
-Zack

Majella

unread,
Jul 15, 2012, 8:41:12 AM7/15/12
to d3...@googlegroups.com
Thank you - did that and  "this.data" structured as follows:

[Object,Object,Object,Object, Object,Object,Object,Object]

with each object broken down into following structure:

key:"Africa"
values:Array[139]
_proto_:Object

..and so on.

So now to access data:

this.circles = this.vis.selectAll("circle").data(this.nodes,function(d) {return this.data;});  (working)

The only problem I have now is that wherever I'd previously used d. as a prefix to access data now showing d. as "undefined".  
e.g
 .attr("id", function(d, i) {return "bubble_" + d.id;})

Know I need to change this but tried changing d.id to this.data.value.id...then this.data[].values[].id ...then this.data[0].values[0].id... but nothing seem to be doing the trick.

Can anyone suggest how I need to correctly word the code?

Thanks,

Majella

jerome cukier

unread,
Jul 15, 2012, 10:42:46 AM7/15/12
to d3...@googlegroups.com
Hi, my question - why do you need to nest it to begin with?

when I do it's to match the structure of my visualisation, and have as many levels as I'd have groupings. 
so I would have say:

svg.selectAll("g").data(data).enter()
     .append("g")
     .[...] // some operations on the level of the groups
      .selectAll("circle").data(function(d) {return d.region;}).enter()
          .append("circle");

some context as to what you want to show would help suggesting how to turn the code.

Chris Viau

unread,
Jul 15, 2012, 11:18:05 AM7/15/12
to d3...@googlegroups.com
Jerome Cukier is right and you probably don't have to nest at the first place, but here is how you would do it with your nesting: http://jsfiddle.net/yKTZ7/1/

d3.select('body').append('svg')
    .selectAll("g")
    .data(data)
    .enter().append("g")  
    .attr('transform'function(di){return 'translate('+(i*50)+', 0)';})        
    .selectAll("text")
    .data(function(d{return d.values;})
    .enter().append("text")
    .text(function(di){return d.country;})
    .attr('dy'function(di){return (i+1)*16;});

Chris
Message has been deleted

Majella

unread,
Jul 15, 2012, 3:21:17 PM7/15/12
to d3...@googlegroups.com
Sorry..correct url for gist:  

g...@gist.github.com:4c6836179dee643d82ca.git

jerome cukier

unread,
Jul 16, 2012, 5:41:44 AM7/16/12
to d3...@googlegroups.com
OK I get it (hopefully), what you need is the rollup function.


with a file like this, if you do: 
d3.nest().key(function(d) {return d.region;})
  .rollup(function(rows) {return {
     imports_to:d3.sum(rows, function(d) {return d.imports_to;}),
     exports_from:
d3.sum(rows, function(d) {return d.exports_from;}), 
          total_trade: d3.sum(rows, function(d) {return d.total_trade;}),  
          trade_balance: d3.sum(rows, function(d) {return d.trade_balance;})
  };})
  .entries(csv); // or whatever the variable you are using to read your file

You will get a variable with one entry by region. rollup is the function that aggregates. So it's run on all the rows corresponding to each key or keys combination from above. 
and just because you have this variable, doesn't mean you can't have, in parallel, other arrangements of this data with one line per country, or one line per country/year, etc.

best, 
j

Hope that helps

On Sunday, July 15, 2012 6:23:05 PM UTC+2, Majella wrote:
What I'm working on is an animated bubble chart and would like the opening graph to depict the world regions which can be filtered by year - clicking any of the bubbles will then depict the countries within that region - and again can be filtered by year. 

Jerome - In the .csv file each country has the name of the region it belongs to, therefore there are alot of repeating values (see sample above). I don't want a bubble for each region as it corresponds to a country - would like one for each distinct region hence the need for nesting.

Chris - Thanks for the example but that doesn't work within my code structure - see gist(not currently displaying chart): https://gist.github.com/4c6836179dee643d82ca

Below is screenshot of chart before attempted to implement nesting - as you can see there is a bubble for each region entry - not each distinct region entry.


On Sunday, July 15, 2012 6:23:05 PM UTC+2, Majella wrote:
What I'm working on is an animated bubble chart and would like the opening graph to depict the world regions which can be filtered by year - clicking any of the bubbles will then depict the countries within that region - and again can be filtered by year. 

Jerome - In the .csv file each country has the name of the region it belongs to, therefore there are alot of repeating values (see sample above). I don't want a bubble for each region as it corresponds to a country - would like one for each distinct region hence the need for nesting.

Chris - Thanks for the example but that doesn't work within my code structure - see gist(not currently displaying chart): https://gist.github.com/4c6836179dee643d82ca

Below is screenshot of chart before attempted to implement nesting - as you can see there is a bubble for each region entry - not each distinct region entry.


On Sunday, July 15, 2012 6:23:05 PM UTC+2, Majella wrote:
What I'm working on is an animated bubble chart and would like the opening graph to depict the world regions which can be filtered by year - clicking any of the bubbles will then depict the countries within that region - and again can be filtered by year. 

Jerome - In the .csv file each country has the name of the region it belongs to, therefore there are alot of repeating values (see sample above). I don't want a bubble for each region as it corresponds to a country - would like one for each distinct region hence the need for nesting.

Chris - Thanks for the example but that doesn't work within my code structure - see gist(not currently displaying chart): https://gist.github.com/4c6836179dee643d82ca

Below is screenshot of chart before attempted to implement nesting - as you can see there is a bubble for each region entry - not each distinct region entry.


On Sunday, July 15, 2012 6:23:05 PM UTC+2, Majella wrote:
What I'm working on is an animated bubble chart and would like the opening graph to depict the world regions which can be filtered by year - clicking any of the bubbles will then depict the countries within that region - and again can be filtered by year. 

Jerome - In the .csv file each country has the name of the region it belongs to, therefore there are alot of repeating values (see sample above). I don't want a bubble for each region as it corresponds to a country - would like one for each distinct region hence the need for nesting.

Chris - Thanks for the example but that doesn't work within my code structure - see gist(not currently displaying chart): https://gist.github.com/4c6836179dee643d82ca

Below is screenshot of chart before attempted to implement nesting - as you can see there is a bubble for each region entry - not each distinct region entry.

Majella

unread,
Jul 16, 2012, 5:48:49 PM7/16/12
to d3...@googlegroups.com
Thank you Jerome that was a great help, but still have the same problem as before - everywhere else in the code where d. is referenced is showing as "undefined" e.g d.radius, d.id, etc. - so the graph still not loading?

Majella

unread,
Jul 17, 2012, 8:56:04 AM7/17/12
to d3...@googlegroups.com
e.g if I try console.log(this.data[0].values[0].id[0])  I get "1" which is the correct outcome, but if I try this:

this.circles = this.vis.selectAll("circle").data(this.nodes,function(d) {return this.data.values.id;});  to try and access the id it is showing as "undefined" - am I wording this incorrectly?

Ger Hobbelt

unread,
Jul 17, 2012, 1:36:18 PM7/17/12
to d3...@googlegroups.com
Hard to tell from here, because I don't know the exact relation between 
  this.nodes
being fed to data(...) and  this.data or this.data[0]  in your console.log, but the general reasoning for this kind of thing goes like this:


when you feed  this.data  to data() (hence I'm assuming here that this.nodes ~ this.data), like:
  selectAll(...).data( this.data )
and you have set up things so that the key to be fed for each of the data[] elements that selectAll is going to look at is  this.data[ i ].values[0].id[0] for item i=0, and so on, then the key function becomes:
  selectAll(...).data( this.data, function(d) {  return d.values[0].id[0]; })
eqv.:
  ref = this.data;
   selectAll(...).data( ref, function(d, i) {  return ref[i].values[0].id[0]; })

Of course, what makes me wonder whether you really want /that/ and not something else entirely, are all those [0] array element accesses in there. Again, I need to guess, but it /looks/ more sensible when the code reads like this (and thus this.nodes ~ this.data[0].values ):

  selectAll(...).data( this.data[0].values, function(d) {  return d.id[0]; })

which will take as 'key' the .id[0] value for each object element in the values[] array.


[..... hold on a sec', having a look at your gist.....]

okay. Well, since it's a private gist, I can't close or push, and I'm a bit loath to create a public one for the purpose, so here's the diff patch:

Ger Hobbelt@FIFI /d/h/prj/1original/gist-4c683617 (master)
$ diff -u -w -B BubbleChart.js BubbleChart-ger-hacked.js
--- BubbleChart.js Tue Jul 17 18:46:16 2012
+++ BubbleChart-ger-hacked.js Tue Jul 17 18:46:16 2012
@@ -21,9 +21,35 @@
this.redraw = __bind(this.redraw, this);

//NESTING THE DATA CORRECTLY
- this.data = d3.nest().key(function(d) { return d.region; }).entries(data);
+ this.data = d3.nest()
+ .key(function(d) {
+ return d.region;
+ })
+ .rollup(function(rows) {
+ return {
+ rows: rows, // to make sure the UNaggregated data makes it into the resulting array too!
+
+ // blunt hack START; just get me some friggin' visuals here!
+ id:rows[0].id,
+ region:rows[0].region,
+ country:rows[0].country,
+ radius: 20,
+ //value: rows[0].total_trade,
+ //importsto: rows[0].imports_to,
+ //exportsfrom: rows[0].exports_from,
+ //tbalance: rows[0].trade_balance,
+ group: rows[0].group,
+ year: rows[0].year,
+ // blunt hack END
+
+ imports_to: d3.sum(rows, function(d) {return d.imports_to;}),
+ exports_from: d3.sum(rows, function(d) {return d.exports_from;}),
+ total_trade: d3.sum(rows, function(d) {return d.total_trade;}),
+ trade_balance: d3.sum(rows, function(d) {return d.trade_balance;})
+ };
+ })
+ .entries(data);
console.debug(this.data);
- alert(JSON.stringify(this.data));

this.vis = null;
this.nodes = [];
@@ -49,9 +75,10 @@
BubbleChart.prototype.create_nodes = function() {
var _this = this;
this.data.forEach(function(d) {
- var node;
+ var node, key = d.key;
+ d = d.values; // quick hack to get at the d3.nest goodness.
node = {
- id:d.id,
+ id:key, // d.id,
region:d.region,
country:d.country,
radius: _this.radius_scale(parseInt(d.total_trade)),


which is what I had to do to get some initial visuals out of there. Yes, filtering and all the rest is FUBAR, but then again it needs some love before it'll work after the d3.nest anyway. .id isn't 'working' as it's not assigned by the time it's used in create_nodes().

Couple of things:

- the patch includes the rollup as suggested by Jerome Cukier (thanks Jerome for the tutorial BTW; I hadn't worked with d3.nest before and first reading your tut before reading the API and then the source code itself (because the API reads like an old skool UNIX man page to me ;-) , i.e. "you get it when you got it")

- as nest+rollup() will produce an array of {key: x, value: y} objects, AND my take is you'ld probably want the UNaggregated data in there too, i.e. a bit more than just a "SELECT SUM(trades) FROM data_table GROUP BY continent" sorta thing ;-), I added a sneaky 'rows' element to the values produced by rollup(): that way you get to have both the sum/avg/etc. 'group by' goodness of rollup() AND the individual entries that led to that grouped-sum as a subarray.

- The test of your code in the gist was still living in the pre-d3.nest() days or so it seems; see the 'blunt hack' section to make d3 produce SVG output without 'r=NaN' and more like that -- dug that one up as no JS errors showed up in Safari, but the display remained pristine; inspect element and found that oopsie.




One general comment:

Given the issue you are running into, and the featureset suggested by your gist, it feels almost like it's something I might have done, or do: getting done with most of the 'Schnick Schnack' (a.k.a. 'bling bling') and then having the main engine go clunker on you because something fundamental just had to change. Cut to the next scene, where you see someone repeatedly banging their head against a solid brick wall and it (for both values of 'it') is slowly turning red.

Advise: get back to the drawing board. Take your data, set up a basic gist using only d3 and NO bubblechart template code or whatever extras, and then load and plot the data from scratch. (gists are great for that, as you've shown they don't have to be public). I'ld do that, because I got pretty dizzy from all the __bind() stuff, etc. and since I've got the proverbial IQ that goes with my hair color (soaring RGB values), I would have to sit back and rethink after seeing what's there and observing how d3.nest()/rollup() does things. (It's my first time with them buggers too.)
(For example:
- I think I can see a way so that filtering and (nested) grouping can be done in a systematic way (one or two codes to serve them all), but I would find it hard to do in the current setting. Not that the alternative makes it easy pie, but then I'ld be able to concentrate better.
- naming fundamentalism: either use the column names from the CSV if they're good to go (not this time, they aren't, e.g. "main_trade-group") or do the transform to 'good names' immediately after loading.  Now there's code seeking .importsto and then there's also . imports_to. And yes, this is your neighbourly anal retentive bastard speaking.)

If the description doesn't fit, discard advise at will.



Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Majella

unread,
Jul 17, 2012, 1:46:30 PM7/17/12
to d3...@googlegroups.com
Thank you Ger for taking the time to look at it and help:)  Still reading through it but I'll let you know how I get on:)
Message has been deleted
Message has been deleted

Ger Hobbelt

unread,
Jul 20, 2012, 7:08:40 PM7/20/12
to d3...@googlegroups.com

TL;DR: cleaned, got rid of classical OOP symptoms hemorrhaging through the place, year filtering done in one spot = simplistic parameterized filtering. One inflexible design decision kept intentionally. (Read more if ...)





<general rant mode on>

I got very sick of all the __bind __bind  __bind dung and got rid of it entirely; JavaScript is a language with closure support and prototypical inheritance and the abundance of __bind() always points at developers who are desperately trying to cling to 'classical OOP' and ditto inheritance schemes: that stuff is for C++, Java, etc. but using JS that way is both driving you bonkers (because 'this' will never be what you expect it to be when you stick to that way of thinking) and severely limiting yourself, because you'll have FUBARred all the useful native bahaviour alongside.
I know, I have been there myself and done all that.
However, a certain library and a certain Mike made me <insert sarcasm/> hate his guts because his code and my drive to understand what he did implicitly forced me to get back to square one and learn JavaScript like I've always learned languages: the hard way, i.e. by reading up on the reference manual and in the case of JavaScript munching over a series of articles about closures and 'this' peculiarities in JavaScript, and only after I got all that, get back and try to get something done.

I'm grateful that I was forced to retrace my steps like that and do it the old-fashioned way once more; I've learned a lot and as a result got unstuck on quite a few bits that had me completely fazed before. JavaScript was my second language that I learned like everybody else seems to learn their languages: copy-pasta-monkey-banging-ooh-shit-banana-is-getting-away, and I can tell you: it's all instant gratification, but when you're even a little stuck, you're dead in the water. Crocodile food.

Have a look at the gist Bubblechart.js code and breathe a sigh of relief as the whole __bind(yack yack yack) and this.this and this.that crowd, which was crapping their mark all over the bloody place, is ... GONE.
Makes for much more readable code; yes, to be 'readable', one needs to understand what a closure is, but that's requirement #1 for grokking JavaScript anyway. Me? Been stupid, done that, got the scars to prove it.

Now the code is much more similar to the way d3.js itself is coded; it's not a coding style, it's making good use of the JavaScript language (functions being first class citizens, for example). See how I altered the Bubblechart: no more 'new', just call and get a chart object, next you're using closures,etc. under the hood to get at all the chart goodness when you call a public member function.


Second rant: pretty please, stick to one name for a (member) variable. (Bonus: always clean up incoming data before processing. It saves a LOT of hassle. see how I go and convert incoming elements to numerics, etc., before barging on)

Cases in point:
- CSV has column 'main_trade-group' (ooch!), so first order of the day is moving that bugger from el['main_trade-group'] to el.main_trade_group. See the new bubblechart code.
- Second, tooltips looked at objects with el.tbalance and el.importsto, while the bulk of the code worked with the 'original' node elements which have el.total_balance and el.imports_to. Now everybody looks at total_balance and imports_to. One name serves everywhere, saving another multiline object copy.

I'm betting good money on that the first rant is for someone else as my bet is you copied the basic bubblechart code from someplace and took off from there. Anyway, have a look and see if you like it, or not.
The second one is basic developer discipline 101. (And I loath the word discipline, but here it applies. Please, do yourself a favor and be strict in what you code.)

</ general rant mode off> 



Ah. got that off my chest.

So, the new code has the 'filter-by-year' built in. It wasn't a matter of 'which single line needs to change' as the basic approach was too inflexible to allow for easy filtering. The current code works, but be warned: it is 'final' in that further enhancement of the d3.nest-originated structure, used this way, is still inflexible: exploding the nodes on click to show individual countries would add another layer of custom code to make it work and your head will start to buzz. I did it this way as it's most similar to the starting point in terms of code and flexibility, and there should be some 'considering how to do this, really' on the road ahead.

Advise: analyze what the current codebase does; notice the ugliness of the code when using two levels of d3.nest()ing here and consider how you'ld do the filtering and create-nodes-from-consolidated-data process bits if you only have one nest level at any one time, ever.

<spoiler alert>
One way of coding the 'data cube visualization' here would be to recreate a suitable d3.nest+rollup dataset, which would be specifically created for the active filter+expansion set. It doesn't mean adding more nest.key levels, but ditching the entire d3.nest result and doing it from scratch. That is a major difference from the original, as it means the 'raw data' must be preserved alongside - the original code didn't do this; the current code has this already set up, so it should be relatively easy to take the d3.nest-related stuff down and both reduce it to a single level and reinvoke it with the correct keys and filters on click events. The current code takes the second d3.nest level to filter, but that is not flexible if you want to approach the data from arbitrary dimensions.

If I would have to code it, I'd do my filtering (zero or more criteria) first, then do a d3.nest+rollup on the remainder. This process would rinse&repeat at each user click on a filter or zoom (expand/consolidate) command, e.g. when year filtering.

For very advanced systems, one can go and precalculate such 'cross-sections', but that's several bridges too far now, I guess.
</spoiler alert>


I brought back the 'All' button to help show the filter behaviour when a non-year filter value is passed into the filter code: when it's not part of the list of available years, simply all data is rolled up and displayed.


Also added: next to nodes, now also some 'links' are generated to help the force layout to keep same-group blobs together. It's one way of doing this, certainly not the only way, but in a force layout it works. Sort of: it's no _guarantee_ that same-group nodes are kept together, but it's a major influence in making it so. linkStrength is the one you want to tweak if the 'clustering together' isn't to your liking.
Note that node.sort doesn't really help a force layout; it can subtly influence it but the controlability is nil. node sorting is more something for a pack layout.


Removed: the 'move to center' logic. That's a copy of the 'gravity' which is already built into the force layout itself; as the center is at x=width/2 and y=height/2 it's only a rewrite of what's already in there. Negative gravity might be useful somewhere, but not here. gravity=-.001 --> +0.2 or thereabouts. Like cooking: adjust to taste.

'Charge' is used to keep nodes apart; it's not exactly collision detection/avoidance, but can be used as something similar to that; I tweaked your charge formula: divisor 7 --> divisor 5. This is to compensate for the gravity and above all the new 'links' influences, which pull nodes together, if you don't 'negative charge' them.

Last bit: the create_chart()+start() vs. redraw() logic has been refactored; chique wording for merging two chunks of code which were 90% identical. ;-)



When you fetch the repo, the commits show the consecutive stages of me going from 'original' to this end result; looking at with a good diff viewer might help to see how this came to be.  (I use Scooter Software's Beyond Compare; a happy user of 10+ years of use there; it is very neat as it copes very well downplaying whitespace diffs in visualization)



Fixing the 'inflexible design' bit re filter chaining and expanding to show countries ~ treating data as a true data cube, is left as an exercise for the reader ;-) - this was done to aid learning: if I deliver this 'ready to go', all you're left with is a lacklustre 'Aha Erlebnis' at best.



Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------



On Wed, Jul 18, 2012 at 5:19 PM, Majella <majella...@inbox.com> wrote:
Sorry should have been more specific.  At the moment only the first set of data in the array are returned as per code:

year: rows[0].year

..so the filter (see code for filter in gist) is working perfectly for the year 2006 which is the one returned within this first set of data , but how do I access all rows in the nested array?



On Wednesday, 18 July 2012 12:42:52 UTC+1, Majella wrote:
Ger - Thank you for the help and it worked perfectly.  Stuck with the filtering now though - previous code just not working?

Created public gist:   https://gist.github.com/3135720 

Majella

unread,
Jul 21, 2012, 2:27:50 PM7/21/12
to d3...@googlegroups.com
Don't know where to start!  Thank you for all your help - you're a star!:)
Reply all
Reply to author
Forward
0 new messages