d3.nest() "internal" and "external" keys

1,224 views
Skip to first unread message

Iain

unread,
Aug 2, 2011, 2:40:08 PM8/2/11
to d3-js
Hello there,

I've been using d3.nest() to group elements in an array into a
hierarchy. However, now I would like the keys to come from 'outside'
the array, rather than 'inside' the array. Is this possible? An
example might help clarify my question...

Say I have the following array, which contains three responses to
question one on a questionnaire:

var data = [
{ name: "Adam", question1: "Yes" },
{ name: "Bill", question1: "Yes" },
{ name: "Chas", question1: "Yes" }
];

Question one only permitted "Yes" and "No" answers, although Adam,
Bill, and Chas all answered "Yes" so we don't know that "No" was a
possible answer from the array itself.

Now, I would like to draw a bar chart; one bar would represent "Yes"
answers and one bar represent "No" answers to question one. The
following...

var groups = d3.nest()
.key(function(d) { return d.question1; })
.rollup(function(d) { return d.length; })
.entries(d);

...would produce something like...

[{ key: "Yes", values: 3 }]

How can I first check that both expected keys ("Yes" and "No") are
present, and if one is missing, add it? In the above case, I would
like the result to be...

[
{ key: "Yes", values: 3 },
{ key: "No", values: 0 }
]

I should point out that in my (slightly more complex) visualization,
the original array (named "data", above) is a subset of a larger
array. I would like the bar chart to update each time a new subset is
produced from the larger array. This, I can do. However, in the
absence of data for a given key in one subset, the data for that key
in the previous subset remains. In short, I can't "fudge" a static bar
chart!

Thanks in advance for any help you may be able to offer. When I have
this cracked, I hope to be able to contribute a nice (geographic)
visualisation back to the community.

Iain

Iain

unread,
Aug 2, 2011, 4:30:25 PM8/2/11
to d3-js
Aah... a break, a cuppa, and it all becomes clear. The solution isn't
with the data, it's with the bar chart.

First select the bars (which were drawn previously and stored in
'chart')...

var bars = chart.selectAll("rect")
.data(data, function(d) { return d.key; });

Then do something with the bars for which we have data...

bars.transition()
.attr("y", function(d) { return h-themeY(d.values); })
.attr("height", function(d) { return themeY(d.values); });

Then do something with the bars for which we don't have data...

bars.exit().transition()
.attr("y", function(d) { return h-themeY(0); })
.attr("height", function(d) { return themeY(0); });

(In this case, reduce them to zero on the y axis.)

And naturally, all this is in the documentation! Duh! In my defence,
it sometimes takes awhile to apply information to new (or seemingly
new) contexts. Re-reading Three Little Circles (http://
mbostock.github.com/d3/tutorial/circle.html) helped.

Mike Bostock

unread,
Aug 6, 2011, 1:17:23 PM8/6/11
to d3...@googlegroups.com
Glad you were able to find a workable solution. I thought I might try
to answer your original question, all the same, since it's a fairly
common sort of data transformation. The heart of the problem is that
you want a cross operator (cross product) rather than a nest operator.
The cross operator gives you all pairwise combinations of questions
and answers (including zeroes), while the nest operator is similar to
grouping in SQL, and so only gives you the distinct, defined values
(not including zeroes).

I'd start by considering a slightly larger example data set where you
have more than one question. Here's a list of responses, where each
response has the name of the responder, a question identifier (a
number), and the answer given:

var responses = [
{name: "Adam", question: 1, answer: "Yes"},
{name: "Adam", question: 2, answer: "Yes"},
{name: "Bill", question: 1, answer: "Yes"},
{name: "Bill", question: 2, answer: "No"},
{name: "Chas", question: 1, answer: "Yes"},
{name: "Chas", question: 2, answer: "Yes"}
];

We also define the set of allowed answers. We'll assume here that all
the questions have the same set of responses:

var answers = ["Yes", "No"];

First we define a function that will group the responses by question:

function nest() {
return d3.nest()
.key(function(d) { return d.question; })
.entries(responses);
}

If I call nest(), I get back an array, with one element per question.
The `key` is the question number (1 or 2) and the `values` are the
individual responses for that question. Something like this:

[
{
key: 1,
values: [
{name: "Adam", question: 1, answer: "Yes"},
{name: "Bill", question: 1, answer: "Yes"},
{name: "Chas", question: 1, answer: "Yes"}
]
},
{
key: 2,
values: [
{name: "Adam", question: 2, answer: "Yes"},
{name: "Bill", question: 2, answer: "No"},
{name: "Chas", question: 2, answer: "Yes"}
]
}
]

Next we define a function that, given one of these elements (a
question), computes the count of responses for each of the allowed
answer. This is the cross operator:

function cross(question) {
var counts = d3.nest()
.key(function(d) { return d.answer; })
.rollup(function(d) { return d.length; })
.map(question.values);
return answers.map(function(d) {
return {
answer: d,
count: counts[d] || 0
};
});
}

We first use the nest operator to compute a map from answer to count,
but the count will be undefined if there was no response with a
particular answer. That's why there's an "|| 0" there to convert the
undefined to zero.

Putting it all together:

var ul = d3.select("body").selectAll("ul")
.data(nest)
.enter().append("ul");

ul.append("li")
.text(function(d) { return "Question #" + d.key; });

var li = ul.selectAll("li.answer")
.data(cross)
.enter().append("li")
.attr("class", "answer")
.text(function(d) { return d.answer + ": " + d.count; });

Mike

Iain

unread,
Aug 7, 2011, 6:12:57 AM8/7/11
to d3-js
Wonderful! Thank you for the response -- it's great to see work
through data transformations like this.

I have one question, though. In the "Putting it all together" example,
you don't pass a parameter to the cross function. In other words, you
have .data(cross) rather than .data(cross(question)), as I would
expect given the function definition. (I appreciate that in the
example there is no variable called "question", but there could be.)
What magic is going on here!?

Thanks again,

Iain
Reply all
Reply to author
Forward
0 new messages