Charts with subset data (fake group?)

174 views
Skip to first unread message

Frozenlock

unread,
Sep 14, 2017, 5:19:22 PM9/14/17
to dc-js user group
Hello,

I'd like to be able to show multiple charts, each with a subset of the entire dataset.

For example, if we have data for a week and divide by days :
Charts for Monday
Charts for Tuesday
Etc...

Each time period might have multiple charts, like this:


I played around with fake groups to filter the unwanted data, but I need to use the 'reduce' method for some charts and it's missing from these fake groups.

Is there a better approach?
Do I need to re-implement the 'reduce' method for the fake groups?

Thank you very much in advance!

Gordon Woodhull

unread,
Sep 14, 2017, 6:31:58 PM9/14/17
to dc-js-us...@googlegroups.com
That sounds a bit like a pivot.

Since crossfilter doesn't really do subsets (except by filtering), what I'd suggest doing is reducing by two or more keys at once, then using fake groups to pull it back apart again. This is the same technique the series chart uses.

So you'd group by a 2D key [day of week, other dimension]. Reduce normally. Then pull it apart using fake groups that choose each day of the week, pull all the relevant entries, then throw away the first element of the key.

That way you're still getting all the efficiency and power of crossfilter, and it's all updated automatically. But the eventual fake groups look just like regular groups which you use normally.
--
You received this message because you are subscribed to the Google Groups "dc-js user group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dc-js-user-gro...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dc-js-user-group/e8c3816f-bfb1-4b30-8b79-4c31e6fe3282%40googlegroups.com.
Message has been deleted

Frozenlock

unread,
Sep 16, 2017, 12:45:04 PM9/16/17
to dc-js user group
hank you very much for the suggestion.

I tried applying it to a bar chart.

Dimension -> [day, other dimension]
Group -> [day, other dimension into bins]
fake-group -> filter by day, promote bins

Everything appears as expected on the bar graph.
However, if I apply a filter on the graph, all the other graphs turn blank (no values), as shown here :

Any ideas if I'm missing a step?

(I'm not providing as JsFiddle because the setup is quite extensive; I'm happy with just your opinions and/or guesses.)

Gordon Woodhull

unread,
Sep 16, 2017, 1:41:59 PM9/16/17
to dc.js user group
That's a good point: since your dimension is now multi-keyed, you'll need to use filterFunction to specify how to apply filters over those multikeys.

This constrains the design further: the dimension which you want to filter must be the first key, and the dimension you are splitting / doing small multiples over must be the second key. (The opposite of what I wrote below.)

This is because the multikeys will be sorted in lexicographical order. E.g. if the days of the week are strings,

[12, Monday]
[12, Saturday]
[12, Sunday]
[12, Thursday]
...
[13, Friday]
[13, Monday]

For your purposes, it shouldn't matter what order the second key is in, but when we do a range filter over the first key we want it to effectively ignore the second key.

filterfunction will look something like (untested)

chart.filterFunction(dimension, filters) {
  if(!filters || filters.length === 0)
    dimension.filter(null)
  else dimension.filterRange([[filters[0][0], '\0'], [filters[0][1], '\0'])
)};

Basically, instead of filtering on the original range filter[0][0] -> filter[0][1] which is in your chart's domain, we'll instead filter over a range of multikeys where the first element of the start and end are those values, and the second element is something chosen to sort lower (lexicographically) than any of the second keys. I chose a string with a null in it to be conservative. 

(But again, this is untested; I'm working off my memory of having solved similar problems before. You may need to debug this a little.)

Efficiency note
Crossfilter will actually coerce the arrays to strings and then sort them this way. It's a lot more efficient to do the coercion yourself when creating the keys, i.e.

dimension = cf.dimension(function(r) { return r.x + ',' + r.y; }

instead of

dimension = df.dimension(function(r) { return [r.x, r.y]; })

but it's so much more convenient to do the latter, that I've stopped arguing with this common practice (assumed by scatterPlot and seriesChart and my code above). Otherwise, you have to split it again yourself when reading keys, etc etc.


Frozenlock

unread,
Sep 17, 2017, 1:47:32 PM9/17/17
to dc-js user group
Amazing explanation.
It works exactly like you said.

Thank you very much!

Frozenlock

unread,
Sep 17, 2017, 2:24:31 PM9/17/17
to dc-js user group
Btw, I think the 2d key is messing with sorting (.bottom and .top), which I use to find my domain.
Do I need another little tweak, or perhaps another function?

Frozenlock

unread,
Sep 17, 2017, 4:23:09 PM9/17/17
to dc-js user group
Ok, I think I understand what is happening.

Like you've mentioned, arrays are converted to a string and then ordered:

One possible solution to this problem is to use a custom .toString method with the necessary formatting to allow an easy comparison.

Gordon Woodhull

unread,
Sep 17, 2017, 7:10:25 PM9/17/17
to dc-js-us...@googlegroups.com
Yeah I suspect it's one of two problems:

1. ['a','b'] will be coerced to the string "a,b" and sorted that way. So you might have to consider how ascii comma (character 44) collates with your data. If all characters in your data are above 44 in ascii then that's fine, but if not then you might have to concatenate manually.

2. If the first element is a number then you might have to left-pad those numbers. Because now it's doing string comparison rather than "natural value comparison".

I bet you are running into problem #2. 

Crossfilter was not built for multikeys but it's possible to make them work. 

I wouldn't try to overload toString but I'd at least specify the coercion for the first element, and possibly have the dimension key function return a string rather than an array. It's less convenient but way more efficient (think about how many coercions must happen to sort all your rows of data) and you get complete control over what the keys look like.

Here's a long discussion about it with some more tips:
--
You received this message because you are subscribed to the Google Groups "dc-js user group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dc-js-user-gro...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages