Workaround to handle joining 2 DataTables with duplicate keys?

72 views
Skip to first unread message

tawofolu

unread,
Dec 28, 2011, 11:11:32 AM12/28/11
to Google Visualization API
Hi,

I am working on a project where we will not be able to avoid having to
join 2 datasets with duplicate keys, just because of the nature of the
data that we are working with.

In a relational scenario I could easily deal with this with a full
join that results in a Cartesian product, but I don't see what my
possible options are with the Google Visualization API.

From the documentation:

"Only dt1, the left-hand table, can include duplicate keys; keys in
dt2 must be unique." http://code.google.com/apis/chart/interactive/docs/reference.html#google_visualization_data_join

So my question is, in situations where the keys in dt2 are not unique,
how could I handle the join?

Please help - thanks!

asgallant

unread,
Dec 28, 2011, 11:58:27 AM12/28/11
to google-visua...@googlegroups.com
This may be horrendously complicated to write or really inefficient in execution, but my first thought is to filter dt2 into two or more views, such that each view contains only unique keys.  Join each view (separately) with dt1, producing N joined DataTables, then manually merge the joined tables together (may be necessary to remove duplicate rows if you are not doing inner joins).

tawofolu

unread,
Dec 28, 2011, 12:41:34 PM12/28/11
to Google Visualization API
Thanks for the feedback! I did think of something along these lines. I
abandoned the idea because as you said, it looks like it could get
pretty inefficient especially when there are several sets of duplicate
keys. Our data could have dozens of duplicate keys, so N could be a
potentially large number, and having to do N additional joins for a
single join operation might be impractical in our case...not to
mention the added complexity of the filtering...

It just seems like there ought to be an easier way?

asgallant

unread,
Dec 28, 2011, 12:57:00 PM12/28/11
to google-visua...@googlegroups.com
Could you join at your original data source instead?

tawofolu

unread,
Dec 28, 2011, 1:02:06 PM12/28/11
to Google Visualization API
Unfortunately no, our datasource is actually a NoSQL database which
does not provide any way to do the joins (BigData - CouchDB).

asgallant

unread,
Dec 28, 2011, 2:01:24 PM12/28/11
to google-visua...@googlegroups.com
I wrote up a super-hacky, quick-and-dirty function that (in theory) will do the many-to-many join: http://jsfiddle.net/4wNXY/

This is COMPLETELY untested, and likely will require a great deal of work to get working properly, but the basic framework is there.

tawofolu

unread,
Dec 28, 2011, 2:28:57 PM12/28/11
to Google Visualization API
Will check it out, thanks!

tawofolu

unread,
Jan 5, 2012, 1:37:36 PM1/5/12
to Google Visualization API
Hello there, Thanks again for all your help! After taking a break for
the holidays I came back to the script, made some very tiny changes
for our project and so far it works fine. Still has performance
issues, hopefully later I could work on optimizing it to remove the
recursion issues that will exist for large datasets.

http://jsfiddle.net/yePMt/1/

asgallant

unread,
Jan 5, 2012, 3:04:30 PM1/5/12
to google-visua...@googlegroups.com
You're welcome; I'm glad to hear things are working for you.
Reply all
Reply to author
Forward
0 new messages