Convert SQL into formatted JSON for "tree.js" visualization

3,622 views
Skip to first unread message

Curt SerVaas

unread,
Sep 17, 2012, 12:30:17 AM9/17/12
to d3...@googlegroups.com

context: I'm a newbie to d3 and still very green to programming in general. I'm trying to make an extension that tracks your history and displays it hierarchically in d3. I have chrome extension that logs browser events and then I convert that into an SQL table that looks like this:


ID   URL  Parent 

-------------------------

001 URL null (root of tree)

002 URL 01

003 URL 01

004 URL 02

005 URL 03


I then regroup the rows so that all the children are directly beneath their parents and the table looks like this:


ID URL Parent

---------------------

001 URL null

002 URL 001

004 URL 002

003 URL 001

005 URL 003


Now what I need to figure out is how to convert this new SQL table into a JSON file formatted like the "flare.json" file used for all the tutorials. 


Thanks in advance! I'm still learning how to ask good question so let me know if there's anything I can clarify. 

Ger Hobbelt

unread,
Sep 17, 2012, 2:35:05 AM9/17/12
to d3...@googlegroups.com
When you're taking off from a SQL starting point, you might be faster when you get it to spit out CSV (or TSV (TAB separated fields)), as D3 does support loading those: see the d3.dsv documentation (sources are in the 'dsv' directory; don't recall exactly if Wiki knows it as d3.csv or d3.tsv / d3.dsv; have a look.)

Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------

Phoebe Bright

unread,
Sep 17, 2012, 4:00:57 AM9/17/12
to d3...@googlegroups.com
I use the csv apprach.  When figuring this out for myself, wrote a few examples of importing cvs and using nest to create the hierarchy.  Here are the examples, https://gist.github.com/3176159



d3 junkie

unread,
Sep 17, 2012, 6:20:40 AM9/17/12
to d3...@googlegroups.com
Jerome Cukier has a great post on using D3 with a MySQL database.
There could be some good tips there

Mike Bostock

unread,
Sep 17, 2012, 12:43:54 PM9/17/12
to d3...@googlegroups.com

Curt SerVaas

unread,
Sep 17, 2012, 6:08:19 PM9/17/12
to d3...@googlegroups.com
So, the steps I'd have to take are:
1. Convert my SQL into CSV. 

The CSV file would look like this?

ID, URL, Parent
001, url, null
002, url, 01
etc. 

I was originally planning on using tree.js and formatting it the same way as flare.json. However, if I follow the example given here, then I can't use tree.js. Which is fine except in the example given here, there are only 2 "columns" in the CSV file and so the visualized tree doesn't display the names of the URLs, only the structure (whereas trees.js also displays the names of the nodes). Also, the above example doesn't allow you to click and expand nodes like tree.js does. Is there a JS file that allows you to expand and collapse nodes while reading CSV data?

Thanks again!

Ger Hobbelt

unread,
Sep 18, 2012, 1:36:48 AM9/18/12
to d3...@googlegroups.com
Your question is spanning multiple subproblems:

The problem/task of loading the data into the browser/D3 environment through CSV/TSV/JSON/[other] is separate from the task of showing hierarchical data in a tree view or graph data in, say, a force layout:
it does not matter whether you use 
  d3.csv(<url>, <function>)
or
  d3.json(<url>, <function>)
to grab the data from the server. The only difference between those data load approaches is that JSON is capable[*] of loading hierarchical data while any CSV format isn't, hence you'll potentially need other 'preprocessing' logic in the <function> to create the JavaScript object structure(s) required for the particular layout you've picked.

[*] this doesn't say you have to when using JSON. In the end it's all just a matter of choice: the choice to do the data preprocessing server-side or client-side. Or maybe even at both, sharing the burden. Not using JSON equally doesn't prevent you from transferring hierarchical data across the wire, it's just requiring some different logic at both sides to make it happen.


Divide & Conquer: identify and split problem into smaller parts

The best approach to coding this thing is to keep this in mind and consequently treat your project as a multi-step sequential process:

1- load data from server (done; using CSV)
2- preprocess loaded data to produce JS structures which match the input expectation of the chosen layout in step 3; may use d3.nest or other means to this end
3- use the 'munched data structures' from step 2 and feed them to the D3 layout of choice
4- take the layout output and render to DOM (SVG/Canvas/plain HTML/etc.etc.) using d3 selections and the d3 data join concept.

where, on occasion, #4 is a two-step when your rendering doesn't plug straight onto the layout outputs:

4a- preprocess ('munch') the layout output structures and make them suitable for the d3.selection+datajoin code in (4b); d3.nest and friends may be helpful again
4b- render to DOM (screen) using d3 selections and datajoin tactics.


CSV, JSON, whatever, it's all the same game

Your question 'how to convert CSV data to tree layout and have it render a collapsible tree' therefor is covering several of these steps, each of which can be handled individually.

First, when you have CSV and you find a JSON-based example that you want to re-use, then the added problem is a '#2': you need to come up with a way to convert your parsed CSV data structures (JS array of CSV record objects) to something that equals the (probably hierarchical) JSON object structure used in the original example.


Making it concrete

In the case of Mike Bostock's Stack Overflow answer 
plus your input


ID URL Parent

---------------------

001 URL null

002 URL 001

004 URL 002

003 URL 001

005 URL 003

the stackoverflow code would only have to be changed a little to let the URL field end up with each constructed node.
(Note that I picked the one where Parent name string-matches your CSV row IDs so that a simple == 'equals' will do the trick of identifying the Parent reference vs. the 'node name to be' (~ CSV row ID) in this example.)
The assumption here is that the URL field belongs with the node, rather than the link. When you have extra data that should be attached to the link, the principle is the same, though.

Original stackoverflow code:

If you want to display a graph, and all you have is a list of edges, then you'll want to iterate over the edges in order to produce an array of nodes and an array of links. Say you had a file called "graph.csv":

source,target
A1
,A2
A2
,A3
A2
,A4

You could load this file using d3.csv and then produce an array of nodes and links:

d3.csv("graph.csv", function(links) {
 
var nodesByName = {};

 
// Create nodes for each unique source and target.
  links
.forEach(function(link) {
    link
.source = nodeByName(link.source);
    link
.target = nodeByName(link.target);
 
});

 
// Extract the array of nodes from the map by name.
 
var nodes = d3.values(nodeByName);

 
function nodeByName(name) {
   
return nodesByName[name] || (nodesByName[name] = {name: name});
 
}
});

Now we take your CSV with the extra URL field and edit that code (color: red):

ID,URL,Parent
001,URL,null
002,URL,001
004,URL,002
003,URL,001
005,URL,003

You could load this file using d3.csv and then produce an array of nodes and links:

d3.csv("graph.csv", function(links) {
 
var nodesByName = {};

 
// Create nodes for each unique source and target. // Also process the link object: // add source+target members to match layout requirements. // CSV input is assumed to have each row only reference Parent nodes // which are known, i.e. have lower IDs ~ have been listed before in the CSV.
  links
.forEach(function(link) {
    link
= transformLink(link);
  });

 
// Extract the array of nodes from the map by name.
 
var nodes = d3.values(nodeByName);

 
function transformLink(link) { // construct the node object from the CSV row
    var source =
nodesByName[link.ID] || (nodesByName[link.ID] = {name: link.ID, url: link.URL}); // CSV input is assumed to have each row only reference Parent nodes
   // which are known, i.e. have lower IDs ~ have been listed before in the CSV.
   var target = nodesByName[link.Parent];
   // construct the layout-conforming link object by augmenting it with the required members
   link.source = source;
   link.target = target;
 }
});

and if you want a cleaned-up link array, the code may look like this instead (I don't mind when additional data hangs on; you may however go this route when learning (and thus preventing yourself from picking those data bits off the wrong objects) or when speed is of the essence and you have a large dataset to juggle):

d3.csv("graph.csv", function(links) {
 
var nodesByName = {};
 
// Create nodes for each unique source and target. // Also process the link object: // add source+target members to match layout requirements. // CSV input is assumed to have each row only reference Parent nodes // which are known, i.e. have lower IDs ~ have been listed before in the CSV.
  links
.map(function(link) {
    return
transformLink(link);
  }); // when you get here, the loaded 'links' array is replaced by the processed links []

 
// Extract the array of nodes from the map by name.
 
var nodes = d3.values(nodeByName);

 
function transformLink(link) { // construct the node object from the CSV row
    var source =
nodesByName[link.ID] || (nodesByName[link.ID] = {name: link.ID, url: link.URL}); // CSV input is assumed to have each row only reference Parent nodes
   // which are known, i.e. have lower IDs ~ have been listed before in the CSV.
   var target = nodesByName[link.Parent];
   // construct the layout-conforming link object, stripped from all surplus
   return { source: source, target: target };
 }
});
[Usual caveat/cop-out applies: code has been monkey-banged straight into the email, so take it to the vet for delousing before cuddling; one known problem is handling the NULL reference in the CSV data, e.g. in the first row.
Solving this is left as an exercise for the reader. ;-)  
Hint: treat the loaded CSV array as a nodes set, rather than a links set + change the processing code to construct a links array.]


Steady as she goes, skipper...

Once you recognize that the loaded data format doesn't prevent you from using examples which started out with another load format, as you can process your data to match the desired structure, you can pick the ones you like, if you don't want to base the rest of your code off any previous work (example, gist, ...)

Collapsible force layouts can be as wild as these ones: http://bl.ocks.org/3104394 (graph) / http://bl.ocks.org/3669455 (tree render)
As these incorporate quite a few other bits, these will not be suitable for easy learning when you're starting out; they show where this stuff can take you if you continue, though.

Mike has a couple of collapsible tree layouts, such as: http://mbostock.github.com/d3/talk/20111018/tree.html where you can easily see how 'collapsing' might be solved (a '#4a/#4b' problem ;-) ). This tree layout is a pure hierarchical one, so your data munching #2 step needs to look a little different.


Met vriendelijke groeten / Best regards,

Ger Hobbelt

--------------------------------------------------
web:    http://www.hobbelt.com/
        http://www.hebbut.net/
mail:   g...@hobbelt.com
mobile: +31-6-11 120 978
--------------------------------------------------



Reply all
Reply to author
Forward
0 new messages