Hello all,
I am building an interactive heatmap tool to allow collaborators to explore how their gene expression data look when analyzed using a novel method created in our lab. I am an experienced developer but new to vega (fresh off the tutorial plus a bit of tinkering with some examples).
I was able to get a working prototype pulled together very quickly by adapting the heatmap example here:
Kudos to the vega team for making it so easy to get started! I am scratching my head now at how I should go about adding the interactivity we envision.
The key difference between the abstract heatmap I'm building and the map of temperature over time in the example above is that there is no natural ordering to the gene expression samples (on the Y axis where the example has hour of day) nor is there any strict order to the other axis, which has "nodes" comprising the activity of various collections of genes (the example has date). We would like a user to be able to select from a list of "sorting" options to choose how to re-order the samples and nodes. If possible, we would like to do all of the visualization in the web browser.
For a gene expression heatmap, the typical sorting order is determined by hierarchical clustering (previous work for a manuscript was done using heatmap.2 from R's gplots
http://cran.cnr.berkeley.edu/web/packages/gplots/index.html ). I have not yet seen anything done in vega that performs hierarchical clustering (please correct me if I'm wrong!), so it appears I would need to do this using an external JS library. The challenge then becomes how to (a) get vega to call out to this library or (b) write some code that hands off the clustered data to vega for rendering.
Approach (a) seems to be possible using a custom-built transform, if I am following the description here correctly:
though that page also refers to a plugin architecture in the works, which would seem to be closer to what I'm looking for.
Approach (b) would mean handling the data outside of vega initially, obtaining the clustering results and then invoking vega using the streaming API to insert or update the dataset with those results.
Can anyone comment on whether this would be a good task for vega (our alternative is D3) and, if so, whether approach (a) or (b) -- or perhaps another way I haven't yet identified -- would be a good way to proceed?
Thank you in advance for any words of guidance you have to offer,
-Matt Huyck
P.S. Heatmaps seem like the most natural way to visualize these data, but we are also open to other methods if you have suggestions.