I'd like to enlist the group's help with a visualization I'm creating
with the help of d3 of a machine learning algorithm called k-means
clustering. I'm in the process of creating visualizations for several
machine learning algorithms that I plan to publish on my blog, but
this particular visualization has not come out like I had hoped. So
I'm hoping to get some good advice here on how to improve the
animation and display of data. I'm pretty new to d3, and I've not
found much material published yet on best practices for animated
transitions, so I'm sure my visualization has a lot of room for
improvement.
The k-means clustering implementation I'm currently using is based on
a solid JS machine learning library by Heather Arthur, but I plan to
eventually switch it out for another implementation I'm writing that
is modeled after the k-means algorithm in the Python sci-kit learn
library.
You can see the visualization in action here:
http://jsfiddle.net/esbullington/Wyjgh/
A full-screen version is here:
http://jsfiddle.net/esbullington/Wyjgh/embedded/result/
My original commented coffeescript is here:
https://gist.github.com/1739860
(I originally tried to display the script with the wonderful
bl.ocks.org, but it chokes on my js, which is admittedly not very
optimized).
A decent explanation of k-means clustering can be found in wikipedia:
http://en.wikipedia.org/wiki/K-means_clustering
A brief description of my visualization before you begin (it's a bad
sign that I feel like I have to explain it first, I know): k-means is
all about grouping data, and you're looking at the algorithm as it
goes through multiple iterations, and as it grows closer to properly
clustering the data into groups. The k-means algorith first spits out
a (very complex) nested Javascript array, then d3 runs through and
displays the iterations (did this instead of dynamically displaying
the algorithm for reasons too tedious to go into here).
My primary question is: how to improve the visualization? This did
not come out like I had imagined it. The main problem is that the
clusters change only around the very edges, and it's very hard to
perceive that the cluster membership of certain nodes is changing as a
result of the new centroids calculated (the big red circles). I need
to find a way to highlight that these nodes are changing from
iteration to iteration.
Any suggestions on how to better visualize this, given the particular
way I have chosen to do so?
Some particular questions on the implementation:
1. As mentioned above, I would like to visually emphasize those nodes
that change cluster membership from one iteration to the next. But
how to do this? From what I understand, d3 does not cache data. Is
the only way for me to do this dynamically to put the setInterval()
timer in its own loop, and cache the needed data in the loop from one
iteration to the next. I suppose I could also create a new category
of data to track whether or not a node changes membership from the
previous iteration, but this will be my last resort.
2. Is the transition method the only method that allows the display
of nodes to be delayed? I'd like to delay the changing cluster
membership by about a half-second after the red centroids are
displayed, but I don't want to animate their transition. Any way I
can just delay that without a transition?
Thanks in advance for reading all this and I look forward to your
input.
Eric