Hi David:
Thank you for the interest.
Probably you are looking on the code where I was trying to figure out the best possible clustering example and for that I have excluded some classes, and some of the class' series.
The reason for this is that SAX-VSM approach, while works for clustering of this dataset, suffers from few issues. One of them is that series are somewhat short to generate "good" discriminating patterns which would help with clustering. Another issue is that noise level is too high at some series and this also introduces some confusion for the algorithm.
Overall, from my experience with this dataset, algorithm is too sensitive thus its performance is unstable. The tf*idf part is aggressively lowering weights of patterns that appear in multiple classes, thus, sometimes, when noise is to high, and just by a chance a "good" discriminating word for a class appears in others, it "cancels out" that pattern. I was experimenting with thresholds on the frequencies, but didn't finish the exploration.