Hi
Vaske:
I don't think that code was unstable or if it wasn't performing as intended.
I think you are pretty safe to go with it, there might be some lost connections with JMotif, but this can be repaired. (Josh built his code upon SAX, so, I assume, his code is not broken at all, it just need to be connected by fixing syntax of calls to evolved JMotif API. - I really had no time at that point to do that. Sorry.)
Now, personally, I never worked with billions of series since I can't figure out how to handle hundreds of those :). Thus I simply couldn't advise you about on what is better, but to some extend could point some information: SAX-based methods are not exact, iSAX, in my opinion, approximating things up even more, so it becomes possible to tackle really large data. Thus you have to make decision on your own - if you can afford to loose some of the good results and willing to deal with false positives.
I use SAX to convert timeseries from long real-valued time series of numbers into a dictionary (corpus) of SAX words (terms), then I use tf*idf to build a vector space which, in turn, is used for classification and knowledge discovery (I am after this knowledge - so I can't afford loosing something or getting wrong answers). Within last months I pushed a lot of changes making this possible. In oncoming time, I plan to settle down with codebase, boost test coverage, remove excessive dependencies, organize API of JMotif and "mavenize" the build. I hope to release 1.0 _stable_ version. We have some plans of bringing in Sequitur and n-Grams language models, but there is no one working on high-throughput end - i.e. iSAX. If you would be able to make it running by connecting code back to JMotif, I think you will be better with us in your experimentation than with old codebase.