Okay, I will re-read the paper over the weekend trying to get into the
mechanics of iSAX, for now I just understand that it does allow some
degree of non-match in the comparison of two pieces representing time-
series. This comes from my previous work and understanding of Dynamic
Time Warping, which allows series to be a bit different in time but
requiring them to be similar in the features and their magnitude,
which is claimed to be the same with iSAX. So I think the trick is in
the distance function implementation which quantifies the distance
between two pieces. Let's see how it works and maybe Alice will give
us later another point of view. If what I think is true - then all we
need is to code this distance routine.
The indexing is the second task. I have implemented a trie for SAX in
the Java code already. Unfortunately I don't know how HBase works and
how to store Trie within it. What I was thinking can be of great value
is to see whether the Trie actually performs faster than
HashTable<String> or Databse (MySQL, Derby) indexing and querying, and
maybe HBase? What do you think?
1. when comparing 2 different timeseries we promote one timeseries to
the higher cardinality of the two to do the comparison; in the
situation (as seen in indexing) of mixed cardinalities inside a single
"word", we do promotions per symbol. The MINDIST function uses some
different mechanics but seems to use the breakpoint lookup table as
implemented already in the base jmotif lib.
2. Indexing, as best as I can tell, uses a sort of extensible hashing
(spatial hashing? Is locality sensitive hashing relevant here?) to
spatially group similar timeseries into near clusters. As the
hash-tree grows, nodes sub-divide and the tree "fans out". Afaik
lookups here should be really fast, and I'm curious to see how much
data we could index when using something like a DHT or HBase.
HBase is really interesting here since its a "multidimensional sorted
map" (big distributed hash table, in some regards) and could store the
hash based nodes very easily and in a fashion that would scale up into
10s of Terabytes.
I also think a smaller in-memory version of the index hash tree would
make an interesting 1NN classifier, similar to how I used SAX to build
a 1NN classifier for Map Reduce with a Ball Tree.
I'm currently implementing a test version of the iSAX words that
incorporate the more sophisticated cardinality rules and the new
distance function. Once I get that done, I want to run some tests on
building a 1NN classifier with the hash tree to see how well it
performs.
Josh
On Wed, Sep 1, 2010 at 11:19 AM, seninp <sen...@gmail.com> wrote:
--
Twitter: @jpatanooga
Hadoop World 2010, October 12, New York City -- register at http://bit.ly/a0cqwe