SAXification for Motif/Discord-Mining

22 views
Skip to first unread message

vaske maskinsen

unread,
Apr 4, 2014, 11:51:06 AM4/4/14
to jmotif-...@googlegroups.com
Hi Pavel,

some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap - privat) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)

Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Is this due to the SAX-trie approach? Or did you have any other reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)? Any other approach that allows PAA-size to be different from alphabet size?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?

I guess I have not completely understood the SAX-trie...

Thank you very much in advance for your answer,
Greets, Vaske.
Reply all
Reply to author
Forward
0 new messages