Hi Pavel,--
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi:There is optimized PAA implementation in the code, it doesn't "explode" the timeseries into the matrix anymore. I cant look into the code right now, sorry. You should be able to get it there.I believe that short anomalies called outliers and you can find those simply by searching the points distribution - i.e. by statistical means. Discords are defined as subsequences and the emphasis is given to "structural anomaly" - this is why typically we are using significant compression, - so we do not react on the noise (in this case noise are short anomalies).Public implementation uses PAA=Alphabet size. But you may try to change that, but also you may have better luck without the trie but with the HashTable.Thank you!
On Fri, Apr 4, 2014 at 5:43 PM, vaske maskinsen <evilt...@gmx.net> wrote:
Hi Pavel,
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap - privat) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
Hi:There is optimized PAA implementation in the code, it doesn't "explode" the timeseries into the matrix anymore. I cant look into the code right now, sorry. You should be able to get it there.I believe that short anomalies called outliers and you can find those simply by searching the points distribution - i.e. by statistical means. Discords are defined as subsequences and the emphasis is given to "structural anomaly" - this is why typically we are using significant compression, - so we do not react on the noise (in this case noise are short anomalies).Public implementation uses PAA=Alphabet size. But you may try to change that, but also you may have better luck without the trie but with the HashTable.Thank you!
On Fri, Apr 4, 2014 at 5:43 PM, vaske maskinsen <evilt...@gmx.net> wrote:
Hi Pavel,
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap - privat) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout - privat.
--
Mahalo, Pavel.
Hi Vaske:The length of the discord or a motif is fixed in terms of HOTSAX and SAX - it is always the length of the sliding window.Soon we will release an implementation that finds anomalies/motifs of a variable length.
On Mon, Apr 7, 2014 at 12:13 PM, vaske maskinsen <evilt...@gmx.net> wrote:
Thank you Pavel for the hints!
I'll try to implement it with a hashTable.
I have another question about the discords/motifs (sorry :)):
If I have a "MotifRecord" instance, I get its positions easily by motif.getPositions(). What I also need (e.g. to mark the motif visually in the time series) is its length in terms of the original time series (I have index from, I need index until). Using the parameters I used for searching (alphabet size, windowLength), how can I compute the size of the motif subseries?
Thank you very much!
Greets,
Vaske
Am Freitag, 4. April 2014 18:21:12 UTC+2 schrieb seninp:
Hi:There is optimized PAA implementation in the code, it doesn't "explode" the timeseries into the matrix anymore. I cant look into the code right now, sorry. You should be able to get it there.I believe that short anomalies called outliers and you can find those simply by searching the points distribution - i.e. by statistical means. Discords are defined as subsequences and the emphasis is given to "structural anomaly" - this is why typically we are using significant compression, - so we do not react on the noise (in this case noise are short anomalies).Public implementation uses PAA=Alphabet size. But you may try to change that, but also you may have better luck without the trie but with the HashTable.Thank you!
On Fri, Apr 4, 2014 at 5:43 PM, vaske maskinsen <evilt...@gmx.net> wrote:
Hi Pavel,
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap- privat
- privat
- privat
) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
--
Mahalo, Pavel.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout - privat.
--
Mahalo, Pavel.
Hi Pavel,
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap- privat
- privat
- privat
- privat
- privat
- privat
- privat
) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout- privat
- privat
- privat
.
--
Mahalo, Pavel.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout - privat.
--
Mahalo, Pavel.
Hi Pavel,
some time ago I asked you about the computational complexity of PAA (SAX conversion (ts2string) for big time-series uses too much heap- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
- privat
) because I found PAA is making SAX conversion for long series slow, if PAA-size is not identical to the series size. (Do you know of an optimized PAA implementation?)
Now I noticed, that in function char[] getSaxVals(double[] vals, int windowSize, double[] cuts), the series is always aggregated to the alphabetSize by PAA. This means that we are always forced to very low "temporal resolution" when the alphabet size is small. However this may not always be desired. In my case, I often have many samples (ca. 10s - 1000s) in my window and interesting events (discords) may sometimes be just 3 samples long (like 1,100,1). They will not be found, if PAA-size is chosen too small (like an alphabet size of 4).
Did you have a reason for choosing the PAA-size equal to alphabet size (any reference to HOT SAX or other paper...)?
Are PAA-size and alphabet size somehow "connected"? Should I just use a window that is equal to my alphabet size?
Thank you very much in advance for your answer,
Greets, Vaske.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout- privat
- privat
- privat
- privat
- privat
- privat
- privat
.
--
Mahalo, Pavel.
--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout- privat
- privat
- privat
.
--
Mahalo, Pavel.