Using TensorFlow for anomaly detection

2,886 views
Skip to first unread message

Sean

unread,
Aug 16, 2016, 12:21:27 AM8/16/16
to Discuss
Hey guys,

Hope you all are doing well. I am working on a project to detect anomalies on time series data using Google TensorFlow.

My use case is just to see whether a given time series of data has any points that fall outside a certain number of standard deviations from the norm. In particular, I am using influxdb to store my time series data.

There aren't many online resources to turn to regarding using TensorFlow for anomaly detection. If anyone can point me in the right direction, that would be great. Any help is really appreciated.

Kevin

ding...@gmail.com

unread,
Aug 16, 2016, 1:42:56 AM8/16/16
to Discuss, kqt...@ucdavis.edu
Why do yo need any form of ML to do what you asked?
R or numpy can do what you want in a few lines of code.

Ambarish Jash

unread,
Aug 16, 2016, 1:44:09 AM8/16/16
to Sean, Discuss
Looks like you already know the metric you are looking for (# std from norm). you could directly measure that. No need of tensorflow.

I could have misunderstood. Here is something you could try.

You could use LSTM to train something like a language model and then use the log probability of observing a sequence as a measure.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a60868c1-4440-4a9e-b6de-c3e0b4f3a604%40tensorflow.org.



--
Ambarish Jash
Message has been deleted
Message has been deleted

Sean

unread,
Aug 16, 2016, 6:33:05 PM8/16/16
to Discuss
Thanks for the quick response. I misspoke of the use case. The hope is to not find points that fall outside a certain number of standard deviations, but rather to detect outliers from a series of points, or if possible, to detect whether in real-time the latest point that comes is an outlier (so I can detect and notify myself of this behavior).

For example, if I have one numerical point stored each day for the past 30 days, then I would want to see whether there are any big spikes (from one day to the next) in the data that are of concern. Or in real-time, if the next point that comes in is an outlier, then I would want to email myself that something "bad" has happened.

I am not too keen on ML, so if you guys think that there isn't a way to accomplish this task using ML, or if using just the statistical approach (as mentioned in your responses), is the best way to go, please let me know.

All the best,
Kevin

alessia

unread,
Nov 15, 2016, 10:32:32 AM11/15/16
to Discuss, kqt...@ucdavis.edu
Hi Kevin,

I am interested in the same thing you're discussing here. I would like to use Tensorflow for anomaly detection but the documentation about that is very poor...
Did you figure out something? Which strategy did you eventually use?

Thanks in advance,
Alessia

pkr...@gmail.com

unread,
Jan 27, 2017, 3:16:51 PM1/27/17
to Discuss, kqt...@ucdavis.edu
Hello Kevin,
I am working on the same kind project and just started on it. A help from you might just give me a kick start.
I would like to get in touch with you over it.

Pavel Konovalov

unread,
Jan 28, 2017, 4:26:41 AM1/28/17
to Discuss, kqt...@ucdavis.edu
Looks like that that is typically statistic tasks. AFAIK Tensorflow have bad performance on sequential computations (looping) since there is a state which needs to be passed from one timestep to the next. And there is no workaround. Correct me if I wrong. I think R is best choice for this task.

вторник, 16 августа 2016 г., 7:21:27 UTC+3 пользователь Sean написал:

Martin Wicke

unread,
Jan 29, 2017, 11:52:48 AM1/29/17
to Discuss, Pavel Konovalov, kqt...@ucdavis.edu
I don't think there is an out of the box implementation of the exact problem Kevin is trying to solve. 

I don't think there should be any problems, performance or otherwise, in implementing it. Partial results are certainly no concern. Those can be handled in variables, or, the whole computation can be performed in a single run using a while loop (inside the graph). 

I am not aware of a reader for influxdb, so that would have to be written. 

Martin

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

dwighta...@gmail.com

unread,
Jun 28, 2017, 3:53:46 PM6/28/17
to Discuss, thes...@gmail.com, kqt...@ucdavis.edu
Influxdb uses a sql like language over a rest api so simple af GET to /query?db=my_data_set_example\&q=SELECT * from my_measurments will give one a series of data sets based on the query string from the database 'my_data_set_example'

There's several existing modules but tipically since we're talking about python+tensorflow here one can use the influxdbclient module or the requests module.
Reply all
Reply to author
Forward
0 new messages