Time series outlier detection and visualisation code

467 views
Skip to first unread message

Kevin McIsaac

unread,
Jan 25, 2017, 9:43:42 PM1/25/17
to PyData
For a project I'm working on I need to visualise and detect outliers. Not finding anything in pandas, scipy or statsmodels, I spent a couple of days learning how this is done then writing some tools that are useful for my work.  

Specifically I wrote three general functions to:
  • Detect outliers using IQR, MAD and z-score after detrending the data.
  • Replace detected outliers with NaN or interpolated values.
  • Plot a time series overlaying a trend line (linear), a 2 SD interval guide, and points for outliers found using one of the detection methods.
These work well and I've created a Notebook showing their use.

My question is are these generally enough and useful enough for inclusion in pandas?

Miki Tebeka

unread,
Jan 25, 2017, 10:52:35 PM1/25/17
to PyData
FWIW: There are some outlier detection algorithms in scikit-learn - http://scikit-learn.org/stable/modules/outlier_detection.html

Joris Van den Bossche

unread,
Feb 14, 2017, 6:30:18 PM2/14/17
to PyData
Hi Kevin,

Sorry for the late reply. But I see that you found your way to github in the meantime. For others, related issues on github: https://github.com/pandas-dev/pandas/issues/15111 and https://github.com/pandas-dev/pandas/issues/15401.

Your notebook looks really cool, but, personally (and with my pandas-maintainer hat), I think the functionality you list here is too specific to include that in the core pandas project (as I also noted in the issue, how to define outliers can be very dependent on the domain and the kind of data).
But, I certainly want to encourage you to create a small package for this functionality, so people can easily use it. In general, that is what we want to encourage, a more rich ecosystem of packages around pandas, as we cannot include everything in pandas itself.

And also, pandas should certainly try to provide the basic generic functionality that makes it 'easy' to implement such extensions as you did. If there are missing things there, that can certainly be discussed.

Regards,
Joris


--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages