skyline - mirage

285 views
Skip to first unread message

earthgecko

unread,
Jan 8, 2015, 7:35:31 AM1/8/15
to skyli...@googlegroups.com

mirage is an extension of skyline that attempts to handle metrics with seasonality (inspired by some of Abe's crucible work).  mirage attempts to extend skyline from a temporal data pool point of view and surface the relevant timeseries data for the expected seasonality of a metric in realtime from graphite.  mirage then analyses this timeseries data set to determine if the timeseries is anomalous at the metric's own "FULL_DURATION" or SECOND_ORDER_RESOLUTION_HOURS.

We have some fairly seasonal metrics and as with anyone who has used skyline at the "normal" FULL_DURATION of 86400, if we alert on these metrics we get noise, not signal.  Running analyzer with a FULL_DURATION greater than 24 hours is difficult due to the timeseries data getting massive and the analyzer run_time ever increasing.  Therefore, skyline analyzer is somewhat limited in it effectiveness on timeseries that do not "fit" with FULL_DURATION.

mirage has been developed so it can coexist with skyline analyzer and various things can be enabled and disabled between analyzer and mirage.  The branch also incorporates a number of pull requests that were made to etsy/skyline to improve metadata in alerts, a new alerter in the form of syslog so that anomalies and the metadata can be pipelined in the normal event stream (e.g. elasticsearch. riemann, etc) and some additional graphite metrics related to anomaly breakdowns, etc.
It is not recommended as a drop in replacement for the master etsy/skyline, although the additions do make it a lot better :).

The mirage code itself is mostly just analyzer code with a few modifications, it is not necessarily the best implementation, but it is enough "like" analyzer to be familiar.

This branch has not been pull requested on the etsy master yet as it is still a proof of concept at the moment, however it is a concept that is proving to be quite useful.  If anyone is interesting in testing - any feedback would be appreciated.

We ran etsy/skyline and this mirage branch on 2 separate servers for a week to compare results - mirage has a negate alerts option as well, so this means analyzer can run as normal and mirage can "test" the analyzer anaomalous metric against the timeseries surfaced from graphite and send negation alerts with embedded graphs for both the analyzer FULL_DURATION and for the specific metric SECOND_ORDER_RESOLUTION_HOURS - this makes for easy comparisons.

https://github.com/earthgecko/skyline/tree/mirage

Hopefully this goes some way to addressing problems like Or's, "the request time was 50ms, at some point it went up to 150ms and remained high for a while.  Is there a way to configure Skyline to set anomaly after multiple data points?"

mirage will probably only suit certain metric types and it is not envisaged to handle all metrics, just certain known metrics which have some seasonality.  It must be keep in mind that mirage may be analysing timeseries from a different retention period (aggregation).

Below are some graphs from some mirage negation testing to show comparisons between analyzer (at 24hrs) and mirage (at 168hrs) - in these case analyzer alerted that the metrics were anomalous for the 24hr timeseries and mirage analysed these at metrics as a 168hr timeseries and found the metrics not to be anomalous.

1. metric with occasional data

analyzer flagging as anomalous at 24hrs


mirage surfacing the timeseries at 168hrs, analysing the 168hrs timeseries and finding the metric to not be anomalous



2. metric with occasional spikes

analyzer flagging as anomalous at 24hrs
 



3. Spiking metric with constant data




Regards
Gary

Anton Lebedevich

unread,
Jan 9, 2015, 4:15:30 AM1/9/15
to skyli...@googlegroups.com, earthgecko
On 01/08/2015 03:35 PM, earthgecko wrote:
>
> mirage is an extension of skyline that attempts to handle metrics with
> *seasonality *(inspired by some of Abe's crucible work). mirage
> attempts to extend skyline from a temporal data pool point of view and
> surface the relevant timeseries data for the expected seasonality of a
> metric in realtime from graphite. mirage then analyses this timeseries
> data set to determine if the timeseries is anomalous at the metric's own
> "FULL_DURATION" or SECOND_ORDER_RESOLUTION_HOURS.

Does mirage re-run the same skyline algorithms over wider time range to
capture at least one seasonal period?

Did you try any seasonal decomposition methods to remove seasonal
variation from data?
http://mabrek.github.io/blog/seasonal-decomposition/

Regards,
Anton Lebedevich.

earthgecko

unread,
Jan 9, 2015, 5:28:22 AM1/9/15
to skyli...@googlegroups.com
Hi Anton

More than happy to decrease noise further with any valid techniques.

If you could define that seasonal decomposition in terms of scipy, etc I will happily add it to a skyline node in the mirage algorithms and we can run comparison to see if it filters more noise, more effectively :)

In answer to your questions, mirage has it own src/mirage/algorithms.py so it is possible to run whatever algorithms you want to define in there and they can be different to the algorithms used by analyzer.  That said, currently I am using the same algorithms, just on a wider time range.  So the workflow is such:

If a metric has a SECOND_ORDER_RESOLUTION_HOURS settings specificed in the the alert tuple e.g:
#          ("metric5.thing.*.rpm", "smtp", EXPIRATION_TIME, SECOND_ORDER_RESOLUTION_HOURS),
         ("metric5.thing.*.rpm", "smtp", 600, 168),

* When analyzer flags a datapoint as anomalous against the the 24hrs (or whatever your FULL_DURATION is) redis timeseries data for the metric
* analyzer places a check file for mirage with the details of the metric and the anomalous datapoint
* mirage then queries graphite for the metric timeseries at SECOND_ORDER_RESOLUTION_HOURS and runs that timeseries data against whatever algorithms are defined in src/mirage/algorithms.py to determine if the timeseries itself is anomalous.

Unfortunately this mirage branch is not based on any clever stats or maths, but rather on simple logic e.g.

* skyline alerts me that a metric is anomalous at 24hrs
* I open graphite and the metric graph
* I change the graph resolution to 7 days and decide if the metric is anomalous or not :)

Surprisingly using just the normal analyzer algorithms and that simple logic has got rid of almost ALL the skyline alert noise and when mirage alerts it is more signal than noise.  I am sure it is not perfect by any means.

That said I will try and take that decompostion in seasonal.R and port it over to python :)  That and the Seasonal Hybrid ESD (S-H-ESD) algorithm as well :)

Perhaps it would be better to extend skyline a bit more and add the capability to analyse a timeseries with R as well - however R data frames could be little painful :)

Regards
Gary

Abe Stanway

unread,
Jan 11, 2015, 12:46:46 PM1/11/15
to earthgecko, skyli...@googlegroups.com
that simple logic has got rid of almost ALL the skyline alert noise

Wow - that's pretty cool. Nice job!

I guess this means that figuring out a way to increase Skyline memory (aka, make it distributed) is the proper path going forward.

--
You received this message because you are subscribed to the Google Groups "skyline-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to skyline-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Abe Stanway
Reply all
Reply to author
Forward
0 new messages