Alert on gauge metrics using prometheus

259 views
Skip to first unread message

张磊

unread,
Apr 6, 2021, 2:58:51 AM4/6/21
to Prometheus Users
I want to alert on some gauge metrics but sometimes the alerts is not as expected.

Kafka lag alert
I have collect the kafka metrics using kafka_exporter and i get the kafka lag metrics. I want to send alert when the lag is increasing. I want to use delta(kafka_lag[5m]) > 0 for 10m to send alert. But in some case, the rule fail.  As we can see from the chart, the lag up and down. But the lag is still increasing.   How to alert on this?

kafka_lag.jpeg 

Ranjib Dey

unread,
Apr 6, 2021, 11:55:20 AM4/6/21
to 张磊, Prometheus Users
:-) this is a very common problem. Your data is dynamic, and I’ll be surprised if Prometheus has anything built in to deal with it (would be wonderful if it has though )

You need something more sophisticated than a fixed threshold. Normally these type of algorithms are classified under anomaly detection. These algorithms will analyze your past data to deduce alert threshold for the moment dynamically based on historical values.  your data has trend (slowly increasing ) and seasonality (those two repeated peaks ), and a suitable algorithm will decompose (time series decomposition ) to extract those patterns and then compute the threshold at any given time based on those. There are limits to these type f algorithms , so you have to tune /choose based on your specific use case.

I’m curious if there’s any easy way to do this with Prometheus, let’s see what others say 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/639e4e56-abd4-464c-b871-2fee4a2cade5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages