Prometheus alert not working for an event based metric

63 views
Skip to first unread message

Arnav Bose

unread,
Oct 19, 2020, 5:02:39 PM10/19/20
to Prometheus Users
Hi,

I have a metric which sends data only in case of a specific event. In other words, it will not have regular continuous telemetry every 15 seconds, but it will be there for a single data point. Now, I configured an alert to check if the metric exists (did not include any 'for' condition in the rule as there is no need to wait), an alert should trigger. What is happening is that when the metric becomes available in Prometheus, the rule gets activated to yellow, but instead of firing,  it automatically goes away within 15 seconds. Is there some additional setting I have to do to alert on event based metric which lasts for only 1 single datapoint or 15s ? 

Thanks,
Arnav 

Tim Schwenke

unread,
Oct 19, 2020, 6:54:49 PM10/19/20
to Prometheus Users
What type of metric are we talking about? A gauge? Prometheus scrapes it's targets continuously so there should be a continuous stream of data points forming a series, not just a single data point when a event occurs. This sounds more like a log to me. It would be helpful if you could post your rule. It sounds like a gauge that goes up to 1 and than back to 0 directly, flapping around like that. For events I recommend you to switch to a counter instead.

Since you have mentioned the alert going into a pending state (yellow) and than going back within 15 seconds I assume that you evaluation interval is above 15s. If it was below that, the alert should have fired. You can try putting the rule into it's own group with a low evaluation interval (5s for example) just to try it out. But in the long term you should do something about the rule and / or the metric itself

arnav...@gmail.com

unread,
Oct 19, 2020, 7:15:18 PM10/19/20
to Prometheus Users
K_Event_Count{EvId="24171643",EvMessage="fan alarm"}
This event message and this event Id will be available for this metric for only one data point. It will not be available in the next datapoint. In other words, its availability will be only for 15s. 

Brian Candler

unread,
Oct 20, 2020, 2:40:52 AM10/20/20
to Prometheus Users
On Tuesday, 20 October 2020 00:15:18 UTC+1, arnav...@gmail.com wrote:
K_Event_Count{EvId="24171643",EvMessage="fan alarm"}
This event message and this event Id will be available for this metric for only one data point. It will not be available in the next datapoint. In other words, its availability will be only for 15s. 


That's not a true prometheus metric then.

If it's a counter, then it would have some value (say 1234) which persists indefinitely, visible on every scrape.  Then when a fan alarm event happens, it would go to 1235, and then remain at that value thereafter until the next event.  If you don't have a mechanism to keep the counter around, then you can use stats_exporter to do it for you.

If it's a gauge, then it will have value 1 while the fan alarm condition is present, and 0 while the fan alarm condition is not present.

Stuart Clark

unread,
Oct 20, 2020, 3:40:55 AM10/20/20
to Arnav Bose, Prometheus Users
As Brian mentions the metric should continue to exist, rather than
regularly disappearing. Normally you would use a counter which is
incremented every time an event happens. Are you using one of the
Prometheus client libraries, as these should make it very easy to setup?

arnav...@gmail.com

unread,
Oct 22, 2020, 7:50:08 PM10/22/20
to Prometheus Users
I changed the query to count_over_time of the metric for 1m and that worked. Even though the particular labels are available only for 15s or less, the count_over_time made it available long enough to trigger the alert. Thanks for pointer. I was not checking the value earlier. 

Thanks,
Bose

Reply all
Reply to author
Forward
0 new messages