Prometheus Alert per day

1,486 views
Skip to first unread message

Govind Madhu

unread,
Aug 13, 2021, 1:55:15 AM8/13/21
to Prometheus Users
Hi,

  I am trying to define an alert rule which fires everyday at a specific time if a request from a specific  client doesn't happen at least once. My expression is 

sum(increase(request_Count{key="clientId"}[24h])) < 1 and ON() hour() > 1 < 3

scrape interval : 30s (I cannot change this) 

I tried the expression in grafana and I can see it is coming as expected. I can see metrics for the requests coming But defining the same expression in an alert, I am getting alerts. I have double checked  the metrics and it is having data and it shouldn't fire these alerts.

Can someone help me out here and point out if something is wrong?

Brian Candler

unread,
Aug 13, 2021, 2:50:40 AM8/13/21
to Prometheus Users
> But defining the same expression in an alert, I am getting alerts

Do you mean *not* getting alerts?

You mentioned grafana; this makes me wonder are you using a grafana alert, instead of a prometheus alerting rule?  If so, that's a grafana issue, not a prometheus one.  But for now I'm going to assume you're talking about prometheus alerting rules.  I also suggest you use prometheus' built-in query browser (typically at x.x.x.x:9090), rather than grafana, for testing.

Any expression which is in the promQL browser which shows any value at all (even zero) generates an alert; when the graph is empty, there's no alert.  Therefore, if you put

sum(increase(request_Count{key="clientId"}[24h])) < 1 and ON() hour() > 1 < 3

into the promQL browser and select graph mode, do you see any lines?  If so, you will get alerts.  If you don't, then first check the prometheus console 'alerts' tab to see if the alert is firing there (just to ensure it's nothing to do with alertmanager not routing the alert properly) or at least is visible as an inactive rule (to ensure that prometheus has read this rule in).  Other possible problems are that your rule is not being evaluated at a short enough interval, or that you have a "for:" value which means it has to trigger multiple times, taking longer than an hour.  Since you didn't show your full alerting rule, I'm only speculating here.

To simplify this problem, change your query to something you *know* has a value, e.g.

up == 1 and ON() hour() > 1 < 3

When I do this in the PromQL browser, set to 'graph' mode and set the duration to 1d or longer, then I can see the expression generating a value between 2am and 3am.  Therefore if put into an alerting rule, it should also generate an alert overnight.

Govind Madhu

unread,
Aug 13, 2021, 3:52:31 AM8/13/21
to Prometheus Users
Hi,

  As you had suggested, I tried to see it from the console. It was coming as 0. So I triggered a request and then later on evaluating the same expression, saw it coming as blank. But the other similar alerts were still showing 0  on evaluating the expression and the alert was on. But as I said I can see these metrics still coming even though the alert expressions results in 0. 

          "alert_name": "test_alert",
          "annotation_labelname": "Summary",
          "annotation_labelvalue": "Test alert triggered",
          "expr": "sum(increase(request_Count{key="clientId"}[24h])) < 1 and ON() hour() > 1 < 3",
          "for": "30s"

And could you please share the same query with "up" ? 

Govind Madhu

unread,
Aug 13, 2021, 3:54:32 AM8/13/21
to Prometheus Users
To add, usually this request happens around 16:00 UTC. Not sure if that causes the issue.

Govind Madhu

unread,
Aug 13, 2021, 4:01:53 AM8/13/21
to Prometheus Users
And additionally, when I evaluate the expression request_Count{key="clientId"}[24h], I can see the value coming as 2

Brian Candler

unread,
Aug 13, 2021, 1:35:54 PM8/13/21
to Prometheus Users
A query expression which generates *any* value - even zero - will trigger an alert.  It is the presence of a value, not the actual value, which fires it.

If you consider the expression

    foo > 4

this is a filter.  If the value of foo is 3 then it has no value, and if the value of foo is 5, then it has value 5.  It doesn't return a boolean true or false.

There is another expression: foo > bool 4, which returns 0 or 1.  But that would trigger an alert continuously, as long as there is any metric "foo".  See:

Brian Candler

unread,
Aug 13, 2021, 1:37:22 PM8/13/21
to Prometheus Users
> And additionally, when I evaluate the expression request_Count{key="clientId"}[24h], I can see the value coming as 2

That expression gives a range vector as its result: a table of (time X value).  It doesn't really make sense in the context of a graph, nor an alerting query.  However the PromQL will let you see it in console mode - it returns a table of values across the given interval.

Govind Madhu

unread,
Aug 13, 2021, 1:58:10 PM8/13/21
to Prometheus Users
Additionally I tried to fetch the metric with an offset of 1d like below.

request_Count{key="clientId"}[24h] offset 1d


It was not coming in console. Current UTC time is 17:53. I tried with an offset of 6h, it was still not coming.

When I tried with an offset of 5h53m,  I can see some value coming for the above expression. I was under the impression that with offset you could get the metric from previous days, but I am unable to get it.

Brian Candler

unread,
Aug 13, 2021, 3:17:51 PM8/13/21
to Prometheus Users
That is a range vector, and I still don't know what you're trying to do.

An instant vector, like
    request_Count{key="clientId"} offset 1d
should work just fine.

Govind Madhu

unread,
Aug 13, 2021, 11:10:42 PM8/13/21
to Prometheus Users
I was checking if I can access metrics over a period of time. But with offset 1d, I am not getting anything with this expression

request_Count{key="clientId"} offset 1d

Do you know what could be the reason? 

Brian Candler

unread,
Aug 14, 2021, 8:11:58 AM8/14/21
to Prometheus Users
Without seeing your metric, I have no idea.

Compare the graphs of:

request_Count{key="clientId"} 
and
request_Count{key="clientId"}  offset 1d

and they should be the same but slid across by 1 day.  Unless you've set your retention time so low that you're retaining less than 1 day's worth of data?

Reply all
Reply to author
Forward
0 new messages