Filter metric between range hour and minutes

991 views
Skip to first unread message

Alen Cappelletti

unread,
Jun 24, 2022, 6:20:42 PM6/24/22
to Prometheus Users
Hi,
I'm try to write this simple code for Prometheus
but I don't understand how can I include also minutes... with a valide range of hour.

Alert could firing only between: 08:30 AM to all 09:00 P.M.

Here below the hours are in CET (+2 from Italy where I'm)

(count by (exported_instance, counter_instance) (database_status{job="aaaa", exported_instance="myserver", status!="ONLINE"})
and on() hour() >= 6 <= 19
......... miss minute .......
) or vector(0)

Thanks Alen

Brian Candler

unread,
Jun 25, 2022, 5:21:51 AM6/25/22
to Prometheus Users
Firstly, given that you have put "or vector(0)", I think you may misunderstand how alerting works in Prometheus.

PromQL expressions return vectors - a set of 0 or more values. In an alerting expression, the alert is treated as firing if the vector is non-empty - i.e. it contains 1 or more values, regardless of what those values actually are.  Therefore, the expression vector(0) gives an alert which fires all of the time, which isn't very useful.

Next, PromQL comparison operators are filters, not booleans.  Suppose you have the following metrics in your database:

node_disk_space{instance="a"} 100
node_disk_space{instance="b"} 200
node_disk_space{instance="c"} 300

The PromQL expression "node_disk_space > 150" returns a vector of 2 values:

node_disk_space{instance="b"} 200
node_disk_space{instance="c"} 300

That is, the expression "node_disk_space" returns a vector of all metrics with that metric name, and "node_disk_space > 150" filters it down to just those metrics whose value is over 150.  It does not return a "true" or "false" value (or values).

Similarly, "and/or/unless" don't work like booleans either.  The expression "node_disk_space > 150 or vector(0)" will return the following:

node_disk_space{instance="b"} 200
node_disk_space{instance="c"} 300
{} 0

In this case you get a vector of 3 values.  The explanation of how "or" works is here:
It's another vector operator, which matches the label sets of the LHS and RHS.

Now, let me go back to your original problem about time periods.  I think you're approach this the wrong way.

I believe the business rule amounts to this:  "I only want to receive alerts on this condition if the time falls between 8:30am and 9pm".  It's not that the problem doesn't happen outside business hours; it's that the problem isn't important enough to send a notification outside of business hours.

Therefore, the right way to handle this is with time periods within alertmanager, to control when the alerts are sent - not within the PromQL expression which determines whether there is a problem or not.

The way you do this is with time intervals in alertmanager routing trees. See:

Not only is this far easier to implement than attempting to do it in PromQL, it's also more flexible - for example you can have the same alert (from the same PromQL alerting rule) sent to different groups depending on the time of day.

Note that you can add labels to your alert in the alerting rule to categorise the alert, and you can match on those labels in your alert routing tree.  This gives you further flexibility to categorise your alerts in whatever way is useful to you.

Alen Cappelletti

unread,
Jun 25, 2022, 7:10:40 PM6/25/22
to Prometheus Users
Hi Brian, and thank you very much for your detailed answer... which I have read very carefully several times.

Maybe I forgot a detail in my question, that is: I'm using Grafana!
Your concepts also related to the muting of the reports are clear to me and absolutely correct. These are not related to the alerts in grafana, unfortunately, but to the communication points where the recipients of the messages are defined. 

So to simplify, it would be... in this particular case easier to fix it directly in the prom-QL code.
I would simply like to know how I can also include the 30 minutes only from 8:00 AM so that it becomes 8:30 AM... I don't know if exists the right syntax in prom-QL

Thanks again and have a nice day.
ALEN

Brian Candler

unread,
Jun 26, 2022, 3:49:59 AM6/26/22
to Prometheus Users
I see; so this is just to workaround the limited functionality of Grafana alerting.

Then I guess you can just modify the rule you already have, to use (hour() + minute()/60).

e.g. I tested this briefly:
(node_filesystem_avail_bytes < 10000000) and on () (hour() + minute()/60) >= 6.5 < 19

But it's pretty ugly.  For a long-running problem, the alert will be "resolved" at 19:00 and then re-activate at 06:30 the next day.

If you have a lot of this to do, then you could find out if Grafana can be plugged into an external system like OpsGenie or PagerDuty (I have no idea if it can; there is a separate discussion group for Grafana).  Or consider moving to Alertmanager.

Alen Cappelletti

unread,
Jun 26, 2022, 5:29:40 PM6/26/22
to Prometheus Users

Hi Brian,
thank you very much  for the snipped code.. 
it was just what I needed ... I was trying to translate it in my mind from SQL to prom-SQL but something was not right. Thanks again you have been very useful.
you're right when you said: "But it's pretty ugly..." but the IT departmen  informed me that outside that time period ... there may be maintenance procedures that could necessarily trigger it!
So it's ok. I looked on grafana ... and you can silence them .. but it is not routine, as I told you I must necessarily intervene in the query.
But it doesn't bother me.

I want to ask you another question on alertmanager, if you prefer I can open another thread. Anyway ... I have been working on a docker stack app from about 8 months and only now that I am nearing the end I'm dedicating to alerts. Honestly, initially I had used ALERTManager, but in Grafana there is a very similar management but I would say even more advanced in other aspects. Honestly, I have read a dozen articles and posts on the web, but it is not clear to me when it is preferable to use alertmanager over grafana. 

From what I understood alertmanager, I see it as a unique hub for managing alerts coming from multiple instances of Prometheus also on other networks, but maybe it's just my opinion as a not profound connoisseur.

Thanks again and have a nice day.
ALEN
Reply all
Reply to author
Forward
0 new messages