Alertmanager

44 views
Skip to first unread message

Jeová Pereira

unread,
Jan 5, 2023, 3:25:37 AM1/5/23
to Prometheus Users
Olá, gostaria de delimitar uma expressão com a necessidade de ter um alerta apenas para horário comercial, exemplo efetuar uma consulta de quantidade de erros 500 durante todo o período comercial do dia 27/12 (08:00 as 18:00).


Hello, I would like to delimit an expression with the need to have an alert only for business hours, a trained example of a query with the number of errors 500 during the entire business period on 12/27 (08:00 to 18:00).

Brian Candler

unread,
Jan 5, 2023, 4:07:46 AM1/5/23
to Prometheus Users
I'm not sure exactly what you're asking for.

On the one hand, you talk about generating "alerts". Alerts generally react to conditions in real time, i.e. how things are "now".

If you don't want to send alerts outside business hours, normally you would control this using Alertmanager routes with time intervals defined: https://prometheus.io/docs/alerting/latest/configuration/#time_interval

That is, you allow the expression to indicate an error condition at any time of day, but configure alertmanager not to send it outside of business hours.

On the other hand, you seem to be saying that you're calculating some figure for SLA reporting (the number of 500 errors during a particular period during a business day).

If you know the start and end time of the period that you're interested in, then you can use the '@' modifier to specify the end of the period, with a range vector that covers the size of the period, e.g. [10h]

If you want to do some sort of query relative to the current day (e.g. "the number of 500 errors in the 10 hour period ending at 18:00 yesterday") then it's difficult to do within prometheus.

You could have a recording rule which measures the error rate over the previous 10 hours, have an alerting expression on it, and mute that alert outside the period 17:55 to 18:05.  Then you'll get an alert once per day, around 18:00, if the SLA for that day was breached.

You could write some external code which does an API query at 18:00 every day, stores the result somewhere (e.g. to node_exporter textfile-collector, or to pushgateway), and then that value is scraped back into prometheus.  That makes the value available to other Prometheus expressions, including alerts.  Or you could store the result in a SQL database.


I would argue that SLA reporting should live outside of Prometheus.  Questions like "in how many days last month was the SLA breached?" are difficult to express in PromQL.
Reply all
Reply to author
Forward
0 new messages