AlertManager rules examples

110 views
Skip to first unread message

Eulogio Apelin

unread,
Jan 13, 2023, 6:08:37 AM1/13/23
to Prometheus Users
I'm looking for information, primarily examples, of various ways to configure alert rules.

Specifically, scenarios like:

In a single rule group:
Regular expression that determined a tls cert expires in 60 days. send 1 alert
Regular expression that determined a tls cert expires in 40 days, send 1 alert
Regular expression that determined a tls cert expires in 30 days, send 1 alert
Regular expression that determined a tls cert expires in 20 days, send 1 alert
Regular expression that determined a tls cert expires in 10 days, send 1 alert
Regular expression that determined a tls cert expires in 5 days, send 1 alert
Regular expression that determined a tls cert expires in 0 days, send 1 alert

Another scenario is to 
send an alert once day to an email address.
send an alert if it's the 3rd day in a row, send the alert to another set of address. and stop alerting.

can alertmanager send alerts to teams like it does slack?

And another other general examples of alert manager rules.

Thanks!

Stuart Clark

unread,
Jan 13, 2023, 6:53:14 AM1/13/23
to Eulogio Apelin, Prometheus Users
I think it is best not to think of alerts as moment in time events but
as being a time period where a certain condition is true. Separate to
the actual alert firing are then rules (in Alertmanager) of how to route
it (e.g. to Slack, email, etc.), what to send (email body template) and
how often to remind people that the alert is happening.

So for example with your TLS expiry example you might have an alert
which starts firing once a certificate is within 60 days of expiry. It
would continue to fire continuously until either the certificate is
renewed (i.e. it is over 60 days again) or stops existing (because
you've reconfigured Prometheus to no longer monitor that certificate).
Then within Alertmanager you can set rules to send you a message every
10 days that alert is firing, meaning you'd get a message at 60, 50, 40,
etc days until expiry.

More complex alerting routing decisions are generally out of scope for
Alertmanager and would be expected to be managed by a more complex
system (such as PagerDuty, OpsGenie, Grafana On-Call, etc.). This would
cover you example of wanting to escalate an alert after a period of
time, but would also cover things like having on-call rotas where
different people would be contacted by looking at a rota calendar.

--
Stuart Clark

Eulogio Apelin

unread,
Jan 17, 2023, 4:11:45 PM1/17/23
to Prometheus Users
Thanks for the info. it helps.

Would be nice if there are examples on web pages or you tube vids.  We also have Grafana, but it sounds like the engineers are trying to maybe pick alertmanager over grafana as it currently is a mix and it's not straight forward to us when configuring both.  Mainly because we don't have a dedicated person working on alerts.  It tends to be the lower 10-20% on the priority list for us and with other companies i've been with also deal with this in the same way.  Just my 2 cents on this

The lazy in my just wants to click click click and be done.

Stuart Clark

unread,
Jan 18, 2023, 5:34:47 AM1/18/23
to Eulogio Apelin, Prometheus Users
Grafana does have its own alerting solution, but that's not something to do with anything Prometheus. You'd need to ask the Grafana lists around how to do it with that option.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Reply all
Reply to author
Forward
0 new messages