Alertmanager configuration doubt.

31 views
Skip to first unread message

Borja UberSinestésico

unread,
May 6, 2022, 3:43:15 AM5/6/22
to Prometheus Users
Good morning,

I want to create a condition for an alert, and since my monitorization implies lots of up and downs when something is wrong, I am receiving lots of mails, because if something fails, tries to reboot, goes on and then of again for a while.

My current alerts looks like this:

 - alert: smarttools_unavailable
    expr: probe_http_status_code{job="smarttools_urls"}<= 199 OR probe_http_status_code{job="smarttools_urls"} >= 300
    for: 5m
    annotations:
      title: SmartTools unavailable
      summary: HTTP failure {{ $labels.platform }}
      description: "HTTP status = {{ $value }}"

Is there any way to indicates that the alarm must be off for 60 minutes or so before it triggers again ?

Thanks!

Brian Candler

unread,
May 6, 2022, 1:38:08 PM5/6/22
to Prometheus Users
On Friday, 6 May 2022 at 08:43:15 UTC+1 bjtdr...@gmail.com wrote:
    expr: probe_http_status_code{job="smarttools_urls"}<= 199 OR probe_http_status_code{job="smarttools_urls"} >= 300

Aside: that's probably not what you meant.  What I think you meant was:

   expr: probe_http_status_code{job="smarttools_urls"} <= 199 >= 300

These are not boolean conditions. These are filters.  "foo" is a vector of 0 or more timeseries; "foo <= 199" is a vector containing only those elements where the value is <= 199 (i.e. the result has the same or fewer number of elements as the LHS vector).  You can then further filter this with >= 300.

That doesn't answer your underlying question, which will probably involve suppressing the alarm using some combination of "unless", "min_over_time" and "max_over_time".
Reply all
Reply to author
Forward
0 new messages