Alertmanager configuration doubt.

31 views

Skip to first unread message

Borja UberSinestésico

unread,

May 6, 2022, 3:43:15 AM5/6/22

to Prometheus Users

Good morning,

I want to create a condition for an alert, and since my monitorization implies lots of up and downs when something is wrong, I am receiving lots of mails, because if something fails, tries to reboot, goes on and then of again for a while.

My current alerts looks like this:

- alert: smarttools_unavailable

expr: probe_http_status_code{job="smarttools_urls"}<= 199 OR probe_http_status_code{job="smarttools_urls"} >= 300

for: 5m

annotations:

title: SmartTools unavailable

summary: HTTP failure {{ $labels.platform }}

description: "HTTP status = {{ $value }}"

Is there any way to indicates that the alarm must be off for 60 minutes or so before it triggers again ?

Thanks!

Brian Candler

unread,

May 6, 2022, 1:38:08 PM5/6/22

to Prometheus Users

On Friday, 6 May 2022 at 08:43:15 UTC+1 bjtdr...@gmail.com wrote:

expr: probe_http_status_code{job="smarttools_urls"}<= 199 OR probe_http_status_code{job="smarttools_urls"} >= 300

Aside: that's probably not what you meant. What I think you meant was:

expr: probe_http_status_code{job="smarttools_urls"} <= 199 >= 300

These are not boolean conditions. These are filters. "foo" is a vector of 0 or more timeseries; "foo <= 199" is a vector containing only those elements where the value is <= 199 (i.e. the result has the same or fewer number of elements as the LHS vector). You can then further filter this with >= 300.

That doesn't answer your underlying question, which will probably involve suppressing the alarm using some combination of "unless", "min_over_time" and "max_over_time".

Reply all

Reply to author

Forward

0 new messages