exporter crashes, multiple up-metric-related alerts fire

21 views
Skip to first unread message

Mario Cornaccini

unread,
May 24, 2023, 5:33:44 AM5/24/23
to Prometheus Users
hi, we got a custom made exporter, checking for running linux processes..

i got 1 scrape job with 27 targets;
also there is a general alert rule:
expr: up==0

so, when the exporter crashed, we got 27 alerts.

is there a way or best-practice to detect that the exporter crashed, and inhibit the 27 alerts, and just have one alert, exporter crashed ?

(happy to give any more info if needed)
cheers, fil

Ben Kochie

unread,
May 24, 2023, 5:51:39 AM5/24/23
to Mario Cornaccini, Prometheus Users
This is what the `group_by` feature in the alertmanager routing is for.


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b6796581-dcff-4577-a725-97e9e0e23c69n%40googlegroups.com.

Mario Cornaccini

unread,
May 25, 2023, 3:24:39 AM5/25/23
to Prometheus Users
of course !
thank you;

totally overlooked that, because at some point we went for group_by: [...], because we didn't like how when 1 alert of a group resolves, that we then get a new notification, that just the remaining firing one is active;
(a matter of taste somehow, i'd say); 'guess we cannot have the cake and eat the cake,so to say ;-)

cheers,fil




Brian Candler

unread,
May 25, 2023, 4:31:41 AM5/25/23
to Prometheus Users

Conall O'Brien

unread,
May 31, 2023, 6:37:57 AM5/31/23
to Mario Cornaccini, Prometheus Users
On Thu, 25 May 2023 at 08:24, Mario Cornaccini <fris...@gmail.com> wrote:
of course !
thank you;

totally overlooked that, because at some point we went for group_by: [...], because we didn't like how when 1 alert of a group resolves, that we then get a new notification, that just the remaining firing one is active;
(a matter of taste somehow, i'd say); 'guess we cannot have the cake and eat the cake,so to say ;-)

If you need further tuning after setting group_by, I would tweak group_wait value in alertmanager and/or the alert hysteresis set using the for operator of the alerting rule, https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/#defining-alerting-rules
 
On Wednesday, May 24, 2023 at 11:51:39 AM UTC+2 Ben Kochie wrote:
This is what the `group_by` feature in the alertmanager routing is for.


On Wed, May 24, 2023 at 11:33 AM Mario Cornaccini <fris...@gmail.com> wrote:
hi, we got a custom made exporter, checking for running linux processes..

i got 1 scrape job with 27 targets;
also there is a general alert rule:
expr: up==0

so, when the exporter crashed, we got 27 alerts.

is there a way or best-practice to detect that the exporter crashed, and inhibit the 27 alerts, and just have one alert, exporter crashed ?

(happy to give any more info if needed)
cheers, fil

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b6796581-dcff-4577-a725-97e9e0e23c69n%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages