finetune alertmanager

415 views
Skip to first unread message

ishu...@gmail.com

unread,
Oct 4, 2021, 7:17:20 AM10/4/21
to Prometheus Users
Hi All,

Any suggestion/advice is highly appreciated. 

I have prom with alertmanager set up working fine. But some alerts though being fired are not being ending up in PagerDuty. A look at the logs say it is to do with the severity not being set. I want to know if there is a global way of setting of severity at global level in alert.rules.yml and then overwrite it wherever it is required?

Also, when my prom query only outputs a number without any labels (for example sum(count(up{job=~"traefikv2",origin="k3s"}))) , the query returns only the value without any labels. does this still trigger an alert?

Thanks
Eswar

Brian Candler

unread,
Oct 4, 2021, 9:12:05 AM10/4/21
to Prometheus Users
On Monday, 4 October 2021 at 12:17:20 UTC+1 ishu...@gmail.com wrote:
I have prom with alertmanager set up working fine. But some alerts though being fired are not being ending up in PagerDuty. A look at the logs say it is to do with the severity not being set. I want to know if there is a global way of setting of severity at global level in alert.rules.yml and then overwrite it wherever it is required?

I don't use pagerduty, but as far as I can see, you set the severity statically in the pagerduty_config in alertmanager; if you don't set it, it defaults to "error".  So I can't see how it's possible to send an alert to pagerduty without a severity.  Perhaps you could show the actual logs?  Maybe your alertmanager configuration is using label 'severity' as a routing key, which would be something else (not related to pagerduty).
 

Also, when my prom query only outputs a number without any labels (for example sum(count(up{job=~"traefikv2",origin="k3s"}))) , the query returns only the value without any labels. does this still trigger an alert?


Yes it will.  The thing which triggers the alert is the presence of any timeseries with any value, i.e. a non-empty instant vector.  Even if there are no labels, the timeseries still exists.

I'm not sure why you want to sum() over count() though.  Unless you're doing "count by" then you'll only get a single count, and summing a single value just gives that value.  The expression
count(up{job="node"})
already returns a timeseries with no labels.

ishu...@gmail.com

unread,
Oct 4, 2021, 4:58:14 PM10/4/21
to Prometheus Users
Hi Brain,

Thanks for your reply.

1)Below is the log message of the error message. I have severity set up in alert manager config.yml 

  - name: 'pagerduty_prod_default'
    pagerduty_configs:
      - send_resolved: true
        routing_key: ${PAGERDUTY_PROD_DEFAULT_KEY}
        description: '{{ template "pagerduty.default.description" .}}'
        severity: '{{ .CommonLabels.severity }}'
        details:
          summary: |-
            {{ range .Alerts }}{{ .Annotations.summary }}
            {{ end }}
          severity: '{{ .CommonLabels.severity }}'
          status: '{{ .Status }}'

level=error ts=2021-10-01T12:52:34.264Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=31 err="pagerduty_prod_default/pagerduty[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 400: Event object is invalid: 'payload.severity' is missing or blank"

The severity comes from the alert. but I would like to know if there is a global way of setting up severity on override this per specific alert, this would reduce a lot of redundant lines in my config yml.


Yes it will.  The thing which triggers the alert is the presence of any timeseries with any value, i.e. a non-empty instant vector.  Even if there are no labels, the timeseries still exists.

2) I'm not sure why you want to sum() over count() though.  Unless you're doing "count by" then you'll only get a single count, and summing a single value just gives that value.  The expression
count(up{job="node"})
already returns a timeseries with no labels.
When the timeseries doesn't have any labels, how does the grouping of these alerts are handled. 


Thanks
Eswar

Brian Candler

unread,
Oct 5, 2021, 4:35:48 AM10/5/21
to Prometheus Users
>         severity: '{{ .CommonLabels.severity }}'

Ah right, so this is actually a templating problem.  You want to have a default value when *expanding the template*.

I don't do much with go templating, but it looks like the "or" function is what you want:

Googling for "go template examples" may help, which includes
and I also found an online tester:

Try these:

severity: '{{ or .CommonLabels.severity "error" }}'

severity: '{{if .CommonLabels.severity}}{{ .CommonLabels.severity }}{{ else }}error{{ end }}'

> When the timeseries doesn't have any labels, how does the grouping of these alerts are handled. 

You can add extra labels in your alerting rule.

ishu...@gmail.com

unread,
Oct 6, 2021, 4:04:06 AM10/6/21
to Prometheus Users
Thanks Brain.

That worked like a charm. 

Reply all
Reply to author
Forward
0 new messages