Hi Brain,
Thanks for your reply.
1)Below is the log message of the error message. I have severity set up in alert manager config.yml
- name: 'pagerduty_prod_default'
pagerduty_configs:
- send_resolved: true
routing_key: ${PAGERDUTY_PROD_DEFAULT_KEY}
description: '{{ template "pagerduty.default.description" .}}'
severity: '{{ .CommonLabels.severity }}'
details:
summary: |-
{{ range .Alerts }}{{ .Annotations.summary }}
{{ end }}
severity: '{{ .CommonLabels.severity }}'
status: '{{ .Status }}'
level=error ts=2021-10-01T12:52:34.264Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=31 err="pagerduty_prod_default/pagerduty[0]: notify retry canceled due to unrecoverable error after 1 attempts: unexpected status code 400: Event object is invalid: 'payload.severity' is missing or blank"
The severity comes from the alert. but I would like to know if there is a global way of setting up severity on override this per specific alert, this would reduce a lot of redundant lines in my config yml.
Yes it will. The thing which triggers the alert is the presence of any timeseries with any value, i.e. a non-empty instant vector. Even if there are no labels, the timeseries still exists.
2) I'm not sure why you want to sum() over count() though. Unless you're doing "count by" then you'll only get a single count, and summing a single value just gives that value. The expression
count(up{job="node"})
already returns a timeseries with no labels.
When the timeseries doesn't have any labels, how does the grouping of these alerts are handled.
Thanks
Eswar