How to send another to a different receiver after 30 minutes?

1,069 views
Skip to first unread message

Frank

unread,
Jul 10, 2018, 7:02:39 AM7/10/18
to Prometheus Users
Hi!

I'd like to have certain critical alerts sent to be sent to chat/email only. If it hasn't been resolved within 30 minutes, send the same alert to pagerduty.

I'm on the latest versions of Prometheus (v2.3.1) and Alertmanager (v0.15.0). And Prometheus and Alertmanager are setup to send to our chat client, email and pagerduty for warning and critical alerts.

I've tried setting Alertmanager to wait 30 minutes, but seems to ignore the "group_wait: 30m" and sends the alert immediately to chat, email and pagerduty with the following in Alertmanager's config:

route:
  receiver: operations
  group_by: ['alertname', 'service']
  group_wait: 60s
  group_interval: 5m
  repeat_interval: 1h

  routes:
  # Send these alerts to pagerduty immediately
  - match_re:
      project: production
      pagerduty: true
    receiver: production_pagerduty

  # Send these alerts to chat, email. If not resolved in 30 minutes, send to pagerduty.
  - match_re:
      project: production
    receiver: production
    route:
      - match_re:
          project: production
          severity: critical
        receiver: production_pagerduty
        group_wait: 30m

inhibit_rules:
- source_match_re:
    severity: critical
  target_match_re:
    severity: warning
  equal: ['alertname']

receivers:
# Alert chat, email and pagerduty
- name: production_pagerduty
  slack_configs:
  - channel: '#test'
    title: '{{ template "slack.myorg.title" . }}'
    text: '{{ template "slack.myorg.text" . }}'
    send_resolved: true
  email_configs:
  - to: 'o...@company.tld'
    send_resolved: true
  pagerduty_configs:
  - routing_key: xxxxxxxxxx

# Just chat and email for now.
- name: production
  slack_configs:
  - channel: '#test'
    title: '{{ template "slack.myorg.title" . }}'
    text: '{{ template "slack.myorg.text" . }}'
    send_resolved: true
  email_configs:
  - to: 'o...@company.tld'
    send_resolved: true


How can a critical alert, that hasn't been resolved within 30 minutes, be sent to another receiver (e.g. pagerduty in our case.)?


Thanks,
Frank

Brian Brazil

unread,
Jul 10, 2018, 7:39:33 AM7/10/18
to Frank, Prometheus Users
This isn't something you can do with the Alertmanager, as escalation and other human responses are out of scope. You should be able to configure this as an escalation policy in PagerDuty though.
 
--

fr...@tablecheck.com

unread,
Jul 10, 2018, 8:47:21 PM7/10/18
to Prometheus Users
Thanks for the quick and clear answer, Brian!

Frank

unread,
Jul 17, 2018, 10:47:38 PM7/17/18
to Prometheus Users
Thank you to Brian Brazil and Ben Kochie (offline).

The solution that seems to be working is using the same alert name with adding `for: 30m` to one of them. And route them based on labels. Here's an example:


- alert: same_alert_name
  expr: metric{label="value"} == 1
  labels:
    severity: Warning
    name: backend
- alert: same_alert_name
  expr: metric{label="value"} == 2
  labels:
    severity: Critical
    name: backend
- alert: same_alert_name
  expr: metric{label="value"} == 2
  for: 30m
  labels:
    severity: Critical
    name: backend
    pagerduty: true


And in alertmanager:

route:
  routes:
  - match_re:
      name: backend
    receiver: slack
  - match_re:
      name: backend
      pagerduty: true
    receiver: pagerduty

inhibit_rules:
- source_match_re:
    severity: Critical
  target_match_re:
    severity: Warning
  equal: ['alertname']

receivers:
- name: slack
  slack_configs:
  ...

- name: pagerduty
  pagerduty_configs:
  ...



Thanks again Brian and Ben!
Reply all
Reply to author
Forward
0 new messages