Alertmanager - Alerts are only send to default receiver

2,350 views
Skip to first unread message

Sebastian Tiggelkamp

unread,
Sep 11, 2017, 9:05:58 AM9/11/17
to Prometheus Users
Hi! 

I'm trying to get alerts with different severities running. My problem is, that the alerts are always send to the default-receiver that is defined under "route". The child routes seem to be ignored...here is my config:

route:
  # The labels by which incoming alerts are grouped together. For example,
  # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
  # be batched into a single group.
  group_by: ['alertname', 'instance', 'severity', 'job']

  # When a new group of alerts is created by an incoming alert, wait at
  # least 'group_wait' to send the initial notification.
  # This way ensures that you get multiple alerts for the same group that start
  # firing shortly after another are batched together on the first
  # notification.
  group_wait: 30s

  # When the first notification was sent, wait 'group_interval' to send a batch
  # of new alerts that started firing for that group.
  group_interval: 5m

  # If an alert has successfully been sent, wait 'repeat_interval' to
  # resend them.
  repeat_interval: 3h

  # A default receiver
  receiver: info-receiver

  # All the above attributes are inherited by all child routes and can
  # overwritten on each.

  # The child route trees.
  routes:
    # The service has a sub-route for critical and warning alerts, any alerts
    # that do not match, i.e. severity != critical and != warning, fall-back to the
    # parent node and are sent to 'info-receiver'
  - match:
      severity: critical
      job: prod
      receiver: critical-prod-receiver
  - match:
      severity: critical
      job: uat
      receiver: critical-uat-receiver
  - match:
      severity: warning
      receiver: warning-receiver


# Inhibition rules allow to mute a set of alerts given that another alert is
# firing.
# We use this to mute any warning-level notifications if the same alert is
# already critical.
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  # Apply inhibition if the alertname is the same.
  equal: ['alertname', 'instance', 'severity', 'job']

receivers:
- name: 'critical-prod-receiver'
  slack_configs:
    - channel: '#alertmanager'
      title: '{{ range .Alerts }} PROD, {{ Labels.severity }} {{ .Annotations.summary }} {{ end }}'
      text: '{{ range .Alerts }}{{ .Annotations.description }} {{ end }}'
      send_resolved: true

- name: 'critical-uat-receiver'
  slack_configs:
      - channel: '#alertmanager'
        title: '{{ range .Alerts }} PROD, {{ Labels.severity }} {{ .Annotations.summary }} {{ end }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }} {{ end }}'
        send_resolved: true

- name: 'warning-receiver'
  slack_configs:
        - channel: '#alertmanager'
          title: '{{ range .Alerts }} WARNING, {{ Labels.severity }} {{ .Annotations.summary }} {{ end }}'
          text: '{{ range .Alerts }} {{ .Annotations.description }} {{ end }}'
          send_resolved: true

- name: 'info-receiver'
  slack_configs:
      - channel: '#alertmanager'
        title: '{{ range .Alerts }} INFO, {{ Labels.severity }} {{ .Annotations.summary }} {{ end }}'
        text: '{{ range .Alerts }}{{ .Annotations.description }} {{ end }}'
        send_resolved: true

Sample output from /api/v1/alerts:

{
    "status": "success",
    "data": [
        {
            "labels": {
                "alertname": "instance_down",
                "instance": "cadvisor2:8080",
                "job": "prod",
                "monitor": "my-project",
                "severity": "critical"
            },
            "annotations": {
                "description": "cadvisor2:8080 on prod has been down for more than 5 minutes.",
                "summary": "Instance cadvisor2:8080 down"
            },
            "startsAt": "2017-09-11T12:42:22.811Z",
            "endsAt": "2017-09-11T12:48:52.817332539Z",
            "status": {
                "state": "active",
                "silencedBy": [],
                "inhibitedBy": []
            },
            "receivers": [
                "info-receiver"
            ]
        }
    ]
}

alert rule in Prometheus:

ALERT instance_down
  IF up == 0
  FOR 1m
  LABELS {
    severity = "critical",
    job = "{{ $labels.job }}"
  }

  ANNOTATIONS {
    summary = "Instance {{$labels.instance}} down",
    description = "{{$labels.instance}} on {{$labels.job}} has been down for more than 5 minutes."
  }

Also tried it just with the severity without the job label, didn't work either. Hopefully someone sees what I'm missing :) 

Thanks and regards,
Sebastian

Brian Brazil

unread,
Sep 11, 2017, 1:24:05 PM9/11/17
to Sebastian Tiggelkamp, Prometheus Users
Your indentation is off here, this is looking for a label called "receiver" rather than specifying the receiver.

Brian

 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c2b0b4e8-14d8-489e-b95d-3021062b7158%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Sebastian Tiggelkamp

unread,
Sep 12, 2017, 3:16:11 AM9/12/17
to Prometheus Users
Argh, thanks that's it! :) 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--
Reply all
Reply to author
Forward
0 new messages