Alertmanager send a message containing two alerts with different status for the same alert

2,636 views
Skip to first unread message

ron...@gmail.com

unread,
Aug 3, 2017, 11:04:36 AM8/3/17
to Prometheus Users
Hello,

My alertmanager sends alerts to a custom webhook.
I've noticed that when an alert is fired, the message contains two alerts in the "alerts" array, one with status "resolved" and the other with status "firing".
When the alert is resolved, the message contains two alerts, both with "resolved" status.

Alert is fired:

{
    "receiver": "mts_webhook",
    "status": "firing",
    "alerts": [
      {
        "status": "resolved",
        "labels": {
          "alertname": "ServiceStopped",
          "instance": "my_ip:my_port",
          "monitor": "monitor",
          "name": "component_name"
        },
        "annotations": {},
        "startsAt": "2017-08-03T14:00:22.803Z",
        "endsAt": "2017-08-03T14:00:37.815Z",
        "generatorURL": "..."
      },
      {
        "status": "firing",
        "labels": {
          "alertname": "ServiceStopped",
          "container_label_build_date": "20170510",
          "container_label_license": "GPLv2",
          "container_label_name": "CentOS Base Image",
          "container_label_vendor": "CentOS",
          "id": "/docker/bf2b68afe1255b211e436531a2265373aa8aab8f8699dc15b4972bb8f671e243",
          "image": "my_image:latest",
          "instance": "my_ip:my_port",
          "job": "cAdvisor",
          "monitor": "monitor",
          "name": "component_name"
        },
        "annotations": {},
        "startsAt": "2017-08-03T14:02:07.815Z",
        "endsAt": "0001-01-01T00:00:00Z",
        "generatorURL": "..."
      }
    ],
    "groupLabels": {
      "alertname": "ServiceStopped",
      "instance": "my_ip:my_port"
    },
    "commonLabels": {
      "alertname": "ServiceStopped",
      "instance": "my_ip:my_port",
      "monitor": "monitor",
      "name": "component_name"
    },
    "commonAnnotations": {},
    "version": "4",
    "groupKey": "{}:{alertname=\"ServiceStopped\", instance=\"my_ip:my_port\"}"
  }

Alert is resolved:

{
    "receiver": "mts_webhook",
    "status": "resolved",
    "alerts": [
      {
        "status": "resolved",
        "labels": {
          "alertname": "ServiceStopped",
          "instance": "my_ip:my_port",
          "monitor": "monitor",
          "name": "component_name"
        },
        "annotations": {},
        "startsAt": "2017-08-03T14:00:22.803Z",
        "endsAt": "2017-08-03T14:00:37.815Z",
        "generatorURL": "..."
      },
      {
        "status": "resolved",
        "labels": {
          "alertname": "ServiceStopped",
          "container_label_build_date": "20170510",
          "container_label_license": "GPLv2",
          "container_label_name": "CentOS Base Image",
          "container_label_vendor": "CentOS",
          "id": "/docker/bf2b68afe1255b211e436531a2265373aa8aab8f8699dc15b4972bb8f671e243",
          "image": "my_image:latest",
          "instance": "my_ip:my_port",
          "job": "cAdvisor",
          "monitor": "monitor",
          "name": "component_name"
        },
        "annotations": {},
        "startsAt": "2017-08-03T14:02:07.815Z",
        "endsAt": "2017-08-03T14:04:37.815Z",
        "generatorURL": "..."
      }
    ],
    "groupLabels": {
      "alertname": "ServiceStopped",
      "instance": "my_ip:my_port"
    },
    "commonLabels": {
      "alertname": "ServiceStopped",
      "instance": "my_ip:my_port",
      "monitor": "monitor",
      "name": "component_name"
    },
    "commonAnnotations": {},
    "version": "4",
    "groupKey": "{}:{alertname=\"ServiceStopped\", instance=\"my_ip:my_port\"}"
  }

My configuration:

global:  

templates: 
- '/etc/alertmanager/template/*.tmpl'

route:  
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 3h 

  receiver: mts_webhook

inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['alertname', 'instance']

receivers:    
- name: 'mts_webhook'
  webhook_configs:
  - url: '...'

Is this the normal behavior I should expect?
Thanks in advance.

thri...@gmail.com

unread,
Oct 20, 2017, 1:26:50 PM10/20/17
to Prometheus Users
Any solution for this? I am also facing same issue. The alert is fired with both status (firing and resolved).

jdbj...@gmail.com

unread,
Oct 21, 2017, 2:05:12 PM10/21/17
to thri...@gmail.com, Prometheus Users
Thrioksn, not sure your case, but from the logs/config files of the first email, I can see two problems:

1) They are not the same alert. You can see this by checking that one has a endsAt set and the other does not (even though it has a value on the second one, it's the zero value for Time in golang, it means that no value was set there), and they differ on the labels. They have the same name, but they are not the same alert.

2) Alertmanager configurations say it should group alerts by alertname, and instance and both alerts match on it. Same name, and same instance. Probably a bug, and two solutions I think that make sense are:
 
- alertmanager will take the alert status into account when grouping, so in this case, those alerts wouldn't be grouped even though they match on alertname and instance

or

- remove the root 'status' from the webhook payload, as each alert in the group has information about their own status

I've created an issue to discuss it further https://github.com/prometheus/alertmanager/issues/1051.

One thing that intrigues me is why are alertmanager receiving two alerts with the same name, a few matching labels but not all. How are you configuring prometheus alert.rules?


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a1544c6f-aed8-4494-9915-3f1c354ee11e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ron...@gmail.com

unread,
Nov 29, 2017, 6:20:51 AM11/29/17
to Prometheus Users
I found the reason - and a solution.
I've posted the details in the discussion:

בתאריך יום שישי, 20 באוקטובר 2017 בשעה 20:26:50 UTC+3, מאת thri...@gmail.com:
Reply all
Reply to author
Forward
0 new messages