Prometheus AlertManager Alert Grouping

44 views
Skip to first unread message

Zhang Zhao

unread,
Jun 17, 2020, 9:00:26 PM6/17/20
to Prometheus Users
Hi, I have a question for alert grouping in AlertManager. I integrated Prometheus Alerts to ServiceNow via Webhook.  I see the events were captured on ServiceNow side as below. However, inside each of events below, there were multiple alerts included. Is there a way to break it off so that one alert from Prometheus corresponds to one event on ServiceNow? I tried to group_by by alertname and status, but it didn't work as expected. Seems have to add other condition in group_by setting. Thanks.
image.png


global:
  resolve_timeout: 5m
receivers:
- name: prometheus-snow
  webhook_configs:
  - url: "https://"
    http_config:
      basic_auth:
        username:
        password: 
route:
  group_by: ['alertname','status']
  group_interval: 10m
  group_wait: 5m
  repeat_interval: 5m
  receiver: prometheus-snow







{
   "receiver":"prometheus-snow",
   "status":"firing",
   "alerts":[
      {
         "status":"firing",
         "labels":{
            "alertname":"Critical_TEST",
            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "endpoint":"https-metrics",
            "geo":"us",
            "instance":"172.25.33.132:10250",
            "job":"kubelet",
            "metrics_path":"/metrics",
            "namespace":"kube-system",
            "node":"aks-esprepmedv01-44274363-vmss000000",
            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-operator-kubelet",
            "severity":"critical"
         
},
         "annotations":{
            "message":"This is for ServiceNow integration testing."
         
},
         "startsAt":"2020-06-14T17:42:40.558Z",
         "endsAt":"0001-01-01T00:00:00Z",
         "generatorURL":"http://prometheus-prometheus-oper-prometheus.espr-prometheus-nonprod:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1",
         "fingerprint":"52169c17bfa388eb"
      
},
      {
         "status":"firing",
         "labels":{
            "alertname":"Critical_TEST",
            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "endpoint":"https-metrics",
            "geo":"us",
            "instance":"172.25.33.132:10250",
            "job":"kubelet",
            "metrics_path":"/metrics",
            "namespace":"kube-system",
            "node":"aks-esprepmedv01-44274363-vmss000000",
            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-prometheus-oper-kubelet",
            "severity":"critical"
         
},
         "annotations":{
            "message":"This is for ServiceNow integration testing."
         
},
         "startsAt":"2020-06-14T17:42:40.558Z",
         "endsAt":"0001-01-01T00:00:00Z",
         "generatorURL":"http://prometheus-prometheus-oper-prometheus.espr-prometheus-nonprod:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1",
         "fingerprint":"0f7d36efbba0e03c"
      
},
      {
         "status":"firing",
         "labels":{
            "alertname":"Critical_TEST",
            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "endpoint":"https-metrics",
            "geo":"us",
            "instance":"172.25.33.132:10250",
            "job":"kubelet",
            "metrics_path":"/metrics/cadvisor",
            "namespace":"kube-system",
            "node":"aks-esprepmedv01-44274363-vmss000000",
            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-operator-kubelet",
            "severity":"critical"
         
},
         "annotations":{
            "message":"This is for ServiceNow integration testing."
         
},
         "startsAt":"2020-06-14T17:42:40.558Z",
         "endsAt":"0001-01-01T00:00:00Z",
         "generatorURL":"http://prometheus-prometheus-oper-prometheus.espr-prometheus-nonprod:9090/graph?g0.expr=up+%3D%3D+0&g0.tab=1",
         "fingerprint":"4f6c2a8be6e9985d"
      
},
      {
         "status":"firing",
         "labels":{
            "alertname":"Critical_TEST",
            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "endpoint":"https-metrics",
            "geo":"us",
            "instance":"172.25.33.132:10250",
            "job":"kubelet",
            "metrics_path":"/metrics/cadvisor",
            "namespace":"kube-system",
            "node":"aks-esprepmedv01-44274363-vmss000000",
            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-prometheus-oper-kubelet",
            "severity":"critical"
         
},

Christian Hoffmann

unread,
Jun 18, 2020, 7:06:57 AM6/18/20
to Zhang Zhao, Prometheus Users
On 6/18/20 3:00 AM, Zhang Zhao wrote:
> Hi, I have a question for alert grouping in AlertManager. I integrated
> Prometheus Alerts to ServiceNow via Webhook.  I see the events were
> captured on ServiceNow side as below. However, inside each of events
> below, there were multiple alerts included. Is there a way to break it
> off so that one alert from Prometheus corresponds to one event on
> ServiceNow? I tried to group_by by alertname and status, but it didn't
> work as expected. Seems have to add other condition in group_by setting.
> Thanks.
> image.png

Sounds like you are looking for the magic
group_by: ['...']

option. :)

Kind regards,
Christian

Zhang Zhao

unread,
Jun 18, 2020, 12:23:39 PM6/18/20
to Christian Hoffmann, Prometheus Users
Thanks, Christian. I found it in the documentation as well. :)
Reply all
Reply to author
Forward
0 new messages