Prometheus AlertManager integration with MSTeams

129 views
Skip to first unread message

Zhang Zhao

unread,
Jun 3, 2020, 5:42:51 AM6/3/20
to Prometheus Users
I was trying to integrate MSTeams with Prometheus AlertManager so that the alert will be able to feed to MSTeams channels. I tried the solution on GitHub as URL below.

Below are the template and testing alert. The issue I ran into was the template was not able to parse my testing alert correctly because of the highlighted 3 fields. After I took the 3 fields out of the alert sample, the alert was able to send to MSTeams channel.  What shall I update to remove these 3 fields Or what shall I modify the current template??? Thanks!

zz@ZhangsodMacBook Prometheus-MSTeams % curl -X POST -d @sample.json http://localhost:2000/testChannel

json: cannot unmarshal string into Go struct field Message.commonLabels of type template.KV%                                                       zz@ZhangsodMacBook Prometheus-MSTeams % curl -X POST -d @sample.json http://localhost:2000/testChannel

json: cannot unmarshal string into Go struct field Message.commonAnnotations of type template.KV%                                                  zz@ZhangsodMacBook Prometheus-MSTeams % curl -X POST -d @sample.json http://localhost:2000/testChannel

json: cannot unmarshal string into Go struct field Message.commonAnnotations of type template.KV%     


Below is my card.tmpl
{{ define "teams.card" }}
{
  "@type": "MessageCard",
  "themeColor": "{{- if eq .Status "resolved" -}}2DC72D
                 {{- else if eq .Status "firing" -}}
                    {{- if eq .CommonLabels.severity "critical" -}}8C1A1A
                    {{- else if eq .CommonLabels.severity "warning" -}}FFA500
                    {{- else -}}808080{{- end -}}
                 {{- else -}}808080{{- end -}}",
  "summary": "Prometheus Alerts",
  "title": "Prometheus Alert ({{ .Status }})",
  "sections": [ {{$externalUrl := .ExternalURL}}
  {{- range $index, $alert := .Alerts }}{{- if $index }},{{- end }}
    { 
      "facts": [
        {{- range $key, $value := $alert.Annotations }}
        {
          "name": "{{ reReplaceAll "_" "\\\\_" $key }}",
          "value": "{{ reReplaceAll "_" "\\\\_" $value }}"
        },
        {{- end -}}
        {{$c := counter}}{{ range $key, $value := $alert.Labels }}{{if call $c}},{{ end }}
        {
          "name": "{{ reReplaceAll "_" "\\\\_" $key }}",
          "value": "{{ reReplaceAll "_" "\\\\_" $value }}"
        }
        {{- end }}
      ],
      "markdown": true
    }
    {{- end }}
  ]
}
{{ end }}


However, there was a parsing issue with the template above. Below is an alert triggered for testing:
{
   "receiver":"prometheus-snow",
   "status":"firing",
   "alerts":[
      {
         "status":"firing",
         "labels":{
            "alertname":"KubeControllerManagerDown",
            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "geo":"us",
            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "severity":"critical"
         },
         "annotations":{
            "message":"KubeControllerManager has disappeared from Prometheus target discovery.",
         },
         "startsAt":"2020-06-02T06:56:55.479Z",
         "endsAt":"0001-01-01T00:00:00Z",
         "fingerprint":"246a26f7e7ce2afc"
      }
   ],
   "groupLabels":"",
   "commonLabels":"",
   "commonAnnotations":"",
   "version":"4",
   "groupKey":"{}:{alertname=\"KubeControllerManagerDown\"}",
   "groupLabels_alertname":"KubeControllerManagerDown",
   "commonLabels_alertname":"KubeControllerManagerDown",
   "commonLabels_cluster":"espr-aksepme-dev-westus-cluster-01",
   "commonLabels_geo":"us",
   "commonLabels_prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
   "commonLabels_region":"westus",
   "commonLabels_severity":"critical",
   "commonAnnotations_message":"KubeControllerManager has disappeared from Prometheus target discovery.",
}


Brian Candler

unread,
Jun 3, 2020, 6:00:35 AM6/3/20
to Prometheus Users
json: cannot unmarshal string into Go struct field Message.commonLabels of type template.KV

means: "I tried to unpack a template.KV object, but you gave me a string instead".  First try:

"groupLabels": {},
"commonLabels": {},
"commonAnnotations": {},

and if that works, you can change it to

"groupLabels": {
  "alertname":"KubeControllerManagerDown",
  .. etc
},

Zhang Zhao

unread,
Jun 3, 2020, 6:13:44 AM6/3/20
to Brian Candler, Prometheus Users
Hi Brian,
After I removed the 3 fields in the sample alert, it worked. The alert was able to be sent to MSTeams. But just modifying the alert is not a solution as all the alerts generated from the AlertManager still contains these 3 fields. What shall I modify in alert manager.yaml config? Or is there a way to remove these 3 field from all alerts?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/50c98d18-fd13-4580-91df-55b2ff235970%40googlegroups.com.

Zhang Zhao

unread,
Jun 3, 2020, 6:19:00 AM6/3/20
to Brian Candler, Prometheus Users
Brian,
I just tested your suggestion as well. After I modified the fields like this below, it worked. What config of AlertManager shall I update to apply this to all alerts?

"groupLabels":{},
"commonLabels":{},
"commonAnnotations":{},


Zhang

On Jun 3, 2020, at 3:00 AM, Brian Candler <b.ca...@pobox.com> wrote:

Brian Candler

unread,
Jun 3, 2020, 7:59:36 AM6/3/20
to Prometheus Users
It's nothing to do with alertmanager, because alertmanager does not support any sort of templating for webhooks.  You haven't shown your alertmanager config, but I presume it's something like this:

receivers:
- name: 'prometheus-msteams'
  webhook_configs:
  - send_resolved: true
    url: 'http://localhost:2000/testChannel

So, alertmanager sends a fixed-format JSON message to this destination.  If you have a problem with the data after this point, it must be down to the prometheus-msteams and its use of that template you showed.

Zhang Zhao

unread,
Jun 3, 2020, 12:26:13 PM6/3/20
to Brian Candler, Prometheus Users
Yes. Below is my Alertmanagers config. So it’s sth. to do with the template. Any advice what shall I change in the template below?

global:
  resolve_timeout: 5m
receivers:
- name: prometheus-msteams
  webhook_configs:
    send_resolved: true
route:
  receiver: prometheus-msteams 
  group_by: ['alertname’]


_______________
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Brian Candler

unread,
Jun 3, 2020, 1:05:28 PM6/3/20
to Prometheus Users
There's nothing you can change with that webhook configuration.

I don't understand how you ever got some JSON containing

   "groupLabels":"",
   "commonLabels":"",
   "commonAnnotations":"",

Are you saying this was something posted from Alertmanager to your app listening on port 2000? Or from that app to MS Teams?  How did you capture it?

What versions of prometheus and alertmanager do you have installed?

Zhang Zhao

unread,
Jun 3, 2020, 1:53:07 PM6/3/20
to Brian Candler, Prometheus Users
I am not sure why every single alerts fired contain these 3 fields. I used stable/prometheus-operator helm chart to deploy Prometheus components. And I upgraded to the latest version v2.18.1. The alerts are supposed to be sending to ServiceNow and MSTeams at the same time. Currently it works on SNOW side. I captured the alerts on SNOW and I was using the alerts captured on SNOW like an example below for manual testing via CURL -X POST -d @sample.json http://localhost:2000/testChannel, where I found the parsing problem because of these 3 fields. Currently it only worked after I manually removed the 3 fields highlighted below and ran the CURL command.

"groupLabels":"",
"commonLabels":"",
"commonAnnotations":"",


{
   "receiver":"prometheus-snow",
   "status":"firing",
   "alerts":[
      {
         "status":"firing",
         "labels":{
            "alertname":"TargetDown",

            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "geo":"us",
            "job":"kubelet",
            "namespace":"kube-system",

            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-operator-kubelet",
            "severity":"warning"
         
},
         "annotations":{
            "message":"100% of the kubelet/prometheus-operator-kubelet targets in kube-system namespace are down."
         
},
         "startsAt":"2020-06-02T06:51:40.558Z",

         "endsAt":"0001-01-01T00:00:00Z",
      
},
      {
         "status":"firing",
         "labels":{
            "alertname":"TargetDown",

            "cluster":"espr-aksepme-dev-westus-cluster-01",
            "geo":"us",
            "job":"kubelet",
            "namespace":"kube-system",

            "prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
            "region":"westus",
            "service":"prometheus-prometheus-oper-kubelet",
            "severity":"warning"
         
},
         "annotations":{
            "message":"100% of the kubelet/prometheus-prometheus-oper-kubelet targets in kube-system namespace are down."
         
},
         "startsAt":"2020-06-02T06:52:10.558Z",

         "endsAt":"0001-01-01T00:00:00Z",

   "groupLabels":"",
   "commonLabels":"",
   "commonAnnotations":"",

   "groupKey":"{}:{alertname=\"TargetDown\"}",
   "groupLabels_alertname":"TargetDown",
   "commonLabels_alertname":"TargetDown",

   "commonLabels_cluster":"espr-aksepme-dev-westus-cluster-01",
   "commonLabels_geo":"us",
   "commonLabels_job":"kubelet",
   "commonLabels_namespace":"kube-system",

   "commonLabels_prometheus":"espr-prometheus-nonprod/prometheus-prometheus-oper-prometheus",
   "commonLabels_region":"westus",
   "commonLabels_severity":"warning"
}

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Zhang Zhao

unread,
Jun 4, 2020, 3:26:35 PM6/4/20
to Brian Candler, Prometheus Users
Hi Brian,
There were bunch of alerts fired on AlertManager but I didn’t see any of them were sent to MSTeams pod for processing. Any advice? I don’t see any log update for prometheus-msteams-678cf9ddc8-b79xq while the alerts were fired on Prometheus. Could you please advice where was wrong?
zz@ZhangsodMacBook Prometheus-MSTeams % kubectl get po
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0   2/2     Running   0          44h
prometheus-grafana-9c786ff96-6wf6w                       2/2     Running   0          44h
prometheus-kube-state-metrics-58d48d86bc-tq8tk           1/1     Running   0          44h
prometheus-msteams-678cf9ddc8-b79xq                      1/1     Running   0          33h
prometheus-prometheus-node-exporter-bp6xb                1/1     Running   0          44h
prometheus-prometheus-oper-operator-8bd46d8cd-kxr8k      2/2     Running   0          44h
prometheus-prometheus-prometheus-oper-prometheus-0       3/3     Running   1          44h



This is the Alertmanager.yaml:
global:
  resolve_timeout: 5m
receivers:
- name: prometheus-msteams
  webhook_configs:
    send_resolved: true
route:
  receiver: prometheus-msteams 
  group_by: ['alertname’]


This is the values.yaml for deploying MSTeams by helm chart.
replicaCount: 1
image:
  tag: v1.3.5

connectors:
# in alertmanager, this will be used as http://prometheus-msteams:2000/testChannel















--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Brian Candler

unread,
Jun 4, 2020, 5:46:03 PM6/4/20
to Prometheus Users
Sorry I can't help you further: if alerts are arriving at alertmanager and being forwarded to ServiceNow successfully, then it sounds like a problem with the third-party package prometheus-msteams or its configuration.

Have you tried looking to see if it generates any logs when processing / forwarding an alert?  Otherwise you may need to look into the source code and get it to print some extra debugging.

Zhang Zhao

unread,
Jun 4, 2020, 6:17:27 PM6/4/20
to Brian Candler, Prometheus Users
The problem is that I didn’t see any log recored when the alert was fired on AlertManager.. Seems the alert was not sent to prometheus-steams at all.. When I tested the alert manually I was able to see log recorded on prometheus-steams sth. like below…. But when the alert was fired on AlertManager, I didn’t see anything happen in the log on prometheus-steams...


{"caller":"transport.go:81","err":"json: cannot unmarshal string into Go struct field Message.groupLabels of type template.KV","ts":"2020-06-04T18:59:54.515883159Z"}
{"caller":"transport.go:52","host":"localhost:2000","method":"POST","status":500,"took":"398.26772ms","ts":"2020-06-04T18:59:54.617527149Z","uri":"/testChannel"}


On Jun 4, 2020, at 2:46 PM, Brian Candler <b.ca...@pobox.com> wrote:

Sorry I can't help you further: if alerts are arriving at alertmanager and being forwarded to ServiceNow successfully, then it sounds like a problem with the third-party package prometheus-msteams or its configuration.

Have you tried looking to see if it generates any logs when processing / forwarding an alert?  Otherwise you may need to look into the source code and get it to print some extra debugging.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

Brian Candler

unread,
Jun 5, 2020, 2:20:40 AM6/5/20
to Prometheus Users
What about logs from alertmanager itself?  Add "--log.level=debug" to alertmanager's command line args to get maximum debugging.

You could also try using tcpdump to see if a message is sent from alertmanager to prometheus-msteams:

tcpdump -i lo -nn -s0 -X tcp port 2000

Also you mentioned helm charts in passing, implying kubernetes is being uesd.  The destination "localhost" is only going to work if prometheus-msteams is running in the same pod as alertmanager.  If that's not the case, then you'll have to give it the correct endpoint instead of "localhost".

Zhang Zhao

unread,
Jun 5, 2020, 2:39:31 AM6/5/20
to Brian Candler, Prometheus Users
Hi Brian,
The issue was resolved. It was sth. wrong in the Alertmanager configuration. There were 2 routes. One goes to ServiceNow and the other goes to MSTeam. The alert was only fed to ServiceNow successful. My troubleshooting direction was wrong yesterday. I presumed that the Alertmanager.yaml was working on MSTeams as I was able to see alerts on ServiceNow. So I was focusing on manually triggering alerts json file that were fired on SNOW to test the MSTeam functionality. But actually ServiceNow does change the type of the data structure from kv to string even though it says it’s raw json. That’s why I ran into the parsing issue I shared in my first email. I also updated the helm chart to the latest version. After all that. the alert started working. 

Thank you for your advice. Appreciate it.


Zhang  

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages