Want to receive only specific alerts in teams channel

45 views
Skip to first unread message

Sampada Thorat

unread,
Feb 26, 2023, 9:53:26 AM2/26/23
to Prometheus Users
Hello Everyone,

I want to receive Alerts for  'HostOutOfDiskSpace','HostHighCpuLoad','HostOutOfMemory','KubeNodeNotReady' alertnames in "elevate_alerts" channel and rest all other alerts in " default_receiver_test" channel. But for the below configuration, I'm getting all the alerts in   "elevate_alerts" only.

This is my ConfigMap:

apiVersion: v1
data:
  connectors.yaml: |
    connectors:
      - test: https://sasoffice365.webhook.office.com/webhookb2/d2415be1-2360-49c3-af48-7baf41aa1371@b1c14d5c-3625-45b3-a430-9552373a0c2f/IncomingWebhook/c7c62c1315d24c1fb5d1c731d2467dc6/5c8c1e6c-e827-4114-a893-9a1788ad41b5
      - alertmanager: https://sasoffice365.webhook.office.com/webhookb2/a7cb86de-1543-4e6d-b927-387c1f1e35ad@b1c14d5c-3625-45b3-a430-9552373a0c2f/IncomingWebhook/687a7973ffe248d081f58d94a090fb4c/05be66ae-90eb-42f5-8e0c-9c10975012ca
kind: ConfigMap
metadata:
  annotations:
    meta.helm.sh/release-name: prometheus-msteams
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2023-02-26T12:33:36Z"
  labels:
    app.kubernetes.io/managed-by: Helm
  name: prometheus-msteams-config
  namespace: monitoring
  resourceVersion: "18040490"
  uid: 795c96d5-8318-4885-804f-71bba707c885


This is my alertmanager.yaml:

global:
  resolve_timeout: 5m
receivers:
- name: elevate_alerts
  webhook_configs:
  - url: "http://prometheus-msteams.default.svc.cluster.local:2000/alertmanager"
    send_resolved: true
- name: default_receiver_test
  webhook_configs:
  - url: "http://prometheus-msteams.default.svc.cluster.local:2000/test"
    send_resolved: true
route:
  group_by:
  - alertname
  - severity
  group_interval: 5m
  group_wait: 30s
  repeat_interval: 3h
  receiver: default_receiver_test
  routes:
  - matchers:
      alertname:['HostOutOfDiskSpace','HostHighCpuLoad','HostOutOfMemory','KubeNodeNotReady']
    receiver: elevate_alerts

Please help



Brian Candler

unread,
Feb 27, 2023, 5:21:33 AM2/27/23
to Prometheus Users
>   routes:
>   - matchers:
>       alertname:['HostOutOfDiskSpace','HostHighCpuLoad','HostOutOfMemory','KubeNodeNotReady']

That's invalid: alertmanager should not even start.  I tested your config, and I get the following error:

ts=2023-02-27T10:17:54.702Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=tmp.yaml err="yaml: unmarshal errors:\n  line 22: cannot unmarshal !!str `alertna...` into []string"

'matchers' is a list of strings, not a map.  This should work:

route:
  routes:
  - matchers:
    - alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
  receiver: elevate_alerts

See:

Sampada Thorat

unread,
Feb 27, 2023, 8:22:04 AM2/27/23
to Prometheus Users
Hello Brian I tried your change yet my alertmanager isn't taking config changes and shows older config. Can u have a look ?

global:
  resolve_timeout: 5m
receivers:
  - name: pdmso_alerts
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/pdmsoalert"

        send_resolved: true
  - name: default_receiver_test
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/test"

        send_resolved: true
route:
  group_by:
    - alertname
    - severity
  group_interval: 5m
  group_wait: 30s
  repeat_interval: 3h
  receiver: default_receiver_test
  routes:
  - matchers:
      alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
  receiver: pdmso_alerts

Sampada Thorat

unread,
Feb 27, 2023, 8:22:45 AM2/27/23
to Prometheus Users
global:
  resolve_timeout: 5m
receivers:
  - name: pdmso_alerts
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/pdmsoalert"
        send_resolved: true
  - name: default_receiver_test
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/test"
        send_resolved: true
route:
  group_by:
    - alertname
    - severity
  group_interval: 5m
  group_wait: 30s
  repeat_interval: 3h
  receiver: default_receiver_test
  routes:
  - matchers:
      alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
  receiver: pdmso_alerts

Brian Candler

unread,
Feb 27, 2023, 10:04:06 AM2/27/23
to Prometheus Users
On Monday, 27 February 2023 at 13:22:04 UTC Sampada Thorat wrote:
Hello Brian I tried your change yet my alertmanager isn't taking config changes and shows older config. Can u have a look ?

You mentioned ConfigMap, which suggests that you are deploying Prometheus on a Kubernetes cluster.  It looks like your problem is primarily with Kubernetes, not Prometheus.

If you deployed Prometheus using one of the various third-party Helm charts, then you could ask on the tracker for that Helm chart.  They might be able to tell you how it's supposed to work if you change the ConfigMap, e.g. whether you're supposed to destroy and recreate the pod manually to pick up the change.

Alternatively, it might be that your config has errors in it, and Alertmanager is sticking with the old config.

I tested the config you posted, by writing it to tmp.yaml and then running a standalone instance of alertmanager by hand:

/opt/alertmanager/alertmanager  --config.file tmp.yaml  --web.listen-address=:19093 --cluster.listen-address="0.0.0.0:19094"

It gave me the following error:

ts=2023-02-27T14:56:01.186Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=tmp.yaml err="yaml: unmarshal errors:\n  line 22: cannot unmarshal !!str `alertna...` into []string\n  line 23: field receiver already set in type config.plain"

(I would expect such errors to appear in pod logs too)

It's complaining that you have duplicate values for the same "receiver" key:

route:
  ...
  receiver: default_receiver_test
  ...
  receiver: pdmso_alerts


This is because you did not indent the second 'receiver:' correctly.  It has to be under the bullet point for the 'routes:'

route:
  receiver: default_receiver_test
  routes:
  - matchers:
      - alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
      ^ dash required here because 'matchers' is a list
    receiver: pdmso_alerts
    ^ should be here, to line up with "matchers" as it's part of the same route (list element under "routes")


Sampada Thorat

unread,
Feb 27, 2023, 12:32:57 PM2/27/23
to Brian Candler, Prometheus Users
I did bullet the second value under the first value as well. 
Yet some minor mistake is occuring, hence it's not reflecting in alertmanager.

Here's my Config:

global:
  resolve_timeout: 5m
receivers:
  - name: pdmso_alerts
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/pdmsoalert"
        send_resolved: true
  - name: default_receiver_test
    webhook_configs:
      - url: "http://prometheus-msteams.monitoring.svc.cluster.local:2000/test"
        send_resolved: true
route:
  group_by:

    - namespace


  group_interval: 5m
  group_wait: 30s
  repeat_interval: 3h

  receiver: default_receiver_test
  routes:
    - matchers:
        - alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"

    receiver: pdmso_alerts
 



Thanks & Regards,


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a183d004-896c-4f7c-88d1-7bed759b1f49n%40googlegroups.com.

Brian Candler

unread,
Feb 27, 2023, 12:45:06 PM2/27/23
to Prometheus Users
Sorry, but this is the last config I am going to test for you.  I think it would be better if you run alertmanager yourself locally, then you can see the errors yourself and correct them.  Or at least, please paste your config into an online YAML validator like yamllint.com - it can highlight structural errors like this one.

The config you just posted gives the following error from alertmanager:

ts=2023-02-27T17:37:25.399Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=tmp.yaml err="yaml: line 21: did not find expected '-' indicator"

Again, this is because you have not lined up "receiver" with "matchers".  Here it is again, this time replacing spaces with asterisks to try to make it 100% clear.

WRONG:

**routes:
****- matchers:
********- alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
****receiver: pdmso_alerts


CORRECT:

**routes:
****- matchers:
********- alertname=~"HostOutOfDiskSpace|HostHighCpuLoad|HostHighCpuLoad|KubeNodeNotReady"
******receiver: pdmso_alerts


The first three lines are the same, but notice the different indentation of the last line: it needs 6 spaces not 4 so the "r" of receiver lines up with the "m" of matchers (they are two keys in the same object).
Reply all
Reply to author
Forward
0 new messages