How to deal with over alerting in alertmanager? How to get only specific notifications via Email?

601 views
Skip to first unread message

kishor k

unread,
Jul 23, 2021, 3:52:09 AM7/23/21
to Prometheus Users

I must admit that I am a beginner in Prometheus operator.
I have deployed "kube-prometheus-stack-14.5.0" Helm chart on the Kubernetes cluster.
Hence the version of Alertmanager is also deployed together.( "alertmanager:v0.21.0")
https://github.com/prometheus-community/helm-charts/releases/tag/kube-prometheus-stack-14.5.0

It all works fine except alertmanager sends too much notification for each activity on the cluster.
Feel free to correct me if I am wrong:
After debugging I realized that there are so many Rules configured in the Prometheus Rules file, therefore, Alertmanager is triggering an alert for each rule.

Rules are configured under Prometheus deployment -->
/etc/prometheus/rules/prometheus-prometheus-kube-prometheus-prometheus-rulefiles-0 $ ls -la
monitoring-prometheus-kube-prometheus-kubelet.rules.yaml -> ..data
monitoring-prometheus-kube-prometheus-node.rules.yaml -> ..data
etc.......

Is it possible to override these rules?
Is it possible to delete these default rules?

E-mail is configured in Alertmanager and now I get every day more than 1000 emails.
This is definitely over alerting. I want to send notifications only for specific activities on the cluster.
Such as Pod/Applications in pending state or crashloop, or HostHighCpuLoad, HostOutOfDiskSpace, etc, etc.
In a simple way, I want to send notifications to 3 different people(Developer, Tester, and Teamlead)

Also,
Is it possible to write my own custom rule file?
If yes, then where I can configure rules in values.yaml file.
or How can I deploy my own rule files together with Helm chart?

Alertmanager config:

alertmanager:
  enabled: true
  config:
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.devops.logix.net:25'
      smtp_from: 'nor...@logix.com'
      smtp_require_tls: false
    route:
      group_by: ['job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h
      receiver: 'null'
      routes:
      - match:
          namespace: fbk-r4-dev
        receiver: kkot  
      - match:
          namespace: fbk-dev
        receiver: kkot
    receivers:
    - name: 'kkot'
      email_configs:
      - to: 'kisho...@logix.com'
        require_tls: false
    templates:
    - '/etc/alertmanager/config/*.tmpl'

I tried to describe the issue in all possible ways.

Looking for support since I am really struggling for a couple of days.
Thanks in advance.

alert email spam.PNG
Over alerting.PNG

Ian Billett

unread,
Jul 26, 2021, 2:31:29 PM7/26/21
to kishor k, Prometheus Users
Hey Kishor,

If you are deploying the kube-prometheus-stack helm chart, your best bet is to direct your questions to that project specifically, this mailing list is for the prometheus server itself.

Also, if I may offer some advice - try and keep your questions in these mailing lists focused and concise! I like the amount of information you provided, but do keep in mind that these mailing lists are maintained by volunteers and are less likely to read & respond to long messages like yours above.

Best,

Ian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/479631da-d198-422d-b53e-fe88f34f3135n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages