Control repeated alerts.

949 views
Skip to first unread message

rajkiran....@tekinvaderz.com

unread,
Sep 25, 2017, 3:27:52 PM9/25/17
to Prometheus Users
Hello Team,

I am able to configure alerts for Host level metrics and get alerts. I am getting alerts every 5 minutes. So, i want to control repetition of alerts, from alertmanager. I am sending alerts through email.

Can you please suggest we a way that alertmanager sends alerts only for every 4 hours. So that users will not get irritated with emails. 

My alert is 

ALERT ContainerGroupMMissingMembers
  IF count(rate(container_last_seen{name=~".+"}[5m])) by (instance, name) < 1
  FOR 1m
  ANNOTATIONS {
      summary = "Container group is missing with '{{ $labels.name }}' on '{{ $labels.instance }}'",
      description = "{{ $labels.name }} is missing containers. Container count is {{ $value }}.",
  }

My prometheus config is

global:
  scrape_interval: 60s
  scrape_timeout: 10s
  evaluation_interval: 60s
  external_labels:
      datacenter: "xxxxxx"

rule_files:
  - "/path/rules/*.rules"

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - "target:port"


My alertmanager config is 

global:
  resolve_timeout: 1m
  hipchat_auth_token: 'xxxxxxxxxxxxxxxxxxxxxxx'
  hipchat_url: 'url'

templates:
- '/opt/promstack/etc/alertmanager/template/*.tmpl'

route:
  receiver: team-hipchat
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: email
  routes:
    - receiver: email
    
receivers:

- name: hipchat
  hipchat_configs:
  - auth_token: '*********************'
    room_id: ***
    message: "Description: {{ range .Alerts }}{{ .Annotations.description }}\nsummary: {{ .CommonAnnotations.summary }}"
    notify: true

- name: email
  email_configs:
  - to: "email"
    from: "fromemail"
    smarthost: localhost:25
    require_tls: false


please let me know.

Thank you,

Raj Kiran.

alin.si...@amanaadvisors.com

unread,
Sep 26, 2017, 9:23:36 AM9/26/17
to Prometheus Users
Look into group_interval at https://prometheus.io/docs/alerting/configuration/

Seems like you are getting alerts with different labels every 5 minutes and that's what group_interval is supposed to limit. If Alertmanager only saw the same labels  you'd only get emails once every 4 hours.

So either increase your group_interval or, even better, filter out some of the labels that you don't care about (by explicitly setting them to "" in your alert (next to where you set the annotations) so you only get emailed when e.g. a new job starts having this issue.

Cheers,
Alin.
Reply all
Reply to author
Forward
0 new messages