trying prometheus. alertmanager wont alert same condition twice

pixel fairy

unread,

Sep 13, 2017, 12:32:28 PM9/13/17

to Prometheus Users

trying out prometheus in vagrant. when setting up the prometheus server and a target node called "web" everything seems fine in the prometheus web ui, alertmanager ui, and node_exporter/metrics. then, in web, ill stop the node_exporter and wait for the alert email which takes a while. then i start node_exporter back up long enough for prometheus to see its up again and back down again, but even with the alert state in "firing" ill never get the second email without triggering a different alert first. the same happens when stopping node_exporter on the monitor host.

host ubuntu 16.04

alertmanager-0.8.0

node_exporter-0.14.0

prometheus-1.7.1

prometheus.yml

global:
  scrape_interval: 15s
  external_labels:
    monitor: 'codelab-monitor'

scrape_configs:

  - job_name: 'watchingself'
    scrape_interval: 5s
    static_configs:
      - targets:
        - 10.45.14.11:9090
        - 10.45.14.11:9100
          
  - job_name: 'web'
    scrape_interval: 5s
    static_configs:
      - targets:
        - 10.45.14.10:9100

rule_files:
  - /opt/prometheus/conf/prometheus/alert.rules

alert.rules

ALERT NodeDown
  IF up == 0 
  FOR 15s
  ANNOTATIONS {
    summary = "node {{ $labels.instance }} of job {{ $labels.job }} down",
    description = "down for 15 seconds",
    severity = "pager"
  }

alertmanager.yml

global:
  smtp_smarthost: 'localhost:25'
  smtp_from: root@prom
  smtp_require_tls: false

route:
  receiver: email-alert

receivers:
  - name: email-alert
    email_configs:
    - to: vagrant@prom

Brian Brazil

unread,

Sep 13, 2017, 1:10:10 PM9/13/17

to pixel fairy, Prometheus Users

On 13 September 2017 at 17:32, pixel fairy <pixel...@gmail.com> wrote:

trying out prometheus in vagrant. when setting up the prometheus server and a target node called "web" everything seems fine in the prometheus web ui, alertmanager ui, and node_exporter/metrics. then, in web, ill stop the node_exporter and wait for the alert email which takes a while. then i start node_exporter back up long enough for prometheus to see its up again and back down again,

but even with the alert state in "firing" ill never get the second email without triggering a different alert first.

Did you wait at least 5 minutes? This is the default group_interval, which is the primary control to throttle spammy notifications.

Brian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/28087773-79c2-4832-a6e4-e03a2aed7bf1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

pixel fairy

unread,

Sep 13, 2017, 7:44:41 PM9/13/17

to Prometheus Users

On Wednesday, September 13, 2017 at 10:10:10 AM UTC-7, Brian Brazil wrote:

On 13 September 2017 at 17:32, pixel fairy <pixel...@gmail.com> wrote:
trying out prometheus in vagrant. when setting up the prometheus server and a target node called "web" everything seems fine in the prometheus web ui, alertmanager ui, and node_exporter/metrics. then, in web, ill stop the node_exporter and wait for the alert email which takes a while. then i start node_exporter back up long enough for prometheus to see its up again and back down again,

but even with the alert state in "firing" ill never get the second email without triggering a different alert first.

Did you wait at least 5 minutes? This is the default group_interval, which is the primary control to throttle spammy notifications.

Thanks! setting that, group_wait, and repeat_interval to a few seconds worked great for testing.

Reply all

Reply to author

Forward