Alertmanager not firing when alert is active

94 views
Skip to first unread message

Danny de Waard

unread,
Apr 7, 2020, 4:51:58 AM4/7/20
to Prometheus Users
I'm having som troubles setting up the alertmanager.

I have set up a rules file in prometheus (see blow) and a setting file for alertmanager (aslo below)
In Alertmanager i see the active alert for swapusage java

instance="lsrv0008"
1 alert
  • 06:49:37, 2020-04-07 (UTC)SourceSilence
    alertname="swap_usage_java_high"
    application="java"
    exportertype="node_exporter"
    host="lsrv0008"
    job="PROD"
    monitor="codelab-monitor"
    quantity="kB"
    severity="warning"
But the mail is not send by alertmanager…. what am i missing?

Prometheus rules file
groups:
- name: targets
  rules:
  - alert: monitor_service_down
    expr: up == 0
    for: 40s
    labels:
      severity: critical
    annotations:
      summary: "Monitor service non-operational"
      description: "Service {{ $labels.instance }} is down."
  - alert: server_down
    expr: probe_success == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Server is down (no probes are up)"
      description: "Server {{ $labels.instance }} is down."
  - alert: loadbalancer_down
    expr: loadbalancer_stats < 1
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "A loadbalancer is down"
      description: "Loadbalancer for {{ $labels.instance }} is down."
- name: host
  rules:
  - alert: high_cpu_load1
    expr: node_load1 > 8.0
    for: 300s
    labels:
      severity: warning
    annotations:
      summary: "Server under high load (load 1m) for 5 minutes"
      description: "Host is under high load, the avg load 1m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
  - alert: high_cpu_load5
    expr: node_load5 > 5.0
    for: 600s
    labels:
      severity: warning
    annotations:
      summary: "Server under high load (load 5m) for 10 minutes."
      description: "Host is under high load, the avg load 5m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
  - alert: high_cpu_load15
    expr: node_load15 > 4.5
    for: 900s
    labels:
      severity: critical
    annotations:
      summary: "Server under high load (load 15m) for 15 minutes."
      description: "Host is under high load, the avg load 15m is at {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
  - alert: high_volume_workers_prod
    expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 325
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Number of workers above 325 for 30s"
      description: "The Apache workers are over 325 for 30s. Current value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
  - alert: medium_volume_workers_prod
    expr: sum(apache_workers{job="Apache PROD"}) by (instance) > 300
    for: 30s
    labels:
      severity: warning
    annotations:
      summary: "Number of workers above 300 for 30s"
      description: "The Apache workers are over 300 for 30s. Current value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."
  - alert: swap_usage_java_high
    expr: swapusage_stats{application="java"} > 500000
    for: 300s
    labels:
      severity: warning
    annotations:
      summary: "Swap usage for Java is high for the last 5 minutes"
      description: "The swap usage for the java process are hig. Current value is {{ $value}}. Reported by instance {{ $labels.instance }} of job {{ $labels.job }}."



Alertmanager setupfile
global:
  resolve_timeout
: 5m
  http_config
: {}
  smtp_from
: alertmanager@example.com
  smtp_hello
: localhost
  smtp_smarthost
: localhost:25
  smtp_require_tls
: true
  pagerduty_url
: https://events.pagerduty.com/v2/enqueue
  hipchat_api_url
: https://api.hipchat.com/
  opsgenie_api_url
: https://api.opsgenie.com/
  wechat_api_url
: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url
: https://alert.victorops.com/integrations/generic/20131114/alert/
route
:
  receiver
: default
  group_by
:
 
- instance
  routes
:
 
- receiver: mail
    match
:
      severity
: warning
 
- receiver: all
    match
:
      severity
: critical
  group_wait
: 1s
  group_interval
: 1s
receivers
:
- name: default
- name: mail
  email_configs
:
 
- send_resolved: true
    to
: somemail@mail.nl
   
from: alertmanager@example.com
    hello
: localhost
    smarthost
: localhost:25
    headers
:
     
From: alertmanager@example.com
     
Subject: '{{ template "email.default.subject" . }}'
     
To: somemail@mail.nl
    html
: '{{ template "email.default.html" . }}'
    require_tls
: false
- name: all
  email_configs
:
 
- send_resolved: true
    to
: fm.nl.itn.dis.cdi.dld.superheroes@rabobank.nl
   
from: alertmanager@example.com
    hello
: localhost
    smarthost
: localhost:25
    headers
:
     
From: alertmanager@example.com
     
Subject: '{{ template "email.default.subject" . }}'
     
To: mymail@mail.nl
    html
: '{{ template "email.default.html" . }}'
    require_tls
: false
 
- send_resolved: true
    to
: mynumber@mysms.nl
   
from: alertmanager@example.com
    hello
: localhost
    smarthost
: localhost:25
    headers
:
     
From: alertmanager@example.com
     
Subject: '{{ template "email.default.subject" . }}'
     
To: mynumber@mysms.nl
    html
: '{{ template "email.default.html" . }}'
    require_tls
: false
- name: webhook
  webhook_configs
:
 
- send_resolved: true
    http_config
: {}
    url
: http://127.0.0.1:9000
templates
: []


Matthias Rampke

unread,
Apr 7, 2020, 12:00:59 PM4/7/20
to Danny de Waard, Prometheus Users
What do the alertmanager logs say? If you don't see anything, increase verbosity until you can see Alertmanager receiving the alert and trying to send the notification. At sufficient verbosity, you should be able to trace exactly what it is trying and/or failing to do.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7cbb3a17-bf66-4530-9d2c-344549c5cbb3%40googlegroups.com.

Danny de Waard

unread,
Apr 8, 2020, 1:11:12 AM4/8/20
to Prometheus Users
Okay i think i got some log. Just not sure what it means....

level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:104 component=dispatcher msg="Received alert" alert=swap_usage_java_high[d346adb][active]
level=debug ts=2020-04-08T05:08:37.628Z caller=dispatch.go:432 component=dispatcher aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:38.630Z caller=dispatch.go:432 component=dispatcher aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:39.630Z caller=dispatch.go:432 component=dispatcher aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing alerts=[swap_usage_java_high[d346adb][active]]
level=debug ts=2020-04-08T05:08:40.630Z caller=dispatch.go:432 component=dispatcher aggrGroup="{}/{severity=\"warning\"}:{instance=\"lsrv0008\"}" msg=flushing alerts=[swap_usage_java_high[d346adb][active]]
and this last line keeps comming.

Op dinsdag 7 april 2020 18:00:59 UTC+2 schreef Matthias Rampke:
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Danny de Waard

unread,
Apr 8, 2020, 5:37:00 AM4/8/20
to Prometheus Users
Okay.

Did some digging on the internet, changed my yml file and it works now.
For as far as i can see ;)

global:
route:
  group_by: [instance,severity]
  receiver: 'default'
  routes:
   - match:
      severity: warning
     receiver: 'mail'
   - match:
      severity: critical
     receiver: 'all'
receivers:
  - name: 'default'
    email_configs:
     - to: 'mym...@rabobank.nl' ##fill in your email
       from: 'alertmanag...@sender.com'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'mail'
    email_configs:
     - to: 'grou...@mail.nl' ##fill in your email
       from: 'alertman...@sender.com'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'all'
    email_configs:
     - to: 'grou...@mail.nl' ##fill in your email
       from: 'alertman...@sender.com'
       smarthost: 'localhost:25'
       require_tls: false
  - name: 'webhook'
    webhook_configs:
      - url: 'http://127.0.0.1:9000'

Now there are some things left that i need to figure out like: Sending to multiple email adresses (or recievers) and using the webhook in a correct way (for instance if node is down then webhook with parameters)

Op woensdag 8 april 2020 07:11:12 UTC+2 schreef Danny de Waard:
Reply all
Reply to author
Forward
0 new messages