AlertManager sends email alerts every 4 hours even if 'repeat_interval' is longer

1,583 views
Skip to first unread message

BlueChips23

unread,
Jan 28, 2019, 10:02:03 AM1/28/19
to Prometheus Users
Here's how I defined my Alert Manager settings.

I am monitoring for two alerts - alert_a and alert_b. Each time any of these two alerts are detected, I want to send an alert to both of my receivers - but with different repeat intervals. Here are how my receivers are defined:
1. webhook-receiver: Send alerts to my webhook - with a repeat interval of 10 minutes
2. email-alert: Send email alerts - with a repeat interval of 52 weeks (have a very long repeat interval since we only want to receive the email alert once).

The issue I am having is, even though I defined the email alert to send once (every 52 weeks), it keeps sending email alerts every 4 hours. I haven't defined 4 hours of repeat interval anywhere in my settings (and I suspect AlertManager is picking it up as the default value for some reason). Any ideas what changes should I do to make sure the email alerts are sent exactly ONCE - and not repeated every 4 hours?

Here are the settings in Alertmanager.yml:


global:


      smtp_smarthost
: 'smart_host_address'
      smtp_from
: 'sen...@example.com'


    templates
:
   
- '/etc/alertmanager/email.tmpl'


   
# The primary route on which each incoming alert enters.
    route
:
     
# Default receiver
      receiver
: webhook-receiver


      group_by
: ['namespace', 'alertname']


      group_wait
: 15s


      group_interval
: 15s


     
# If an alert has successfully been sent, wait 'repeat_interval' to
     
# resend them. Very high repeat_interval means we don't want repeat email alerts
      repeat_interval
: 1h


      routes
:
     
- match:
          alertname
: 'alert_a'
        group_by
: ['deployment', 'namespace']
        receiver
: email-alert
        repeat_interval
: 52w
       
continue: true


     
- match:
          alertname
: 'alert_a'
        group_by
: ['deployment', 'namespace']
        receiver
: webhook-receiver
        repeat_interval
: 10m
       
continue: true


     
- match:
          alertname
: 'alert_b'
        group_by
: ['pod', 'namespace']
        receiver
: email-alert
        repeat_interval
: 52w
       
continue: true


     
- match:
          alertname
: 'alert_b'
        group_by
: ['pod', 'namespace']
        receiver
: webhook-receiver
        repeat_interval
: 10m
       
continue: true


    receivers
:
   
- name: 'email-alert'
      email_configs
:
     
- to: <email_address_goes_here>
        html
: '{{ template "email.html" . }}'
        send_resolved
: false
        require_tls
: false


   
- name: 'webhook-receiver'
      webhook_configs
:
       
- send_resolved: true
          url
: '<webhook_URL_goes_here>'


Simon Pasquier

unread,
Jan 28, 2019, 10:12:11 AM1/28/19
to BlueChips23, Prometheus Users
Can you check the configuration exposed in the Status page of the UI?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a9b6b9e3-425b-4445-8773-4ad79a908347%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Christian Hoffmann

unread,
Jan 28, 2019, 11:01:27 AM1/28/19
to promethe...@googlegroups.com, BlueChips23
Hi,

try checking the alertmanager command line flag --data.retention. Its value needs to be larger than the largest repeat_interval.

Kind regards,
Christian

Am 28. Januar 2019 16:02:02 MEZ schrieb BlueChips23 <bluech...@gmail.com>:
>Here's how I defined my Alert Manager settings.
>
>I am monitoring for two alerts - *alert_a* and *alert_b*. Each time any
>of
>these two alerts are detected, I want to send an alert to both of my
>receivers - but with different repeat intervals. Here are how my
>receivers
>are defined:
>1. *webhook-receiver*: Send alerts to my webhook - with a repeat
>interval
>of 10 minutes
>2. *email-alert*: Send email alerts - with a repeat interval of 52
Message has been deleted

BlueChips23

unread,
Jan 28, 2019, 2:24:02 PM1/28/19
to Prometheus Users
@christian - how do I check what my --data.retention value is? I don't specify it in the command line args, so I'm guessing it's picking up the default value (not sure what the default is).

@Simon - here's my settings from AlertManager UI:

global:
  resolve_timeout: 5m
  http_config: {}
  smtp_from: <email_address>
  smtp_hello: localhost
  smtp_smarthost: <smart_smtphost_address>
  smtp_require_tls: true
route:
  receiver: webhook-receiver
  group_by:
  - namespace
  - alertname
  routes:
  - receiver: email-receiver
    group_by:
    - deployment
    - namespace
    match:
      alertname: alert_a
    continue: true
    repeat_interval: 52w
  - receiver: webhook-receiver
    group_by:
    - deployment
    - namespace
    match:
      alertname: alert_a
    continue: true
    repeat_interval: 10m
  - receiver: email-receiver
    group_by:
    - pod
    - namespace
    match:
      alertname: alert_b
    continue: true
    repeat_interval: 52w
  - receiver: webhook-receiver
    group_by:
    - pod
    - namespace
    match:
      alertname: alert_b
    continue: true
    repeat_interval: 10m
  group_wait: 15s
  group_interval: 15s
  repeat_interval: 1h
receivers:
- name: email-receiver
  email_configs:
  - send_resolved: false
    to: <email_address>
    from: <sender_email_address>
    hello: localhost
    smarthost: <smart_smtphost_address>
    headers:
      From: <sender_email_address>
      Subject: '{{ template "email.default.subject" . }}'
      To: <email_address>
    html: '{{ template "email.alerting.html" . }}'
    require_tls: false
- name: webhook-receiver
  webhook_configs:
  - send_resolved: true
    http_config: {}
    url: <webhook_url_goes_here>
templates:
- /etc/alertmanager/email.tmpl

Christian Hoffmann

unread,
Jan 28, 2019, 2:37:38 PM1/28/19
to BlueChips23, Prometheus Users
On 1/28/19 8:24 PM, BlueChips23 wrote:
> @christian - how do I check what my --data.retention value is? I don't
> specify it in the command line args, so I'm guessing it's picking up the
> default value (not sure what the default is).

Ok, then you should be fine -- the default will be sufficient. It's 120
hours:

$ alertmanager -h |& grep -A1 retention
-data.retention duration
How long to keep data for. (default 120h0m0s)


Kind regards,
Christian

BlueChips23

unread,
Jan 28, 2019, 2:50:32 PM1/28/19
to Prometheus Users
Do you think 52 weeks is too long for AlertManager? I reduced it to 52 hours and seems like the repeat emails have stopped. But I am curious what's the maximum repeat_interval I can set.

Simon Pasquier

unread,
Feb 11, 2019, 8:33:50 AM2/11/19
to BlueChips23, Prometheus Users
I failed to reproduce locally (though I didn't use the email receiver)...
Which version of AlertManager are you using?
Are you running several AlertManager instances with clustering?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/fa0957c5-ae2c-475f-af06-a767e52a7660%40googlegroups.com.

BlueChips23

unread,
Feb 11, 2019, 11:53:51 AM2/11/19
to Prometheus Users
I'm using AlertManager v0.15.3 and it's the only instance of AM running in my cluster. I only use two receivers - one for webhook (which I set a repeat interval for 10 minutes), and the second one for email, which repeats the alerts every 4 hours (even if I set the repeat interval for 52 weeks).

My default -data.retention duration for AM is 120 hours. In one test when I increased the repeat_interval to 100 hours (less than data.retention duration), I was still able to receive email alerts after 100 hours. I haven't tried extending the repeat_interval to 52 weeks yet. 

I wish AlertManager had a config to turn off repeat intervals for some specific receivers. My team has a backend application which receives repeat alerts from the webhook every 10 minutes, but we just want to get the email once immediately when something goes wrong - and not have it repeat every X hours.
Reply all
Reply to author
Forward
0 new messages