Simple alert rule logic not working

1,635 views
Skip to first unread message

Unni Sathyarajan

unread,
Nov 1, 2017, 9:25:16 AM11/1/17
to Prometheus Users
Hello Guys, 


Following is alert rule I have created from my instrumented app:

- alert: LastSectionPullTimeExceeded
    expr: (time()- pull_sectionlists{instance="example.org:8000"}) > 2155
    for: 2m
    labels:
      severity: warning
    annotations:
      impact: 'Last pull time of content exceeded by 20mins'
      summary: Last pull time of contents  exceeded by 20mins


pull_sectionlists : returns an epoch time so to convert to current time I have used time()  and it works fine to display the metric in Grafana. 

But to create an alert, the prometheus expression does not work. 

The value of  (time() - pull_sectionlists{instance="example.org:8000") has gone up way above 2155 but still the alerts are not yet triggered and prometheus does not show it as FIRING state.

Any idea why the expression does not work?


Thanks 

Brian Brazil

unread,
Nov 1, 2017, 9:31:02 AM11/1/17
to Unni Sathyarajan, Prometheus Users
What version of Prometheus are you running, and how often are you reloading the config? This might be a bug we're just fixing.
 


Thanks 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5c488f47-dcfc-4fc6-b15d-bcf2825d14db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Unni Sathyarajan

unread,
Nov 1, 2017, 9:34:39 AM11/1/17
to Brian Brazil, Prometheus Users
Hi Brian, 


I am running prometheus 2.0 

prometheus, version 2.0.0-rc.1 (branch: HEAD, revision: 5ab8834befbd92241a88976c790ace7543edcd59)
  build user:       root@1f56dd8b6f7b
  build date:       20171017-12:34:15
  go version:       go1.9.1


Prometheus.yml:  
global:
  scrape_interval:     15s 
  evaluation_interval: 15s 


Thanks

Brian Brazil

unread,
Nov 1, 2017, 9:36:04 AM11/1/17
to Unni Sathyarajan, Prometheus Users
How often are you reloading the config file?

Brian

On 1 November 2017 at 13:34, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Hi Brian, 


I am running prometheus 2.0 

prometheus, version 2.0.0-rc.1 (branch: HEAD, revision: 5ab8834befbd92241a88976c790ace7543edcd59)
  build user:       root@1f56dd8b6f7b
  build date:       20171017-12:34:15
  go version:       go1.9.1


Prometheus.yml:  
global:
  scrape_interval:     15s 
  evaluation_interval: 15s 


Thanks



--

Unni Sathyarajan

unread,
Nov 1, 2017, 9:42:24 AM11/1/17
to Brian Brazil, Prometheus Users

Sorry, where can I find this info? 

I execute the following command to reload my config file  (prometheus.yml) in Prometheus:


and also make sure to check if the rule updates appear on http://prometheus.example.org/alerts - page. 

And this is how I start prometheus: 

./prometheus --storage.tsdb.retention=15d --web.enable-lifecycle --config.file=prometheus.yml





Brian Brazil

unread,
Nov 1, 2017, 9:47:02 AM11/1/17
to Unni Sathyarajan, Prometheus Users
On 1 November 2017 at 13:41, Unni Sathyarajan <unnisa...@gmail.com> wrote:

Sorry, where can I find this info? 

changes(prometheus_config_last_reload_success_timestamp_seconds[5m]) should tell you.

Brian



--

Unni Sathyarajan

unread,
Nov 1, 2017, 9:50:43 AM11/1/17
to Brian Brazil, Prometheus Users
changes(prometheus_config_last_reload_success_timestamp_seconds[5m]) = 0 

Brian Brazil

unread,
Nov 1, 2017, 9:53:56 AM11/1/17
to Unni Sathyarajan, Prometheus Users
What do you see if you graph (time()- pull_sectionlists{instance="example.org:8000"}) > 2155  ?

Brian

On 1 November 2017 at 13:50, Unni Sathyarajan <unnisa...@gmail.com> wrote:
changes(prometheus_config_last_reload_success_timestamp_seconds[5m]) = 0 



--

Unni Sathyarajan

unread,
Nov 1, 2017, 10:01:27 AM11/1/17
to Brian Brazil, Prometheus Users
Inline image 1

You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/57oGnAvpZlw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.

To post to this group, send email to prometheus-users@googlegroups.com.

Brian Brazil

unread,
Nov 1, 2017, 10:06:43 AM11/1/17
to Unni Sathyarajan, Prometheus Users
Can you share your full configuration?

On 1 November 2017 at 14:00, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Inline image 1



--

Unni Sathyarajan

unread,
Nov 1, 2017, 10:23:39 AM11/1/17
to Brian Brazil, Prometheus Users
Hi Brian, 

Sure, please find the configuration details below:

FILE:prometheus.yml
global:
  scrape_interval:     15s 
  evaluation_interval: 15s 
  external_labels:
      monitor: 'codelab-monitor'
 
rule_files:
  - "./alert-rules/dockerhost.yml"
  - "./alert-rules/pub1.yml"
 
alerting:
  alertmanagers:
    - static_configs:
      - targets: ["localhost:9093"]

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'pub1'
    static_configs:
        labels:
         name: 'pub1.example.net'
         env: 'prod'

  - job_name: 'dockerhost'
    static_configs:
      - targets: ['1.2.3.4:4194','1.2.3.4:9100','1.2.3.4:9121']
        labels:
         name: 'dockerhost'
         env: 'test'                                                                                                                                

 
FILE:./alert-rules/pub1.yml
- name: alert.rules
  rules:
  - alert: LastSectionPullTimeExceeded
    expr: round(time()- pull_sectionlists{name="pub1.example.net",instance="pub1.example.net:8000"}) > 2155
    for: 2m
    labels:
      severity: warning
    annotations:
      impact: 'Last pull time content  exceeded by 20mins'
      summary: Last pull time content  exceeded by 20mins

 
 
FILE: alertmanager.yml
global:
templates: 
- '/etc/alertmanager/template/*.tmpl'
route:
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 3h 
  receiver: first-level-notifiers 
  routes:
inhibit_rules:
- source_match:
    severity: 'critical'
  target_match:
    severity: 'warning'
  equal: ['alertname', 'cluster', 'service']


receivers:
- name: 'first-level-notifiers'
  email_configs:
    send_resolved: true
    smarthost: 'ses-endpoint:25'
    auth_username: 'abc'
    auth_password: 'abc'
  - to: 'a...@example.net'
    send_resolved: true
    smarthost: 'ses-endpoint:25'
    auth_username: 'abc'
    auth_password: 'abc124'
  slack_configs:
  - api_url: https://abc.1234./
    send_resolved: true
    channel: '#prometheus-alerts'



I had set up alerts for other server metrics such as CPU, DISK, NETWORK I/O which works fine 


Brian Brazil

unread,
Nov 1, 2017, 10:37:58 AM11/1/17
to Unni Sathyarajan, Prometheus Users
That all looks okay. Can you share the Alerts page on Prometheus?

Brian

On 1 November 2017 at 14:23, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Hi Brian, 




--

Unni Sathyarajan

unread,
Nov 1, 2017, 10:45:11 AM11/1/17
to Brian Brazil, Prometheus Users

Brian Brazil

unread,
Nov 1, 2017, 10:48:36 AM11/1/17
to Unni Sathyarajan, Prometheus Users
This is smelling like a bug. Does this also happen with Prometheus 1.8.1?

Brian
--

Unni Sathyarajan

unread,
Nov 1, 2017, 10:49:46 AM11/1/17
to Brian Brazil, Prometheus Users
I have not tried it yet. 

Unni Sathyarajan

unread,
Nov 1, 2017, 11:07:21 AM11/1/17
to Brian Brazil, Prometheus Users
Is it possible to convert the alert rules from proemtheus 2.0 to prometheus 1.8 compatible syntax using promtool ?

On Wed, Nov 1, 2017 at 6:49 PM, Unni Sathyarajan <unnisa...@gmail.com> wrote:
I have not tried it yet. 

Brian Brazil

unread,
Nov 1, 2017, 11:15:03 AM11/1/17
to Unni Sathyarajan, Prometheus Users
On 1 November 2017 at 15:06, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Is it possible to convert the alert rules from proemtheus 2.0 to prometheus 1.8 compatible syntax using promtool ?

No, it only goes the other direction. But for one rule it shouldn't be difficult to convert by hand.

The conversion is more intended to help those where there's too many rules to convert by hand.

Brian



--

Unni Sathyarajan

unread,
Nov 1, 2017, 11:25:10 AM11/1/17
to Brian Brazil, Prometheus Users
I have tried with prometheus 1.8.0 , same issue. Alert is not getting triggered. :-(

Brian Brazil

unread,
Nov 1, 2017, 11:28:17 AM11/1/17
to Unni Sathyarajan, Prometheus Users
On 1 November 2017 at 15:24, Unni Sathyarajan <unnisa...@gmail.com> wrote:
I have tried with prometheus 1.8.0 , same issue. Alert is not getting triggered. :-(

This is probably an issue on your end then. Can you share the results of pull_sectionlists[10m] in the expression browser table view?



--

Unni Sathyarajan

unread,
Nov 1, 2017, 11:31:59 AM11/1/17
to Brian Brazil, Prometheus Users
Inline image 1

Brian Brazil

unread,
Nov 1, 2017, 11:35:00 AM11/1/17
to Unni Sathyarajan, Prometheus Users
Thanks, no oddities in the raw data.

How about: (time()- pull_sectionlists) > 2155

Brian

On 1 November 2017 at 15:31, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Inline image 1



--

Unni Sathyarajan

unread,
Nov 1, 2017, 11:40:47 AM11/1/17
to Brian Brazil, Prometheus Users
Inline image 1

Brian Brazil

unread,
Nov 1, 2017, 11:47:28 AM11/1/17
to Unni Sathyarajan, Prometheus Users
Hmm, how about: ALERTS

That's a special time series used to record alerts.

Brian

On 1 November 2017 at 15:40, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Inline image 1



--

Unni Sathyarajan

unread,
Nov 1, 2017, 11:53:22 AM11/1/17
to Brian Brazil, Prometheus Users
Inline image 1

Becoming an image editing expert now 😂

Brian Brazil

unread,
Nov 1, 2017, 11:55:50 AM11/1/17
to Unni Sathyarajan, Prometheus Users
I meant what does the expression "ALERTS" show in the expression browser?

If you have any recording rules, are their time series getting updated?

Brian

On 1 November 2017 at 15:52, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Inline image 1

Becoming an image editing expert now 😂



--

Unni Sathyarajan

unread,
Nov 1, 2017, 12:08:19 PM11/1/17
to Brian Brazil, Prometheus Users
I get "No Datapoints Found" error !!

Unni Sathyarajan

unread,
Nov 1, 2017, 12:10:02 PM11/1/17
to Brian Brazil, Prometheus Users
Brian, 

Could you please share a link to know about the "special times series for alerts"

On Wed, Nov 1, 2017 at 8:07 PM, Unni Sathyarajan <unnisa...@gmail.com> wrote:
I get "No Datapoints Found" error !!

Brian Brazil

unread,
Nov 1, 2017, 12:20:57 PM11/1/17
to Unni Sathyarajan, Prometheus Users
On 1 November 2017 at 16:09, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Brian, 

Could you please share a link to know about the "special times series for alerts"



My best guess is that recording rules aren't being evaluated. Does a restart of Prometheus fix this?



--

Unni Sathyarajan

unread,
Nov 2, 2017, 12:50:57 AM11/2/17
to Brian Brazil, Prometheus Users
Yes Brian, I have tried restarting it. It's still the same.

Brian Brazil

unread,
Nov 3, 2017, 2:18:47 AM11/3/17
to Unni Sathyarajan, Prometheus Users
Is the prometheus_rule_evaluation_duration_seconds_count time series increasing?

Brian

On 2 November 2017 at 04:50, Unni Sathyarajan <unnisa...@gmail.com> wrote:
Yes Brian, I have tried restarting it. It's still the same.



--
Reply all
Reply to author
Forward
0 new messages