unable to send alert when disk space running out

Shiuh Rong Yong

unread,

May 21, 2019, 4:45:12 AM5/21/19

to Prometheus Users

Hi everyone,

I have configure the rules to monitor the disk space but recently one of my server disk space is 100%, i did not received anything, following is my config, hope you all can give me some idea.

alert.rules.yml:

groups:
- name: alert.rules
rules:
- alert: EndpointDown
    expr: probe_success == 0
    for: 120s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"

- alert: DiskSpace20%Free
    expr: node_exporter:node_filesystem_free:fs_used_percents >= 80
    labels:
      severity: moderate
    annotations:
      summary: "Instance {{ $labels.instance }} is low on disk space"
      description: "{{ $labels.instance }} has only {{ $value }}% free."

alertmanager.yml:

global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'monit...@gmail.com'
# smtp_require_tls: true
smtp_auth_username: 'monit...@gmail.com'
smtp_auth_password: 'password'
slack_api_url: "slack hook url
resolve_timeout: 3m
route:
group_by: ['instance', 'severity']
group_wait: 30s
group_interval: 3m
repeat_interval: 3m
receiver: email

receivers:
- name: 'email'
    email_configs:
    - to: 'us...@email.com'
      send_resolved: true
    slack_configs:
    - channel: "#doropu-infra"
      text: "summary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
      send_resolved: true

Simon Pasquier

unread,

May 21, 2019, 8:23:37 AM5/21/19

to Shiuh Rong Yong, Prometheus Users

Anything in the Prometheus or AlertManager logs?
Are Prometheus & AlertManager running on the same server?
node_exporter:node_filesystem_free:fs_used_percents looks like a
metric generated from a recording rule. Is this assumption correct?

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/245bca03-e59e-4186-bf06-58509a4a3e39%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Shiuh Rong Yong

unread,

May 21, 2019, 11:23:02 PM5/21/19

to Prometheus Users

1. Yes both prometheus and alertmanager running on same server. I have check the /var/log/message log, there aren 't many things except some grafana error.

2. Under prometheus, I can see the metrics like this:

node_filesystem_free_bytes{device="lxcfs",fstype="fuse.lxcfs",instance="server-ip address:9100",job="node",mountpoint="/var/lib/lxcfs"}

0

> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Simon Pasquier

unread,

May 22, 2019, 5:39:08 AM5/22/19

to Shiuh Rong Yong, Prometheus Users

Where does the "node_exporter:node_filesystem_free:fs_used_percents" metric come from?

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8f01a475-aea5-4411-a585-5d1d9e7c5872%40googlegroups.com.

Shiuh Rong Yong

unread,

May 22, 2019, 11:12:08 PM5/22/19

to Prometheus Users

How can I check that? I copied someone config online.

Simon Pasquier

unread,

May 23, 2019, 5:41:38 AM5/23/19

to Shiuh Rong Yong, Prometheus Users

If "node_exporter:node_filesystem_free:fs_used_percents" is missing,
this explains why the alert never fired.

> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

> To post to this group, send email to promethe...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0fb7be0a-a4d8-4480-8218-e4fc27a3db94%40googlegroups.com.

benny...@gmail.com

unread,

May 23, 2019, 11:39:27 AM5/23/19

to Prometheus Users

"node_exporter:node_filesystem_free:fs_used_percents" is a recording rule: a new timeserie created after some computation by Prometheus. You might have copied the alert but not the recording rule.

https://prometheus.io/docs/prometheus/latest/configuration/recording_rules/

Check your Prometheus config file to see if it includes any recording rules. If it does, then grep for "node_exporter:node_filesystem_free:fs_used_percents" in your rule folder to see if you do have it.

Shiuh Rong Yong

unread,

May 24, 2019, 12:37:50 AM5/24/19

to Prometheus Users

I found this link https://petargitnik.github.io/blog/2018/01/04/how-to-write-rules-for-prometheus

Did you meant i need to define "node_exporter:node_filesystem_free:fs_used_percents" somewhere in my rules.yml?

On Wednesday, 22 May 2019 17:39:08 UTC+8, Simon Pasquier wrote:

Shiuh Rong Yong

unread,

May 24, 2019, 12:38:37 AM5/24/19

to Prometheus Users

Thanks, I think i found from this link https://petargitnik.github.io/blog/2018/01/04/how-to-write-rules-for-prometheus

Shiuh Rong Yong

unread,

May 24, 2019, 12:51:51 AM5/24/19

to Prometheus Users

Hi all,

It is working now, thanks!

On Tuesday, 21 May 2019 16:45:12 UTC+8, Shiuh Rong Yong wrote:

Hi everyone,

I have configure the rules to monitor the disk space but recently one of my server disk space is 100%, i did not received anything, following is my config, hope you all can give me some idea.

alert.rules.yml:

groups:
- name: alert.rules
rules:
- alert: EndpointDown
    expr: probe_success == 0
    for: 120s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"

- alert: DiskSpace20%Free
    expr: node_exporter:node_filesystem_free:fs_used_percents >= 80
    labels:
      severity: moderate
    annotations:
      summary: "Instance {{ $labels.instance }} is low on disk space"
      description: "{{ $labels.instance }} has only {{ $value }}% free."

alertmanager.yml:

global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'monitoringemail.com'

# smtp_require_tls: true
smtp_auth_username: 'monit...@gmail.com'
smtp_auth_password: 'password'
slack_api_url: "slack hook url
resolve_timeout: 3m
route:
group_by: ['instance', 'severity']
group_wait: 30s
group_interval: 3m
repeat_interval: 3m
receiver: email

receivers:
- name: 'email'
email_configs:

- to: 'user1email.com'

Reply all

Reply to author

Forward