unable to send alert when disk space running out

137 views
Skip to first unread message

Shiuh Rong Yong

unread,
May 21, 2019, 4:45:12 AM5/21/19
to Prometheus Users
Hi everyone,

I have configure the rules to monitor the disk space but recently one of my server disk space is 100%, i did not received anything, following is my config, hope you all can give me some idea.


alert.rules.yml:

groups:
- name: alert.rules
  rules:
  - alert: EndpointDown
    expr: probe_success == 0
    for: 120s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
 
  - alert: DiskSpace20%Free
    expr: node_exporter:node_filesystem_free:fs_used_percents >= 80
    labels:
      severity: moderate
    annotations:
      summary: "Instance {{ $labels.instance }} is low on disk space"
      description: "{{ $labels.instance }} has only {{ $value }}% free."


alertmanager.yml:

global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'monit...@gmail.com'
#  smtp_require_tls: true
  smtp_auth_username: 'monit...@gmail.com'
  smtp_auth_password: 'password'
  slack_api_url: "slack hook url
  resolve_timeout: 3m
route:
  group_by: ['instance', 'severity']
  group_wait: 30s
  group_interval: 3m
  repeat_interval: 3m
  receiver: email

receivers:
  - name: 'email'
    email_configs:
    - to: 'us...@email.com'
      send_resolved: true
    slack_configs:
    - channel: "#doropu-infra"
      text: "summary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"
      send_resolved: true



Simon Pasquier

unread,
May 21, 2019, 8:23:37 AM5/21/19
to Shiuh Rong Yong, Prometheus Users
Anything in the Prometheus or AlertManager logs?
Are Prometheus & AlertManager running on the same server?
node_exporter:node_filesystem_free:fs_used_percents looks like a
metric generated from a recording rule. Is this assumption correct?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/245bca03-e59e-4186-bf06-58509a4a3e39%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Shiuh Rong Yong

unread,
May 21, 2019, 11:23:02 PM5/21/19
to Prometheus Users
1. Yes both prometheus and alertmanager running on same server. I have check the /var/log/message log, there aren 't many things except some grafana error.
2. Under prometheus, I can see the metrics like this:

node_filesystem_free_bytes{device="lxcfs",fstype="fuse.lxcfs",instance="server-ip address:9100",job="node",mountpoint="/var/lib/lxcfs"}0
> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Simon Pasquier

unread,
May 22, 2019, 5:39:08 AM5/22/19
to Shiuh Rong Yong, Prometheus Users
Where does the "node_exporter:node_filesystem_free:fs_used_percents" metric come from?

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

Shiuh Rong Yong

unread,
May 22, 2019, 11:12:08 PM5/22/19
to Prometheus Users
How can I check that? I copied someone config online.

Simon Pasquier

unread,
May 23, 2019, 5:41:38 AM5/23/19
to Shiuh Rong Yong, Prometheus Users
If "node_exporter:node_filesystem_free:fs_used_percents" is missing,
this explains why the alert never fired.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0fb7be0a-a4d8-4480-8218-e4fc27a3db94%40googlegroups.com.

benny...@gmail.com

unread,
May 23, 2019, 11:39:27 AM5/23/19
to Prometheus Users
"node_exporter:node_filesystem_free:fs_used_percents" is a recording rule: a new timeserie created after some computation by Prometheus. You might have copied the alert but not the recording rule.


Check your Prometheus config file to see if it includes any recording rules. If it does, then grep for "node_exporter:node_filesystem_free:fs_used_percents" in your rule folder to see if you do have it.

Shiuh Rong Yong

unread,
May 24, 2019, 12:37:50 AM5/24/19
to Prometheus Users

Did you meant i need to define "node_exporter:node_filesystem_free:fs_used_percents" somewhere in my rules.yml?



On Wednesday, 22 May 2019 17:39:08 UTC+8, Simon Pasquier wrote:

Shiuh Rong Yong

unread,
May 24, 2019, 12:38:37 AM5/24/19
to Prometheus Users

Shiuh Rong Yong

unread,
May 24, 2019, 12:51:51 AM5/24/19
to Prometheus Users
Hi all,

It is working now, thanks!

On Tuesday, 21 May 2019 16:45:12 UTC+8, Shiuh Rong Yong wrote:
Hi everyone,

I have configure the rules to monitor the disk space but recently one of my server disk space is 100%, i did not received anything, following is my config, hope you all can give me some idea.


alert.rules.yml:

groups:
- name: alert.rules
  rules:
  - alert: EndpointDown
    expr: probe_success == 0
    for: 120s
    labels:
      severity: "critical"
    annotations:
      summary: "Endpoint {{ $labels.instance }} down"
 
  - alert: DiskSpace20%Free
    expr: node_exporter:node_filesystem_free:fs_used_percents >= 80
    labels:
      severity: moderate
    annotations:
      summary: "Instance {{ $labels.instance }} is low on disk space"
      description: "{{ $labels.instance }} has only {{ $value }}% free."


alertmanager.yml:

global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: 'monitoringemail.com'

#  smtp_require_tls: true
  smtp_auth_username: 'monit...@gmail.com'
  smtp_auth_password: 'password'
  slack_api_url: "slack hook url
  resolve_timeout: 3m
route:
  group_by: ['instance', 'severity']
  group_wait: 30s
  group_interval: 3m
  repeat_interval: 3m
  receiver: email

receivers:
  - name: 'email'
    email_configs:
    - to: 'user1email.com'
Reply all
Reply to author
Forward
0 new messages