Prometheus predict linear query needs to exclude a metrics for a certain host

211 views
Skip to first unread message

Ishvar B

unread,
May 15, 2020, 5:30:22 AM5/15/20
to Prometheus Users
Hi,

I need help for the below issue.

I have the below query regarding which as per the query definition is working fine. But for certain host (say example*), I want to exclude /var/cache/fscache metric from alert rule. I am little confused as either it excludes the metric for all the hosts or collects it for all the hosts. 

query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0 -  This collects all metrics for all the hosts

updated query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0 - This excludes both the host example* and the mountpoint /var/cache/fscache. I want this to alert for all the metrics for all the hosts but for example* host don't alert for mounpoint /var/cache/fscache.

Any help is highly appreciated.

Thanks
Eswar

Julius Volz

unread,
May 15, 2020, 6:01:29 AM5/15/20
to Ishvar B, Prometheus Users
There's two options:

- You can set the --collector.filesystem.ignored-mount-points (default "^/(dev|proc|sys|var/lib/docker/.+)($|/)") on the specific hosts to not even export metrics about that mountpoint, if you never need those metrics at all.

- If that's not an option, you could have static recording rules that create time series with the instance+job labels that you would want to exclude, like:

groups:
- name: excludes
  rules:
  - record: excluded_instance_mountpoints
    expr: 1
    labels:
       instance: <my-excluded-instance-name1>
       mountpoint: <my-excluded-mountpoint-name1>
  - record: excluded_instance_mountpoints
    expr: 1
    labels:
       instance: <my-excluded-instance-name2>
       mountpoint: <my-excluded-mountpoint-name2>

...and then you could use those recorded time series to remove alerts for those exceptions:

      predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0
  unless on(instance, mountpoint)
      excluded_instance_mountpoints

(you could change "instance" to "job" in the example if indeed your exceptions are grouped by "job" in your case)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/253f05a4-5428-4b00-af77-0ca93f2c93c8%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

Brian Candler

unread,
May 15, 2020, 6:16:41 AM5/15/20
to Prometheus Users
predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0

Label filter conditions are combined using "AND".  So this expression will only return metrics where:

* the metric name is "node_filesystem_free_bytes" AND
* the fstype does not match "tmpfs" AND
* the job does not match "example.*" AND
* the mountpoint does not match "/var/cache/fscache AND
* the predict_linear computed value is less than 0

Therefore, you have excluded /var/cache/fscache from all jobs (not just the example job), and you have excluded job "example.*" entirely.

What you describe needs two rules:

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job=~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0
predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*"}[1h], 4 * 3600) < 0

Aside 1: some of those don't need regexps. You could write fstype!="tmpfs",mountpoint!="/var/cache/fscache"
Aside 2: normally hosts are identified by "instance" label rather than "job"

Ishvar B

unread,
May 15, 2020, 6:50:37 AM5/15/20
to Prometheus Users
Hi Julius,

Thanks for your reply. 

I could not completely understand. I have read about recording rules but in this context what should come in the place of expr: 1 , should that be 
node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}?

Thanks
Eswar

On Friday, 15 May 2020 12:01:29 UTC+2, Julius Volz wrote:
There's two options:

- You can set the --collector.filesystem.ignored-mount-points (default "^/(dev|proc|sys|var/lib/docker/.+)($|/)") on the specific hosts to not even export metrics about that mountpoint, if you never need those metrics at all.

- If that's not an option, you could have static recording rules that create time series with the instance+job labels that you would want to exclude, like:

groups:
- name: excludes
  rules:
  - record: excluded_instance_mountpoints
    expr: 1
    labels:
       instance: <my-excluded-instance-name1>
       mountpoint: <my-excluded-mountpoint-name1>
  - record: excluded_instance_mountpoints
    expr: 1
    labels:
       instance: <my-excluded-instance-name2>
       mountpoint: <my-excluded-mountpoint-name2>

...and then you could use those recorded time series to remove alerts for those exceptions:

      predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0
  unless on(instance, mountpoint)
      excluded_instance_mountpoints

(you could change "instance" to "job" in the example if indeed your exceptions are grouped by "job" in your case)

On Fri, May 15, 2020 at 11:30 AM Ishvar B <ish...@gmail.com> wrote:
Hi,

I need help for the below issue.

I have the below query regarding which as per the query definition is working fine. But for certain host (say example*), I want to exclude /var/cache/fscache metric from alert rule. I am little confused as either it excludes the metric for all the hosts or collects it for all the hosts. 

query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0 -  This collects all metrics for all the hosts

updated query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0 - This excludes both the host example* and the mountpoint /var/cache/fscache. I want this to alert for all the metrics for all the hosts but for example* host don't alert for mounpoint /var/cache/fscache.

Any help is highly appreciated.

Thanks
Eswar

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Julius Volz

unread,
May 15, 2020, 8:23:33 AM5/15/20
to Ishvar B, Prometheus Users
No, the 1 should just stay a 1, as sample value for the recorded series doesn't matter for this use case. For the "unless" set operator purpose, the only thing that matters is that there is a time series on the right-hand side that has the same "instance" and "mountpoint" label values as a series on the left-hand side (which then gets removed from the result by the "unless").

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f67755a9-397f-4595-b87b-97d2e94189e4%40googlegroups.com.

Julius Volz

unread,
May 15, 2020, 8:23:43 AM5/15/20
to Ishvar B, Prometheus Users
*as the sample value
Reply all
Reply to author
Forward
0 new messages