Prometheus predict linear query needs to exclude a metrics for a certain host

Ishvar B

unread,

May 15, 2020, 5:30:22 AM5/15/20

to Prometheus Users

Hi,

I need help for the below issue.

I have the below query regarding which as per the query definition is working fine. But for certain host (say example*), I want to exclude /var/cache/fscache metric from alert rule. I am little confused as either it excludes the metric for all the hosts or collects it for all the hosts.

query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0 - This collects all metrics for all the hosts

updated query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0 - This excludes both the host example* and the mountpoint /var/cache/fscache. I want this to alert for all the metrics for all the hosts but for example* host don't alert for mounpoint /var/cache/fscache.

Any help is highly appreciated.

Thanks

Eswar

Julius Volz

unread,

May 15, 2020, 6:01:29 AM5/15/20

to Ishvar B, Prometheus Users

There's two options:

- You can set the --collector.filesystem.ignored-mount-points (default "^/(dev|proc|sys|var/lib/docker/.+)($|/)") on the specific hosts to not even export metrics about that mountpoint, if you never need those metrics at all.

- If that's not an option, you could have static recording rules that create time series with the instance+job labels that you would want to exclude, like:

groups:
- name: excludes
rules:
- record: excluded_instance_mountpoints
expr: 1
labels:
instance: <my-excluded-instance-name1>
mountpoint: <my-excluded-mountpoint-name1>

- record: excluded_instance_mountpoints
expr: 1
labels:
instance: <my-excluded-instance-name2>
mountpoint: <my-excluded-mountpoint-name2>

...and then you could use those recorded time series to remove alerts for those exceptions:

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0

unless on(instance, mountpoint)

excluded_instance_mountpoints

(you could change "instance" to "job" in the example if indeed your exceptions are grouped by "job" in your case)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/253f05a4-5428-4b00-af77-0ca93f2c93c8%40googlegroups.com.

--

Julius Volz

PromLabs - promlabs.com

Brian Candler

unread,

May 15, 2020, 6:16:41 AM5/15/20

to Prometheus Users

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0

Label filter conditions are combined using "AND". So this expression will only return metrics where:

* the metric name is "node_filesystem_free_bytes" AND
* the fstype does not match "tmpfs" AND
* the job does not match "example.*" AND

* the mountpoint does not match "/var/cache/fscache AND

* the predict_linear computed value is less than 0

Therefore, you have excluded /var/cache/fscache from all jobs (not just the example job), and you have excluded job "example.*" entirely.

What you describe needs two rules:

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job=~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*"}[1h], 4 * 3600) < 0

Aside 1: some of those don't need regexps. You could write fstype!="tmpfs",mountpoint!="/var/cache/fscache"

Aside 2: normally hosts are identified by "instance" label rather than "job"

Ishvar B

unread,

May 15, 2020, 6:50:37 AM5/15/20

to Prometheus Users

Hi Julius,

Thanks for your reply.

I could not completely understand. I have read about recording rules but in this context what should come in the place of expr: 1 , should that be

node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}?

Thanks

Eswar

On Friday, 15 May 2020 12:01:29 UTC+2, Julius Volz wrote:

There's two options:

- You can set the --collector.filesystem.ignored-mount-points (default "^/(dev|proc|sys|var/lib/docker/.+)($|/)") on the specific hosts to not even export metrics about that mountpoint, if you never need those metrics at all.

- If that's not an option, you could have static recording rules that create time series with the instance+job labels that you would want to exclude, like:

groups:
- name: excludes
rules:
- record: excluded_instance_mountpoints
expr: 1
labels:
instance: <my-excluded-instance-name1>
mountpoint: <my-excluded-mountpoint-name1>
- record: excluded_instance_mountpoints
expr: 1
labels:
instance: <my-excluded-instance-name2>
mountpoint: <my-excluded-mountpoint-name2>

...and then you could use those recorded time series to remove alerts for those exceptions:

predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0
unless on(instance, mountpoint)
excluded_instance_mountpoints

(you could change "instance" to "job" in the example if indeed your exceptions are grouped by "job" in your case)

On Fri, May 15, 2020 at 11:30 AM Ishvar B <ish...@gmail.com> wrote:

Hi,

I need help for the below issue.

I have the below query regarding which as per the query definition is working fine. But for certain host (say example*), I want to exclude /var/cache/fscache metric from alert rule. I am little confused as either it excludes the metric for all the hosts or collects it for all the hosts.

query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs"}[1h], 4 * 3600) < 0 - This collects all metrics for all the hosts

updated query: predict_linear(node_filesystem_free_bytes{fstype!~"tmpfs",job!~"example.*",mountpoint!~"/var/cache/fscache"}[1h], 4 * 3600) < 0 - This excludes both the host example* and the mountpoint /var/cache/fscache. I want this to alert for all the metrics for all the hosts but for example* host don't alert for mounpoint /var/cache/fscache.

Any help is highly appreciated.

Thanks
Eswar

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/253f05a4-5428-4b00-af77-0ca93f2c93c8%40googlegroups.com.

Julius Volz

unread,

May 15, 2020, 8:23:33 AM5/15/20

to Ishvar B, Prometheus Users

No, the 1 should just stay a 1, as sample value for the recorded series doesn't matter for this use case. For the "unless" set operator purpose, the only thing that matters is that there is a time series on the right-hand side that has the same "instance" and "mountpoint" label values as a series on the left-hand side (which then gets removed from the result by the "unless").

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f67755a9-397f-4595-b87b-97d2e94189e4%40googlegroups.com.

Julius Volz

unread,

May 15, 2020, 8:23:43 AM5/15/20

to Ishvar B, Prometheus Users

*as the sample value

Reply all

Reply to author

Forward