alert label with variable for pushgateway metrics

196 views
Skip to first unread message

Aravind Poojari

unread,
Aug 10, 2020, 1:06:49 AM8/10/20
to Prometheus Users

Hello Team,

I'm new to Prometheus and we started adopting it for our infra from past 2 months.
We are using - push method - pushgateway for metric collection with multiple exporters such as node-exporter, cadvisor, jmx-exporter & etc.
The metrics pushed to pushgateway as follows:

ex:
job="node-exporter-dev";    instance="instance-name"
job="node-exporter-prod";    instance="instance-name"
job="cadvisor-dev";    instance="instance-name"
job="cadvisor-prod-";    instance="instance-name"
etc..............,

We are facing an issue while writing alert rules for the above jobs & instances.
We are unable to use a template so we have to write the alert rules for each and every job and their respective instances. It's kind of hard as instances keep on increasing every day. Following is an example of alert rule how we are using configuration

##Alert rule config
---
    - alert: HighCPU-Critical
      expr: instance:node_cpu_utilization:ratio{job="node-exporter-test", instance="instance-two",  mode="idle"} > 0.90
      for: 1m
      labels:
        severity: critical
      annotations:
        title: CPU use percent is extremely high on {{ $labels.instance }} for the past 10 minutes.

    - alert: HighCPU-Critical
      expr: instance:node_cpu_utilization:ratio{job="node-exporter-test", instance="instance-two",  mode="idle"} > 0.90
      for: 1m
      labels:
        severity: critical
      annotations:
        title: CPU use percent is extremely high on {{ $labels.instance }} for the past 10 minutes.
------
We have to repeat the same for every job and their instances. Let us know if we can use variables for all jobs & instance values may be something like this [I ain't sure].

instance=instance1, instance2, instance3 ...
job=node, cdavisor, jmx

so later we can use them in a single template.

I'm not really sure if it's possible however hoping for a solution to my problem.

Christian Hoffmann

unread,
Aug 10, 2020, 1:55:07 AM8/10/20
to Aravind Poojari, Prometheus Users
Hi,
What's your reason for repeating the rules for each server?

Prometheus does not have any concept of objects or servers (in contrast
to other monitoring systems). This means that you can design your
queries rather freely.

To be more specific: If you want to monitor all configured targets, just
drop the instance= label. If you want to continue explicitly listing
your targets there, you can still simplify by using regular expressions
(instance=~"instance1|instance2|instance3"}). The same is true for
recording rules.

Hope this helps.

Christian

Aravind Poojari

unread,
Aug 10, 2020, 2:04:14 AM8/10/20
to Prometheus Users
Thanks, Christian.
It semms your answer would solve our current issue.

The reason to repeat it is, as we are using metrics from pushgateway, if we write template as follows, It's not returning the jobs or instances values

       expr: avg without (cpu) (sum without (mode) (1 - rate(node_cpu_seconds_total{mode="idle"}))) > 0.90

I will test it mentioned format and write back

Brian Candler

unread,
Aug 10, 2020, 2:39:24 AM8/10/20
to Prometheus Users
On Monday, 10 August 2020 06:06:49 UTC+1, Aravind Poojari wrote:

I'm new to Prometheus and we started adopting it for our infra from past 2 months.
We are using - push method - pushgateway for metric collection with multiple exporters such as node-exporter, cadvisor, jmx-exporter & etc.

Are you really using pushgateway?  Do you mean PushProx ?

Aravind Poojari

unread,
Aug 10, 2020, 3:16:23 AM8/10/20
to Prometheus Users
Yes We are suing prometheus pushgateway.

Aravind Poojari

unread,
Aug 10, 2020, 3:17:56 AM8/10/20
to Prometheus Users
typo:: We are using pushgateway.
Let us know if pushprox has any benefit over pushgateway.

Brian Candler

unread,
Aug 10, 2020, 11:28:22 AM8/10/20
to Prometheus Users
pushgateway is designed for a completely different use case: a short-lived process which needs to export its result, such as a batch job run from cron.  The metric remains for collection after the process has terminated.

It's not even obvious to me how you've managed to combine node_exporter with pushgateway.

It's best if you can get prometheus to poll ("scrape") node_exporter directly.  If it cannot do this, e.g. because node_exporter is sitting behind a NAT firewall and you are not allowed to configure port-forwarding, then pushprox will allow you to perform a "meet in the middle".
Reply all
Reply to author
Forward
0 new messages