Hello Team,
I'm new to Prometheus and we started adopting it for our infra from past 2 months.
We are using - push method - pushgateway for metric collection with multiple exporters such as node-exporter, cadvisor, jmx-exporter & etc.
The metrics pushed to pushgateway as follows:
ex:
job="node-exporter-dev"; instance="instance-name"
job="node-exporter-prod"; instance="instance-name"
job="cadvisor-dev"; instance="instance-name"
job="cadvisor-prod-"; instance="instance-name"
etc..............,
We are facing an issue while writing alert rules for the above jobs & instances.
We are unable to use a template so we have to write the alert rules for each and every job and their respective instances. It's kind of hard as instances keep on increasing every day. Following is an example of alert rule how we are using configuration
##Alert rule config
---
- alert: HighCPU-Critical
expr: instance:node_cpu_utilization:ratio{job="node-exporter-test", instance="instance-two", mode="idle"} > 0.90
for: 1m
labels:
severity: critical
annotations:
title: CPU use percent is extremely high on {{ $labels.instance }} for the past 10 minutes.
- alert: HighCPU-Critical
expr: instance:node_cpu_utilization:ratio{job="node-exporter-test", instance="instance-two", mode="idle"} > 0.90
for: 1m
labels:
severity: critical
annotations:
title: CPU use percent is extremely high on {{ $labels.instance }} for the past 10 minutes.
------
We have to repeat the same for every job and their instances. Let us know if we can use variables for all jobs & instance values may be something like this [I ain't sure].
instance=instance1, instance2, instance3 ...
job=node, cdavisor, jmx
so later we can use them in a single template.
I'm not really sure if it's possible however hoping for a solution to my problem.