Is there any reason why reusable templates (the
{{define "myTemplate"}} sort from .lib files) cannot be referenced from alert rules? Would they significantly affect memory usage/processing time or is this simply not deemed particularly useful?
Since Prometheus doesn't support rule templates (where you could e.g. just plug in the job name and alerting threshold instead of copy-pasting the entire rule), I'm trying to at least keep down the verbosity (and improve readability) by having the templating in the description be reasonably concise.
For example, I have an alerting rule that fires when a certain percentage of instances of my job are down. It issues a query to list all affected instances (since they are already aggregated away by the rule.
job:up:sum = sum by (job)(up{environment="production"})
job:up:count = count by (job)(up{environment="production"})
job:up:ratio =
job:up:sum
/ on (job)
job:up:count
ALERT TasksMissing
IF job:up:ratio < 0.9
FOR 1m
LABELS { severity="warning" }
ANNOTATIONS {
summary = "Tasks missing from {{ $labels.job }}",
description = "Tasks missing from {{ $labels.job }}: {{ range printf `up{job=\"%s\"}==0` $labels.job | query }} {{ .Labels.instance }} {{ end }}",
}
This is mostly OK as long as I only have one alert. But as soon as I set up a second/third/etc. similar alert (e.g. with a higher severity for a lower fraction of instances running, or for a different job) it becomes very verbose. I would prefer to have a .lib file with a definition
{{ define "enumerateInstancesForJob" }}
{{ $job := .arg0 }}
{{ $query := (or .arg1 `up{job=\"%s\"}`) }}
{{ range $job | printf $query | query }} {{ .Labels.instance }}, {{ end }}
{{ end }}
and reduce the alert definition to
ALERT TasksMissing
IF job:up:ratio < 0.9
FOR 1m
LABELS { severity="warning" }
ANNOTATIONS {
summary = "Tasks missing from {{ $labels.job }}",
description = "Tasks missing from {{ $labels.job }}: {{ template `enumerateInstancesForJob` (args $labels.job `up{job=\"%s\"}==0`) }}",
}
Thanks,
Alin.