I'm using Prometheus 2.6.0 and go 1.11
After changing an alert rule to have more labels and restarting Prometheus, I have ALERTS{} metrics that match the older alert rule that are still firing days later.
old alert rule:
alert: procMissing
expr: namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)"} < 1
for: 2m
labels: { env: '{{$labels.environment}}', region: '${{labels.region}}', severity: critical }
annotations:
summary: "{{$labels.groupname}} not running on {{$labels.node}}.{{$labels.ip}}"
description: "num_procs for groupname {{$labels.groupname}} < 1 for 2 minutes"
old alert metric:
ALERTS{alertname="procMissing",alertstate="firing",env="beta",groupname="BusinessServer",ip="[redacted]",job="process-exporter",node="[redacted]",region="us-east-1",severity="critical"}
new alert rule:
alert: procMissing
expr:
namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)",groupname!~"Business.*"} < 1
for: 2m
labels: { env: '{{$labels.environment}}', region: '${{labels.region}}', rulefile: alerts2.yml, severity: critical }
annotations:
summary: "{{$labels.groupname}} not running on {{$labels.node}}.{{$labels.ip}}"
description: "num_procs for groupname {{$labels.groupname}} < 1 for 2 minutes"
After updating alerts2.yml and restarting prometheus, the old alert metric that matches "Business.*" and without the rulefile label still appears with recent timestamps dated after I made the change and restarted.
Shouldn't firing alerts match the alertrules? I'd understand if the pre-change ALERTS{} had timestamps that predated the change, but these have timestamps after the change.
Why are there current ALERTS{alertstate="firing"} metrics that aren't in the /alerts http report? Maybe I misunderstand the nature of ALERTS{} metrics.