ALERTS{} that do not match current alert rules

108 views
Skip to first unread message

Moses Moore

unread,
Feb 4, 2019, 11:43:14 AM2/4/19
to Prometheus Users
I'm using Prometheus 2.6.0 and go 1.11

After changing an alert rule to have more labels and restarting Prometheus, I have ALERTS{} metrics that match the older alert rule that are still firing days later.

old alert rule:
  alert: procMissing
  expr: namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)"} < 1
  for: 2m
  labels: { env: '{{$labels.environment}}', region: '${{labels.region}}', severity: critical }
  annotations:
    summary: "{{$labels.groupname}} not running on {{$labels.node}}.{{$labels.ip}}"
    description: "num_procs for groupname {{$labels.groupname}} < 1 for 2 minutes"

old alert metric:
  ALERTS{alertname="procMissing",alertstate="firing",env="beta",groupname="BusinessServer",ip="[redacted]",job="process-exporter",node="[redacted]",region="us-east-1",severity="critical"}

new alert rule:
  alert: procMissing
  expr: namedprocess_namegroup_num_procs{environment!="alpha",groupname!~"(cron|master|rsyslogd|snmpd|sshd)",groupname!~"Business.*"} < 1
  for: 2m
  labels: { env: '{{$labels.environment}}', region: '${{labels.region}}', rulefile: alerts2.yml, severity: critical }
  annotations:
    summary: "{{$labels.groupname}} not running on {{$labels.node}}.{{$labels.ip}}"
    description: "num_procs for groupname {{$labels.groupname}} < 1 for 2 minutes"

After updating alerts2.yml and restarting prometheus, the old alert metric that matches "Business.*" and without the rulefile label still appears with recent timestamps dated after I made the change and restarted.

On one hand, http://prometheus:9090/alerts says the "procMissing" alert is not firing nor pending, and http://prometheus:9090/rules says the "rulefile" label is in the "procMissing" alert rule description.
On the other hand, http://prometheus:9090/graph?g0.expr=ALERTS%7Balertname%3D%22procMissing%22%2Calertstate%3D%22firing%22%2Crulefile%3D%22%22%7D  gives me fifty current metrics that are missing the 'rulefile' label.

Shouldn't firing alerts match the alertrules?  I'd understand if the pre-change ALERTS{} had timestamps that predated the change, but these have timestamps after the change.
Why are there current ALERTS{alertstate="firing"} metrics that aren't in the /alerts http report?  Maybe I misunderstand the nature of ALERTS{} metrics.

Moses Moore

unread,
Feb 5, 2019, 3:03:34 PM2/5/19
to Prometheus Users
Follow-up on this:

curl -s http://localhost:9090/api/v1/alerts |jq '.data.alerts[]|select(.alertname == "procMissing" and .state == "firing")'
returns zero items.

curl -s 'http://localhost:9090/api/v1/query?query=ALERTS{alertname="procMissing",alertstate="firing"}' |jq -r -c '.data.result[]|.metric' 
returns sixty-two items, all with a timestamp less than a minute old, and still have labels from the old version of the "procMissing" alert rule.

I expected if there's no events at the '/alerts' endpoint, there would be no ALERTS{alertstate='firing'} metrics with immediate timestamps.
Am I misunderstanding something, or is this a bug?

I found a way to delete unwanted metrics -- by restarting with --web.enable-admin-api and using .../admin/tsdb/delete_series http endpoint -- but this seems drastic, and if this is a bug then I'd rather get it fixed than sweep it under the carpet.

Simon Pasquier

unread,
Feb 8, 2019, 11:49:33 AM2/8/19
to Moses Moore, Prometheus Users
I would understand if old alerts weren't properly marked as stale in
ALERTS but seeing them for days is really strange as they should be
gone after 5 minutes.
Can you share your rule files before and after the reload? Are you
adding/removing alerts?
Anything special in the logs?
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To post to this group, send email to promethe...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/f05985f0-3490-4cb4-a794-8445bf0aaba8%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Moses Moore

unread,
Feb 8, 2019, 1:19:07 PM2/8/19
to Prometheus Users

- can you share your rule files ?

'fraid not, this is a production machine.  If I diff the before/after of the alerts2.yml file, the only change is the "rulefile: alerts2.yml" bit in the "labels:" block of that one "procMissing" alert rule.

- anything special in the logs?

nope, and we're using log.level=debug .  I need to look again because:

It's a moot point now; `ALERTS{alertname="procMissing",alertstate="firing",rulefile=""}` has zero results now instead of 62 results yesterday.  I made the alert rule change a week ago, and prometheus hasn't been restarted in the last five days.  Maybe it takes 7d for ALERTS{alertstate="firing"} to die of old age if they aren't regenerated?


Thanks for asking.  If I can reproduce it in a cleanroom, I'll mention it again on the list.




On Monday, 4 February 2019 11:43:14 UTC-5, Moses Moore wrote:

Fedor Kanin

unread,
Mar 11, 2024, 11:49:12 AM3/11/24
to Prometheus Users
Hello, did you find a solution? I am facing the same issue, the new ALERTS labels don't match, although they are completely static and there should be no issue.

tags: missing labels, missing prometheus alert labels, missing static labels, prometheus labels don't match
Reply all
Reply to author
Forward
0 new messages