Prometheus Active Alert getting resolved instead of sending to Slack

45 views
Skip to first unread message

arnav...@gmail.com

unread,
Oct 29, 2020, 2:01:30 PM10/29/20
to Prometheus Users
Hi,

I have a working setup of Prometheus-Alert manager with slack and Teams and alerts are working fine and reaching both Slack and Teams. 

Today I added a new alert with the following query. The alert does not have any wait time configured. All alert rules are getting evaluated every 15 seconds. Based on the query the alert became active in Prometheus UI but it did not reach slack or Teams. It resolved before doing so. What could be the reason? Please note, during the same time some existing alerts worked alright. 

The query

count_over_time(K_Event_Count{EvId="2417164311",EvMessage="fan alarm"}[2m]) > 0  

During this time the only logs I was able to find in alert manager were these but I am not sure the error logs are related:

{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:12:35.567Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:13:29.493Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:13:31.031Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:13:32.568Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:13:34.297Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:14:29.419Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:14:31.007Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:14:33.991Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:14:36.078Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:15:29.448Z","version":"v1"}
{"caller":"api.go:781","component":"api","err":"bad_data: json: cannot unmarshal bool into Go struct field Alert.labels of type model.LabelValue","level":"error","msg":"API error","ts":"2020-10-29T15:15:30.979Z","version":"v1"}

Could you please tell me what could have gone wrong? 

Thanks,
Arnav

Brian Candler

unread,
Oct 29, 2020, 4:22:23 PM10/29/20
to Prometheus Users
You have a label like

    foo: true

but it needs to be quoted, i.e.

    foo: "true"

Brian Candler

unread,
Oct 29, 2020, 4:24:10 PM10/29/20
to Prometheus Users
And regarding the self-resolved alert, can you give your full alerting rule, not just the "expr" part.

arnav...@gmail.com

unread,
Oct 29, 2020, 4:32:02 PM10/29/20
to Prometheus Users
Here is the rule:

alert: RPHY_RFS_FAN 
labels: 
 Dashboard: ""   
 alertsrc: prometheus 
 category: RFS 
 host_impacted: '{{$labels.rfsMac}}' 
 kafka: "true" 
 severity: major 
 slack: "true" 
annotations: 
 summary: fan alarm  

Brian Candler

unread,
Oct 30, 2020, 4:09:54 AM10/30/20
to Prometheus Users
Looks reasonable, it should trigger for at least 2 minutes AFAICS.

Next set --log.level=debug on alertmanager, and see what it outputs on stderr when the alert triggers (if running under systemd, check
journalctl -eu alertmanager)
Reply all
Reply to author
Forward
0 new messages