Prometheus randomly stops logging counter, which stays at zero.

135 views
Skip to first unread message

Constantin Clauzel

unread,
Feb 25, 2021, 5:32:52 AM2/25/21
to Prometheus Users
Hey,

Since this morning I'm experiencing some very weird behaviors with one of my counters.
It randomly stays at zero for an hour, then appears again, then again to zero.

What is strange is that all other metrics are showing up, meaning prometheus can reach the endpoint, and when I check the endpoint is has the missing counter in it.

Is there any possible reason that could explain why a counter suddenly only returns zeros, and then starts working again for no apparent reason?

Please find attached how the graphs look.

The query that returns all zeros:
sum(increase(appbackend_alerts{metric="TEMP", type="TOO_HIGH"}[5m]))
sum(increase(appbackend_alerts{metric="HUMI", type="TOO_HIGH"}[5m]))

The prometheus endpoint returns all those appbackend_alerts lines:

appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_HIGH"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_LOW"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="TEMP",type="TOO_LOW"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_LOW"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_HIGH"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_LOW"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_LOW"} 1
appbackend_alerts{boxID="0",controllerID="xxxxxxxxx",metric="HUMI",type="TOO_HIGH"} 1
[ ... And may more ]

Thanks,
Constantin
Screenshot from 2021-02-25 11-28-26.png

Constantin Clauzel

unread,
Feb 25, 2021, 5:37:36 AM2/25/21
to Prometheus Users
Attached file in previous message was broken:
Screenshot from 2021-02-25 11-36-42.png

Constantin Clauzel

unread,
Feb 25, 2021, 5:44:42 AM2/25/21
to Prometheus Users
And it just started appearing again.

It took approximately the same time to show up again as last time, around 1h10.
Screenshot from 2021-02-25 11-43-30.png

Constantin Clauzel

unread,
Feb 25, 2021, 5:48:55 AM2/25/21
to Prometheus Users
Is it possible that prometheus is ignoring the counter as long as it's equal to 1?
I can't be sure but it seems that it only started logging my counter because some of them are now higher than 1.
This could be coherent as my system increments the value every 30min, so if the condition is met twice it will take up to an hour to have one reach 2.

Constantin Clauzel

unread,
Feb 25, 2021, 6:05:45 AM2/25/21
to Prometheus Users
Last test, looks like it's the `sum(increase(appbackend_alerts{metric="TEMP", type="TOO_HIGH"}[5m]))` query that doesn't return anything as long as the counters are at 1.
I added the query sum(appbackend_alerts{}) which displays the attached graph (blue lines shows that there are indeed counter values coming in).
Screenshot from 2021-02-25 12-03-56.png

Constantin Clauzel

unread,
Feb 25, 2021, 6:24:47 AM2/25/21
to Prometheus Users
Reply all
Reply to author
Forward
0 new messages