one off in a counter

78 views
Skip to first unread message

Ittay Dror

unread,
Feb 26, 2024, 9:03:11 AM2/26/24
to Prometheus Users
I have a counter with a type label. The type can be one of 4 values. Using prom-client (15.0.0) in a node (express) app with Typescript. The code only does 'inc' in the counter. The counter is initialized only with name, help and labelNames configuration.

If I fetch /metrics directly from the pod I see correct values. Say type "A" has 1, type "B" has 2, "C" 2 and "D" 1. The sum is 6

But if I issue a promQL, the results are off by one. That is, the values are 0,1,1,0 respectively. The sum is 2.

This happens across several counters.

Using GCE for the (managed) prometheus server. 

There are no restarts of the pod and using 'increase' doesn't help.

What is the reason and what is the approach to solve this? 

Fabian Stäber

unread,
Feb 26, 2024, 9:13:21 AM2/26/24
to Ittay Dror, Prometheus Users
Hi Ittay,

Please post the PromQL query you are using.

Fabian

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0383adcd-5878-4887-9273-552874fe9f11n%40googlegroups.com.

Ittay Dror

unread,
Feb 26, 2024, 10:08:47 AM2/26/24
to Prometheus Users
It is literaly just 'my_total' 

Fabian Stäber

unread,
Feb 26, 2024, 3:51:22 PM2/26/24
to Ittay Dror, Prometheus Users
Thanks Ittay.

Hmmm, I can only imagine two reasons:
  1. Prometheus hasn't scraped yet, so you are seeing some older values than if you look directly at /metrics.
  2. Prometheus scrapes another target than you are looking at. Maybe something with service discovery is going wrong.
Could you use the Prometheus UI on port 9090 to check that the target is as expected, and when the last scrape to your target happened?

Fabian

Ittay Dror

unread,
Feb 28, 2024, 6:37:30 AM2/28/24
to Prometheus Users
1. That is the only data, there were no metrics before. Also, if looking the data as a chart I can see the time in which it was indexed. 
2. The data is always off by one, and we see metrics change when there's activity in the intended cluster. 

So the targets are expected and scraped recently. Note this happened in two environments (of the same software, test & main)

I also think this is extremely bizarre. Note in particular that we see metric values of "0" being scraped, but only after the counter is actually "1". So when the system just started and is idle, no value, then on the first increment, the value is "0", but only on the prometheus server side, the client reports correctly.

Maybe there's a configuration that rebases values, or assigns a type that causes things to be treated differently, or maybe the first scrape is expected to get 0 as value and doesn't,  or something like that? 

Ittay Dror

unread,
Feb 28, 2024, 6:53:03 AM2/28/24
to Prometheus Users
Also, we don't use collectDefaultMetrics. Maybe that is related? E.g. maybe the 'off by one' is swallowed by default metrics that start accumulating when the server starts and in our case it is noticed because the first metrics we have are at the application level? 

Fabian Stäber

unread,
Feb 28, 2024, 7:51:34 AM2/28/24
to Ittay Dror, Prometheus Users
Hi Ittay,

I don't think this is related to collectDefaultMetrics.

Anyway, I think the best way forward is to try to reproduce it. Write a small stand-alone application that exposes similar metrics, and see if Prometheus has the same issue.

If you can reproduce the issue please share the example, then we can figure out what's wrong.

If the issue can't be reproduced, you know there's a difference between your production app and your example. Then you need to narrow down that difference.

Fabian

Ittay Dror

unread,
Mar 2, 2024, 9:41:03 PM3/2/24
to Prometheus Users
For future googlers, this is the reason: https://cloud.google.com/stackdriver/docs/managed-prometheus/troubleshooting#counter-sums

Monarch, which is Google's implementation ignores the first reported value. So if the first value it scrapes is 5, then all future values will be off by 5. As far as I could see this is per label. 
Reply all
Reply to author
Forward
0 new messages