Question about Prometheus rate()

297 views
Skip to first unread message

daniel.hoel...@bitmovin.com

unread,
Jul 3, 2017, 10:17:09 AM7/3/17
to Prometheus Users
Hi,
I am seeing a weird behavior in my Prometheus metrics collection.

I have two counter metrics I graph with Grafana

Total requests per second: sum(rate(ingress_requests_total[30s]))
Total successful requests per second: sum(rate(ingress_successful_requests_total[30s]))

Pretty standard stuff, the code path this gets collected from ensures that ingress_requests_total is always incremented, while the ingress_successful_requests_total is only incremented when everything went through.

So now I graphed this with Grafana and the above queries and I get a value where total requests is < than the successful requests - which to my knowledge can't happen?

See the screenshot here:


Am I doing something wrong or am I missing something? All these values were recorded from one instance so I can't even attribute this to lost scrapes or something.



Thanks for your help,
greetings Daniel

Brian Brazil

unread,
Jul 3, 2017, 10:28:17 AM7/3/17
to daniel.hoel...@bitmovin.com, Prometheus Users
This is possible if you increment ingress_successful_requests_total before you increment ingress_requests_total, and the scrape happens between the two increments.

Brian
 


Thanks for your help,
greetings Daniel

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0c68b02c-87f6-4fe2-a484-694dfaefcb06%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Jul 3, 2017, 10:38:53 AM7/3/17
to daniel.hoel...@bitmovin.com, Prometheus Users
You didn't mention what your scrape interval is.  We recommend a scrape_interval such that you would get 4 samples in the smallest range vector request you're looking for.  This allows for better sample interpolation, coverage of lost scrapes, and counter resets.

For a 30s, you need a scrape interval of 7s of lower in order to get good results.  I would probably recommend a scrape interval of 5s.

--

daniel.hoel...@bitmovin.com

unread,
Jul 3, 2017, 10:43:12 AM7/3/17
to Prometheus Users, daniel.hoel...@bitmovin.com
Thanks for the reply,
my scrape interval is 30s (How often should I scrape btw? Are there any guidelines / best practices to follow here?)

As for a scrape in between incrementing the counters: 
This cannot happen as ingress_requests_total is literally the first line of code in my HTTP handler. It gets incremented at the beginning of a request and ingress_successful_requests_total at the end of a request cycle. There is no async etc going on in between, it's a straightforward procedural piece of code.
There is no other way through the source, if ingress_requests_total is not incremented there is no way to get to the successful_requests_total

greetings Daniel


On Monday, 3 July 2017 16:28:17 UTC+2, Brian Brazil wrote:
On 3 July 2017 at 15:17, <daniel.hoel...@bitmovin.com> wrote:
Hi,
I am seeing a weird behavior in my Prometheus metrics collection.

I have two counter metrics I graph with Grafana

Total requests per second: sum(rate(ingress_requests_total[30s]))
Total successful requests per second: sum(rate(ingress_successful_requests_total[30s]))

Pretty standard stuff, the code path this gets collected from ensures that ingress_requests_total is always incremented, while the ingress_successful_requests_total is only incremented when everything went through.

So now I graphed this with Grafana and the above queries and I get a value where total requests is < than the successful requests - which to my knowledge can't happen?

See the screenshot here:


Am I doing something wrong or am I missing something? All these values were recorded from one instance so I can't even attribute this to lost scrapes or something.


This is possible if you increment ingress_successful_requests_total before you increment ingress_requests_total, and the scrape happens between the two increments.

Brian
 


Thanks for your help,
greetings Daniel

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

Brian Brazil

unread,
Jul 3, 2017, 10:53:41 AM7/3/17
to daniel.hoel...@bitmovin.com, Prometheus Users
On 3 July 2017 at 15:43, <daniel.hoel...@bitmovin.com> wrote:
Thanks for the reply,
my scrape interval is 30s (How often should I scrape btw? Are there any guidelines / best practices to follow here?)

As for a scrape in between incrementing the counters: 
This cannot happen as ingress_requests_total is literally the first line of code in my HTTP handler. It gets incremented at the beginning of a request and ingress_successful_requests_total at the end of a request cycle. There is no async etc going on in between, it's a straightforward procedural piece of code.
There is no other way through the source, if ingress_requests_total is not incremented there is no way to get to the successful_requests_total

The race can also happen the other way. To minimise it (and make your metrics easier to understand from the code), I'd try to increment these two metrics one after the other. It's also more usual to do total and failure rather than total and success.

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c8590216-c89e-48aa-a1f3-0f6f79ce94b8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Jul 3, 2017, 10:54:42 AM7/3/17
to daniel.hoel...@bitmovin.com, Prometheus Users
As I said earlier, if you want to see rate over 30 second windows, you will need a faster scrape interval.  As for best practices, it highly depends on what you're doing.  5s scrape interval is totally fine, depending on how fast your Prometheus server is (CPU? memory?), how many targets you have, how many metrics each target has.  All of these things affect how many samples per second a Prometheus server can handle, and how many samples you have to ingest per second.  Without more specific details, it's hard to say in your specific case.

What language is your code written in?  For example in Go, we do async updates which can return scrape results in the middle of a request.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c8590216-c89e-48aa-a1f3-0f6f79ce94b8%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages