HELP:What's the fastest/easiest way to calculate percentage value according to two counter metrics?

8,432 views
Skip to first unread message

messi...@gmail.com

unread,
Jul 28, 2016, 2:44:30 AM7/28/16
to Prometheus Developers
Hi all,

I'm new to Prometheus and using it together with Grafana, now I have a requirement to calculate a cache hit percentage for all searches between any time range.

Metrics:
- metrics_cache_hit_total
- metrics_search_total

In my understanding, all I need to do is to calculate the increased number of each counter for the given time range and then divide them. The query I tried in Grafana is:
sum(increase(metrics_cache_hit_total[1m])) / sum(increase(metrics_search_total{action="SEARCH", status!~"402|100"}[1m])) * 100, with Step set to 1m and Resolution set to 1/1, but the result is not correct.

Could anybody tell me what's wrong with my query? Or any other way to make this work done?

Thanks a lot.

BR, Shizhz

Brian Brazil

unread,
Jul 28, 2016, 3:16:08 AM7/28/16
to messi...@gmail.com, Prometheus Developers
The problem here is likely that metrics_search_total doesn't cover exactly what you need. I'd suggest adding an additional metric called metrics_cache_lookups_total or similar and using that in the denominator.

--

messi...@gmail.com

unread,
Jul 28, 2016, 3:51:50 AM7/28/16
to Prometheus Developers, messi...@gmail.com
Thanks for your quick response Brian.

But I don't think it's caused by the coverage of metrics_search_total, the result of expression for either numerator or denominator is correct when I use them separately.

The problem seems to be that I use the wrong way to count the increase number between the given time range. Grafana uses the http endpoint `query_range(https://prometheus.io/docs/querying/api/#range-queries)`, the way I count the total increased number is sum all the increased number in each minutes for the time range, but the `sum` action happens at Grafana side after it gets the response from prometheus. So I can't put the above two expressions into one query to do dividing due to the whole expression is evaluated by Prometheus. The result I get is a list of cache hit percentage of each minute for the given time range.

So do you have any ideas of how to fix this?

BR, Shizhz

zhz shi

unread,
Jul 28, 2016, 4:51:28 AM7/28/16
to Brian Brazil, Prometheus Developers
To reduce the length of response I reset the step to 1h and the time range is 1 day, and got the following results:

sum(increase(metrics_cache_hit_total[1h])):
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"0"],[1469466000,"0"],[1469469600,"0"],[1469473200,"0"],[1469476800,"0"],[1469480400,"0"],[1469484000,"0"],[1469487600,"0"],[1469491200,"0"],[1469494800,"0"],[1469498400,"13.01808066759388"],[1469502000,"1.0018592838542864"],[1469505600,"3.004172461752434"],[1469509200,"0"],[1469512800,"0"],[1469516400,"3.004172461752434"],[1469520000,"0"],[1469523600,"0"],[1469527200,"0"],[1469530800,"0"],[1469534400,"0"],[1469538000,"0"],[1469541600,"0"],[1469545200,"0"],[1469548800,"0"]]}]}}

sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h]))
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"0"],[1469466000,"0"],[1469469600,"0"],[1469473200,"0"],[1469476800,"0"],[1469480400,"0"],[1469484000,"0"],[1469487600,"0"],[1469491200,"0"],[1469494800,"0"],[1469498400,"28.03894297635605"],[1469502000,"24.056435176604047"],[1469505600,"8.011126564673157"],[1469509200,"0"],[1469512800,"0"],[1469516400,"5.006954102920723"],[1469520000,"0"],[1469523600,"0"],[1469527200,"2.0027816411682893"],[1469530800,"0"],[1469534400,"0"],[1469538000,"0"],[1469541600,"0"],[1469545200,"0"],[1469548800,"0"]]}]}}

sum(increase(metrics_cache_hit_total[1h])) / sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"NaN"],[1469466000,"NaN"],[1469469600,"NaN"],[1469473200,"NaN"],[1469476800,"NaN"],[1469480400,"NaN"],[1469484000,"NaN"],[1469487600,"NaN"],[1469491200,"NaN"],[1469494800,"NaN"],[1469498400,"0.46428571428571425"],[1469502000,"0.04164620719983645"],[1469505600,"0.375"],[1469509200,"NaN"],[1469512800,"NaN"],[1469516400,"0.6000000000000001"],[1469520000,"NaN"],[1469523600,"NaN"],[1469527200,"0"],[1469530800,"NaN"],[1469534400,"NaN"],[1469538000,"NaN"],[1469541600,"NaN"],[1469545200,"NaN"],[1469548800,"NaN"]]}]}}

Let's remove other info and all entries with both value 0 and compare the three list:
sum(increase(metrics_cache_hit_total[1h])):
List 1: [[1469498400,"13.01808066759388"],[1469502000,"1.0018592838542864"],[1469505600,"3.004172461752434"],[1469516400,"3.004172461752434"],[1469527200,"0"]]

sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
List 2: [[1469498400,"28.03894297635605"],[1469502000,"24.056435176604047"],[1469505600,"8.011126564673157"],[1469516400,"5.006954102920723"],[1469527200,"2.0027816411682893"]]

sum(increase(metrics_cache_hit_total[1h])) / sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
List 3: [[1469498400,"0.46428571428571425"],[1469502000,"0.04164620719983645"],[1469505600,"0.375"],[1469516400,"0.6000000000000001"],[1469527200,"0"]]

The value in the 3rd list is the quotient from the previous two lists with the same timestamp.

So let's say if I got a List 1 with value [a, b, c, d] and List 2 with value [A, B, C, D], what I want is (a + b + c + d) / (A + B + C + D), but with List 3 I got [a/A, b/B, c/C, d/D]




On Thu, Jul 28, 2016 at 3:55 PM, Brian Brazil <brian....@robustperception.io> wrote:
I doubt that's the problem, all the math is done on the Prometheus side.

What is not correct about the result you're seeing?

Brian
 

So do you have any ideas of how to fix this?

BR, Shizhz

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--



--
BR, Zhenzhong

Brian Brazil

unread,
Jul 28, 2016, 4:56:13 AM7/28/16
to zhz shi, Prometheus Developers
On 28 July 2016 at 09:51, zhz shi <messi...@gmail.com> wrote:
To reduce the length of response I reset the step to 1h and the time range is 1 day, and got the following results:

sum(increase(metrics_cache_hit_total[1h])):
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"0"],[1469466000,"0"],[1469469600,"0"],[1469473200,"0"],[1469476800,"0"],[1469480400,"0"],[1469484000,"0"],[1469487600,"0"],[1469491200,"0"],[1469494800,"0"],[1469498400,"13.01808066759388"],[1469502000,"1.0018592838542864"],[1469505600,"3.004172461752434"],[1469509200,"0"],[1469512800,"0"],[1469516400,"3.004172461752434"],[1469520000,"0"],[1469523600,"0"],[1469527200,"0"],[1469530800,"0"],[1469534400,"0"],[1469538000,"0"],[1469541600,"0"],[1469545200,"0"],[1469548800,"0"]]}]}}

sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h]))
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"0"],[1469466000,"0"],[1469469600,"0"],[1469473200,"0"],[1469476800,"0"],[1469480400,"0"],[1469484000,"0"],[1469487600,"0"],[1469491200,"0"],[1469494800,"0"],[1469498400,"28.03894297635605"],[1469502000,"24.056435176604047"],[1469505600,"8.011126564673157"],[1469509200,"0"],[1469512800,"0"],[1469516400,"5.006954102920723"],[1469520000,"0"],[1469523600,"0"],[1469527200,"2.0027816411682893"],[1469530800,"0"],[1469534400,"0"],[1469538000,"0"],[1469541600,"0"],[1469545200,"0"],[1469548800,"0"]]}]}}

sum(increase(metrics_cache_hit_total[1h])) / sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[1469462400,"NaN"],[1469466000,"NaN"],[1469469600,"NaN"],[1469473200,"NaN"],[1469476800,"NaN"],[1469480400,"NaN"],[1469484000,"NaN"],[1469487600,"NaN"],[1469491200,"NaN"],[1469494800,"NaN"],[1469498400,"0.46428571428571425"],[1469502000,"0.04164620719983645"],[1469505600,"0.375"],[1469509200,"NaN"],[1469512800,"NaN"],[1469516400,"0.6000000000000001"],[1469520000,"NaN"],[1469523600,"NaN"],[1469527200,"0"],[1469530800,"NaN"],[1469534400,"NaN"],[1469538000,"NaN"],[1469541600,"NaN"],[1469545200,"NaN"],[1469548800,"NaN"]]}]}}

Let's remove other info and all entries with both value 0 and compare the three list:
sum(increase(metrics_cache_hit_total[1h])):
List 1: [[1469498400,"13.01808066759388"],[1469502000,"1.0018592838542864"],[1469505600,"3.004172461752434"],[1469516400,"3.004172461752434"],[1469527200,"0"]]

sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
List 2: [[1469498400,"28.03894297635605"],[1469502000,"24.056435176604047"],[1469505600,"8.011126564673157"],[1469516400,"5.006954102920723"],[1469527200,"2.0027816411682893"]]

sum(increase(metrics_cache_hit_total[1h])) / sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1h])):
List 3: [[1469498400,"0.46428571428571425"],[1469502000,"0.04164620719983645"],[1469505600,"0.375"],[1469516400,"0.6000000000000001"],[1469527200,"0"]]

The value in the 3rd list is the quotient from the previous two lists with the same timestamp.

So let's say if I got a List 1 with value [a, b, c, d] and List 2 with value [A, B, C, D], what I want is (a + b + c + d) / (A + B + C + D), but with List 3 I got [a/A, b/B, c/C, d/D]

What you're looking for then is sum(increase(metrics_cache_hit_total[1d])) / sum(increase(metrics_search_total {action="SEARCH", status!~"402|100"}[1d])) as you looking for a number rather than a graph.

Brian



--

zhz shi

unread,
Jul 28, 2016, 5:28:04 AM7/28/16
to Brian Brazil, Prometheus Developers
1 day here is just an example, the time range is dynamic and there're many options on Grafana dashboard page. I guess this is a problem related to Grafana, I'm going to ask help from Grafana community.

Thanks for your help very much :-) 
--
BR, Zhenzhong

Brian Brazil

unread,
Jul 29, 2016, 7:00:30 PM7/29/16
to messi...@gmail.com, Prometheus Developers
I doubt that's the problem, all the math is done on the Prometheus side.

What is not correct about the result you're seeing?

Brian
 
So do you have any ideas of how to fix this?

BR, Shizhz

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Reply all
Reply to author
Forward
0 new messages