promethues query explain plan

360 views
Skip to first unread message

mse...@gmail.com

unread,
Jul 9, 2018, 10:53:36 AM7/9/18
to Prometheus Users
Hello,
there is a way to track the prometheus explain plan like sql servers?

whats the best method to optimize queries in prometheus ?

Simon Pasquier

unread,
Jul 10, 2018, 3:49:17 AM7/10/18
to mse...@gmail.com, Prometheus Users
There's no query explain in Prometheus but if you set the stats parameter in the query request, you'll get some statistics:

$ curl 'localhost:9090/api/v1/query?query=up&stats=true' | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [...],
    "stats": {
      "timings": {
        "evalTotalTime": 9.0508e-05,
        "resultSortTime": 0,
        "queryPreparationTime": 5.3835e-05,
        "innerEvalTime": 2.7971e-05,
        "execQueueTime": 2.412e-06,
        "execTotalTime": 0.000100271
      }
    }
  }
}


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/8741d265-fecd-499d-b891-5ae344b51987%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

eldad marciano

unread,
Jul 10, 2018, 6:23:58 AM7/10/18
to Simon Pasquier, Prometheus Users
Ok thanks a lot

On Tue, Jul 10, 2018 at 10:49 AM Simon Pasquier <spas...@redhat.com> wrote:
There's no query explain in Prometheus but if you set the stats parameter in the query request, you'll get some statistics:

$ curl 'localhost:9090/api/v1/query?query=up&stats=true' | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [...],
    "stats": {
      "timings": {
        "evalTotalTime": 9.0508e-05,
        "resultSortTime": 0,
        "queryPreparationTime": 5.3835e-05,
        "innerEvalTime": 2.7971e-05,
        "execQueueTime": 2.412e-06,
        "execTotalTime": 0.000100271
      }
    }
  }
}


On Mon, Jul 9, 2018 at 4:53 PM, <mse...@gmail.com> wrote:
Hello,
there is a way to track the prometheus explain plan like sql servers?

whats the best method to optimize queries in prometheus ?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

eldad marciano

unread,
Jul 11, 2018, 12:27:18 PM7/11/18
to Simon Pasquier, Prometheus Users
Simon, can you tell a word or two about innerEvalTime and if you have any idea why it takes too long ?
stats":{"timings":{"evalTotalTime":6.500474902,"resultSortTime":0.000620076,"queryPreparationTime":0.021838381,"innerEvalTime":6.477876328,"resultAppendTime":0.000097736,"execQueueTime":0.000004624,"execTotalTime":6.500490179}}}}

here is the query i used 
sum (rate (container_memory_rss{pod_name=~"^.*"}[2m])) by (pod_name)



Simon Pasquier

unread,
Jul 12, 2018, 3:50:49 AM7/12/18
to eldad marciano, Prometheus Users
On Wed, Jul 11, 2018 at 6:27 PM, eldad marciano <mse...@gmail.com> wrote:
Simon, can you tell a word or two about innerEvalTime and if you have any idea why it takes too long ?
stats":{"timings":{"evalTotalTime":6.500474902,"resultSortTime":0.000620076,"queryPreparationTime":0.021838381,"innerEvalTime":6.477876328,"resultAppendTime":0.000097736,"execQueueTime":0.000004624,"execTotalTime":6.500490179}}}}

here is the query i used 
sum (rate (container_memory_rss{pod_name=~"^.*"}[2m])) by (pod_name)

It depends how many distinct values of pod_name you have. It's probably a lot if it takes that much time.
I'd also remove the "pod_name=~"^.*" selector.
Finally if you're not on Prometheus v2.3.x, I'd give it a try since the query engine had some improvements there.
 




On Tue, Jul 10, 2018 at 1:23 PM eldad marciano <mse...@gmail.com> wrote:
Ok thanks a lot

On Tue, Jul 10, 2018 at 10:49 AM Simon Pasquier <spas...@redhat.com> wrote:
There's no query explain in Prometheus but if you set the stats parameter in the query request, you'll get some statistics:

$ curl 'localhost:9090/api/v1/query?query=up&stats=true' | jq .
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [...],
    "stats": {
      "timings": {
        "evalTotalTime": 9.0508e-05,
        "resultSortTime": 0,
        "queryPreparationTime": 5.3835e-05,
        "innerEvalTime": 2.7971e-05,
        "execQueueTime": 2.412e-06,
        "execTotalTime": 0.000100271
      }
    }
  }
}


On Mon, Jul 9, 2018 at 4:53 PM, <mse...@gmail.com> wrote:
Hello,
there is a way to track the prometheus explain plan like sql servers?

whats the best method to optimize queries in prometheus ?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.

To post to this group, send email to prometheus-users@googlegroups.com.

eldad marciano

unread,
Jul 12, 2018, 6:50:29 AM7/12/18
to Simon Pasquier, Prometheus Users
I've around 10K distinct values on top of prometheus  2.2.1
go version 1.10
I tried got get rid of the pod_name filter and few other tweaks but it doesn't affect the performance 

there is a way to read how sort if the data read from memory or disk ?

Ok thanks a lot

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

eldad marciano

unread,
Jul 12, 2018, 8:02:12 AM7/12/18
to Simon Pasquier, Prometheus Users
Simon,
I've run small benchmark for prometheus versions 2.2.1 vs 2.3.1
using the same query with the same timestamp for start and end points, seems like it really have better performance
anyway, there is a way to log how many disk reads vs ram reads the query used ?

v2.3.1
3.359047-":{"timings":{"evalTotalTime":3.213843488,"resultSortTime":0.000617039,"queryPreparationTime":0.014059529,"innerEvalTime":3.199121233,"execQueueTime":0.000002213,"execTotalTime":3.213858898}}}}
3.413554-":{"timings":{"evalTotalTime":3.25778293,"resultSortTime":0.000587423,"queryPreparationTime":0.015315705,"innerEvalTime":3.241856526,"execQueueTime":0.000002381,"execTotalTime":3.257797143}}}}

v2.2.1
8.065635-":{"timings":{"evalTotalTime":6.76108759,"resultSortTime":0.000591011,"queryPreparationTime":0.027974497,"innerEvalTime":6.732404572,"resultAppendTime":0.000086805,"execQueueTime":0.000004065,"execTotalTime":6.761101748}}}}
8.224094-":{"timings":{"evalTotalTime":6.9350228210000004,"resultSortTime":0.000583043,"queryPreparationTime":0.025800544,"innerEvalTime":6.908517224,"resultAppendTime":0.000092759,"execQueueTime":0.000002647,"execTotalTime":6.935035496}}}}

Simon Pasquier

unread,
Jul 12, 2018, 8:15:21 AM7/12/18
to eldad marciano, Prometheus Users
On Thu, Jul 12, 2018 at 2:01 PM, eldad marciano <mse...@gmail.com> wrote:
Simon,
I've run small benchmark for prometheus versions 2.2.1 vs 2.3.1
using the same query with the same timestamp for start and end points, seems like it really have better performance

Thanks for reporting!

 
anyway, there is a way to log how many disk reads vs ram reads the query used ?

If your query covers the last 2 hours, it will only hit the write-ahead-log which is in RAM (that's my understanding at least, I might be wrong).
Otherwise it will hit block files that are mmap'ed to memory.
 
Ok thanks a lot

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.

To post to this group, send email to prometheus-users@googlegroups.com.

Alin Sînpălean

unread,
Jul 12, 2018, 8:36:45 AM7/12/18
to Prometheus Users
For one, the query plan is exactly defined by the AST (i.e. the result of parsing the expression). In the case of your query, it will first look up all series that match the selector; then compute the rate over the last 2 minutes for each (which may require unpacking quite a bit more than the last 2 minutes worth of samples, because of the way the TSDB works); and finally sum those results by pod_name. There is no fancy query plan optimization going on, as you would have with a DBMS.

Second, 10k distinct values isn't all that much, but you have to consider other dimensions too. I.e. do you have 10k series matching your selector, or do you have 10k different values for the pod_name label but many more matching series (because of other labels)?

And finally, if my guess is correct, container_memory_rss looks like a gauge (resident memory in use), so rate() is not the right function to use there. delta() might be more appropriate, but do you really care about the speed of change of memory utilization?

Cheers,
Alin.

eldad marciano

unread,
Jul 12, 2018, 11:05:25 AM7/12/18
to Alin Sînpălean, Prometheus Users
Thanks Alin,
yes we do care about the speed (granularity) and how often it changes.
delta runs better performance thats right, but the results is quite diffrent and shows negative numbers

also, we run 10K uniq pods with uniq pod names

You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/kZ2DvZnYHUA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

Brian Brazil

unread,
Jul 12, 2018, 11:14:39 AM7/12/18
to Alin Sînpălean, Prometheus Users
On 12 July 2018 at 13:36, Alin Sînpălean <alin.si...@gmail.com> wrote:
For one, the query plan is exactly defined by the AST (i.e. the result of parsing the expression). In the case of your query, it will first look up all series that match the selector; then compute the rate over the last 2 minutes for each (which may require unpacking quite a bit more than the last 2 minutes worth of samples, because of the way the TSDB works); and finally sum those results by pod_name. There is no fancy query plan optimization going on, as you would have with a DBMS.

Second, 10k distinct values isn't all that much, but you have to consider other dimensions too. I.e. do you have 10k series matching your selector, or do you have 10k different values for the pod_name label but many more matching series (because of other labels)?

And finally, if my guess is correct, container_memory_rss looks like a gauge (resident memory in use), so rate() is not the right function to use there. delta() might be more appropriate, but do you really care about the speed of change of memory utilization?

Yes, container_memory_rss is a gauge from cAdvisor. deriv() is generally what you want for how much a gauge is changing.

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9b5519a8-7e50-4cec-b320-35192fcd2f89%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Alin Sînpălean

unread,
Jul 12, 2018, 2:16:05 PM7/12/18
to mse...@gmail.com, Prometheus Users
With rate(), a decrease in memory usage from 5.2 GB to 5.1 GB will look like an increase of 5.1 GB. You could do clamp_min(deriv(...), 0) if all you want is to drop negative values.

Cheers,
Alin.

eldad marciano

unread,
Jul 16, 2018, 6:33:39 AM7/16/18
to Alin Sînpălean, Prometheus Users
ok Alin will try it.

May i ask a side question I found i have 2669 goroutines that means active goroutines or log of goroutines 
Uptime2018-07-16 10:11:24.207324902 +0000 UTC
Working Directory/prometheus
Number of goroutines2669
GOMAXPROCS40

Simon Pasquier

unread,
Jul 16, 2018, 7:50:46 AM7/16/18
to eldad marciano, Alin Sînpălean, Prometheus Users
You have 2669 active goroutines.

To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPVnq2Adu%3DvnQh6VXsazUS%2Bm%2BjX%3DYx1%3DBOGh8oaT4%2BZe-xdtaQ%40mail.gmail.com.

eldad marciano

unread,
Jul 16, 2018, 10:45:53 AM7/16/18
to Simon Pasquier, Alin Sînpălean, Prometheus Users
Is that tunable it has some pool to control it ?

To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

eldad marciano

unread,
Jul 16, 2018, 10:56:29 AM7/16/18
to Simon Pasquier, Alin Sînpălean, Prometheus Users
the clamp_min(deriv(...), 0)
also have long execution time...

there is way to fliter out some of the results set elements, to return only pod_name for instance instead of all set 
{container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0004f9bb_8779_11e8_83a7_fa163e4e9472.slice/docker-5292577c9320e819ef5247f33595b634e006e9d2f0f01b66f4ad0446229eb23a.scope",image="registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.10",instance="192.168.0.224:10250",job="kubelet",name="k8s_POD_sync-kx747_openshift-node_0004f9bb-8779-11e8-83a7-fa163e4e9472_0",namespace="openshift-node",pod_name="sync-kx747",service="kubelet"}



Simon Pasquier

unread,
Jul 16, 2018, 11:12:43 AM7/16/18
to eldad marciano, Alin Sînpălean, Prometheus Users
On Mon, Jul 16, 2018 at 4:45 PM, eldad marciano <mse...@gmail.com> wrote:
Is that tunable it has some pool to control it ?

Not that I know. Prometheus spawns as many goroutines as it needs, limiting this would mean dropping scrapes or web requests.
 

To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.

To post to this group, send email to prometheus-users@googlegroups.com.

emar...@redhat.com

unread,
Jul 16, 2018, 5:51:38 PM7/16/18
to Prometheus Users
Guys,
I've good news, we found the resolution or step size dramatically affect the query performance.
the default in prometheus graph resolution is 3s and In Grafana is 15s this is small change that significantly affect the performance of the query.

as long the step size increased it will speed up the results, most of the impact is around innerEvalTime, see the following:
step 15:
2018-07-16 17:44:50,240 - INFO - duration: 1.862589-":{"timings":{"evalTotalTime":1.778064077,"resultSortTime":0.007469064,"queryPreparationTime":0.349853693,"innerEvalTime":1.420712212,"execQueueTime":0.000001542,"execTotalTime":1.778074411}}}}
2018-07-16 17:44:52,327 - INFO - duration: 1.882672-":{"timings":{"evalTotalTime":1.805533145,"resultSortTime":0.007118621,"queryPreparationTime":0.335963486,"innerEvalTime":1.462436464,"execQueueTime":0.000001463,"execTotalTime":1.805540575}}}} 


step 5:
2018-07-16 17:47:00,519 - INFO - duration: 4.037879-":{"timings":{"evalTotalTime":3.8893069000000002,"resultSortTime":0.008419253,"queryPreparationTime":0.361821925,"innerEvalTime":3.519047432,"execQueueTime":0.000001234,"execTotalTime":3.889314616}}}}
2018-07-16 17:47:05,371 - INFO - duration: 4.309454-":{"timings":{"evalTotalTime":4.141177569,"resultSortTime":0.008191162,"queryPreparationTime":0.34855919,"innerEvalTime":3.784409001,"execQueueTime":0.000001366,"execTotalTime":4.141185931}}}}
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

emar...@redhat.com

unread,
Jul 16, 2018, 5:53:09 PM7/16/18
to Prometheus Users
also the impact between 5 to 3 is as step size is very huge..

Alin Sînpălean

unread,
Jul 17, 2018, 3:30:36 AM7/17/18
to Prometheus Users
With apologies for stating the obvious, doing fewer calculations (i.e. one for every 15 seconds) is faster than doing more (one for every 3 seconds). If I had to guess, about 5 times faster (minus some of the overhead, that may improve less than linearly).

Cheers,
Alin.

eldad marciano

unread,
Jul 17, 2018, 4:56:45 AM7/17/18
to Alin Sînpălean, Prometheus Users

Changing it produce different results
So if its about calculation we didn't expect the results to be constant


Eldad Marciano

unread,
Jul 17, 2018, 8:51:26 AM7/17/18
to eldad marciano, Alin Sînpălean, Prometheus Users
There other way to filter elements in timeseries beside sum? for instance let take the following input in order to filter pod_name
{container_name="POD",endpoint="https-metrics",id="/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0004f9bb_8779_11e8_83a7_fa163e4e9472.slice/docker-5292577c9320e819ef5247f33595b634e006e9d2f0f01b66f4ad0446229eb23a.scope",image="registry.reg-aws.openshift.com:443/openshift3/ose-pod:v3.10.10",instance="192.168.0.224:10250",job="kubelet",name="k8s_POD_sync-kx747_openshift-node_0004f9bb-8779-11e8-83a7-fa163e4e9472_0",namespace="openshift-node",pod_name="sync-kx747",service="kubelet"}

so we can do it by `sum by (pod_name) ...`
there is any other alternative to do that ? just to filter pod_name field 

Simon Pasquier

unread,
Jul 17, 2018, 9:32:23 AM7/17/18
to Eldad Marciano, eldad marciano, Alin Sînpălean, Prometheus Users
Aggregation operators are listed here:

I'm not sure you'll gain something as it means additional computation on the Prometheus side.

To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/kZ2DvZnYHUA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CADbKFw82UT8YNxc%2BdDOX7Z6n3rQ%3DXbu2DZycCpzAMU_tk0Ruzw%40mail.gmail.com.

Eldad Marciano

unread,
Jul 18, 2018, 6:40:05 AM7/18/18
to Simon Pasquier, eldad marciano, Alin Sînpălean, Prometheus Users
Ok, is that possible to rate(...sum()[range]) instead of sum(rate(...[range])) unforthuantly it didn't work, I'm trying to figure out if we can reduce the results and than rate it and gain some performance 

To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/kZ2DvZnYHUA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To post to this group, send email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages