More granular query performance metrics

Moad Zardab

unread,

Nov 24, 2021, 7:55:46 AM11/24/21

to Prometheus Developers

Hello all,

TL;DR: measuring `http_request_duration_seconds` on the query path is a bad proxy for query latency as it does not account for data distribution and number of samples/series touched by a query (both of which have significant implications on the performance of a query)

---

I'm exploring more granular performance metrics for prom queries downstream in Thanos (inspired by this discussion from Ian Billet) and wanted to reach out to the Prometheus developer community for ideas on how people are measuring and tracking query performance systematically.

The aim is to create a new metric that captures these additional dimensions with respect to the query to better understand/quantify query performance SLI's with respect to number of samples/series touched before a query is executed.

The current solution I have arrived at is crude n-dimensional histogram, where query_duration is observed/bucketed with labels representing some scale (simplified to t-shirt sizes) of samples touched and series queried. This would allow me to query for query_duration quantiles for some ranges of sample/series sizes (e.g. 90% of queries for up to 1,000,000 samples and up to 10 series complete in less than 2s)

I would love to hear about other approaches members of the community have taken for capturing this level of performance granularity in a metric (as well as stir the pot wrt the thanos proposal).

Thanks,

Moad.

Darshan Chaudhary

unread,

Nov 24, 2021, 9:52:36 AM11/24/21

to Prometheus Developers

During query evaluation, Prometheus tracks the current samples held in memory at evaluator.currentSamples. This might be a good proxy for the "work" that Prometheus had to do to get the query result?

Brian Brazil

unread,

Nov 24, 2021, 10:15:03 AM11/24/21

to Darshan Chaudhary, Prometheus Developers

On Wed, 24 Nov 2021 at 14:52, Darshan Chaudhary <death...@gmail.com> wrote:

During query evaluation, Prometheus tracks the current samples held in memory at evaluator.currentSamples. This might be a good proxy for the "work" that Prometheus had to do to get the query result?

That's memory usage, not work done. There was https://github.com/prometheus/prometheus/pull/6890 to track samples touched which should be a good proxy (I use 10M/s as my rule of thumb), waiting to make sure the performance hit is negligable.

Brian

On Wednesday, 24 November 2021 at 18:25:46 UTC+5:30 mza...@redhat.com wrote:
Hello all,

TL;DR: measuring `http_request_duration_seconds` on the query path is a bad proxy for query latency as it does not account for data distribution and number of samples/series touched by a query (both of which have significant implications on the performance of a query)

---

I'm exploring more granular performance metrics for prom queries downstream in Thanos (inspired by this discussion from Ian Billet) and wanted to reach out to the Prometheus developer community for ideas on how people are measuring and tracking query performance systematically.

The aim is to create a new metric that captures these additional dimensions with respect to the query to better understand/quantify query performance SLI's with respect to number of samples/series touched before a query is executed.

The current solution I have arrived at is crude n-dimensional histogram, where query_duration is observed/bucketed with labels representing some scale (simplified to t-shirt sizes) of samples touched and series queried. This would allow me to query for query_duration quantiles for some ranges of sample/series sizes (e.g. 90% of queries for up to 1,000,000 samples and up to 10 series complete in less than 2s)

I would love to hear about other approaches members of the community have taken for capturing this level of performance granularity in a metric (as well as stir the pot wrt the thanos proposal).

Thanks,

Moad.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/a392a3c9-21b0-4174-9219-53cda79de0f1n%40googlegroups.com.

--

Brian Brazil

www.robustperception.io

Reply all

Reply to author

Forward