any best practice on using limited le's for a given histogram

68 views
Skip to first unread message

rs vas

unread,
Sep 2, 2020, 3:38:44 AM9/2/20
to Prometheus Users
We are seeing an issue when we have used and generated a histogram with 69le's it created a unique time series per le. this metric has few labels for example: method, operation, identifier, ip

This single metric itself is generating a 17k unique time series from a single host and from all hosts its total time series is totalling up to 300k for this xxxxx_seconds_bucket series.

I was reading some best practices on defining buckets, we will have to consider SLO and all, but my questions are:
  • any good number we can cross when defining buckets for example not to define more than 10 le's.
  • any good number of max labels you can have on a single metric
  • any good number on the cardinality of a single time series at a given point? If I query the count(xxxxx_seconds_bucket) what is that number we should not cross?
I have a feeling like it is going to kill the Prometheus at some point, if we can't fix the number of le's. Any input is appreciated.

Example defined buckets: (looks to be a real concern here having 69 le's).
xxxxx_seconds_bucket{...="none",le="0.001",} 71849.0
xxxxx_seconds_bucket{...="none",le="0.001048576",} 72078.0
xxxxx_seconds_bucket{...="none",le="0.001398101",} 73083.0
xxxxx_seconds_bucket{...="none",le="0.001747626",} 73600.0
xxxxx_seconds_bucket{...="none",le="0.002097151",} 73943.0
xxxxx_seconds_bucket{...="none",le="0.002446676",} 74160.0
xxxxx_seconds_bucket{...="none",le="0.002796201",} 74399.0
xxxxx_seconds_bucket{...="none",le="0.003145726",} 74936.0
xxxxx_seconds_bucket{...="none",le="0.003495251",} 75109.0
xxxxx_seconds_bucket{...="none",le="0.003844776",} 75227.0
xxxxx_seconds_bucket{...="none",le="0.004194304",} 75336.0
xxxxx_seconds_bucket{...="none",le="0.005592405",} 75674.0
xxxxx_seconds_bucket{...="none",le="0.006990506",} 75885.0
xxxxx_seconds_bucket{...="none",le="0.008388607",} 75958.0
xxxxx_seconds_bucket{...="none",le="0.009786708",} 75981.0
xxxxx_seconds_bucket{...="none",le="0.011184809",} 75995.0
xxxxx_seconds_bucket{...="none",le="0.01258291",} 76004.0
xxxxx_seconds_bucket{...="none",le="0.013981011",} 76005.0
xxxxx_seconds_bucket{...="none",le="0.015379112",} 76008.0
xxxxx_seconds_bucket{...="none",le="0.016777216",} 76013.0
xxxxx_seconds_bucket{...="none",le="0.022369621",} 76033.0
xxxxx_seconds_bucket{...="none",le="0.027962026",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.033554431",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.039146836",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.044739241",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.050331646",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.055924051",} 76039.0
xxxxx_seconds_bucket{...="none",le="0.061516456",} 76042.0
xxxxx_seconds_bucket{...="none",le="0.067108864",} 76043.0
xxxxx_seconds_bucket{...="none",le="0.089478485",} 76044.0
xxxxx_seconds_bucket{...="none",le="0.111848106",} 76044.0
xxxxx_seconds_bucket{...="none",le="0.134217727",} 76044.0
xxxxx_seconds_bucket{...="none",le="0.156587348",} 76044.0
xxxxx_seconds_bucket{...="none",le="0.178956969",} 76044.0
xxxxx_seconds_bucket{...="none",le="0.20132659",} 76045.0
xxxxx_seconds_bucket{...="none",le="0.223696211",} 76045.0
xxxxx_seconds_bucket{...="none",le="0.246065832",} 76046.0
xxxxx_seconds_bucket{...="none",le="0.268435456",} 76046.0
xxxxx_seconds_bucket{...="none",le="0.357913941",} 76046.0
xxxxx_seconds_bucket{...,le="0.447392426",} 76057.0
xxxxx_seconds_bucket{...,le="0.536870911",} 76061.0
xxxxx_seconds_bucket{...,le="0.626349396",} 76061.0
xxxxx_seconds_bucket{...,le="0.715827881",} 76064.0
xxxxx_seconds_bucket{...,le="0.805306366",} 76085.0
xxxxx_seconds_bucket{...,le="0.894784851",} 76085.0
xxxxx_seconds_bucket{...,le="0.984263336",} 76086.0
xxxxx_seconds_bucket{...,le="1.073741824",} 76086.0
xxxxx_seconds_bucket{...,le="1.431655765",} 76086.0
xxxxx_seconds_bucket{...,le="1.789569706",} 76088.0
xxxxx_seconds_bucket{...,le="2.147483647",} 76116.0
xxxxx_seconds_bucket{...,le="2.505397588",} 76116.0
xxxxx_seconds_bucket{...,le="2.863311529",} 76116.0
xxxxx_seconds_bucket{...,le="3.22122547",} 76116.0
xxxxx_seconds_bucket{...,le="3.579139411",} 76116.0
xxxxx_seconds_bucket{...,le="3.937053352",} 76116.0
xxxxx_seconds_bucket{...,le="4.294967296",} 76116.0
xxxxx_seconds_bucket{...,le="5.726623061",} 76116.0
xxxxx_seconds_bucket{...,le="7.158278826",} 76116.0
xxxxx_seconds_bucket{...,le="8.589934591",} 76116.0
xxxxx_seconds_bucket{...,le="10.021590356",} 76116.0
xxxxx_seconds_bucket{...,le="11.453246121",} 76116.0
xxxxx_seconds_bucket{...,le="12.884901886",} 76116.0
xxxxx_seconds_bucket{...,le="14.316557651",} 76116.0
xxxxx_seconds_bucket{...,le="15.748213416",} 76116.0
xxxxx_seconds_bucket{...,le="17.179869184",} 76116.0
xxxxx_seconds_bucket{...,le="22.906492245",} 76116.0
xxxxx_seconds_bucket{...,le="28.633115306",} 76116.0
xxxxx_seconds_bucket{...,le="30.0",} 76116.0
xxxxx_seconds_bucket{...,le="+Inf",} 76116.0

Matthias Rampke

unread,
Sep 3, 2020, 2:50:01 AM9/3/20
to rs vas, Prometheus Users
For a (conservative) guide see this article. You have the right intuition – ~10 buckets is a good number.

You can go higher if you use high-powered machines (mostly: lots of RAM) for Prometheus, but you will run into increasing problems as either the cardinality of a single label, or the cardinality of a metric, gets large. Some of these you can work around by aggressively pre-aggregating in recording rules; otherwise it will be impossible to graph any reasonable timeframe.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAPs_AfjxPf5mOLb1s%2BtgFXfjAqcZ7uOyMT1WrX3n3KymYNp_4Q%40mail.gmail.com.

rs vas

unread,
Sep 4, 2020, 2:31:45 AM9/4/20
to Matthias Rampke, Prometheus Users
Thank you /MR for your input, very helpful!

Any recommendation/standards on a single Time Series - can't go beyond some number? For example:
metric_name(name, label1, label2, label3)
label1 can have - 10 different values
label 2 can have - 50 different values
label 3 can have - 100 different values
Total number of unique series generated from a single node for this is: 500,000 from single node.
if we have 1000 node - total of this unique series would be: 50 million(50,000,000)

This looks like not a great design to have too much cardinality on the label values. Basically number values on each label will have a greater impact on overall Prometheus performance and availability.

Is there any recommendation on - one single metric - labels combination - should not go beyond a number? for example: a metric on a single node should not generate 50000 unique timeseries on single scrape or something like that?

rs vas

Ben Kochie

unread,
Sep 4, 2020, 2:54:52 AM9/4/20
to rs vas, Matthias Rampke, Prometheus Users
My typical recommendation is that individual targets should try and keep their sum total metrics under 10k. So all metric/label combinations per target.

Brian Candler

unread,
Sep 4, 2020, 3:13:21 AM9/4/20
to Prometheus Users
Another guideline I have seen is no more than 2 million timeseries per prometheus server.  Beyond that, you should think about sharding (i.e. multiple prometheus servers, each scraping a subset of targets).

Bjoern Rabenstein

unread,
Sep 8, 2020, 4:56:49 PM9/8/20
to rs vas, Prometheus Users
On 02.09.20 00:38, rs vas wrote:
>
> • any good number we can cross when defining buckets for example not to
> define more than 10 le's.

It all really depends on your total cardinality. It's fine to create a
histogram with loads of buckets if that's only exposed on three
targets and has no further labels at all.

In your case, where you have many hosts _and_ partitioning by a bunch
of other labels with some significant cardinality, too, you really
have to be careful with the number of buckets.

A common pattern for something like HTTP request metrics is to have a
counter with many labels (like method, path, status code, ...) and
then a histogram for the request duration with no further labels (or
at least only a few with low cardinality). In that way, you cannot
calculate latency per status code and such, but it might be a good
compromise.

In different news, I'm working on ways to allow high-res histograms in
the future, see
https://grafana.com/blog/2020/08/24/kubecon-cloudnativecon-eu-recap-better-histograms-for-prometheus/
for a bunch of links to talks etc.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Rong Hu

unread,
Sep 15, 2020, 3:41:25 PM9/15/20
to Prometheus Users
We would love to learn more about the roadmap for histogram improvements and rough timeline / estimates for earliest GA. We are trying to standardize our metrics on Prometheus internally and have lots of DDSketch histograms to migrate. In the short term we plan to roughly translate existing DDSketch buckets to default histogram buckets. It would greatly incentivize migration internally if the feature gap is filled.
Thank you for doing this valuable work! 

Rong Hu
Airbnb

Aliaksandr Valialkin

unread,
Sep 15, 2020, 8:09:54 PM9/15/20
to Rong Hu, Prometheus Users
FYI, the following article is quite interesting re histograms - https://linuxczar.net/blog/2020/08/13/histogram-error/

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.


--
Best Regards,

Aliaksandr Valialkin, CTO VictoriaMetrics

Bjoern Rabenstein

unread,
Sep 23, 2020, 12:20:34 PM9/23/20
to Rong Hu, Prometheus Users
On 15.09.20 12:41, 'Rong Hu' via Prometheus Users wrote:
> We would love to learn more about the roadmap for histogram improvements and
> rough timeline / estimates for earliest GA. We are trying to standardize our
> metrics on Prometheus internally and have lots of DDSketch histograms to
> migrate. In the short term we plan to roughly translate existing DDSketch
> buckets to default histogram buckets. It would greatly incentivize migration
> internally if the feature gap is filled.

I'm afraid that's right now bottlenecked on me. I'm doing the research
and gathering all the considerations right now to write up a document,
which can then be discussed and finalized by the community, so that we
then can all work together to implement it in Prometheus.

Since I kept getting distracted by "real life" things like ringing
pagers or "small" work commitments in my day job, it's hard to give a
reliably ETA even for publishing that doc, beyond "as soon as
possible". I'm very sorry for that. In any case, expect a very careful
discussion of the new concepts to introduce, and then an experimental
implementation that will take its time before it finds its way into
released versions of Prometheus. This is hard to get right in a way
that won't require another change very soon so we have to be careful
in what we promise as production-ready and what's still experimental.

I shared my current state in my recent KubeCon talk:
https://www.youtube.com/watch?v=HG7uzON-IDM

It's very well possible (and desirable) that similar approaches in
other mentrics system, like the DDSketch histograms, can be converted
into the new Prometheus histograms. But it's too early to guarantee
that.

Bjoern Rabenstein

unread,
Sep 23, 2020, 12:39:56 PM9/23/20
to Aliaksandr Valialkin, Rong Hu, Prometheus Users
On 16.09.20 03:09, Aliaksandr Valialkin wrote:
> FYI, the following article is quite interesting re histograms - https://
> linuxczar.net/blog/2020/08/13/histogram-error/

Thanks for the pointer. The interpolation problem discussed there is
plainly a bug, if the blog post got that all right. I filed
https://github.com/prometheus/prometheus/issues/7970
Reply all
Reply to author
Forward
0 new messages