The Future of Classic Histograms; Moving to (Custom) Native Histograms?

Bartłomiej Płotka

unread,

Jun 10, 2024, 5:49:42 AMJun 10

to Prometheus Developers, Bjoern Rabenstein

Hi!

I don't know if we ever discussed this in the bigger forum, so let's start: I would like to check the team and community feelings around the idea of replacing and deprecating classic histograms in some (very) distant future with the native histograms (with different bucketing options). This is not a formal proposal, but rather initial thoughts for this vision and intention.

Why?

Classic histograms are one of the main sources of cardinality, on top of other inefficiencies. This is because each bucket is a separate, unique counter series that has to be indexed and represented everywhere (e.g. exposition format, query response, remote write etc). They are not "sparse" (if you had zero observation in one bucket you still pay for it) and they are always floats. This causes more cost, but also in return lower accuracy of histograms, because users have to cap their resolution to minimum.
They are generally not usable without series scrape transactionality. That transactionality is non trivial to support across the Prometheus server, especially when integrating with the remote storage.
They only support a manual, "custom" (explicit) bucketing.
Aggregating over time/labels for histograms with different buckets is impossible.

How?

Native histograms (the design is here, also on Grafana blog) were designed and implemented almost everywhere now, solving all 4 above problems. In terms of bucketing, native histograms were implemented with the sparse, exponential bucketing. Generally this is what I see as the most efficient and recommended bucketing going forward for any new histograms in the Prometheus ecosystem.

Two main friction point in migrating to native histograms:

A) Native histograms come with some minor changes to how you query them via PromQL (you don't need "_bucket" suffix, there are also new functions e.g. histogram_avg, histogram_count and histogram_sum, histogram_fraction).

B) New bucketing can, in theory, break some users that use classic histograms now. It also might not fit some use cases (?)

For B, luckly, native histograms were designed to allow flexible bucketing "schema" logic. As a result, we recently proposed and started implementing the "custom native histogram", which is a shorthand for the native histograms with custom bucketing schema (also sometimes referred to as nhcb in the code and PRs). This gives a "classic histogram" bucketing model, while keeping many efficiencies and transactionality benefits (solves 1 to 3 problems above).

With custom native histograms it's trivial to add Prometheus modes that will translate ALL classic histograms to custom native ones on storage to get those benefits now. Prometheus can hide this fact and translate those back (to some degree) on the PromQL layer (for A). Similarly converted histograms can be reliably (in terms of transactionality) and efficiently forwarded via remote write (Prometheus Remote-Write 2.0 now supports custom native histograms!). The receivers can choose to interpret those as native histogram or translate to classic histograms for PromQL compatibility.

Custom native histograms are also great for migration purposes to native histogram PromQL semantics too if one chooses so.

Note that custom native histograms are NOT the end goal. Ideally most users migrate to native histograms with sparse, exponential histograms for more efficiency gains. On top of that the main disadvantage of custom bucketing over exponential is, similar to classic histogram, querying or aggregating histograms with different bucketing is simply wrong (4th problem above).

Potential Future

The obvious question is.. do we still need classic histograms? I don't think so, but wonder what I am missing?

Would you agree that the long term plan is to replace and depreate classic histograms? Can we at least share that intention?

Another question is migration logic, that might be a separate discussion. So far we discussed things like emitting both classic histograms and native histograms, but it feels like an overhead. Can we do better? Can we solve problem 1 (efficiency) straight away? Why not use custom native histograms for every classic histogram and then have compatibility modes on the query side if needed?

I am sure @Bjoern Rabenstein, @Jeanette and @krajorama already had some thoughts around this.

Kind Regards,

Bartek Płotka (@bwplotka)

Fabian Stäber

unread,

Jun 10, 2024, 6:05:03 AMJun 10

to Prometheus Developers

Thanks Bartek for bringing this up.

For context, here's what the Prometheus Java client library does since release 1.x:

Internally, the client library tracks both the classic and the native histogram representation by default. The "performance" section in the docs recommends turning one of them off for high performance applications. https://prometheus.github.io/client_java/getting-started/performance/
If the Prometheus server scrapes text format, it will get the classic histogram.
If the Prometheus server scrapes the Protobuf format, it will get both the classic and the native histogram.

Fabian

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAMssQwY0AHT4sUh3zOdYd8t4G-o7QXnW9F8hm-MJu8_iJPxN6A%40mail.gmail.com.

Bartłomiej Płotka

unread,

Jun 10, 2024, 6:36:17 AMJun 10

to Prometheus Developers

Nice! That approach does not feel too bad. In practice sending both should not "add" a lot of overhead (one extra series to each histogram). We might reuse your approach here in client_golang too.

However, my point is that with custom native histograms, we can immediately "improve" performance, by default (e.g. when proto is used). Thoughts?

Kind Regards,

Bartek

Bartłomiej Płotka

unread,

Jun 10, 2024, 10:32:42 AMJun 10

to Prometheus Developers

Fabian

> Internally, the client library tracks both the classic and the native histogram representation by default. The "performance" section in the docs recommends turning one of them off for high performance applications. https://prometheus.github.io/client_java/getting-started/performance/

BTW, what exponential factor do you use for those "default" native histograms from classic ones? Do you take into consideration any aspect of the currently defined buckets?

Kind Regards,

Bartek

Bjoern Rabenstein

unread,

Jun 12, 2024, 9:33:45 AMJun 12

to Bartłomiej Płotka, Prometheus Developers

Here is my idea for a deprecation plan:

1. On the server side, including PromQL:

A future PromQL server should still be able to recognize classic
histograms when scraped, but only to convert them to native histograms
with custom buckets (NHCB). From that point on (storage, query, remote
write), it shoud have no notion of classic histograms anymore.

This is also a reason why I think a "reverse transparency" layer to
query NHCB as if they were classic histograms is undesirable. We want
to get rid of the query patterns for classic histograms. (Another
reason is that it will be really hard to implement such a "reverse
transparency" layer reliably.)

2. On the instrumentation side, including exposition formats:

In direct instrumentation, if you need a histogram, you should default
to a native histogram with a standard exponential schema (which
guarantees mergeability across time and space and is compatible with
OTel's eponential histogram). Only if you need custom bucket
boundaries for some reason, you should use an NHCB. Generally, I think
the libraries can even keep their existing API for that. If you
instrument a histogram in the classic way, the API doesn't actually
tell you that this will result in a classic histogram. It just asks
you for the bucket boundaries, and then you call `Observe` as you do
for a native histogram, too. Hence, future libraries can just expose
an NHCB in that case.

If you translate a histogram from a 3rd party source, you use a
suitable flavor of native histograms as the translation target. In the
unlikely case that the source histogram fits the exponential bucketing
schema, use a regular exponential native histogram. For specific types
of 3rd party histograms (e.g. DDSketch, but there are many more), we
might implement additional schemas of native histograms that directly
accommodate them. And finally, if nothing else fits, you do NHCB.

The exposition formats of the future should be able to represent all
flavors of native histograms, so that we don't need to expose classic
histograms anymore.

_However_, the existing Prometheus exposition formats are so
ubiquitious by now, that I don't think they will ever die. For as long
as technically feasible, Prometheus servers should be able to
understand old exposition formats. Which circles back to the very
beginning: Any Prometheus server should still understand classic
histograms, but convert them into NHCB on scrape.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Reply all

Reply to author

Forward