Inf buckets in Native Histograms

102 views
Skip to first unread message

Fabian Stäber

unread,
Sep 6, 2022, 5:50:37 AM9/6/22
to Prometheus Developers
Hi,

I'm working on an experimental native histogram implementation in client_java.

Looking at client_golang, it seems you can observe math.Inf(), and bucket index math.MaxInt32 is used to represent the Inf bucket.


I'm wondering how to represent the Inf bucket as a BucketSpan in protobuf.
Initially I set the offset to current index minus previous index, but obviously that doesn't work if the current index is MaxInt32.

Any ideas?


Fabian

Bjoern Rabenstein

unread,
Sep 7, 2022, 2:15:10 PM9/7/22
to Fabian Stäber, Prometheus Developers
On 06.09.22 02:50, 'Fabian Stäber' via Prometheus Developers wrote:
>
> Looking at client_golang, it seems you can observe math.Inf(), and bucket
> index math.MaxInt32 is used to represent the Inf bucket.
>
> https://github.com/prometheus/client_golang/blob/95cf173f1965388665dcb2a28971f35af280e3a5/prometheus/histogram.go#L589-L590
>
> I'm wondering how to represent the Inf bucket as a BucketSpan in protobuf.
> Initially I set the offset to current index minus previous index, but
> obviously that doesn't work if the current index is MaxInt32.
>
> Any ideas?

Yeah, very good question. And definitely something that needs to get
ironed out before coming up with a final spec for Native Histograms.

In practice, I think, observations of ±Inf will be irrelevant. The set
the sum of observations to ±Inf, too (or even to NaN if it was +Inf
before and then -Inf is observed or vice versa), thereby rendering the
sum useless.

My idea so far was to put observations of ±Inf and even NaN in no
bucket at all, let them "ruin" the sum of observations (setting it to
±Inf or NaN as appropriate), and increment the count of observations
as usual. In that way, the difference between observations in buckets
and observations in the count would account for all those
observations. The downside is that you cannot distinguish between the
three types of "weird" observations (+Inf, -Inf, NaN). On the other
hand, I don't think we should add a whole lot of costly plumbing
throughout the stack to store them separately.

From a completionist's perspective, observations of very large
positive or negative numbers should be treated similarly as very small
observations, i.e. adding an "overflow bucket" (or even two, for
negative and positive observations separately) similarly to the zero
bucket we already have.

The reason for not doing it so far is mainly pragmatic: While it is
easy to accidentally create values close to zero (may it come from
some calculation or from actual physical measurements), it is far less
likely (but not impossible, of course) to accidentally create numbers
with a very large absolute value of up to ±Inf.

This assumption might not hold, and that's exactly why the Native
Histograms are marked as experimental. We can still correct those
things if needed.
Yeah, that's weird. I filed
https://github.com/prometheus/client_golang/issues/1131 to investigate
more closely.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in
Reply all
Reply to author
Forward
0 new messages