[Feature/Proposal] Add histogram based trimmed mean (truncated mean).

140 views
Skip to first unread message

Jacques Bernier

unread,
Aug 3, 2024, 2:50:02 AM8/3/24
to Prometheus Developers
Tl;Dr; I'd like to implement histogram_trimmed_mean

"A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points." https://en.wikipedia.org/wiki/Truncated_mean 

Cloudwatch added support for trimmed mean. https://aws.amazon.com/about-aws/whats-new/2021/07/amazon-cloudwatch-supports-trimmed-mean-statistics/ There is also a few mention of why trimmed mean can be useful for teams here https://www.youtube.com/watch?v=_uaaCiyJCFA (skip the first 8 minutes)

I will use cloudwatch as an example. If you have a metric, you can calculate pXX.XX to get percentiles. p50, p99, or p99.9. This is similar to histogram_quantile(0.5, a_metric), histogram_quantile(0.99, a_metric) and histogram_quantile(0.999, a_metric). For trimmed mean, it looks very similar. tmXX.XX[:XX.XX]
tm99 is the with the last 1% removed and tm1:99 is the mean with the first 1% and the last 1% removed.

I envision histogram_trimmed_mean(lower, up, metric) working in a similar way.

tm99 -> histogram_trimmed_mean(0, 0.99, metric)
tm1:99 -> histogram_trimmed_mean(0.01, 0.99, metric)

Similar to how histogram_quantile works, histogram_trimmed_mean would use the values gathered in the different buckets and extrapolate the trimmed mean.

Another alternative would be if it was possible to do something similar to 
avg_over_time(histogram_quantilte(0.01, metric) < metric < histogram_quantilte(0.99, metric))

Ben Kochie

unread,
Aug 3, 2024, 2:55:51 AM8/3/24
to Jacques Bernier, Prometheus Developers
Seems like a nice function to me. 

I don't think this is complex enough to require a full design document.

I did a quick check of the issue tracker and can't find any previous proposals.

I would file a feature request issue: https://github.com/prometheus/prometheus/issues/new/choose

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/3a9deafc-1180-41f6-9aad-279e876dbc46n%40googlegroups.com.

Bjoern Rabenstein

unread,
Aug 7, 2024, 6:34:46 AM8/7/24
to Jacques Bernier, Prometheus Developers
On 02.08.24 21:21, Jacques Bernier wrote:
> Tl;Dr; I'd like to implement histogram_trimmed_mean
>
> "A truncated mean or trimmed mean is a statistical measure of central
> tendency, much like the mean and median. It involves the calculation of the
> mean after discarding given parts of a probability distribution or sample
> at the high and low end, and typically discarding an equal amount of both.
> This number of points to be discarded is usually given as a percentage of
> the total number of points, but may also be given as a fixed number of
> points." https://en.wikipedia.org/wiki/Truncated_mean

I vaguely remember that we discussed this before, but I cannot find
any reference to it right now.

Given that native histograms are so much better, I would focus on
implementing this for native histograms. (Trimming will almost always
involve interpolation, and that just gets horribly wrong with the low
resolution usually provided by classic histograms.)

My memory from the previous discussion was to "simply" implement the
`>` and `<` operator between native histograms and scalars/floats. It
would return a new histogram with all the observations below or above
the threshold given by the scalar removed. (This will be an estimate
in most cases, but given the generally high resolution of native
histograms, that's OK.)

In that way, you can already "trim" a histogram at a given
threshold. This will return a histogram containing only requests that
lasted longer than 100ms:

request_duration_seconds > 0.1

You can then combine this with other PromQL expressions to implement
trimming at percantages. The following will exclude the 25% shortest
requests:

request_duration_seconds > histogram_quantile(0.25, request_duration_seconds)

From there, you can use all the other tools to do something with the
returned histogram, e.g. calculating a mean or a median or whatever
you want.

--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email] bjo...@rabenste.in

Jacques Bernier

unread,
Aug 8, 2024, 12:14:05 PM8/8/24
to Bjoern Rabenstein, Prometheus Developers
Thanks to both of you. If we can implement the operations (> and <) on native histogram, that would be great snd solve the problem and possibly open up even more possibilities than just trimmed mean.

Any pointers as to how this can or should be implemented? I’ll file a feature request at the link provided.

Bjoern Rabenstein

unread,
Aug 9, 2024, 1:42:08 PM8/9/24
to Jacques Bernier, Prometheus Developers
On 08.08.24 09:13, Jacques Bernier wrote:
>
> Any pointers as to how this can or should be implemented? I’ll file a
> feature request at the link provided.

I filed https://github.com/prometheus/prometheus/issues/14651 based on
the discussion here.

Jacques Bernier

unread,
Aug 9, 2024, 5:05:30 PM8/9/24
to Bjoern Rabenstein, Prometheus Developers
Great! Thanks.
Reply all
Reply to author
Forward
0 new messages