On 02.08.24 21:21, Jacques Bernier wrote:
> Tl;Dr; I'd like to implement histogram_trimmed_mean
>
> "A truncated mean or trimmed mean is a statistical measure of central
> tendency, much like the mean and median. It involves the calculation of the
> mean after discarding given parts of a probability distribution or sample
> at the high and low end, and typically discarding an equal amount of both.
> This number of points to be discarded is usually given as a percentage of
> the total number of points, but may also be given as a fixed number of
> points."
https://en.wikipedia.org/wiki/Truncated_mean
I vaguely remember that we discussed this before, but I cannot find
any reference to it right now.
Given that native histograms are so much better, I would focus on
implementing this for native histograms. (Trimming will almost always
involve interpolation, and that just gets horribly wrong with the low
resolution usually provided by classic histograms.)
My memory from the previous discussion was to "simply" implement the
`>` and `<` operator between native histograms and scalars/floats. It
would return a new histogram with all the observations below or above
the threshold given by the scalar removed. (This will be an estimate
in most cases, but given the generally high resolution of native
histograms, that's OK.)
In that way, you can already "trim" a histogram at a given
threshold. This will return a histogram containing only requests that
lasted longer than 100ms:
request_duration_seconds > 0.1
You can then combine this with other PromQL expressions to implement
trimming at percantages. The following will exclude the 25% shortest
requests:
request_duration_seconds > histogram_quantile(0.25, request_duration_seconds)
From there, you can use all the other tools to do something with the
returned histogram, e.g. calculating a mean or a median or whatever
you want.
--
Björn Rabenstein
[PGP-ID] 0x851C3DA17D748D03
[email]
bjo...@rabenste.in