Hello.
We use metrics to track database access times and similar stuff in our project. The monitoring logs are traced every minute to a file.
Recently we discovered strange logs (commented with -- below):
-- application doesn't have any load for several hours --
-- the following periodic traces represent the latest value of the histogramm (same trace for hours) --
16:26:55 type=HISTOGRAM, name=xxxxlatency, count=108288, min=1, max=137, mean=1.123272970726895, stddev=0.5904154125651903, median=1.0, p75=1.0, p95=2.0, p98=3.0, p99=4.0, p999=4.0
16:27:55 type=HISTOGRAM, name=xxxxlatency, count=108288, min=1, max=137, mean=1.123272970726895, stddev=0.5904154125651903, median=1.0, p75=1.0, p95=2.0, p98=3.0, p99=4.0, p999=4.0
16:28:55 type=HISTOGRAM, name=xxxxlatency, count=108288, min=1, max=137, mean=1.123272970726895, stddev=0.5904154125651903, median=1.0, p75=1.0, p95=2.0, p98=3.0, p99=4.0, p999=4.0
-- according to other logs and code analysis application now updates the histogram with a few values --
-- then the periodic trace changes to: --
16:29:55 type=HISTOGRAM, name=xxxxlatency, count=108292, min=1, max=2, mean=NaN, stddev=NaN, median=1.0, p75=1.0, p95=1.0, p98=1.0, p99=1.0, p999=1.0
Unfortunately we don't have traces of the numbers we provide to histogram.update.
But they are just time values (differences of subsequent calls of System.currentTimeMillis and System.nanoTimes in milliseconds).
We currently don't understand which (strange) long values we possibly could provide to histogram.update that could cause "mean" not to be a number any longer.
Is there any possibilities of a bug in histogram?
Version and class information:
* metrics-core-3.1.0
* We use histogram with ExponentiallyDecayingReservoir (which uses a WeightedSnapshot internally).
Any hints for us? - Thank you,
Michael