org.HdrHistogram questions

531 views
Skip to first unread message

Janny

unread,
Jan 28, 2015, 9:42:01 AM1/28/15
to mechanica...@googlegroups.com
Just started to use hdr Histogram, some usage is not clear also not so much documentation. here are my questions (thanks for help for answering them): 

a) What is the difference between AtomicHistogram and CouncurrentHistogram? I understand that in concurrent environment I need to use one of them - but which one I need to pick up in which scenario?

b) Why getStartTimestamp and getEndTimestamp in AbstractHistogram may be required?

c) What is the difference between DoubleHistogram, IntCountsHistogram and ShoftCountsHistogram. I can't just understand the world "resolution", I need to see clear picture with good simple example

d) Why I may need WriterReaderPhaser or SingleWriterDoubleRecorder and where to use it?

e) I want to record Disruptor (Ring buffer) queue size and I use simple Histogram class with recordValue(), but when I print output, it should values below 1 with decimal places. Why? Is HDR Histogram good to record disruptor size?

Thanks.

Gil Tene

unread,
Jan 29, 2015, 2:09:29 AM1/29/15
to mechanica...@googlegroups.com


On Wednesday, January 28, 2015 at 6:42:01 AM UTC-8, Janny wrote:
Just started to use hdr Histogram, some usage is not clear also not so much documentation. here are my questions (thanks for help for answering them): 

a) What is the difference between AtomicHistogram and CouncurrentHistogram? I understand that in concurrent environment I need to use one of them - but which one I need to pick up in which scenario?

The JavaDoc for each is specific about what they provide. AtomicHistogram provides safe multi-threaded recording in fixed sized (as opposed to autosized) histograms. It does not provide safe auto-sizing or value shifting support when multiple recording threads are involved. Also, neither AtomicHistogram nor CouncurrentHistogram support concurrent reading or querying of data (concurrent with the writers, that is). You'll want to use one of the Recorder variants if you need that. 
 
b) Why getStartTimestamp and getEndTimestamp in AbstractHistogram may be required?

Histograms support set/get of start/end timestamps associated with the histogram. This can be generically useful, but is most often used in logging. E.g. the Recorder class (and variants) will produce interval histograms with the start/end timestamps set to indicate the time interval covered by the histogram. And the HistogramLogWriter will interpret the start/end time stamps on the histogram when producing log output. The two are often used together (e.g. see jHiccup for an example).  
 
c) What is the difference between DoubleHistogram, IntCountsHistogram and ShortCountsHistogram. I can't just understand the world "resolution", I need to see clear picture with good simple example

The JavaDocs are quite specific for each. You may want to read the package Javadoc in detail, as well as the specific wording in the JavaDoc of each histogram variant.

All the histogram variants provide for recording of Values, and track the Counts of recorded values (to within the precision requested, e.g. to 3 decimal points).

Values:
Histogram, IntCountsHistogram, ShortCountsHistogram (as well as AtomicHistogram and ConcurrentHistogram, Recorder, etc.) all deal with integer (long) recorded values, regardless of the count fidelity. In contrast DoubleHistigram (and it's Concurrent variant and DoubleRecorder variants) deals with recorded values that are doubles.

Counts:
Histogram uses long counts (i.e. can track up to Long.MAX_VALUE value recordings, even within a single internal bucket with no overflow). IntCounts and ShortCounts variants use int and short counts respectively (supporting "only" up to Integer.MAX_VALUE and only up to Short,MAX_VALUE recordings before overflow, respectively). The (only) benefit on the Int and Short counts variants is reduced footprint. This matter when you are tracking 40,000 different histograms in a single process (as some do), but doesn't much matter when toy are only tracking tens or hundreds of histograms.
 
d) Why I may need WriterReaderPhaser or SingleWriterDoubleRecorder and where to use it?

WriterReaderPhaser is not really needed by common users of histograms. It just happens to have a game in HdrHistogram since it doesn't (yet) belong anywhere else, and may be generically useful as a synchronization primitive. See blog post here for details. Several classes in HdrHistogram do use WriterReaderPhaser internally though, and can be seen as good examples of using the synchronization primitive. E.g. see the implementation internals in Recorder and ConcurrentHistogram.

SingleWriterDoubleRecorder is a faster variant of DoubleRecorder that is only safe for use with a single writer (working concurrently with reader(s)). Very useful in (common) use cases where a single writer is updating values that need to be sampled and logged externally. Same goes for SingleWriterRecorder.
 
e) I want to record Disruptor (Ring buffer) queue size and I use simple Histogram class with recordValue(), but when I print output, it should values below 1 with decimal places. Why?

Not clear what you mean here. You may want to post the specific code you use for instantiating the histogram and for doing the output, as well as the output itself.
 
Is HDR Histogram good to record disruptor size?

Yes, Histograms are useful for recording all sorts of values: queue depth/size, latencies, color/intensity, etc. HdrHistograms are especially useful when the values recorded cover a wide dynamic range while maintaining a good relative precision across the range.


Thanks.

Janny

unread,
Jan 30, 2015, 7:37:44 PM1/30/15
to


On Thursday, January 29, 2015 at 3:09:29 PM UTC+8, Gil Tene wrote:

The JavaDoc for each is specific about what they provide.


Thanks, Gil. Java Doc is specific but in real life when there is no time and you need to find a solution as fast as possible and sell to the business in your organization, it will be great to have a top-level tutorial with many examples which will be possible just to copy and paste and adjust later on. Realistically we spent lots of time to understand how to use hdrHistogram because we needed to go through all java doc details and glue things together. Just a suggestions which I think will help to make this project widespread.

Two more questions I have: we want to print a histogram, but sometimes it doesn't have any values, but getMin() still return some strange large value - how to handle this situation? Also number of elements in the histogram around 2, but it doesn't return a mean, just NaN. Can't be specific now, because all my records at work, but thought to ask, maybe it is something well known.


Gil Tene

unread,
Jan 30, 2015, 9:00:13 PM1/30/15
to mechanica...@googlegroups.com

On Friday, January 30, 2015 at 4:37:44 PM UTC-8, Janny wrote:


On Thursday, January 29, 2015 at 3:09:29 PM UTC+8, Gil Tene wrote:

The JavaDoc for each is specific about what they provide.


Thanks, Gil. Java Doc is specific but in real life when there is no time and you need to find a solution as fast as possible and sell to the business in your organization, it will be great to have a top-level tutorial with many examples which will be possible just to copy and paste and adjust later on. Realistically we spent lots of time to understand how to use hdrHistogram because we needed to go through all java doc details and glue things together. Just a suggestions which I think will help to make this project widespread.

Two more questions I have: we want to print a histogram, but sometimes it doesn't have any values, but getMin() still return some strange large value - how to handle this situation? Also number of elements in the histogram around 2, but it doesn't return a mean, just NaN. Can't be specific now, because all my records at work, but thought to ask, maybe it is something well known.


I doubt that you are getting a NaN for a getMean on a non-empty histogram, but you would definitely be getting one on an empty one. The answers to things like Min/Max/Mean/StdDev on an empty histogram are [mathematically] undefined. There is no proper min or max (which is why in practice max was returning 0 and min was returning Long.MAX_VALUE). Similarly, there is no proper average or standard deviation for an empty set, which is why getMean() and getStdDeviation() properly return NaNs, because they are basically being asked to compute 0/0.

But since I agree that this may confuse people, I've pushed a change to make them all return 0 when the histogram is empty, and this will show up whenever I push out a new version of HdrHistogram to maven central. Note that 0 is not really any better than what is being returned right now. But it's not any worse either, so I don't mind plugging up the topic.
  


Reply all
Reply to author
Forward
0 new messages