LatencyUtils & LatencyStats

819 views
Skip to first unread message

Gil Tene

unread,
Nov 16, 2013, 12:30:36 AM11/16/13
to mechanica...@googlegroups.com
I just finished a first cut at LatencyUtils (which includes the LatencyStats class). LatencyStats is my attempt at a clean API for a magic, auto-coordinated-omission-correcting thing that can be dropped into the classic (and broken) way most in-process monitoring of latency is done. You can find it at http://latencyutils.github.io/LatencyUtils/

The problem LatencyStats is meant to solve is the following:

When processes internally measure and collect stats on operation latency, the code usually looks like some variant of the following:

long startTime = System.nanoTime();
doTheThingWeWantToMeasureLatencyFor();
statsBucket.recordLatency(System.nanoTime() - startTime);

There are two big issues with the above pattern:

1. When a large stall occurs during the measured operation, the resulting large latency will be recorded only once, ignoring the fact that other operations are likely pending outside of the timing window. As a result, the stats that will come out of the statsBucket will be highly skewed. This is classic coordinated omission

2. If a stall occurs outside of the timing window, even the single large latency will not be recorded.

While #1 can potentially be corrected, e.g. a stats bucket that uses HdrHistogram can correct for it given a known or estimated expected interval, #2 is harder to deal with, since missing the large result altogether leaves us with no indication that there is something to correct.

LatencyStats deal with these two problems simultaneously by employing a pause detector and an interval estimator. It uses both under the hood such that a latencyStats object can simply fit right into the code pattern above, and produce significantly more accurate/realistic histograms that account for stalls experienced by the system.

The code is fairly young (a week or so), and has not been beaten up much yet, so approach with caution. Feedback and flames are welcome.

Enjoy.

ryan rawson

unread,
Nov 16, 2013, 2:30:39 AM11/16/13
to mechanica...@googlegroups.com
Hey Gil,

That looks cool. I look forward to trying it out. 

What might be the equivalent with c-based non gc languages? 
--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Gil Tene

unread,
Nov 16, 2013, 10:15:32 AM11/16/13
to <mechanical-sympathy@googlegroups.com>, mechanica...@googlegroups.com
The detection and correction logic would be the same regardless of language. It deals with any process wide or system wide stalls regardless of cause. GC is obviously a big reason for such stalls, but systems naturally exhibit other stalls due to scheduling contention, swapping/paging, power management, and other underlying things like hypervisors, all of which can produce similar issues without GC. The LatencyStats logic compensates for any situation where runnable threads were not getting to run.

A key thing that this logic does not try to deal with or correct (yet?) is the effect of long operations that stall the processing pipe of the specific operation without a detectable pause. E.g. When the operation logic actually blocks for a long time for some reason, so that the effect is not detectable by other runnable threads. Such situations can only occur within the timing window (so a single long operation would be logged), but they can cause pending operations to be queued up outside the timing window without reflecting their longer latencies in the stats.

So LatencyStats corrects for stalls that occur outside if the operation logic, but not for ones caused directly by the logic itself. Adding that will take some more thought.

But the current correction is already showing 2-3 order of magnitude effects in real systems (think HFT), where the detected able stalls clearly dominate percentiles above 99%.

Sent from my iPad
You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/oZSv5QnpAYs/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

Justin Mason

unread,
Nov 16, 2013, 4:50:26 PM11/16/13
to mechanica...@googlegroups.com
This looks great -- can't wait to give it a go.

What kind of performance impact does that pause detector implementation impose, if any?

--j.


--

Gil Tene

unread,
Nov 17, 2013, 1:59:11 AM11/17/13
to <mechanical-sympathy@googlegroups.com>
Pause detectors are pluggable, and run in the background. I.e. the recording path does not involve any interaction with the pause detectors. As far as overhead, this will depend on the pause detector choice and configuration. E.g. the default SimplePauseDetector uses three threads to establish pause consensus, with each thread sleeping and waking up 1,000 times per second. They'll detect pauses larger than 1 msec, and won't present much of a burden on modern servers. However, you can choose to use spinning threads instead of sleeping ones, and burn cores to do so. It appears that with 3 spinning threads, you can detect pauses as short as ~50usec with fairly good reliability. But I wouldn't waste resources on that unless you are in the high frequency game (e.g. 20K+ things per second handled by a single thread).

-- Gil. 

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/oZSv5QnpAYs/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to mechanical-symp...@googlegroups.com.

pron

unread,
Nov 19, 2013, 10:09:20 AM11/19/13
to mechanica...@googlegroups.com
Thank you, Gil. Will give it a try. Can it be used concurrently?

Also, as per our previous discussion, I've implemented a very simple reservoir for the Metrics Timer, that doesn't address coordinated omission, but does not use sampled measurements. It tracks the precise max and min latency, and an approximate mean (up to concurrency effect as the sum and the count are not updated atomically). It is available here.

Gil Tene

unread,
Nov 19, 2013, 8:19:31 PM11/19/13
to mechanica...@googlegroups.com
Yes. It's intended to support non blocking multithreaded recording into LatencyStats objects.

Latency recordings are done into AtomicHistograms. Pause corrections are done on a separate thread. Interval histograms are swapped without blocking the recorders, with the intervals sampler (run on a background timer thread) taking the synchronization burden and ordering against atomic updates down by the recorders. Interval and accumulated histograms are provided to callers as copies of non-moving data (non atomic, but synchronized against interval swaps such that the copy is always made from a non-chaging histogram).

-- Gil.

tm jee

unread,
Jan 24, 2014, 8:30:15 AM1/24/14
to mechanica...@googlegroups.com
Hi guys, 


LatencyStates.Builder b = LatencyStates.Builder.create();
// do configuration on b
LatencyStates s = b.build();

// then in our program
s.recordLatency(...);


// after we are done
Histogram h = s.getAccumulatedHistogram();


What is the simplest way to print out the histogram in console?

tia, Cheers

Gil Tene

unread,
Jan 24, 2014, 4:11:53 PM1/24/14
to mechanica...@googlegroups.com

h.getHistogramData().outputPercentileDistribution(System.out, 5, 1);

Reply all
Reply to author
Forward
0 new messages