jHiccup for low latency

208 views
Skip to first unread message

Janny

unread,
Jan 30, 2015, 7:42:00 PM1/30/15
to mechanica...@googlegroups.com
We want to use jHiccup to measure latency on market data deliver sources. As I understood jHiccup uses instrumentation and works as a java agent which may downgrade performance - correct me if I'm wrong. Has jHiccup designed to work in development environments or it also may be used in production for HFT?

Michael Barker

unread,
Jan 30, 2015, 8:42:56 PM1/30/15
to mechanica...@googlegroups.com
We (LMAX) run it in our performance environment and some of our production servers.  I has no measurable impact AFAICT.

Mike.

On 31 January 2015 at 13:42, Janny <winnie...@gmail.com> wrote:
We want to use jHiccup to measure latency on market data deliver sources. As I understood jHiccup uses instrumentation and works as a java agent which may downgrade performance - correct me if I'm wrong. Has jHiccup designed to work in development environments or it also may be used in production for HFT?

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gil Tene

unread,
Jan 31, 2015, 12:01:50 PM1/31/15
to
jHiccup does not instrument any code. The agent just runs a separate observer thread that records it's experience, and a logging thread that records data to a log file. The overall "cost" of a jHiccup agent is having a single (independent)  thread wake up ~1000 times per second and do pretty much no work at all: measuring time and recording data in a histogram (the wakeup itself is much more costly that the work performed on each wakeup). In addition to that a summary output line is generated into a log file once every 5 seconds (typical output rate is ~40-50 bytes/sec). Other than the tiny amount of CPU and IO consumed by these two actions (which generally use otherwise-idle time on the plentiful cpu cores available on most systems), there is no impact at all on any other threads, and no change to the way any code is executed, so no performance degradation occurs in application execution. When run with the "control mode" enabled (the -c option, which spawns a separate idle process that also runs jHiccup to produce a control hiccup log), which is highly recommended, this overall (extremely small) background cost doubles.

I know of several places (LMAX included, per @mikeb01's comment) that run it in production in very latency sensitive environments. The fact that jHiccup (with a control jHiccup log) allows you to reconstruct a hiccup distribution for any time range in the past makes it a very useful triage tool for isolating or ruling out parts of the system as causes of experienced glitches (e.g. was it the system as a whole that glitched, the JVM that glitched, or the actual application code/network that showed delays?).

However, it's important to note that jHiccup will probably not directly provide you with what you say you are looking for below: It's not a means for measuring latency, it's a means for measuring the disruption of execution in your platform. As such it can tell you what the best possible latency behavior you would have had is (if all processing paths had no latency at all and only glitches observed at the JVM or system level caused latency). You can think of jHiccup as a "best case" latency indicator: It provides a good sanity check for other metrics, as anything reporting better latency distribution (i.e. lower magnitudes at given percentiles) than jHiccup reports likely has a measurement problem. It is also good triage tool (figure out if latency behavior is caused/dominated by hiccups or by you code or network) and overall monitoring tool (see if your system or JVM is glitching, rather than wait for those glitches to hit you hard with actual transactions).

If what you want to do is analyze the latency on market data delivery sources, you can use HdrHistogram to record those latencies in much the way jHiccup uses it to record observed "time to do nothing" latencies. since jHiccup is ~700 lines of code (half of which is a big comment and argument parsing code), it serves as a good, simple example of how to do latency recording using HdrHistogram. You can record and produce similar histogram logs for your market data latencies, and plot them with the same tools and format as jHiccup uses to depict latency distribution by percentile and over time. 
Reply all
Reply to author
Forward
0 new messages