JMeter and HdrHistogram Integration

Mark E. Dawson, Jr.

unread,

Dec 20, 2019, 4:18:12 PM12/20/19

to mechanical-sympathy

So our company is evaluating a set of messaging platforms, and we're in the process of defining non-functional requirements. In preparation for evaluating performance, I was considering suggesting JMeter since it appears to support testing messaging platforms (with several specific tutorials online). However, these tutorials show the Response Time by Percentile graphs from the tool, and they all appear to show evidence of CO.

Does anyone know if the latest versions include support for HdrHistogram either out-of-the-box or via extra configuration?

Gil Tene

unread,

Dec 21, 2019, 12:54:27 PM12/21/19

to mechanical-sympathy

On Friday, December 20, 2019 at 1:18:12 PM UTC-8, Mark E. Dawson, Jr. wrote:

So our company is evaluating a set of messaging platforms, and we're in the process of defining non-functional requirements. In preparation for evaluating performance, I was considering suggesting JMeter since it appears to support testing messaging platforms (with several specific tutorials online). However, these tutorials show the Response Time by Percentile graphs from the tool, and they all appear to show evidence of CO.
Does anyone know if the latest versions include support for HdrHistogram either out-of-the-box or via extra configuration?

Yeah, JMeter is a good tool for generating load, but not a good tool for reporting what client-experienced latency or response time behaviors would be if you care about things other than averages or medians. If you want an “ok approximation”, you can generate load at a constant rate from JMeter (non of those cool rammup or think time things) so that you *know* what the expected intervals between logged events are supposed to be, and then take the detailed logs from JMeter and post process them with some coordinated-omission correction tooling. e.g. jHiccup’s -f flag can be used to injest a stream of timestamp, latency tulle lines (rather than measure anything) and it’s -r parameter can then be used to control the expected interval. It will then produce a histogram log that is a good approximation corrected for coordinated omission (it is conservative: it may under correct, it will not over correct). (See https://github.com/giltene/jHiccup#using-jhiccup-to-process-latency-log-files). You can then plot those histogram logs using e.g. https://github.com/HdrHistogram/HistogramLogAnalyzer

Peter Booth

unread,

Dec 21, 2019, 2:21:35 PM12/21/19

to mechanical-sympathy

Mark,

I don't know anything about your use-case but the fact that you're posting on mechanical sympathy

suggests that latency is important to you. If you are talking about low latency messaging, such as that

found n electronic trading then you should forget JMeter. Depending upon the age of your infrastructure,

the use of lower latency NICs like Solarflare, Mellanox, low latency switches, TCP offload

you could be interested in latency measurements in the low numbers of microseconds.

This testing is harder than it sounds when you consider the impact messages of varying sizes, realistic

topologies, slow consumers, different reliability constraints, ...

It's easy to do this badly ( I have many times), and hard to do well.

I don't know your business context. If it's electronic trading and you are looking at

low latency messaging products (like 60East's AMPS, 29West lbm (now Informatica), Tibco's FTL,

IBM's low latency messaging (now Cofinity), Aeron, ZeroMQ, Solace) then this is a solved

problem. All the above come with benchmarking tools, but the best designed messaging benchmarks

that I have seen are those done by STAC Labs.

The best independent benchmarking that I have seen is that performed by STAC Labs.

They have benchmarked many of the products I listed. See https://stacresearch.com/stac-testing-tools

Hope this helps.

Feel free to email me directly if you want to chat.

Peter

peter_booth *at* me.com

Mark Dawson

unread,

Dec 21, 2019, 3:10:07 PM12/21/19

to mechanica...@googlegroups.com

Peter,

Yes, I do work in the HFT, and we do use low-latency infrastructure for exchange-facing infrastructure. However, the messaging platform deals with transferring pub/sub messages that can vary from 1KB to 1MB in size – that is decidedly *not* in the microsecond realm, even if it were on an Infiniband network communicating via ibverbs.

My interest is in response time measurements of various Message Bus products, so we’re talking millisecond scale for which tools like JMeter are more than enough. My issue is with the error-prone reporting with their Response Time Percentile Graph that I’ve noticed so far.

I think Gil’s suggestion gets me closer to what I thought I wanted. But the fact that I can’t make use of exponentially distributed request intervals, like what the new Precise Throughput Timer permits, makes me think we’ll have to craft our own (a fraught exercise all its own).

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/c84555b9-5ed1-42a9-a844-fe1204cdf16a%40googlegroups.com.

Faraz Babar

unread,

Dec 21, 2019, 3:10:25 PM12/21/19

to mechanica...@googlegroups.com

Correctly instrumented performance testing makes test subjects act like quantum particles. The act of observation changes the result. If you are only interested in black box testing at the boundary, I have had some luck with wrk2, but do keep in mind, there are so many different variables from message size to network config to proximity between nodes that precise accounting of all these variables in black box testing is easy to get wrong. And this does not even get into garbage collection or vm overheads if these variables are involved. Proceed with caution.

Sent from my iPhone

On Dec 21, 2019, at 12:21 PM, Peter Booth <pboo...@gmail.com> wrote:

--

Mark Dawson

unread,

Dec 21, 2019, 4:12:34 PM12/21/19

to mechanica...@googlegroups.com

Faraz,

I looked at wrk2 last week but it appears to only support HTTP.

To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/9956407B-D81B-4890-A0DA-B0210D997936%40gmail.com.

Wojciech Kudla

unread,

Dec 21, 2019, 5:36:17 PM12/21/19

to mechanical-sympathy

> However, the messaging platform deals with transferring pub/sub messages that can vary from 1KB to 1MB in size – that is decidedly *not* in the microsecond realm, even if it were on an Infiniband network communicating via ibverbs

I'm afraid I have to disagree with this statement.

Contemporary solutions (eg SolarFlare) are capable of exactly that. What's more - you can see sub-microsecond latency for small payloads (in synthetic benchmarks).

Single digit microsecond latency is actually expected these days for disseminating (market) data over a single hop in local network.

To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/5dfe7c18.1c69fb81.9b697.e43e%40mx.google.com.

Mark Dawson

unread,

Dec 21, 2019, 6:33:55 PM12/21/19

to mechanica...@googlegroups.com

Guys,

I think you're missing the point. Of course you can transfer small payloads with Solarflare/OpenOnload or IB/VMA in single digit microseconds in simple ping-pong tests. That is standard stuff in the industry in which I work.

This is NOT the kind of testing I'm doing. This Message Bus Load Testing involves simulating multiple clients serializing and sending medium to large messages from a Publisher Host (Host A) at predefined throughput levels to one or more Topics on a Message Broker (Host B), which then may itself perform more processing on the message(s) for replication or persistence purposes, and then sink to interested Subscribers on a separate host. This is far different than the kind of ping-pong tests you guys appear to be describing. This involves much more than kernel bypass, PCIE, NIC HW, serialization, and propagation delay from one hop to the next.

To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAHNMKAqd1X3SB2Cir7oOmwkOjaZ%2Bu73gRUes-h1hcKDCTxyKVw%40mail.gmail.com.

Faraz Babar

unread,

Dec 22, 2019, 2:54:45 AM12/22/19

to mechanica...@googlegroups.com

Two solutions to wrk2 being a http only tool:

1. wrk2 is open source and a very nicely laid out source structure. You can modify the source and replace the http bit with driver for your messaging platform (assuming there is a C binding available).

2. Write a simple web server (using vertx for example) and invoke your messaging subject from there. You will need to take care of instrumenting the server and accumulate the metrics so you can subtract the overhead. This approach also gives you the added benefit of being able to deploy multiple instances of the carefully tuned web server behind an aws elastic load balancer (or a physical one if you are not running this in AWS). This allows you to truly simulate the load without mistakenly reusing resources and connections from a single node running wrk2 with your own messaging driver).

You will also need to tune the os. Plus cpu pinning for workload and Ethernet IRQ (interrupt request) handling and a few other things like disk, connection pooling, tcp etc.

You are going to have fun either way, I am always excited about work like this :)

Sent from my iPhone

On Dec 21, 2019, at 2:12 PM, Mark Dawson <medaw...@gmail.com> wrote:

To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAFvqqVdALxzd6qsw3CVscCo19pBU_CTubX254bVY04qvJGCL5Q%40mail.gmail.com.

Peter Booth

unread,

Dec 23, 2019, 10:20:35 AM12/23/19

to mechanica...@googlegroups.com

I wonder whether we are talking past each other here, or if we are dealing with different contexts that we each have experience in?

I've jumped back and forth between electronic trading and high traffic web environments at shops that have performed at different levels.

The units of measure vary but the essential issues (coordinated omission, precision of measurements, use of inappropriate statistics, Heisenberg principle when measuring,

how and where to timestamp, impact of confounding variables) have appeared in all settings.

I am not a fan of JMeter for the reasons that you stated and because I have seen it report incorrect results in multiple settings.

I would choose any of gatling, wrk2, Tsung before it, even on a web project.

What does it mean when we say that a system "is or is not in the microsecond realm?"

I'm currently testing a system that uses a messaging platform, is deployed on 6 year old hardware (Sandy Bridge),

has nodes in different countries/datacenters, uses a variety of slow application protocols (RMI, json over HTTP)

has message sizes that vary from 1k to 1MB, and has persistence. The slowest operation in this system have latencies

in th erange 45ms to 500ms. Some operations are consistently 5 to 15ms. And some (not many) operations

are sub-millisecond There is no numa, kernel bypass, no sexy tech.

This sounds like it might be in the same ballpark as your system. millisecond precision timing is not sufficient here.

I am having a problem imagining a setting where it would be sufficient.

You received this message because you are subscribed to a topic in the Google Groups "mechanical-sympathy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mechanical-sympathy/CjlMFP00FRY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mechanical-symp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAFvqqVd4-Dw_Tq7bSiqLc4vf5GpA15v4gZtdqAJqcviAtdj%2Bzg%40mail.gmail.com.

Mark Dawson

unread,

Dec 23, 2019, 7:27:53 PM12/23/19

to mechanica...@googlegroups.com

So that we can put this thread to rest since Gil's initial response already closed out this query, my reply regarding "microseconds vs milliseconds" was in response to another responder who thought I was misrepresenting the scale of latency measurements achievable with Solarflare or Mellanox cards. My reply was that my aim is measuring Message Broker *response time*, for which most response time measurements come in at the millisecond range - each hop's NIC & PCIe latency is but a very small fraction of this whole thing.

It's really that simple, guys. My specific JMeter question was answered. And whether that answer works for me in the end will not be based on microsecond vs millisecond magnitudes, but based on whether I'm willing to forego simulating exponentially distributed interarrival rates just so that I'm able to correct for JMeter's CO effects. But that's outside the scope of *this* particular question.

Have a Merry Christmas, all!

To view this discussion on the web, visit https://groups.google.com/d/msgid/mechanical-sympathy/CAGYL54vt1W8Hmq0VWVoA2-_4vSUiZndQUZE1aZ39Aybq7OYLBg%40mail.gmail.com.

Reply all

Reply to author

Forward