Measuring latency of a function call

Nathan Fisher

unread,

Sep 22, 2017, 10:29:05 PM9/22/17

to mechanical-sympathy

Hi All,

I'm doing some work to profile Clojure's start-up performance so I can figure out where the most bang for buck improvements could be made.

I wanted to verify that what I've done to instrument the call cost is correct. I know benchmarking and am happy to take guidance from others.

In one of the classes I've created the following static atomic fields:

public static final AtomicLongArray latency = new AtomicLongArray(10000000);
public static final AtomicInteger index = new AtomicInteger(0);

Where latency is a preallocated array and the index is used to get the next available slot in that array.

I'm instrumenting the call sites as follows:

public static Var intern(Namespace ns, Symbol sym, Object root){
   long start = System.nanoTime();

   // interesting work
   Var v = intern(ns, sym, root, true);

   long finish = System.nanoTime();
   int i = RT.index.getAndIncrement(); // get next slot id
   RT.latency.set(i, finish - start); // set the work time in the slot

   return v;
}

Once the app shuts down I calculate the results as follows:

int count = RT.index.get();
System.out.println("Total Var.intern calls: " + count);
ArrayList<Long> al = new ArrayList<>();

for (int i = 0; i < count; i++) {
    al.add(i, RT.latency.get(i));
}

List<Long> l = al.stream().sorted().collect(Collectors.toList());
Long total = al.stream().reduce((a, b) -> a + b).get() / 1_000_000;

double pctl99 = 0.99 * count;
double pctl50 = 0.5 * count;

System.out.println(
        " Min: " + l.get(0) + "ns," +
        " Median: " + l.get((int)pctl50) + "ns," +
        " 99PCTL: " + l.get((int)pctl99) + "ns," +
        " Max: " + l.get(count - 1) / 1_000_000 + "ms," +
        " Total: " + total + "ms");

An example run yields the following:

Min: 162ns, Median: 907ns, 99PCTL: 30724ns, Max: 10ms, Total: 27ms

Is this an adequate way to capture the cost of calling a specific function?

Is there a way to improve the measurements (unfortunately JMH wouldn't be straightforward to use at this point)?

Is there any obvious pitfalls I'm overlooking with this approach (e.g. out-of-order execution, etc)?

Thanks in advance for any guidance.

Nathan

Kirk Pepperdine

unread,

Sep 23, 2017, 2:32:59 AM9/23/17

to mechanica...@googlegroups.com

Hi Nathan,

The old fashion way of measuring the performance of such a small piece of code was to add up the clock cycles needed to execute each instruction. I’d agree that this is much more difficult for Java than in order run times but the problem here is that the measure is so very close to the advertised resolution of the clock that it’s likely dominated by noise. I saw advertised in that if the timer being used here doesn’t have nano second resolution, the measure is only noise.

Kind regards,

Kirk

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Thompson

unread,

Sep 23, 2017, 9:58:20 AM9/23/17

to mechanical-sympathy

This approach to measurement is likely to skew the results. I'd start with perf record via perf-map-agent and then use flame graphs.

http://psy-lob-saw.blogspot.co.uk/2017/02/flamegraphs-intro-fire-for-everyone.html

Nathan Fisher

unread,

Sep 23, 2017, 10:51:52 AM9/23/17

to mechanical-sympathy

Thank-you I ran across an article by Brandon Gregg and was just starting to dig into honest-profiler. Looks like I'll spin up a linux box instead to use perf-map-agent.

http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html

On Sat, 23 Sep 2017 at 14:58 Martin Thompson <mjp...@gmail.com> wrote:

This approach to measurement is likely to skew the results. I'd start with perf record via perf-map-agent and then use flame graphs.

http://psy-lob-saw.blogspot.co.uk/2017/02/flamegraphs-intro-fire-for-everyone.html

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

- sent from my mobile

Peter Booth

unread,

Sep 24, 2017, 12:45:21 PM9/24/17

to mechanical-sympathy

Nathan,

You mentioned that it was clojure startup time that you want to improve. Is it a general "all clojure apps" issue or "our clojure apps?" What are typical times for the entire startup that you observe? What do the clojure apps actually do?

Some points:

Precision/noise:

As Kirk described, calling System.nanoTime() costs about 28 nanos on a one year old Haswell CPU. It just doesn't work to use it to measure operations that themselves take tens or hundreds of nanos.

Skewing

Martin Thompson alluded to how measurement can skew behavior of the underlying system. JMH can’t avoid the Heisenberg effect. Perf-map reduces Heisenberg cost because you are tracing from outside the process (but still on the host). Taking measurements out-of-band is the only way I know to avoid Heisenberg

Host issues

When you said "spin up a linux box" did you mean a physical box, not a VM or container?

I've had a bunch of consulting projects that were different variations on “performance issues that only occur in environment X or on hardware Y”. It common for people to assume “performance is relative. If this is a hotspot here it will be a hotspot here”

All of the points described here require that you have root access to physical hosts that are representative of your target hardware. In larger (and some small) shops this isn’t always easy to get.

On Saturday, September 23, 2017 at 10:51:52 AM UTC-4, Nathan Fisher wrote:

Thank-you I ran across an article by Brandon Gregg and was just starting to dig into honest-profiler. Looks like I'll spin up a linux box instead to use perf-map-agent.

http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html

On Sat, 23 Sep 2017 at 14:58 Martin Thompson <mjp...@gmail.com> wrote:

This approach to measurement is likely to skew the results. I'd start with perf record via perf-map-agent and then use flame graphs.

http://psy-lob-saw.blogspot.co.uk/2017/02/flamegraphs-intro-fire-for-everyone.html

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nathan Fisher

unread,

Sep 24, 2017, 3:48:40 PM9/24/17

to mechanica...@googlegroups.com

Hi Peter,

Apologies everyone if I'm polluting the mailing list it's not the typical latency question.

Thanks Peter inline below for answers to your questions.

On Sun, 24 Sep 2017 at 17:45, Peter Booth <pboo...@gmail.com> wrote:

Nathan,

You mentioned that it was clojure startup time that you want to improve. Is it a general "all clojure apps" issue or "our clojure apps?"

NF> Not just me, all Clojure apps. It takes about 700ms to load clojure.core from a fat jar and execute "(+ 1 2 3)" (e.g. 1 + 2 + 3). (see https://dev.clojure.org/display/design/Improving+Clojure+Start+Time)

What are typical times for the entire startup that you observe? What do the clojure apps actually do?

NF> Typically 10s of seconds except for new/small projects which are single-digit in seconds.

Some points:

Precision/noise:
As Kirk described, calling System.nanoTime() costs about 28 nanos on a one year old Haswell CPU. It just doesn't work to use it to measure operations that themselves take tens or hundreds of nanos.

NF> Thanks for the clarification I wasn't actually sure how long the methods were taking but it did give me insight to look elsewhere. Naively can I assume the approach is a useable albeit crude technique that could be applied where the latency is much larger (e.g. > 100us)? I was considering using a dynamic proxy with that kind of instrumentation to collect data but static methods present that. I also looked at AOP but the site was down.

Skewing

Martin Thompson alluded to how measurement can skew behavior of the underlying system. JMH can’t avoid the Heisenberg effect. Perf-map reduces Heisenberg cost because you are tracing from outside the process (but still on the host). Taking measurements out-of-band is the only way I know to avoid Heisenberg

NF> Yes I figured this would be an issue. I was instrumenting one method at a time so that only affected the caller and not the callee I was measuring. It was enough to identify that the method I was measuring at the time of the original e-mail might not yield a huge benefit. The attached Flame Graph generated with perf-map is what I was able to generate for the (+ 1 2 3) example. My possible mis-interpretation of the Flame Graph is that a significant amount of time is being spent in loading the classes and interpreting the byte-code (e.g. "Interpreter" is both wide and deep on the call stacks). When started there are around 2000 classes loaded. So I've started looking into seeing what about the class loading is slow. Some thoughts so far are:

zip compression level (0 appears to save 40-80ms, which is similar savings as when loaded from disk).
class load ordering (e.g. would loading based on a dependency graph help? would automatically loading a class from the jar as it's streamed past help? etc).
static field/execution blocks (Clojure employs heavy use of static initialisation and fields that could be deferred to after start-up in a dev scenario).
custom class loader (less inclined for this as it introduces another dependency to "get started").

Host issues

When you said "spin up a linux box" did you mean a physical box, not a VM or container?

NF> VM it's not something where I'm aiming to achieve us performance and a smooth latency curve rather just want to scratch an itch and see if I can make some improvements.

I've had a bunch of consulting projects that were different variations on “performance issues that only occur in environment X or on hardware Y”. It common for people to assume “performance is relative. If this is a hotspot here it will be a hotspot here”

All of the points described here require that you have root access to physical hosts that are representative of your target hardware. In larger (and some small) shops this isn’t always easy to get.

On Saturday, September 23, 2017 at 10:51:52 AM UTC-4, Nathan Fisher wrote:

Thank-you I ran across an article by Brandon Gregg and was just starting to dig into honest-profiler. Looks like I'll spin up a linux box instead to use perf-map-agent.

http://www.brendangregg.com/blog/2014-06-09/java-cpu-sampling-using-hprof.html

On Sat, 23 Sep 2017 at 14:58 Martin Thompson <mjp...@gmail.com> wrote:

This approach to measurement is likely to skew the results. I'd start with perf record via perf-map-agent and then use flame graphs.

http://psy-lob-saw.blogspot.co.uk/2017/02/flamegraphs-intro-fire-for-everyone.html

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
- sent from my mobile

--

You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

flamegraph-2461.svg

Shripad Agashe

unread,

Sep 25, 2017, 1:10:20 AM9/25/17

to mechanical-sympathy

Also you may want to have a look at JMH [1] and a presentation by Aleksey Shipilev on it

Shripad

[1] : http://openjdk.java.net/projects/code-tools/jmh/

[2]: https://www.youtube.com/watch?v=VaWgOCDBxYw

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
- sent from my mobile

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-sympathy+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Nathan Fisher

unread,

Sep 25, 2017, 11:05:11 PM9/25/17

to mechanical-sympathy

Thanks for the link to Alekseys video. I’ll give it a viewing!

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
- sent from my mobile

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
- sent from my mobile

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group.

To unsubscribe from this group and stop receiving emails from it, send an email to mechanical-symp...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward