lock free based load generator

ymo

unread,

Jul 15, 2014, 5:21:55 AM7/15/14

to mechanica...@googlegroups.com

Hi All.

In my quest to make accurate load generators in java i came up with something that could be useful. The idea is very simple to implement and I hope you can all provide your valuable feedback as you always do.

The issues one has to deal with in implementing an accurate load generator can be summed in this thread

Normally if you want to benchmark a piece of code you would make the call in a bracketed fashion like this

long testStartTime = System.nanoTime();
doTheUberThingYouAreMeasuring();

long testTime = System.nanoTime() - testStartTime;

Now imagine if you could remove those pesky nanoTime calls altogether and still provide accurate timing ? Here is how i think i would do it.

The thread performing the loop (We will call it THREAD-A) starts and does nothing else but:

1) wait (in a spin) for a shared long *volatile* variable ( between THREAD-A and THREAD-B) to be incremented. We call this the SEQ[1] variable.

2) perform the doTheUberThingYouAreMeasuring();

3) go to 1)

Before entering the timed loop THREAD-A starts a background thread (That we will call THREAD-B). This timing thread does nothing else but :

1) sleep for a constant amount of time (that we call TIME-X) using parkNano .

2) increment the SEQ variable that is shared with the thread performing the loop (THREAD-A).

3) go to 1)

The obvious assumption here is that THREAD-A would perform the tested operation in a time lesser (or equal) to the expected TIME-X. IN a way both thread would sync and we would know that the operation we are trying to perform is *ALWAYS* lesser than TIME-X. No need to make percentiles. If that is the case then we are done. However we know this is not going to happen *ALL THE TIME*

So, the Solution then is for THREAD-A to record how far THREAD-B has gone ahead of itself by keeping track of the last position ( or sequence) that THREAD-B was located, Thread-A can determine that it is behind THREAD-B store the difference in an array , reset its current SEQ to that of THREAD-A and then continue like nothing happened. This array would contain the number of times THREAD-B was behind THREAD-A and by how much. From this array we can plot something like HdrHistogram for corrected or non corrected coordinated omission.

However, like Gil puts it "When a pause occurs outside of the tracked operation (and outside of the tracked time window) no long latency value would be recorded, even though any requested operation would be stalled by the pause". We can remedy this by making one System.nanoTime() right after THREAD-B wakes up from its sleep and record that in an array. By looking at this array (after the test) we can determine if a long pause happened during the test.

Even with calling System.nanotime once we are still guaranteed that the tested operation ( the SUT) will never be called faster than its expected latency. Which practically means that

1) We are still calling the SUT in fixed rate fashion

2) We are still not suffering from coordinated omission

Moreover, The thread taking the time stamps (THREAD-B) does not have to be in java even ! It can be written in c/c++ to provide (more) accurate timing if needed. If you want you can even start with a very long latency and start doing a binary search until the perfect latency per percentile is achieved. All one has to do is play with the the TIME-X variable

[1] The lock free spin time is considered as white noise !

Nathan Tippy

unread,

Jul 17, 2014, 2:23:05 PM7/17/14

to mechanica...@googlegroups.com

What if this were done with 3 threads?
A - the thing measured and its incremented count.
B - the fixed size step and its incremented count.
C - observer of A and B and results capture.

The hand off would be one-producer to one-consumer (as we have seen with disruptor is very fast) A To C and B to C

This would also give extra cycles in C for other work such as transmitting/recording/analysing the details

ymo

unread,

Jul 17, 2014, 3:54:06 PM7/17/14

to mechanica...@googlegroups.com

Hi Nathan.

I found that in a load simulator environments i need to run a set of tests in a tight loop as as fast as possible to not impact the actual test. As such, i only record minimal data that can be processed later. Later meaning between iterations. Once a test is completed both Thread-A and B can be used for processing the results captured before the second iteration ( to measure run-to-run variance) is started. This is obviously limited by the amount of ram and the number of measurements you can make.

If the test is running for a very long time as opposed to being a repeated load (maybe in a long running profiler ) then the requirement for a 3rd thread is needed. But then the speed at which C is able to catchup with the other two threads will become a limiting factor.They found in the disruptor that queues are either empty or full. So introducing more threads will only increase that contention.

Reply all

Reply to author

Forward