In my quest to make accurate load generators in java i came up with something that could be useful. The idea is very simple to implement and I hope you can all provide your valuable feedback as you always do.
The issues one has to deal with in implementing an accurate load generator can be summed in this thread
Normally if you want to benchmark a piece of code you would make the call in a bracketed fashion like this
long testStartTime = System.nanoTime();
long testTime = System.nanoTime() - testStartTime;
Now imagine if you could remove those pesky nanoTime calls altogether and still provide accurate timing ? Here is how i think i would do it.
The thread performing the loop (We will call it THREAD-A) starts and does nothing else but:
1) wait (in a spin) for a shared long *volatile* variable ( between THREAD-A and THREAD-B) to be incremented. We call this the SEQ variable.
2) perform the doTheUberThingYouAreMeasuring();
3) go to 1)
Before entering the timed loop THREAD-A starts a background thread (That we will call THREAD-B). This timing thread does nothing else but :
1) sleep for a constant amount of time (that we call TIME-X) using parkNano .
2) increment the SEQ variable that is shared with the thread performing the loop (THREAD-A).
3) go to 1)
The obvious assumption here is that THREAD-A would perform the tested operation in a time lesser (or equal) to the expected TIME-X. IN a way both thread would sync and we would know that the operation we are trying to perform is *ALWAYS* lesser than TIME-X. No need to make percentiles. If that is the case then we are done. However we know this is not going to happen *ALL THE TIME*
So, the Solution then is for THREAD-A to record how far THREAD-B has gone ahead of itself by keeping track of the last position ( or sequence) that THREAD-B was located, Thread-A can determine that it is behind THREAD-B store the difference in an array , reset its current SEQ to that of THREAD-A and then continue like nothing happened. This array would contain the number of times THREAD-B was behind THREAD-A and by how much. From this array we can plot something like HdrHistogram for corrected or non corrected coordinated omission.
However, like Gil puts it "When a pause occurs outside of the tracked operation (and outside of the tracked
time window) no long latency value would be recorded, even though any requested
operation would be stalled by the pause". We can remedy this by making one System.nanoTime() right after THREAD-B wakes up from its sleep and record that in an array. By looking at this array (after the test) we can determine if a long pause happened during the test.
Even with calling System.nanotime once we are still guaranteed that the tested operation ( the SUT) will never be called faster than its expected latency. Which practically means that
1) We are still calling the SUT in fixed rate fashion
2) We are still not suffering from coordinated omission
Moreover, The thread taking the time stamps (THREAD-B) does not have to be in java even ! It can be written in c/c++ to provide (more) accurate timing if needed. If you want you can even start with a very long latency and start doing a binary search until the perfect latency per percentile is achieved. All one has to do is play with the the TIME-X variable
 The lock free spin time is considered as white noise !