User threads are interrupted uninterruptibly (see /proc/interrupts) and pinning your thread/process to a particular CPU
does not guarantee that the OS won't run other threads on the same CPU, just that yours won't be run on others.
More importantly, however, in this case is the fact that System.nanoTime() is used to keep the CPU busy.
System.nanoTime() translates to clock_gettime(CLOCK_MONOTONIC) and that thing is far more complex than
just 'rdtsc' and a couple of arithmetic operations. For one thing, it occasionally makes system calls.
Not only those system calls take much longer to execute, you relinquish control to the kernel at a
well defined point so it may decide to run stuff it wouldn't during scheduling interrupts.
Anyway, all that to support a simple idea that after 5 seconds of "spinning" in System.nanoTime()
you can be sure that cached 'calculation', 'rand' and whatever 'rand' uses internally are long gone.
That is to say that when you measure your computation for the first time after busyPause()
you measure, among other things, how long it takes to read all those values from lower levels of
the memory hierarchy. My one-liner was intended to show that if we 'prefetch' the values the
measured time goes down significantly.
Whatever other theory you may have, it now has to provide an account for that change as well.
This is not a full solution to the mystery you started with but an important factor not to be
neglected.
-- Oleg