I am calling from C++ to Python code that has been compiled with Cython and timing the performance. For example
Cython code:
cdef void foo(int i):
do_something_with_i(i)
C++
for (int i = 0; i < 10000; i++) {
start_timer();
For my function foo it averages approximately 30 micro seconds per evaluation (using a high performance timer).
But when I slow the C++ code down by doing busy work, for example
double tmp = 0;
for (int i = 0; i < 10000; i++) {
start_timer();
foo(i);
end_timer();
for (int j = 0; j < niter; j++) tmp = exp(-tmp);
}
The average time jumps to around 200 micro seconds per evaluation! Note that the loop with the call to exp is outside of the timer.
It seems that the longest elapsed time for foo() remains constant as niter changes, but there are more and more cases that take the longest time as niter increases.
Any ideas why this might be happening?