Weird performance issues with odeint/custom integrator

1 view
Skip to first unread message

Frank Hellmann

unread,
Feb 3, 2016, 8:01:30 AM2/3/16
to Numba Public Discussion - Public
Hi everyone,

I am writing a simple ode solver/time stepper in numba. The right hand side functions we will eventually have to deal with will be quite involved in terms of logic, delays, stochastic inputs etc, so nothing that will be doable in pure numpy. On the other hand that will anyway force us to use a fixed time step solver. My hope was that we could keep things reasonably fast by compiling the right hand side as well as the solver using numba. A first test of that strategy runs into some weird performance characteristics though:

L1 norm of the differences of the two right hand side functions on 100 random drawn vectors: 0.0
1000 x right_hand_side_numba: 0.0154027938843
1000 x right_hand_side_numpy: 0.0550789833069
Time taken to compile ode system: 0.469746112823
Time taken to integrate numpy rhs using odeint: 0.125674962997
Time taken to integrate numba rhs using odeint: 1.17314887047
Time taken to integrate using ode_compiled: 0.166043996811

So the numba compiled rhs is faster than the numpy one, good. The two functions also produce exactly the same numerical output.
But integrating the slower numpy function with odeint is faster than integrating the compiled function, by a factor of 10!

My completely trivial time stepper almost gets me back to the speed of odeint/numpy (mostly a bit slower, occasionally a sliver faster).
But I am severely puzzled how odeint can beat the completely trivial solver (which is surely much less accurate).

Does anyone have any insights into why this could possibly happen?

I have attached the script that produces the above numbers. The two important functions in the script are:


rhs_gen:
produces the two right hand side functions we are testing. They are at the minimum level of complexity I am interested in real world (and about the maximum achievable while staying purely within numpy).


ode_system_compiler:
takes a compiled function, defines an ode integrator around that function, compiles that and passes it back.
Eliminating that closure does not seem to change the performance characteristics.

I would be most grateful for any insights here.  I never studied CS, so I'm "learning as I go", so pointers to literature I should/need to read would also be much appreciated.

Best,
Frank Hellmann
test.py

Joshua Adelman

unread,
Feb 3, 2016, 9:21:40 AM2/3/16
to Frank Hellmann, numba...@continuum.io
--



Hi Frank,

I think what you’re seeing is the compile time for the jit code being lumped in with the run time. If I modify your code to call calculate states2 and states3 once outside of the timing loop, the output of the script is (using Numba 0.23.1 on OSX 10.9.5):

L1 norm of the differences of the two right hand side functions on 100 random drawn vectors: 0.0
1000 x right_hand_side_numba: 0.00724101066589
1000 x right_hand_side_numpy: 0.0181360244751
Time taken to compile ode system: 0.270935058594
Time taken to integrate numpy rhs using odeint: 0.0618920326233
Time taken to integrate numba rhs using odeint: 0.0392701625824
Time taken to integrate using ode_compiled: 0.0689558982849

Whenever you time lazily jit’d code, you need to “warm-up” the code by running it once to produced the compiled version before you time it, unless you really care about the cost of compilation. That is why it is often good to time code using something like the timeit module, which will run your code multiple times and give you the fastest of the group, since it will naturally remove the warm-up cost.

Best,
Josh 










Frank Hellmann

unread,
Feb 3, 2016, 10:19:41 AM2/3/16
to Numba Public Discussion - Public, cer...@gmail.com
Thank you!
I did call nrhs(y, 0) before running the timing loop, but odeint seems to run it with a different signature internally, e.g. nrhs(y, 0.).
Warming up and averaging at least removes the paradoxical performance.
I am still very much puzzled by the fact that odeint with a python callback can outperform a trivial numba compiled Euler loop on the same output array by a factor of two:

Time taken to integrate numba rhs using odeint: 0.866139888763
Time taken to integrate using ode_compiled: 1.75888204575

Any ideas on that? If there are no long hanging fruit in this, fair enough, perfromance wise I am about where I need to be I think...

Best,
Frank

Stanley Seibert

unread,
Feb 3, 2016, 10:26:09 AM2/3/16
to Numba Public Discussion - Public, cer...@gmail.com
Can you compare the number of calls to rhs made by both methods?  odeint uses some very well-tuned FORTRAN libraries, which means it might converge in much fewer rhs calls than you make with ode_compiled.

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/28d8100e-ad94-42e2-a6bf-00dd974a78dd%40continuum.io.

Frank Hellmann

unread,
Feb 3, 2016, 10:43:07 AM2/3/16
to Numba Public Discussion - Public, cer...@gmail.com
That solved that, indeed it seems that odeint calls rhs only around 1000 times, and then interpolates the 10,000 output points from there.
Changing the integration parameterrs I am now seeing examples where odeint is slower but more accurate, as expected.
Thank you very much for the help!

Best,
Frank
Reply all
Reply to author
Forward
0 new messages