sph rendering and numba

85 views
Skip to first unread message

Rok Roskar

unread,
Oct 18, 2013, 7:21:03 AM10/18/13
to pynbo...@googlegroups.com, pynbod...@googlegroups.com
Hey everyone,

thought I'd share some things I've been experimenting with. I wanted to try numba (one of these magic python packages that turns your slow python code into something C-fast) so as a test case I rewrote Andrew's pynbody sph renderer in pure python. You can see what it looks like in this gist along with some sample timings: https://gist.github.com/rokroskar/bdcf6c6b210ff0efc738

Numba is an open source project but I've got it installed as a part of the continuum analytics python toolset Anaconda (https://store.continuum.io/cshop/anaconda/). Numbapro, the "accelerated" version of numba (promises to allow you to use gpu cuda-like code) and support for OpenMP-like loop parallelism via a prange for loop). It's free for academics but I could never get these bits to work and the documentation for them is even worse than numba. 

You'll see that it uses mostly pythonic syntax but there are some cases where numba blows up if you try to do too much with list comprehensions etc. It also doesn't know how to do abstract loops (i.e. for bla in some_list_of_stuff). It can be a pain to get it working occasionaly because sometimes the errors it gives during the compilation stage are less than informative and the documentation basically boils down to a set of (short) examples. But, when it works it's pretty cool. As you can see from the timings, it's sometimes faster than the pynbody renderer. In case it isn't clear -- the numba part of the code are just the @autojit, @jit, and @vectorize decorators... yes, it does seem too good to be true. 

The second goal was to try and put this onto a gpu. After talking with Joachim Stadel about it, he pointed out that I should rewrite the algorithm to expose as much paralellism as possible by forcing the threads to do as much of identical work as possible. So we came up with a scheme where the particles are sorted by how many pixels they need to draw on the image (smoothing length basically). 

The resulting code is the second file under the same gist link. You can see some sample timings in the doc string at the top of the file -- for the eris simulation (~40M dm particles), the template render code does ~ 6 times better than the pynbody renderer, even including the very slow np.where and sort. I tried to put this on the gpu using the anaconda accelerate cuda jit but it was very painful -- no debugger, few useful error messages. I gave up on the numbapro cuda and implemented it in CUDA C but getting it to work correctly and efficiently still needs some more work... but it's close. :)

Take away points: numba is great when it works -- you can write fairly naive python code and turn it into something relatively fast fairly easily. It can take a little bit of work reorganizing the code, but come on, with a single decorator you get fabulous speed ups! Numbapro, on the other hand, might seriously test your nerves, I don't think it's really ready for anything but experimentation though maybe things will get better. They appear to have Nvidia behind the development now so perhaps that will help things along. 

The new renderer is pretty fast and I'll try to find the time to write it in C and wrap it in cython to make it more portable than being simply a numba-fied code. It needs some quality control as the results don't really agree with pynbody at the moment, I think it's just some normalization somewhere that I missed. It should also be made multi-threaded which will speed things up even more obviously. Would be cool if people try it and help find bugs... 

Let me know if you end up trying any of this out! 


Rok

Andrew Pontzen

unread,
Oct 18, 2013, 7:27:36 AM10/18/13
to pynbo...@googlegroups.com, pynbod...@googlegroups.com
At a glance this looks very cool! It is definitely along the right track for the future.

Just one thing though: why didn't you use cython, which is a very popular, stable and well-developed python->C compiler?

If we are going to have separate GPU / CPU renderers, I would advocate writing the CPU one in cython and the GPU one in OpenCL. This should result in maximum future compatibility. I did once try playing with rendering in OpenCL, but didn't get anything stable (or fast) enough to be worth sharing.

Anyway, thanks for this work Rok, I'm looking forward to trying it out :-)

Cheers, Andrew
> --
> You received this message because you are subscribed to the Google Groups "pynbody-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pynbody-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.


Rok Roskar

unread,
Oct 18, 2013, 8:00:55 AM10/18/13
to pynbo...@googlegroups.com, pynbod...@googlegroups.com
hey andrew,

yes, I did it in numba just as an experiment -- to see how far the autojit decorator can take you. I think it's pretty impressive. I agree that for portability it'll have to be done in something like cython -- I think all it will take is ripping the cuda calls out of my cuda c code and wrapping it with cython.

I thought about OpenCL also, but Nvidia is dropping all support for it. So it's not clear that it's actually a future proof direction to take. It's lame that they are forcing CUDA on people by doing this, but so it goes. What I have brewing now is CUDA C called using PyCuda -- both of these things are in very active development and very well supported.

I should also point out that the Anaconda python distribution from Continuum Analytics is pretty good -- it comes with a package manager (conda) that allows you to really easily select package versions, create virtual environments for testing etc. I've been using it now for a while instead of enthought's canopy and I like the fact that they stay on top of the latest releases (e.g. canopy still ships with ipython 0.13 even though ipython 1.0 has been out for a while). I installed anaconda initially just to simplify the numba install, but in the end I just stayed with it.


Rok
> You received this message because you are subscribed to the Google Groups "pynbody-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pynbody-user...@googlegroups.com.

Rory Woods

unread,
Oct 18, 2013, 1:58:11 PM10/18/13
to pynbo...@googlegroups.com, pynbod...@googlegroups.com
Very cool stuff, Rok! Thanks for doing this. I won't be able to play
with it for a bit, but I'll let you know if i find any bugs when I do
play with it.

-Rory
Reply all
Reply to author
Forward
0 new messages