numba + cuda: possible to write "Python macros" to help make it easier to write long kernels?

Brian Merchant

unread,

Apr 29, 2016, 8:50:32 PM4/29/16

to Numba Public Discussion - Public

Hi all,

Let's say you had to take a cross product two times within a kernel. You could write out the cross product in long form two times within the kernel, but could you write a "Python macro" representing the cross product, which would then get expanded out as Python code in the cuda decorated function, before it then gets compiled out to CUDA C?

Kind regards,
Brian

Diogo Silva

unread,

Apr 30, 2016, 6:17:04 AM4/30/16

to Numba Public Discussion - Public

Yes, you have device functions (http://numba.pydata.org/numba-doc/dev/cuda/device-functions.html).

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/dcca1419-9784-4b08-b2c0-7498e3e456b3%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Stanley Seibert

unread,

May 1, 2016, 7:41:14 PM5/1/16

to Numba Public Discussion - Public

And important to note: the CUDA compiler is *extremely* aggressive about inlining functions, so device functions will have no call overhead, unless you do something like a recursive call (which Numba won't let you compile anyway). This is one of the many reasons why compiling for the CUDA target is extremely slow... :)

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/CAKiRKhQ0iwW%2BvPXPXpdTpbBx%3DYPYHr_ozDZ0w7XqL9A3ab8E2A%40mail.gmail.com.

Reply all

Reply to author

Forward