numba + cuda: possible to write "Python macros" to help make it easier to write long kernels?

0 views
Skip to first unread message

Brian Merchant

unread,
Apr 29, 2016, 8:50:32 PM4/29/16
to Numba Public Discussion - Public
Hi all,

Let's say you had to take a cross product two times within a kernel. You could write out the cross product in long form two times within the kernel, but could you write a "Python macro" representing the cross product, which would then get expanded out as Python code in the cuda decorated function, before it then gets compiled out to CUDA C?

Kind regards,
Brian

Diogo Silva

unread,
Apr 30, 2016, 6:17:04 AM4/30/16
to Numba Public Discussion - Public

--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/dcca1419-9784-4b08-b2c0-7498e3e456b3%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Stanley Seibert

unread,
May 1, 2016, 7:41:14 PM5/1/16
to Numba Public Discussion - Public
And important to note: the CUDA compiler is *extremely* aggressive about inlining functions, so device functions will have no call overhead, unless you do something like a recursive call (which Numba won't let you compile anyway).  This is one of the many reasons why compiling for the CUDA target is extremely slow...  :)

Reply all
Reply to author
Forward
0 new messages