--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/d70976d6-f215-4ab8-8e6e-098135dc3db2%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
If I understand your question correctly, you want to use Numba to compile a Python function to PTX and get a pointer to it so that a CUDA-C++ function can use it as a callback. That way, you can have a CUDA-C++ function minimizer (for example) and pass it user-defined functions to minimize, which are written in Python for user convenience.
Looking at the documentation (http://numba.pydata.org/numba-doc/0.13/CUDAJit.html), numba.cuda.jit has a "device" argument. When "True," the Python function is compiled as a __device__ function, rather than __global__. So that's how you can get your callback.
s
"pseudo-code" below:>>> from numba import cfunc
>>> def add(a,b):
return return
...:
>>> c_callback = cfunc("float64(float64, float64)", gpu=True, cpu=True)(add)
>>> from hydra import generic_algo
>>> gpu_result = generic_algo.gpu_calculation( c_callback.gpu_address)
>>> cpu_result = generic_algo.cpu_calculation( c_callback.cpu_address)
As for its pointer, there must be a way using ctypes. It will probably be in some llvmlite attribute, rather than a Numba attribute. I might have an example for a CPU function, and it would be similar for GPU functions.But... wait a minute... I thought that functions running in GPUs couldn't run functions from pointers. That would make recursion possible, something else I thought was impossible on GPUs because the stack depth needs to be known at compile time.
The address of a __global__ function taken in host code cannot be used in device code (e.g. to launch the kernel). Similarly, the address of a __global__ function taken in device code 10 cannot be used in host code.
It is not allowed to take the address of a __device__ function in host code.
__global__ functions do not support recursion.
So I think using even recursion in device functions is not a very smart thing to do, there is no formal constraint to forbid that.
Do you have some idea on how I could implement that ideas ?
A.A.
Jim
On Jun 10, 2017 2:47 AM, "Antonio Augusto Alves Junior" <aalv...@gmail.com> wrote:
Hi folks, some suggestion?--
Cheers
A.A.
On Friday, June 9, 2017 at 12:30:51 PM UTC+2, Antonio Augusto Alves Junior wrote:Hi all,
Congratulations, this is a very interesting and useful project.
I am absolute beginner on numba... so I apologise for asking this very silly question.
I am designing python bindings for Hydra, a library for data analysis in high energy physics using multi-core CPU and GPU.
I would like to be able to define host/device functions on the python interpreter, compile them using @jit decorators , then get their address and
pass it to the foreign c++ interface of Hydra.
Currently, it is possible to do this task for CPU target it using @cfunc, as documented here http://numba.pydata.org/numba-doc/latest/user/cfunc.html. I would like to have similar functionally for gpu.
Does someone have a suggestion?
Cheers
A.A.
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users+unsubscribe@continuum.io.
To post to this group, send email to numba...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/fae4ba89-3346-425f-a78e-393c0c470400%40continuum.io.
The part of the documentation you quoted didn't say that you can't execute a __device__ function pointer on the device, but it didn't say you can, either.
nvlink warning : Stack size for entry function 'blablabla...' cannot be statically determined.
I was under the impression that PTX bytecode can't have a function call stack that grows without bound at runtime, which is fundamentally unlike x86 bytecode. This is what I have heard— I'm not an expert. Do you have evidence, independent of Numba, that you can pass and execute __device__ function pointers on the device? Have you seen a working example in pure CUDA code— no Numba? Because of you haven't, then what you're trying to do in Numba might be impossible for the hardware, and it's not really a Numba problem.
Now, assuming that the above is true, that you can't pass __device__ function pointers on the device, JIT-compilation provides another way. You can take users' Python code and compile an entry point around it, rather than passing it as a function. The effect would be the same: users get to pass user-defined functions to a functor. The difference is that the user's function actually gets inlined into the framework, compiled fresh every time. I'll give an example of it's not clear what I mean.
Jim
I think i understand what are you suggesting, but having an example would be great. Please, could you include in the example some interation with a c/c++ coded routine ?
@numba.jit
def doEverything(startingParams):iteration = 0params = startingParamswhile not isConverged():results = {}for p in params:results[p] = objective(p) # since we're auto-generating this, might as well include objective inlineh = computeHessian(results)params = predictMinimum(h)return params