2012/5/30 Krzysztof Kamieniecki <kr...@kamieniecki.com>:
> I am currently walking a fine line between wanting to get the GPU integrated
> and not wanting to rebuild LLVM in Julia (and also not replicate what NVidia
> is doing with PTX generation in LLVM). I've been busy with other things, but
> I have been thinking about this, and I think I have finally figured out a
> simple/quick way to get basic PTX code generated, that can be replaced when
> the LLVM/PTX backend comes up to speed. Even when the GPU code
> is generated there is still the issue of deciding how, when, how much data
> is moved to the GPU and back, and when to execute the kernels. I want that
> interface to match with @parallel and DArray, although I may change how it
> behaves with the GPU, I hope that eventually the behavior will converge when
> everyone has code to experiment with.
Based on my experience with GPUs, this will be hard to do if the goal
is good performance, but you noted some of the issues in your message.
I think a first step is to provide ways for Julia users to send stuff
to the GPU if they need it, even if it means changing/restructuring
the functions that will be executed on the GPU. That is, create a more
low-level layer for people who want to use it, and then build the
higher level layers on top of it.
Regarding OpenCL, as far as I know it is precisely done to make the
programmer have control over the GPU, I don't know how OpenCL "takes
out" the problematic elements of GPU programming. Programming OpenCL
is similar to programming CUDA at the Driver API level, which is
considerably lower-level than CUDA C. The only difference is that
OpenCL has vector programming support, and the Intel OpenCL SDK has an
automatic vectoriser, but I don't think it does everything magically
(haven't used it though).