Support for GPU (Cuda/OpenCL)

Håvard Wormdal Høiby

unread,

Sep 8, 2014, 9:41:20 AM9/8/14

to juli...@googlegroups.com

Hi.

I'm considering to extend Julia with "native" support for writing kernels, i.e. not having to write them in C.

What is the current state of the GPU support in the language/libraries?

Is it reflected by https://github.com/JuliaGPU or are there other promissing efforts?

And are there any plans or desired directions for such an extension?

My primary target would be CUDA using the LLVM PTX backend.

On this list I've found Pieter Verstraete and Krysztof Kamieniecki working towards similar goals, what are your current results and future plans?

- Håvard

maleadt

unread,

Sep 9, 2014, 3:25:08 AM9/9/14

to juli...@googlegroups.com

Hi Håvard,

Op maandag 8 september 2014 15:41:20 UTC+2 schreef Håvard Wormdal Høiby:

On this list I've found Pieter Verstraete and Krysztof Kamieniecki working towards similar goals, what are your current results and future plans?

Pieter was a student at my lab who added preliminary native GPU support to Julia (limited to NVIDIA hardware) as part of his master's thesis. More concretely, the project compiled Julia code to NVPTX assembly using the existing codegen machinery, and invoked the CUDA driver to execute the code and communicate with the GPU.

Although it worked quite well, the support is pretty much "bolted on" (codegen riddled with 'if ctx.ptx', magic strings identifying functions and modules, ...) and definitely not ready for general consumption yet. I plan to continue his work, but it might take a while before I feel confident to put it on-line.

Tim

Valentin Churavy

unread,

Sep 9, 2014, 4:23:41 AM9/9/14

to juli...@googlegroups.com

So there is definitely interest in doing this. For the OpenCL side of things there is a branch that works on mapping Julia Code to OpenCL C and compiling this. We also have been discussing supporting SPIR (the OpenCL equivalent of NVPTX), but haven't made any progress on that. In https://github.com/JuliaGPU/OpenCL.jl/issues/29 Jake outlined why he thinks it is currently not feasible.

Best Valentin

Erik Schnetter

unread,

Sep 9, 2014, 8:53:57 PM9/9/14

to juli...@googlegroups.com

The pocl project <http://pocl.sourceforge.net/> compiles OpenCL C to LLVM. Translating Julia to OpenCL C and then compiling this could be significantly simplified if one used only the pocl back-end, which consists of a few custom LLVM passes for vectorisation and to handle barriers.

-erik

--
Erik Schnetter <schn...@cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/

Håvard Wormdal Høiby

unread,

Sep 12, 2014, 5:02:18 AM9/12/14

to juli...@googlegroups.com

Great, thanks for the feedback.

I'll look into both SPIR and the pocl back-end for inspiration to do this with OpenCL.

- Håvard

Håvard Wormdal Høiby

unread,

Sep 19, 2014, 5:45:11 AM9/19/14

to juli...@googlegroups.com

So after a bit of research into POCL I see this as an interesting project to adapt. But Erik, in your paper[1] on page 10 the footnote that GPU support is not in place at this time. As the main focus of my project is GPU, this might be a show stopper for me. Any new developments?

Any thoughts on the C++ AMP[2] effort by Microsoft? Both Intel[3] and AMD[4] have implemented the specification.

AMD's is open source based on LLVM and has the following output strategy:

- OpenCL C for any OpenCL compliant platform

- SPIR for AMD and Intel

- HSAIL for HSA compliant platforms

[1] http://download.springer.com/static/pdf/986/art%253A10.1007%252Fs10766-014-0320-y.pdf?auth66=1411113969_9fa120ba7ecda267df3966aa41b47891&ext=.pdf

[2] http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf

[3] http://llvm.org/devmtg/2012-11/Sharlet-ShevlinPark.pdf

[4] https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/Home

- Håvard

Erik Schnetter

unread,

Sep 19, 2014, 10:19:45 AM9/19/14

to juli...@googlegroups.com

On Fri, Sep 19, 2014 at 5:45 AM, Håvard Wormdal Høiby
<havard...@gmail.com> wrote:
> So after a bit of research into POCL I see this as an interesting project to
> adapt. But Erik, in your paper[1] on page 10 the footnote that GPU support
> is not in place at this time. As the main focus of my project is GPU, this
> might be a show stopper for me. Any new developments?

There are two parts to supporting GPUs. One is generation code and
providing the built-in functions, and doing so with reasonable
efficiency. This pocl should already handle, thanks to LLVM as
back-end. The other is the very system specific mechanisms to transfer
code and data to and from the device, and run it there. This can
easily be added to pocl -- pocl supports three or four different
device types -- but there seems to be currently no volunteer e.g. for
Nvidia GPUs, Intel MICs, or other devices.

Since you use the term "GPU" without qualification, I assume you are
looking at Nvidia GPUs. In this case, one approach could be to make
LLVM emit PTX (can it do that?), and then use CUDA to run this PTX
code (is there an API for this)?

In essence, the idea is to convert Julia code to GPU-optimized LLVM
code instead of OpenCL C. LLVM code is system specific, and one needs
to perform certain transformations/optimizations to make code run
efficiently on a GPU (the code needs to be vectorized). I am hoping
that there is a rather straightforward way from having this LLVM code,
to running it on a GPU, but I'm not familiar enough with Nvidia's
design to be sure.

-erik

Tim Holy

unread,

Sep 19, 2014, 7:20:06 PM9/19/14

to juli...@googlegroups.com

Re running PTX code, Dahua's CUDA.jl first got this working. I stole it for
CUDArt.jl. You can find both here: https://github.com/JuliaGPU

--Tim

Jake Bolewski

unread,

Sep 19, 2014, 8:45:52 PM9/19/14

to juli...@googlegroups.com

You can also run compiled PTX code with Nvidia's OpenCL implementation (binary kernel).

Håvard Wormdal Høiby

unread,

Sep 20, 2014, 6:34:08 AM9/20/14

to juli...@googlegroups.com

NVIDIA PTX code can be emitted by the NVPTX LLVM backend contributed by NVIDIA, so it should be possible to make pocl emit PTX.

And as Jake Bolewski and Tom mentions support for running this is already in place.

Jake Bolewski

unread,

Sep 20, 2014, 12:54:56 PM9/20/14

to juli...@googlegroups.com

I don't know the status of the Ptx backend with llvm trunk, but if you want to use any of the nvidia compiler support libraries (nvvm) they use an ir from llvm 3.2, so the current llvm ir generated by julia is annoyingly incompatible. Don't know if they have plans to update the llvm version they are using any time soon.

Reply all

Reply to author

Forward