I am investigating the possibilities of a project which makes uses
of Polly to parallelize loops and code-gen into code that makes use of
GPU. One of the options is to generate CUDA/OpenCL code - LLVM
currently does not have a OpenCL/CUDA code generator. Another option
is to generate PTX code - LLVM currently does have PTX backend.
Tobias, have you thought/evaluated both approaches ? Any suggestions ?
I have thought about it. ;-)
If you want to translate from LLVM-IR to GPU code there are, as you
pointed out, two possible approaches:
1. Generating C like CUDA/OpenCL code
To generate CUDA/OpenCL a backend like the LLVM C backend
needs to be written or the LLVM C backend needs to be extended. For
OpenCL this was already done in a bachelors thesis . This approach
has both advantages and disadvantages:
- The result can be compiled by any OpenCL compiler
(Though this only works, if we solve the hard problem of
generating correct OpenCL code from LLVM-IR)
- Unreliable and Buggy.
The C backend is known to be buggy (and a rewrite is needed to
fix it). An OpenCL backend based on the C backend, will also
be unreliable if the root problems are not fixed.
Going back from LLVM-IR to OpenCL is actually just overhead.
All OpenCL compilers I know of (AMD, Intel, NVidia)
lower OpenCL back to LLVM-IR.
2. Use compiler backends for the GPU low level IRs
Here we directly go from LLVM-IR to PTX/AMD-IL/..
Again some positive and negative points:
- You need specific compiler backends
- Need to generate different meta-data for different GPUs
At the moment, the different back ends expect different
meta-data. This means your stuff would for the moment be
The existence of proprietary GPU backends shows that this
approach can yield production quality software.
- Existing open source backends
LLVM includes a PTX backend. AMD also open sourced their AMD IL
- Upcoming standard ?!
I heard rumors that people are thinking to standardize LLVM-IR
as an alternative OpenCL input format.
Personally, I would recommend to go with approach two. I think time is
better invested in working on advanced transformations that enable GPU
code generation, than on fixing the c back end. If you plan to work on
optimizations, you may want to have a look at Muthu's work. In my
group we currently work improving on this. The relevant code is
available in ppcg. For the moment, this is still a source to source
tool, but adapting/using these algorithms for Polly would be great.
Keep me up to date, what is/has happening here.
P.S.: The Google summer of code project is starting soon. If you know
any student interested in this, I believe this would be a great project.
 Automatic C-to-CUDA Code Generation for Affine Programs
Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan
I have not looked into the backends yet. but as far as i can see,
there is only one PTX back-end that generates code for GPU. Is not the
whole idea of IR to abstract away the heterogeneity in the back-ends ?
Sure. It mostly works. However, e.g. the AMD-IL back end stores
additional information within meta data, which is not standardized. I
don't think it's a big deal, but you should be aware that you might need
to generate slightly different annotations/meta-data to mark certain
constructs. Though, I think the differences are rather small.
The relevant tool was just posted on the LLVM mailing list:
You may want to give it a look.