I am investigating the possibilities of a project which makes uses
of Polly to parallelize loops and code-gen into code that makes use of
GPU. One of the options is to generate CUDA/OpenCL code - LLVM
currently does not have a OpenCL/CUDA code generator. Another option
is to generate PTX code - LLVM currently does have PTX backend.
Tobias, have you thought/evaluated both approaches ? Any suggestions ?
Thanks
Xin
Hi Xin,
I have thought about it. ;-)
If you want to translate from LLVM-IR to GPU code there are, as you
pointed out, two possible approaches:
1. Generating C like CUDA/OpenCL code
To generate CUDA/OpenCL a backend like the LLVM C backend
needs to be written or the LLVM C backend needs to be extended. For
OpenCL this was already done in a bachelors thesis [1]. This approach
has both advantages and disadvantages:
Advantages:
- The result can be compiled by any OpenCL compiler
(Though this only works, if we solve the hard problem of
generating correct OpenCL code from LLVM-IR)
Disadvantages:
- Unreliable and Buggy.
The C backend is known to be buggy (and a rewrite is needed to
fix it). An OpenCL backend based on the C backend, will also
be unreliable if the root problems are not fixed.
- Overhead
Going back from LLVM-IR to OpenCL is actually just overhead.
All OpenCL compilers I know of (AMD, Intel, NVidia)
lower OpenCL back to LLVM-IR.
2. Use compiler backends for the GPU low level IRs
Here we directly go from LLVM-IR to PTX/AMD-IL/..
Again some positive and negative points:
Disadvantages:
- You need specific compiler backends
- Need to generate different meta-data for different GPUs
At the moment, the different back ends expect different
meta-data. This means your stuff would for the moment be
backend specific
Advantages:
- Reliable
The existence of proprietary GPU backends shows that this
approach can yield production quality software.
- Existing open source backends
LLVM includes a PTX backend. AMD also open sourced their AMD IL
backend [2]
- Upcoming standard ?!
I heard rumors that people are thinking to standardize LLVM-IR
as an alternative OpenCL input format.
Personally, I would recommend to go with approach two. I think time is
better invested in working on advanced transformations that enable GPU
code generation, than on fixing the c back end. If you plan to work on
optimizations, you may want to have a look at Muthu's work[3]. In my
group we currently work improving on this. The relevant code is
available in ppcg. For the moment, this is still a source to source
tool, but adapting/using these algorithms for Polly would be great.
Keep me up to date, what is/has happening here.
Cheers
Tobi
P.S.: The Google summer of code project is starting soon. If you know
any student interested in this, I believe this would be a great project.
[1] http://www.cdl.uni-saarland.de/publications/theses/moll_bsc.pdf
[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046136.html
[3] Automatic C-to-CUDA Code Generation for Affine Programs
Muthu Manikandan Baskaran, J. Ramanujam and P. Sadayappan
CC 2010
[4] http://repo.or.cz/w/ppcg.git
Thanks
I have not looked into the backends yet. but as far as i can see,
there is only one PTX back-end that generates code for GPU. Is not the
whole idea of IR to abstract away the heterogeneity in the back-ends ?
Sure. It mostly works. However, e.g. the AMD-IL back end stores
additional information within meta data, which is not standardized. I
don't think it's a big deal, but you should be aware that you might need
to generate slightly different annotations/meta-data to mark certain
constructs. Though, I think the differences are rather small.
Cheers
Tobi
The relevant tool was just posted on the LLVM mailing list:
https://bitbucket.org/gnarf/axtor/
You may want to give it a look.
Tobi