Hi Ruyman,Both the "lib" and "include" folders are inside "/opt/AMDAPP".
In my computer I only have an AMD GPU (HD 7750).
The OpenCL implementation I have installed (under /opt/AMDAPP) is the AMD-APP-SDK-v2.8-lnx64.
How should I setup the env-parameters.sh file?Keeping in mind I don't have CUDA (my GPU only supports OpenCL).
By default the file looks as follows:
# CUDA and OpenCL PATH
export PATH=/usr/local/cuda/bin/:$PATH
export CUDADIR=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH
export CPPFLAGS="-I/usr/local/cuda/include":$CPPFLAGS
export LDFLAGS="-L/usr/local/cuda/lib":$LDFLAGS
Best regards,
Ricardo2013/5/14 Ricardo Nobre <rjfnob...@gmail.com>
Hi Ruyman,Thanks for your quick and complete answer!Do you think collapse(3) would lead to significant better performance when compared with collapse(2)?Why?Does you source to source compiler perform automatic loop tiling in order to improve thread- and data-level (good use of SIMD units)parallelism?Or do I have to set tiling parameters through pragmas?Regards,Ricardo2013/5/13 Ruymán Reyes <rre...@ull.es>Hi Ricardo,
Thank you for your interest in accULL.
accULL is an experimental implementation of OpenACC, thus, you'll may
experience some problems with complex codes. Release 0.2 is currently
available and works reasonably well. A new release (0.3) is to be
expected soon (<1month), with many fixes for large codes and a more
user-friendly interface.
The major drawback of our implementation is that it will always
attempt to generate a GPU kernel (CUDA and/or OpenCL), no matter if
the code is really parallelizable or not (i.e the independent clause
is always implicit).
Other than that, it is a good (and free) approach to OpenACC with
support for many of the 1.0 features.
Up-to-date details are in http://accull.wordpress.com/
With respect to performance, depending on the code you may or may not
get better performance. In simpler codes where the PGI or CAPS
compiler cannot perform complex code transformations, accULL has the
same or better performance due to less overhead and better scheduling
of loop iterations. However, there are no polyhedral transformation in
accULL thus CAPS and particularly PGI will outperform in situations
were this is an advantage.
Bear in mind that accULL produces clean CUDA/OpenCL source code, thus
it is suitable for further optimisation. This does not happen with
PGI/CAPS.
The initial idea of the framework was to leverage the development
effort rather than completely replace the CUDA and OpenCL languages.
You will need to change the addressing from [] to bare pointers. This
> Can you suggest me the pragmas to use for best performance using accULL so I
> can test it myself?
>
will help other OpenACC implementations as well, in particular CAPS.
For example (not tested!):
val = obstacles[i*n1*n2+j*n2+k]
With respect to the directive, I think that:
#pragma acc kernels loop collapse(2) copy(potential[0:n1*n2*n3])
copyin(obstacles[0:n1*n2*n3]) private(acc, val)
should work.
It will help too if you declare acc and val within the loop as here
(assuming double):
#pragma acc kernels loop collapse(2) copy(potential[0:n1*n2*n3])
copyin(obstacles[0:n1*n2*n3])
> for (i = 1; i < (X - 1); i++) {> double val = obstacles[i][j][k];
> for (j = 1; j < (Y - 1); j++) {
> for (k = 1; k < (Z - 1); k++) {
> double acc = potential[i-1][j][k] + potential[i+1][j][k] +
> potential[i][j-1][k] + potential[i][j+1][k] + potential[i][j][k-1] +In this case, collapse(3) would work on CUDA but not in OpenCL due to
> potential[i][j][k+1];
> potential[i][j][k] = acc * (1/6);
current implementation limitations. If you think you'll need this, I
can try to push the developers to put it on the next release - not
sure if they'll make it on time.
Best regards,
Ruyman Reyes,
>
> Best regards,
> Ricardo