Help requested : Satish sort on GPU

28 views
Skip to first unread message

Viewon01

unread,
Jun 17, 2011, 12:31:56 PM6/17/11
to clpp
Hi All,

About the Key-Value pair sorting (clppSort_RadixSort.XXX) I try to :
1 - Sort only on 16 bits (8 bits...), but it fails !
2 - Use NVidia GPU to sort. In general it works but when I have a big
number of items, it failed !!!

So, does someone is able to take a look at theses problems ? For sure
there are stupid bugs that I missed !!

Viewon01

unread,
Jun 20, 2011, 3:58:30 AM6/20/11
to clpp
Hi,

The Satish sort for key-value pairs is fixed for NVidia GPU.
Sorry for the delay, I was searching for a very complex bug... and in
fact it is just a very stupid bug !!

The good news is that the sort is now 41% FASTER than the previous
version :-D

I have not test it on ATI GPU... if someone has an ATI GPU... please
send me the benchmark !!!

philipp...@gmail.com

unread,
Jun 21, 2011, 4:42:58 AM6/21/11
to cl...@googlegroups.com
Hi Krys,

well done. I am sorry but for the moment I have absolutely no time to
work on the sorting. I am not able to compile too. Would it be possible
to rename the kernel sources as .cl, as usual ? It would simplify
greatly my SConstruct file for mac or linux. And I would be able to test
on my AMD card.

bests

PH

Le 20/06/2011 09:58, Viewon01 a �crit :

kr...@polarlights.net

unread,
Jun 21, 2011, 5:48:32 AM6/21/11
to cl...@googlegroups.com
Hi Phil,

All the kernel sources are ".cl" !!!!

What is the problem ?

-----Original Message-----
From: "philipp...@gmail.com" <philipp...@gmail.com>
Sent: Tuesday, June 21, 2011 4:42am
To: cl...@googlegroups.com
Subject: Re: Help requested : Satish sort on GPU

Hi Krys,

well done. I am sorry but for the moment I have absolutely no time to
work on the sorting. I am not able to compile too. Would it be possible
to rename the kernel sources as .cl, as usual ? It would simplify
greatly my SConstruct file for mac or linux. And I would be able to test
on my AMD card.

bests

PH

crb002

unread,
Jul 2, 2011, 6:26:25 PM7/2/11
to clpp
Here is my naive compile under OSX using the example on the main page.
Guess I will be tracking down these issues.
Using the version clpp_v1_beta2.zip.

crb002$ g++ test.cpp -framework OpenCL -I./ -Wall
test.cpp: In function ‘int main()’:
test.cpp:12: error: no matching function for call to
‘clpp::createBestSort(clppContext&, int)’
./clpp/clpp.h:16: note: candidates are: static clppSort*
clpp::createBestSort(clppContext*, unsigned int, unsigned int)
test.cpp:12: error: cannot declare variable ‘sort’ to be of abstract
type ‘clppSort’
./clpp/clppSort.h:13: note: because the following virtual functions
are pure within ‘clppSort’:
./clpp/clppSort.h:16: note: virtual std::string clppSort::getName()
./clpp/clppSort.h:19: note: virtual void clppSort::sort()
./clpp/clppSort.h:31: note: virtual void
clppSort::pushCLDatas(_cl_mem*, size_t)
./clpp/clppSort.h:34: note: virtual void clppSort::popDatas()
test.cpp:14: error: expected primary-expression before ‘...’ token
test.cpp:14: error: expected ‘,’ or ‘;’ before ‘...’ token
test.cpp:17: error: base operand of ‘->’ has non-pointer type
‘clppSort’
test.cpp:18: error: base operand of ‘->’ has non-pointer type
‘clppSort’
test.cpp:20: error: base operand of ‘->’ has non-pointer type
‘clppSort’
test.cpp:22: error: no match for call to ‘(clppSort) ()’
test.cpp:14: warning: unused variable ‘dataToSort’




On Jun 21, 3:42 am, "philippe.hel...@gmail.com"

philipp...@gmail.com

unread,
Jul 3, 2011, 4:47:16 AM7/3/11
to cl...@googlegroups.com
I have written a scons script (SConstruct) for compiling clpp. You
should rather install scons ( http://www.scons.org/ ), open a terminal,
cd to the clpp folder and just type "scons". It works !
ph

Le 03/07/11 00:26, crb002 a �crit :


> Here is my naive compile under OSX using the example on the main page.
> Guess I will be tracking down these issues.
> Using the version clpp_v1_beta2.zip.
>
> crb002$ g++ test.cpp -framework OpenCL -I./ -Wall

> test.cpp: In function �int main()�:


> test.cpp:12: error: no matching function for call to

> �clpp::createBestSort(clppContext&, int)�


> ./clpp/clpp.h:16: note: candidates are: static clppSort*
> clpp::createBestSort(clppContext*, unsigned int, unsigned int)

> test.cpp:12: error: cannot declare variable �sort� to be of abstract
> type �clppSort�


> ./clpp/clppSort.h:13: note: because the following virtual functions

> are pure within �clppSort�:


> ./clpp/clppSort.h:16: note: virtual std::string clppSort::getName()
> ./clpp/clppSort.h:19: note: virtual void clppSort::sort()
> ./clpp/clppSort.h:31: note: virtual void
> clppSort::pushCLDatas(_cl_mem*, size_t)
> ./clpp/clppSort.h:34: note: virtual void clppSort::popDatas()

> test.cpp:14: error: expected primary-expression before �...� token
> test.cpp:14: error: expected �,� or �;� before �...� token
> test.cpp:17: error: base operand of �->� has non-pointer type
> �clppSort�
> test.cpp:18: error: base operand of �->� has non-pointer type
> �clppSort�
> test.cpp:20: error: base operand of �->� has non-pointer type
> �clppSort�
> test.cpp:22: error: no match for call to �(clppSort) ()�
> test.cpp:14: warning: unused variable �dataToSort�

kr...@polarlights.net

unread,
Jul 3, 2011, 5:09:22 AM7/3/11
to cl...@googlegroups.com
Thanks for your interest.

I just would like to notify that it remain a bug and that the architecture has to be improved too. So, maybe it is better if you directly use the last SVN version.

Feel free to contact us for help

Krys

-----Original Message-----
From: "philipp...@gmail.com" <philipp...@gmail.com>
Sent: Sunday, July 3, 2011 4:47am
To: cl...@googlegroups.com
Subject: Re: Help requested : Satish sort on GPU

I have written a scons script (SConstruct) for compiling clpp. You
should rather install scons ( http://www.scons.org/ ), open a terminal,
cd to the clpp folder and just type "scons". It works !
ph

Le 03/07/11 00:26, crb002 a écrit :


> Here is my naive compile under OSX using the example on the main page.
> Guess I will be tracking down these issues.
> Using the version clpp_v1_beta2.zip.
>
> crb002$ g++ test.cpp -framework OpenCL -I./ -Wall

> test.cpp: In function ‘int main()’:


> test.cpp:12: error: no matching function for call to

> ‘clpp::createBestSort(clppContext&, int)’


> ./clpp/clpp.h:16: note: candidates are: static clppSort*
> clpp::createBestSort(clppContext*, unsigned int, unsigned int)

> test.cpp:12: error: cannot declare variable ‘sort’ to be of abstract
> type ‘clppSort’


> ./clpp/clppSort.h:13: note: because the following virtual functions

> are pure within ‘clppSort’:


> ./clpp/clppSort.h:16: note: virtual std::string clppSort::getName()
> ./clpp/clppSort.h:19: note: virtual void clppSort::sort()
> ./clpp/clppSort.h:31: note: virtual void
> clppSort::pushCLDatas(_cl_mem*, size_t)
> ./clpp/clppSort.h:34: note: virtual void clppSort::popDatas()

> test.cpp:14: error: expected primary-expression before ‘...’ token
> test.cpp:14: error: expected ‘,’ or ‘;’ before ‘...’ token


> test.cpp:17: error: base operand of ‘->’ has non-pointer type
> ‘clppSort’

> test.cpp:18: error: base operand of ‘->’ has non-pointer type
> ‘clppSort’
> test.cpp:20: error: base operand of ‘->’ has non-pointer type
> ‘clppSort’
> test.cpp:22: error: no match for call to ‘(clppSort) ()’
> test.cpp:14: warning: unused variable ‘dataToSort’

Viewon01

unread,
Jul 3, 2011, 5:35:01 AM7/3/11
to clpp
I would like to say that I have just fixed the radix sort for NVidia
cards.

crb002

unread,
Jul 7, 2011, 10:06:08 PM7/7/11
to clpp
Thanks. Installed scons and it is working fine. Now getting some
OpenCL errors. For my NVIDIA 9400m probably just not enough memory,
so i can go in and change the benchmark.

Here is the results on the i7, Apple's OpenCL seems to not like
something:

crb002$ ./go
Platform[Apple] Device[Radeon HD 4850]


--------------- Radix sort Key
[CL_BUILD_PROGRAM_FAILURE] : OpenCL Error : clBuildProgram failed:
could not build program for device 0 (0x0) (-1)
Break on OpenCLErrorBreak to debug.
[CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:71:15: error: __local automatic variables can only be
declared in a kernel
__local uint localBuffer[64];
^

Break on OpenCLErrorBreak to debug.
Error: Failed to build program executable!
<program source>:71:15: error: __local automatic variables can only be
declared in a kernel
__local uint localBuffer[64];
^

Program build failure
Assertion failed: (clStatus == CL_SUCCESS), function checkCLStatus,
file src/clpp/clppProgram.cpp, line 180.
Abort trap




On Jul 3, 3:47 am, "philippe.hel...@gmail.com"
<philippe.hel...@gmail.com> wrote:
> I have written a scons script (SConstruct) for compiling clpp. You
> should rather install scons (http://www.scons.org/), open a terminal,
> cd to the clpp folder and just type "scons". It works !
> ph
>
> Le 03/07/11 00:26, crb002 a crit :
>
>
>
>
>
>
>
> > Here is my naive compile under OSX using the example on the main page.
> > Guess I will be tracking down these issues.
> > Using the version clpp_v1_beta2.zip.
>
> > crb002$ g++ test.cpp -framework OpenCL -I./ -Wall
> > test.cpp: In function int main() :
> > test.cpp:12: error: no matching function for call to
> > clpp::createBestSort(clppContext&, int)
> > ./clpp/clpp.h:16: note: candidates are: static clppSort*
> > clpp::createBestSort(clppContext*, unsigned int, unsigned int)
> > test.cpp:12: error: cannot declare variable sort to be of abstract
> > type clppSort
> > ./clpp/clppSort.h:13: note:   because the following virtual functions
> > are pure within clppSort :
> > ./clpp/clppSort.h:16: note:        virtual std::string clppSort::getName()
> > ./clpp/clppSort.h:19: note:        virtual void clppSort::sort()
> > ./clpp/clppSort.h:31: note:        virtual void
> > clppSort::pushCLDatas(_cl_mem*, size_t)
> > ./clpp/clppSort.h:34: note:        virtual void clppSort::popDatas()
> > test.cpp:14: error: expected primary-expression before ... token
> > test.cpp:14: error: expected , or ; before ... token
> > test.cpp:17: error: base operand of -> has non-pointer type
> > clppSort
> > test.cpp:18: error: base operand of -> has non-pointer type
> > clppSort
> > test.cpp:20: error: base operand of -> has non-pointer type
> > clppSort
> > test.cpp:22: error: no match for call to (clppSort) ()
> > test.cpp:14: warning: unused variable dataToSort

philipp...@gmail.com

unread,
Jul 8, 2011, 2:25:01 AM7/8/11
to cl...@googlegroups.com
you have to replace

__local uint localBuffer[64];


by

uint localBuffer[64];


and the same at other places

ph

Le 08/07/2011 04:06, crb002 a �crit :
> __local uint localBuffer[64];

kr...@polarlights.net

unread,
Jul 8, 2011, 3:45:58 AM7/8/11
to cl...@googlegroups.com
Thanks Philippe,

But it is incorrect, this way it will not work, why ?

Because __local memory is shared between all the 'items' in a workgroup.
If you don't use __local it will not be shared between the items.
In the following case, we need to share them because each work-item will compute
values, and just after another work-item will use this value. Then, it can
be done only because the memory is shared.

So, finally there are 2 solutions :

1) Declare this array in the kernel and pass it as a parameter to the functions.
It is the way I propose to do the correction.

2) Declare the __local array in C++ and pass it as a kernel parameter.

On NVidia card I have try both, but strangely the second version give me a 'slower' sort !

I expect to do the correction today.

Krys

-----Original Message-----
From: "philipp...@gmail.com" <philipp...@gmail.com>
Sent: Friday, July 8, 2011 2:25am
To: cl...@googlegroups.com
Subject: Re: Help requested : Satish sort on GPU

you have to replace

__local uint localBuffer[64];


by

uint localBuffer[64];


and the same at other places

ph

Le 08/07/2011 04:06, crb002 a écrit :
> __local uint localBuffer[64];


philipp...@gmail.com

unread,
Jul 8, 2011, 3:59:22 AM7/8/11
to cl...@googlegroups.com
yes Krys, you are right, I answered too quicly...
I don't know why, but it seems that in some OpenCL implementations it is
not possible to allocate the local memory from the kernel
I do not understand the difference between 1) and 2) :-O

ph

kr...@polarlights.net

unread,
Jul 8, 2011, 4:06:42 AM7/8/11
to cl...@googlegroups.com
:-D

In the first version (1) you declare the local parameter into your kernel like this :
__local int localBuffer[64];

In the second, the localBuffer is a kernel parameter. You create it in C++ with the following command :
clSetKernelArg(kernel, 0, 64, NULL);

I was expecting the same behaviour, but it sounds that in some case the first version is faster. I don't know why !

kr...@polarlights.net

unread,
Jul 8, 2011, 4:24:36 AM7/8/11
to cl...@googlegroups.com
It is fixed and I have commit it :-)

It is a problem that I have also with the AMD GPU !

Thanks for your help

kr...@polarlights.net

unread,
Jul 8, 2011, 4:24:45 AM7/8/11
to cl...@googlegroups.com
It is fixed and I have commit it :-)

It is a problem that I have also with the AMD GPU !

Thanks for your help

crb002

unread,
Jul 9, 2011, 6:48:08 PM7/9/11
to clpp
AppleOpenCL on the Radeon HD4850 seems to be happy for compiling now,
but it is complaining about the workgroup size.

crb002$ ./go
Platform[Apple] Device[Radeon HD 4850]


--------------- Satish radix sort Key
[CL_INVALID_WORK_GROUP_SIZE] : OpenCL Error : clEnqueueNDRangeKernel
failed: total work group size (128) is greater than the device can
support (64)
Break on OpenCLErrorBreak to debug.


kr...@polarlights.net

unread,
Jul 10, 2011, 1:38:45 PM7/10/11
to cl...@googlegroups.com
What is the problem with the workgroup size ?

Thx

crb002

unread,
Jul 10, 2011, 1:46:44 PM7/10/11
to clpp
The Radeon HD4850 can only take a workgroup of size up to 64. 128 is
too large.

http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=127773&enterthread=y&STARTPAGE=2

So, either hard code the workgroup size for each family of GPUs or
figure this out dynamically.

I think CL_KERNEL_WORK_GROUP_SIZE should give the correct number
according to that AMD forum.

-Brew

kr...@polarlights.net

unread,
Jul 10, 2011, 2:27:13 PM7/10/11
to cl...@googlegroups.com
In the first step of the radix sort we try to sort locally 512 elements.

128 * 4 = 512 , where 4 is an int4.

you can try with 256 but maybe you will have to adapt the algorithm.

Krys

Message has been deleted

philipp...@gmail.com

unread,
Jul 12, 2011, 11:59:21 AM7/12/11
to cl...@googlegroups.com
la formule est correcte et ln (math) = log (C++)

ph

Le 12/07/11 17:31, kr...@polarlights.net a écrit :
> Salut Philippe,
>
> Si tu as un peu de temps pour cette question. J'ai un bug dans la fonction 'pow' de OpenCL sur le SDK d'intel.
>
> Du coup j'essaie de l'implémenter autrement et je crois que :
>
> pow(x,exponent) = exp(exponent*ln(x))
>
> avec x et y étant des flottants !
>
> Est ce bien correcte ?
>
> Hors je ne vois pas de fonction 'ln' en OpenCL ! Je n'ai que log, log2, log10, log1p et logb ! Comment puis je la remplacer ?
>
> Merci d'avance
>
> Krys
>

kr...@polarlights.net

unread,
Jul 12, 2011, 4:10:34 PM7/12/11
to cl...@googlegroups.com
Super,

Un grand merci, encore une fois.

Bonne soirée

Krys

-----Original Message-----
From: "philipp...@gmail.com" <philipp...@gmail.com>

Sent: Tuesday, July 12, 2011 11:59am
To: cl...@googlegroups.com

Reply all
Reply to author
Forward
0 new messages