Viewon01
unread,Feb 8, 2012, 10:23:22 AM2/8/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to clpp
Dear CLPP team,
I'm pleased to announce that I have integrated the work of Eric
Bainville, the GPU version of the Bitonic sort in CLPP.
First, thanks to Eric for his work, this sort algorithm is simply...
great.
This sort algorithm is 2x (or more) faster than the Radix-Satish sort
for small data-set (< 1 M).
But, as usual we always discover some strange stuffs when playing with
this kind of algorithm on the GPU.
In the current case the Bitonic soft is 4X slower when I use the Key-
Value sort instead of the Key-only sort, at least on my GTX 590 !
It is strange and looks like a memory access problem... as often with
NVidia GPU.
Here are the benchmark results :
--------------- GPU : Key : Bitonic sort
Performance for data-set size[1024] time (ms): 0.717365 KPS[1427446]
Performance for data-set size[4096] time (ms): 1.28015 KPS[3199633]
Performance for data-set size[8192] time (ms): 1.18177 KPS[6931949]
Performance for data-set size[16384] time (ms): 1.25885 KPS[13015073]
Performance for data-set size[32768] time (ms): 1.31907 KPS[24841814]
Performance for data-set size[65536] time (ms): 1.55204 KPS[42225604]
Performance for data-set size[131072] time (ms): 2.26921 KPS[57760976]
Performance for data-set size[262144] time (ms): 2.84238 KPS[92226864]
Performance for data-set size[524288] time (ms): 3.96873
KPS[132104840]
Performance for data-set size[1048576] time (ms): 6.7722
KPS[154835264]
--------------- GPU : Key-Value : Bitonic sort
Performance for data-set size[1024] time (ms): 2.60164 KPS[393597]
Performance for data-set size[4096] time (ms): 3.87459 KPS[1057145]
Performance for data-set size[8192] time (ms): 5.03649 KPS[1626530]
Performance for data-set size[16384] time (ms): 6.50554 KPS[2518467]
Performance for data-set size[32768] time (ms): 7.20386 KPS[4548671]
Performance for data-set size[65536] time (ms): 8.35131 KPS[7847392]
Performance for data-set size[131072] time (ms): 8.77579 KPS[14935630]
Performance for data-set size[262144] time (ms): 10.4196 KPS[25158832]
Performance for data-set size[524288] time (ms): 16.8555 KPS[31104852]
Performance for data-set size[1048576] time (ms): 23.3437
KPS[44919068]
Krys