You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to clpp
Hi,
I've just added a new sort algoritm, very basic for now but already
faster in some condition. It is the bitonic sort algorithm, it works
fine on the CPU and for data set size equals to 1<< N.
Here are some performances comparisons :
OpenCL Platform : AMD Accelerated Parallel Processing
OpenCL Device : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
--------------- Satish sort Key-Value
Performance for data-set size[1024] time (ms): 0.367715 KPS[2784767]
Performance for data-set size[4096] time (ms): 0.784064 KPS[5224064]
Performance for data-set size[8192] time (ms): 1.34049 KPS[6111194]
Performance for data-set size[16384] time (ms): 2.88351 KPS[5681965]
Performance for data-set size[32768] time (ms): 5.83458 KPS[5616172]
Performance for data-set size[65536] time (ms): 16.1906 KPS[4047783]
Performance for data-set size[131072] time (ms): 36.8477 KPS[3557125]
Performance for data-set size[262144] time (ms): 75.2805 KPS[3482230]
--------------- Bitonic sort Key-Value
Performance for data-set size[1024] time (ms): 0.0416843 KPS[24565588]
Performance for data-set size[4096] time (ms): 0.0979675 KPS[41809768]
Performance for data-set size[8192] time (ms): 0.242014 KPS[33849228]
Performance for data-set size[16384] time (ms): 0.616134 KPS[26591608]
Performance for data-set size[32768] time (ms): 1.48633 KPS[22046300]
Performance for data-set size[65536] time (ms): 3.33594 KPS[19645446]
Performance for data-set size[131072] time (ms): 7.37194 KPS[17779846]
Performance for data-set size[262144] time (ms): 16.3755 KPS[16008270]
OpenCL Platform : Intel(R) OpenCL
OpenCL Device : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
--------------- Satish sort Key-Value
Performance for data-set size[1024] time (ms): 0.230857 KPS[4435655]
Performance for data-set size[4096] time (ms): 0.620495 KPS[6601180]
Performance for data-set size[8192] time (ms): 14.5135 KPS[564439]
Performance for data-set size[16384] time (ms): 1.64213 KPS[9977301]
Performance for data-set size[32768] time (ms): 2.65385 KPS[12347366]
Performance for data-set size[65536] time (ms): 4.58262 KPS[14300995]
Performance for data-set size[131072] time (ms): 9.38908 KPS[13960054]
Performance for data-set size[262144] time (ms): 25.0086 KPS[10482146]
--------------- Bitonic sort Key-Value
Performance for data-set size[1024] time (ms): 0.0712739 KPS[14367112]
Performance for data-set size[4096] time (ms): 0.154489 KPS[26513174]
Performance for data-set size[8192] time (ms): 0.433946 KPS[18877912]
Performance for data-set size[16384] time (ms): 0.494113 KPS[33158378]
Performance for data-set size[32768] time (ms): 0.9924 KPS[33018930]
Performance for data-set size[65536] time (ms): 2.01328 KPS[32551806]
Performance for data-set size[131072] time (ms): 4.45903 KPS[29394730]
Performance for data-set size[262144] time (ms): 9.9656 KPS[26304882]
OpenCL Platform : NVIDIA CUDA
OpenCL Device : Quadro FX 380
--------------- Satish sort Key-Value
Performance for data-set size[1024] time (ms): 0.142871 KPS[7167282]
Performance for data-set size[4096] time (ms): 0.266221 KPS[15385716]
Performance for data-set size[8192] time (ms): 0.329403 KPS[24869212]
Performance for data-set size[16384] time (ms): 0.341379 KPS[47993612]
Performance for data-set size[32768] time (ms): 0.649352 KPS[50462600]
Performance for data-set size[65536] time (ms): 1.11045 KPS[59017408]
Performance for data-set size[131072] time (ms): 2.14495 KPS[61107376]
Performance for data-set size[262144] time (ms): 4.25322 KPS[61634312]
--------------- Bitonic sort Key-Value
Performance for data-set size[1024] time (ms): 0.441135 KPS[2321285]
Performance for data-set size[4096] time (ms): 0.685279 KPS[5977130]
Performance for data-set size[8192] time (ms): 0.737831 KPS[11102809]
Performance for data-set size[16384] time (ms): 1.29097 KPS[12691227]
Performance for data-set size[32768] time (ms): 2.07302 KPS[15806856]
Performance for data-set size[65536] time (ms): 4.35922 KPS[15033871]
Performance for data-set size[131072] time (ms): 8.78023 KPS[14928080]
Performance for data-set size[262144] time (ms): 18.0001 KPS[14563475]