Frequency count with RadixSort?

0 views

Skip to first unread message

kirchen...@googlemail.com

unread,

May 18, 2015, 11:03:49 AM5/18/15

to numba...@continuum.io

Hi everyone,

I would like to use cudalib.RadixSort to sort an array and i need to count the frequency of each key.

I assume that RadixSort internally already calculates this count. Is there any way to return it?

If not, what is a fast (but not too complex) way of counting the frequencies of a sorted array?

(without using atomics)

I would be happy if someone could point me in the right direction.

Thanks!

Cheers,

Manuel

kirchen...@googlemail.com

unread,

Jun 3, 2015, 1:05:25 AM6/3/15

to numba...@continuum.io, kirchen...@googlemail.com

Hi,

I am answering my own question and share the things I have learned with you:

"How to do an easy parallel prefix sum on a sorted array"

As I found no possibility to access a parallel prefix sum trough the RadixSort library,

I needed to find another way to do this efficiently without too much effort. The following code is quite "cheap" compared to the Sort itself.

It was tested for 6m values and it took less then 3% of the time of the Sort itself.

If you have a sorted array (e.g.):

(012345678)

001112244

the parallel prefix sum (inclusive) will look like this:

(01234)

14668

so if you want to have all particles that have the value == 2:

you need to look for the parallel prefix value at the index 2: = 6

and subtract the value from index 1 from it: 6 - 4 = 2

which will give you the frequency of values for that index.

In order to do the parallel prefix sum, the following code will do the following:

- each thread represents one entry in the sorted array

- it will look for its own value and the value of the next entry of the array

- if the next value is greater than the own value, it will write its threadIdx to the parallel prefix sum array at the position of its value

- this is done within a "while loop" and the own value is incremented by one for each iteration

- the loop stops when the own value is not greater than the next value anymore

@cuda.jit('void(int32[:], int32[:])')

def incl_prefix_sum(sorted_idx, prefix_sum):

"""

Perform an inclusive parallel prefix sum on the sorted

index array.

Parameters

----------

sorted_idx : 1darray of integers

prefix_sum : 1darray of integers

"""

i = cuda.grid(1)

if i < sorted_idx.shape[0]-1:

si = sorted_idx[i]

si_next = sorted_idx[i+1]

while si < si_next:

prefix_sum[si] = i

si += 1

if i == sorted_idx.shape[0]-1:

si = sorted_idx[i]

prefix_sum[si] = i

Perhaps someone will consider this helpful.

Cheers,

Manuel

Reply all

Reply to author

Forward

0 new messages