Frequency count with RadixSort?

0 views
Skip to first unread message

kirchen...@googlemail.com

unread,
May 18, 2015, 11:03:49 AM5/18/15
to numba...@continuum.io
Hi everyone,

I would like to use cudalib.RadixSort to sort an array and i need to count the frequency of each key.
I assume that RadixSort internally already calculates this count. Is there any way to return it?

If not, what is a fast (but not too complex) way of counting the frequencies of a sorted array?
(without using atomics)

I would be happy if someone could point me in the right direction.

Thanks!

Cheers,
Manuel

kirchen...@googlemail.com

unread,
Jun 3, 2015, 1:05:25 AM6/3/15
to numba...@continuum.io, kirchen...@googlemail.com
Hi,

I am answering my own question and share the things I have learned with you:

"How to do an easy parallel prefix sum on a sorted array"

As I found no possibility to access a parallel prefix sum trough the RadixSort library,
I needed to find another way to do this efficiently without too much effort. The following code is quite "cheap" compared to the Sort itself.
It was tested for 6m values and it took less then 3% of the time of the Sort itself.

If you have a sorted array (e.g.):
(012345678)
 001112244

the parallel prefix sum (inclusive) will look like this:
(01234)
 14668

so if you want to have all particles that have the value == 2:

you need to look for the parallel prefix value at the index 2: = 6
and subtract the value from index 1 from it: 6 - 4 = 2
which will give you the frequency of values for that index.

In order to do the parallel prefix sum, the following code will do the following:

- each thread represents one entry in the sorted array
- it will look for its own value and the value of the next entry of the array
- if the next value is greater than the own value, it will write its threadIdx to the parallel prefix sum array at the position of its value
- this is done within a "while loop" and the own value is incremented by one for each iteration
- the loop stops when the own value is not greater than the next value anymore

@cuda.jit('void(int32[:], int32[:])')
def incl_prefix_sum(sorted_idx, prefix_sum):
    """
    Perform an inclusive parallel prefix sum on the sorted 
    index array.

    Parameters
    ----------
    sorted_idx : 1darray of integers
    
    prefix_sum : 1darray of integers
    """
    i = cuda.grid(1)
    if i < sorted_idx.shape[0]-1:
        si = sorted_idx[i]
        si_next = sorted_idx[i+1]
        while si < si_next:
            prefix_sum[si] = i
            si += 1
    if i == sorted_idx.shape[0]-1:
        si = sorted_idx[i]
        prefix_sum[si] = i

Perhaps someone will consider this helpful.

Cheers,
Manuel
Reply all
Reply to author
Forward
0 new messages