Hi,
I am answering my own question and share the things I have learned with you:
"How to do an easy parallel prefix sum on a sorted array"
As I found no possibility to access a parallel prefix sum trough the RadixSort library,
I needed to find another way to do this efficiently without too much effort. The following code is quite "cheap" compared to the Sort itself.
It was tested for 6m values and it took less then 3% of the time of the Sort itself.
If you have a sorted array (e.g.):
(012345678)
001112244
the parallel prefix sum (inclusive) will look like this:
(01234)
14668
so if you want to have all particles that have the value == 2:
you need to look for the parallel prefix value at the index 2: = 6
and subtract the value from index 1 from it: 6 - 4 = 2
which will give you the frequency of values for that index.
In order to do the parallel prefix sum, the following code will do the following:
- each thread represents one entry in the sorted array
- it will look for its own value and the value of the next entry of the array
- if the next value is greater than the own value, it will write its threadIdx to the parallel prefix sum array at the position of its value
- this is done within a "while loop" and the own value is incremented by one for each iteration
- the loop stops when the own value is not greater than the next value anymore
@cuda.jit('void(int32[:], int32[:])')
def incl_prefix_sum(sorted_idx, prefix_sum):
"""
Perform an inclusive parallel prefix sum on the sorted
index array.
Parameters
----------
sorted_idx : 1darray of integers
prefix_sum : 1darray of integers
"""
i = cuda.grid(1)
if i < sorted_idx.shape[0]-1:
si = sorted_idx[i]
si_next = sorted_idx[i+1]
while si < si_next:
prefix_sum[si] = i
si += 1
if i == sorted_idx.shape[0]-1:
si = sorted_idx[i]
prefix_sum[si] = i
Perhaps someone will consider this helpful.
Cheers,
Manuel