cudpp_hash will take a list of keys (or a list of key-value pairs) on the GPU and create a GPU hash table from it. It requires no data transfer to/from the CPU. It does leverage data parallelism across keys (many keys are inserted simultaneously).
What it doesn't do is incremental inserts. You can't first build a hash table and then want to put more items into it after it's built.
JDO
I haven't understand the idea about of data already located on GPU. If i wanna insert a colection of data in a <key: value (int list)> structure i'll not need alocate data in CPU-side and make a transfer the batch of <key: value> pairs?
Other information: if my value will be a linear structure of each pair i've to pass a parameter of a worst case of all list-values length inserted. Right?
I just wanna use a algorithm that seems like a bucketsort to use the collisions to concatenate lists of equal keys using workloads that can be splitted on a N batches.