Hi,
There is no known way!
I'm not sure it would even do the speedup you expect:
- GPU cores have to all run the same instruction at the same time. So you would trivially run thousand of time the same run in parallel but I don't know how to handle several paths (that don't do the same event at the same time).
- GPU are fast to compute but slow to transfer data. Even if you manage to design a way so that each core computes its own run. Gathering all the result from all these core should be the bottleneck.
Best,
Pierre B.