I have a computation framework which uses multiprocessing.Process. Each of the spawned processes has to perform FFTs, which I want to offload to a Tesla GPU using reikna.
However, I get "pycuda._driver.LogicError: cuDeviceGetCount failed: initialization error" when using reikna within multiple processes.
See the example below.
Is there a way to use reikna within multiple processes?
best regards
Thomas
def init_reikna():
from reikna import cluda
# init reikna
api=cluda.cuda_api()
dev = api.get_platforms()[0].get_devices()[0]
thr = api.Thread(dev)
# to work here
if __name__=='__main__':
import numpy as np
from reikna import cluda
from reikna.fft import FFT
from multiprocessing import Process
# init reikna
api=cluda.cuda_api()
dev = api.get_platforms()[0].get_devices()[0]
thr = api.Thread(dev)
# initdata
wind=np.random.rand(256,256,8)+1j*np.random.rand(256,256,8)
data=np.random.rand(256,256,8)+1j*np.random.rand(256,256,8)
# precompile
d=np.pad(data,((0,0),(0,0),(0,32-data.shape[2])),'constant')
fft = FFT(d, axes=(0,1,2))
fftc = fft.compile(thr, fast_math=True)
p=Process(target=init_reikna)
p.start()
# do some dummy work in main
for i in range(10):
d=wind*data
d=np.pad(data,((0,0),(0,0),(0,32-data.shape[2])),'constant')
data_dev = thr.to_device(d)
fftc(data_dev, data_dev)
fwd = data_dev.get()
print(fwd.shape)
p.join()
Thanks for that tip. I'm using linux, where the multiprocessing module forks per default. Googling around shows that forking is incompatible with pycuda.
Using spawned processes instead of forked ones works.
However, startup time is much slower with spawned instead of forked processes.
best regards
Thomas