In the last link I find the recommendation for "one context per device per process". Hence, I tried the following with two GPUs and one CPU.
1. Call select_device for both GPUs from the host thread, and check the returned context.
2. Call select_device for both HPUs via two processes from the ProcessPoolExectur, and check the returned context.
As a result I see this:
select() from host thread:
<CUDA context c_void_p(48215136) of device 0>
<CUDA context c_void_p(48215136) of device 0>
select() from child processes:
<CUDA context c_void_p(48314656) of device 0>
<CUDA context c_void_p(61306784) of device 1>
The host thread seems to use a single context (48215136). But it also appears that select_device() silenty fails and always selects the same GPU?
"The host thread will have one context to communicate with both GPU's." Considering this simple experiment, what is required to let the CPU communicate with the second GPU?
To reproduce, I used this code:
from concurrent.futures import ProcessPoolExecutor
import numba
from numba import cuda
def select(i):
dev = cuda.select_device(i)
print(cuda.current_context())
def main():
print('selet() from host thread:')
for i in range(2):
select(i)
print('\nselect() from child processes:')
ex = ProcessPoolExecutor(max_workers=2)
tasks = ex.map(select, range(2))
if __name__ == '__main__':
main()