Hi,
I've been looking into an issue where cifar10 fails to start training when running on multiple P100 GPU cards. 1 P100 is fine, 1 or more M40 is fine, this only failed on multiple P100.
cifar10 gets stuck at this line: CUDA_CHECK(cudaStreamSynchronize(comm_stream_->get())); inside parallel.cpp
Has anyone else encountered this problem and has a work around or solved it?
Thanks,
Michael