I have built tensorflow 2.0 from source for cuda 10 and cudnn 7. When I use nccl to run multi-host job, I get this error:
p40-gpu-0004:25127:25578 [2] NCCL INFO rank 2 nranks 8
p40-gpu-0004:25127:25578 [3] NCCL INFO rank 3 nranks 8
NCCL version 2.3.5+cudaCUDA_MAJOR.CUDA_MINOR
p40-gpu-0004:25127:25578 [0] NCCL INFO rank 0 nranks 8
p40-gpu-0004:25127:25578 [1] NCCL INFO rank 1 nranks 8
p40-gpu-0004:25127:25592 [1] NCCL INFO comm 0x7f7c082f9b00 rank 1 nranks 8
p40-gpu-0004:25127:25590 [3] NCCL INFO comm 0x7f649b64be50 rank 3 nranks 8
p40-gpu-0004:25127:25589 [2] NCCL INFO comm 0x7f7bf807a0e0 rank 2 nranks 8
p40-gpu-0004:25127:25591 [0] NCCL INFO comm 0x7f7c0c2c9fd0 rank 0 nranks 8
p40-gpu-0004:25127:25592 [1] NCCL INFO CUDA Dev 1, IP Interfaces : eth0(PXB)
p40-gpu-0004:25127:25590 [3] NCCL INFO CUDA Dev 3, IP Interfaces : eth0(PXB)
p40-gpu-0004:25127:25589 [2] NCCL INFO CUDA Dev 2, IP Interfaces : eth0(PXB)
p40-gpu-0004:25127:25591 [0] NCCL INFO CUDA Dev 0, IP Interfaces : eth0(PXB)
p40-gpu-0004:25127:25591 [0] NCCL INFO Using 256 threads
p40-gpu-0004:25127:25591 [0] NCCL INFO Min Comp Cap 6
p40-gpu-0004:25127:25591 [0] NCCL INFO Ring 00 : 0 1 2 3 4 5 6 7
p40-gpu-0004:25127:25591 [0] NCCL INFO Ring 00 : 7 -> 0 via NET/Socket/0
p40-gpu-0004:25127:25591 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via direct shared memory
p40-gpu-0004:25127:25592 [1] NCCL INFO Ring 00 : 1[1] -> 2[2] via direct shared memory
p40-gpu-0004:25127:25589 [2] NCCL INFO Ring 00 : 2[2] -> 3[3] via direct shared memory
p40-gpu-0004:25127:25587 [0] NCCL INFO Launch mode Group/CGMD
p40-gpu-0004:25127:25595 [0] bazel-out/k8-opt/bin/external/nccl_archive/transport/
net_socket.cu.cc:189 NCCL WARN Message truncated : received 174080 bytes instead of 8192
p40-gpu-0004:25127:25595 [0] NCCL INFO bazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/include_hdrs/net.h:28 -> 3
p40-gpu-0004:25127:25595 [0] NCCL INFO bazel-out/k8-opt/bin/external/nccl_archive/transport/
net.cu.cc:474 -> 3
p40-gpu-0004:25127:25595 [0] bazel-out/k8-opt/bin/external/nccl_archive/
transport.cu.cc:153 NCCL WARN bazel-out/k8-opt/bin/external/nccl_archive/
transport.cu.cc:153 -> 3 [Proxy thread error]
p40-gpu-0004:25127:25595 [0] bazel-out/k8-opt/bin/external/nccl_archive/transport/
net_socket.cu.cc:189 NCCL WARN Message truncated : received 976055552 bytes instead of 8192
p40-gpu-0004:25127:25595 [0] NCCL INFO bazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/include_hdrs/net.h:28 -> 3
p40-gpu-0004:25127:25595 [0] NCCL INFO bazel-out/k8-opt/bin/external/nccl_archive/transport/
net.cu.cc:474 -> 3
p40-gpu-0004:25127:25595 [0] bazel-out/k8-opt/bin/external/nccl_archive/
transport.cu.cc:153 NCCL WARN bazel-out/k8-opt/bin/external/nccl_archive/
transport.cu.cc:153 -> 3 [Proxy thread error]