Maybe something wrong with the interface or cable. I will have the Clemson
folks check it out. FYI, at least a couple of your nodes are constantly
spitting out:
-----
[ 5138.736663] NVRM: GPU 0000:21:00.0 is already bound to nouveau.
[ 5138.736671] NVRM: GPU 0000:e2:00.0 is already bound to nouveau.
[ 5138.736692] NVRM: The NVIDIA probe routine was not called for 2 device(s).
[ 5138.736693] NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
[ 5138.736693] NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
[ 5138.736694] NVRM: No NVIDIA devices probed.
[ 5138.736894] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
----
to "dmesg" and the console. Shouldn't have anything to do with the network
problem.