Avoiding control network traffic violations while using RDMA

142 views
Skip to first unread message

ajma...@gmail.com

unread,
May 9, 2025, 12:09:03 PMMay 9
to cloudlab-users
Hi all,

If your experimentation involves RDMA, you will need to make sure that your configuration is using the correct interface.  Simply specifying the internal IP (10.x.x.x) is often insufficient for ensuring your RDMA traffic is flowing over the correct interface.  In utilities such as ib_send_bw for example, the internal IP is used for the handshake QP info exchange, but the physical port that it actually sends traffic out on is selected with a different argument.  By default, it will select the first port on the first RDMA-capable NIC it finds, which is the Control network port for a number of CloudLab hardware types.  You'll want to use utilities such as ibstat or ibv_devinfo to find the device that you actually want to send on, and then ensure that your application is properly configured to use that.  In the case of ib_send_bw for example, you specify it with the -d flag in your arguments.

We have noticed an uptick in incidents of this nature lately, so hopefully this will help shed some light on why this is happening, and help people avoid running into this in the future.

Best,
 - Aleks
Reply all
Reply to author
Forward
0 new messages