Multirail RDMA connections fallback to TCP: rdma_resolve_addr failed

42 views
Skip to first unread message

Shenyue Chen

unread,
Dec 25, 2022, 9:40:12 PM12/25/22
to beegfs-user
Hi BeeGFS users,

We're trying to make use of the client multirail feature to improve bandwidth and load balancing. https://doc.beegfs.io/latest/advanced_topics/rdma_support.html#client-multi-rail-support

Our test setup:
  • Two server nodes with 2 * ConnectX-5 cards, on 2 different IP, Ubuntu 20.04
  • One client nodes with 2 * ConnectX-6 cards, on 2 different IP, Ubuntu 22.04
  • All using MLNX OFED 5.4 LTS
  • Two servers running 1 * meta + 2 * storage (in total 2 * meta + 4 * storage)
I'm using beegfs-net to check the connections. When multirail is not enabled, all connections are RDMA
Screen Shot 2022-12-26 at 10.21.40.png

However, when we turn on the multirail feature using the connRDMAInterfacesFile option. All connections fallback to TCP:
Screen Shot 2022-12-26 at 10.22.10.png

In both cases, we've made sure the configuration are correct:
  • /proc/fs/beegfs/<clientID>/client_info show expected interfaces
  • All machines are ibping-able

Additionally, by turning on the debug flags, we found the RDMA connection issues are caused by this function rdma_resolve_addr:

[14828.300006] beegfs: mount(33435): IBVSocket_connectByIP:189: rdma_resolve_addr failed, src client-ip1, dst node1-ip1

When establishing connection from an explicit ip, the connection attempt will fail. When connecting from src=NULL (any), this will work.

Establishing new RDMA connection from any to: beegfs-meta@node1-ip1
Connected: beegfs-meta@node1-ip1 (protocol: RDMA)
Preferred IP addr is xxxxx
Establishing new RDMA connection from client-ip1 to: beegfs-storage@node1-ip1
Connect failed: beegfs-storage@node1-ip1 (protocol: RDMA)

Several attempts we've tried but all not working:
  • Reboot the machine
  • Give it some traffic
  • Disable TCP fallback using 
  • Upgrade firmware
  • Upgrade MLNX OFED driver

Any help or suggestions would be accepted. Thank you!

Vinci Chow

unread,
Jan 28, 2023, 6:30:00 AM1/28/23
to beegfs-user
Did you configure IP routing tables for multi-home support, so that traffic received by one network interface always go back by that inferface?
Reply all
Reply to author
Forward
0 new messages