Meta and Storage not using infiniband / RDMA

131 views
Skip to first unread message

Omnia

unread,
Apr 9, 2024, 3:06:56 PMApr 9
to beegfs-user
Hello, i am using the following setup in a cluster:

OS: On all nodes Rocky Linux 8.9

IB Cards:  Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
MLNX_OFED: MLNX_OFED_LINUX-23.10-0.5.5.0-rhel8.8-x86_64.iso

BeeGFS Server: 
Version: 7.4.3

Management,Meta and Storage Server VM's under Proxmox

BeeGFS Clients:
Version 7.4.3
Two bare metal machines with same configuration


The hosts IB cards 2 (in total) are passed through to the VM'S of storage and meta,
 the mgmt server has no IB card.

I can successfully use ibping from storage to meta server and to all client nodes.

The two clients get recognized in the right way and the connection is intialized as RDMA connection.

Log of mgmt-server :

Worker2 [Node registration] >> New node: beegfs-client 4B396-661566B6-gpu0.cluster [ID: 17]; RDMA; Source: 192.168.1.20:36986

My meta server doesn't connect through RDMA also the storage node doesn't.

root@bee-meta ~]# tail -f /var/log/beegfs-meta.log
(2) Apr09 17:38:46 Main [Register node] >> Node registration successful.
(3) Apr09 17:38:46 Main [NodeConn (acquire stream)] >> Connected: beegfs...@192.168.1.11:8008 (protocol: TCP)
(2) Apr09 17:38:46 Main [printSyncResults] >> Nodes added (sync results): 1 (Type: beegfs-storage)
(3) Apr09 17:38:46 Main [App] >> Registration and management info download complete.
(3) Apr09 17:38:46 Main [DGramLis] >> Listening for UDP datagrams: any Port 8005
(3) Apr09 17:38:46 Main [ConnAccept] >> Listening for TCP connections: Port 8005
(3) Apr09 17:38:46 Main [App] >> Restored 1 sessions and 0 mirrored sessions
(1) Apr09 17:38:46 Main [App] >> Version: 7.4.3
(2) Apr09 17:38:46 Main [App] >> LocalNode: beegfs-meta bee-meta [ID: 2]
(2) Apr09 17:38:46 Main [App] >> Usable NICs: ens19(TCP) ens18(TCP)


Storage Logs:
 tail -f /var/log/beegfs-storage.log
(2) Apr09 17:38:50 Main [App] >> Usable NICs: ens19(TCP) ens18(TCP)
(2) Apr09 17:38:50 Main [App] >> Storage targets: 1
(3) Apr09 17:38:50 Main [RegDGramLis] >> Listening for UDP datagrams: any Port 8003
(2) Apr09 17:38:50 Main [Register node] >> Node registration successful.
(2) Apr09 17:38:50 Main [InternodeSyncer.cpp:607] >> Storage targets registration successful.
(2) Apr09 17:38:50 Main [Sync results] >> Nodes added: 1 (Type: beegfs-meta)
(3) Apr09 17:38:50 Main [App] >> Registration and management info download complete.
(3) Apr09 17:38:50 Main [DGramLis] >> Listening for UDP datagrams: any Port 8003
(3) Apr09 17:38:50 Main [ConnAccept] >> Listening for TCP connections: Port 8003
(3) Apr09 17:38:50 Main [App] >> 1 sessions restored.

So i can't explain why it behaves so different when the setup is everywhere the same and the storage and meta servers won't recgonize the Infiniband Cards .


The only differences i recognized is that lsmod is showing slighlty different results on meta and storage as on the clients:

[root@bee-meta ~]# lsmod | grep ib
ib_ipoib              147456  0
ib_cm                 118784  2 rdma_cm,ib_ipoib
ib_umad                28672  0
mlx5_ib               409600  0
ib_uverbs             159744  8 rdma_ucm,mlx5_ib
ib_core               401408  8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
libcrc32c              16384  4 nf_conntrack,nf_nat,nf_tables,xfs
mlx5_core            1810432  1 mlx5_ib
libata                270336  2 ata_piix,ata_generic


Client:
[root@gpu0 ~]# lsmod | grep ib
ib_ipoib              155648  0
ib_cm                 114688  2 rdma_cm,ib_ipoib
ib_umad                28672  0
libnvdimm             200704  1 nfit
libcrc32c              16384  4 nf_conntrack,nf_nat,nf_tables,xfs
mlx5_ib               466944  0
ib_uverbs             143360  2 rdma_ucm,mlx5_ib
ib_core               442368  9 
beegfs,rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx5_core            2146304  1 mlx5_ib
libahci                40960  1 ahci
libata                266240  2 libahci,ahci
mlx_compat             16384  12 beegfs,rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core

I tested almost everything and i cant explain why the beegfs-meta and storage services doesn't recognize the ib interfaces and only are using TCP over the 10G network.

Does anyone know of a similar case? 

Thanks in advance

Greetings
Omnia

Guan Xin

unread,
Apr 10, 2024, 12:42:39 AMApr 10
to beegfs-user
Hi,

1) Have you installed the libbeegfs-ib package?
2) Although ibping shows everything ok, could you also check with ibstat?

Greetings
Guan

Omnia

unread,
Apr 10, 2024, 9:53:47 AMApr 10
to beegfs-user
Hi, 

yes the package  libbeegfs-ib is installed already.

I also checked ibstat it shows that the link is active:

[root@bee-meta ~]# ibstat
CA 'mlx5_0'
        CA type: MT4121
        Number of ports: 1
        Firmware version: 16.35.3502
        Hardware version: 0
        Node GUID: 0xb83fd20300be963a
        System image GUID: 0xb83fd20300be963a
        Port 1:
                State: Down
                Physical state: Disabled
                Rate: 10
                Base lid: 65535
                LMC: 0
                SM lid: 0
                Capability mask: 0xa651e848
                Port GUID: 0xb83fd20300be963a
                Link layer: InfiniBand
CA 'mlx5_1'
        CA type: MT4121
        Number of ports: 1
        Firmware version: 16.35.3502
        Hardware version: 0
        Node GUID: 0xb83fd20300be963b
        System image GUID: 0xb83fd20300be963a
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 4
                LMC: 0
                SM lid: 1
                Capability mask: 0xa651e848
                Port GUID: 0xb83fd20300be963b
                Link layer: InfiniBand

I am using  Proxmox Virtual Environment 8.0.9 for virtualization.

cat /var/tmp/ibdiagnet2/ibdiagnet2.lst

Unbenannt.JPG

The only difference on ibdiagnet2 is that the bee-storage and bee-meta Cards does not get recognized as mlx5_0 but HCA-2 could this be causing the issues ?

Quentin Le Burel

unread,
Apr 10, 2024, 11:02:27 AMApr 10
to fhgfs...@googlegroups.com
Hi,
Can you check if "ip a" is showing the IPoIB interfaces as up ?
And did you try to just restart the beegfs-meta and storage services ? sometimes they start too early in the boot process before ib0 is up and then ignore the interface.

Kind regards

Quentin



--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fhgfs-user/bdebc399-4439-4333-b0fd-812c84914f8bn%40googlegroups.com.
Message has been deleted

Guan Xin

unread,
Apr 10, 2024, 11:03:31 PMApr 10
to beegfs-user
On Wednesday, April 10, 2024 at 9:53:47 PM UTC+8 Omnia wrote:

The only difference on ibdiagnet2 is that the bee-storage and bee-meta Cards does not get recognized as mlx5_0 but HCA-2 could this be causing the issues ?

No. These names are used for diagnostics and can be changed at will under sysfs. 

Omnia

unread,
Apr 11, 2024, 4:39:23 AMApr 11
to beegfs-user
@Guan Xin ah okay, i thought maybe the drivers aren't recognized right.

@Quentin the state is marked as DOWN but its also marked DOWN on the client node where RDMA works fine, as you can see on the screen shot gpu0=client node.

ipa_ib0.jpg

Quentin Le Burel

unread,
Apr 11, 2024, 4:55:39 AMApr 11
to fhgfs...@googlegroups.com
The links should be up for proper communication over IB.
Try ip link set ib0 up and then restart the beegfs services ?





--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.

Omnia

unread,
Apr 11, 2024, 5:53:09 AMApr 11
to beegfs-user
When i run   ip link set ib0 up the state stays as down.
But i think this seems to be a bug or something as the client also shows down but the RDMA connection gets initialized.

Might it be possible that i need to install the MLNX OFED Drivers also on the Proxmox Host and not only on the VM?
At the moment i am trying to find the right MLNX Ofed build for Proxmox 8.0.9 based on Debian 12.

May i ask do you also use Proxmox with ConnectX - 5 IB card's and does it work?

Quentin Le Burel

unread,
Apr 11, 2024, 6:09:58 AMApr 11
to fhgfs...@googlegroups.com
Can you try ib1 instead of ib0 ? 
Also set the IP on ib1
Your ibstat output below shows it's the second card that is connected so that would be ib1




Omnia

unread,
Apr 11, 2024, 7:35:29 AMApr 11
to beegfs-user
I tried also to set up ib1 and i already had an IP address configured as i had tried before to use the other port and switched the cable from ib0 to ib1.
But unfortunately it won't come up .

ib0_ib1_down.JPG

Omnia

unread,
Apr 11, 2024, 7:39:11 AMApr 11
to beegfs-user
I just looked with dmesg for some failure it reports this :

[   14.885228] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   16.929710] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   20.961358] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   29.153058] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   45.537332] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   61.921191] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   78.305114] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[   94.689510] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  111.073498] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  127.457458] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  143.841515] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  160.225630] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  176.609582] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  192.993369] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  209.377433] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  225.761256] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  242.145172] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  258.529315] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  274.912986] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[  291.297406] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[152524.538294] IPv6: ADDRCONF(NETDEV_UP): ib1: link is not ready
[152524.539204] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[152526.561563] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[152530.593071] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22
[152539.104977] ib1: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22


Might that causing the issues ? I am using a  Mellanox MLNX-OS MQM8700 Switch with SubnetManager activated.

Omnia

unread,
Apr 11, 2024, 8:28:44 AMApr 11
to beegfs-user
Ok nevermind as this message also occurs on the client nodes which establish successfully a RDMA connection.

Omnia

unread,
Apr 11, 2024, 10:21:35 AMApr 11
to beegfs-user
Finally i found the solution it seems like to be necessary to install the MLNX OFED Drivers on the Proxmox host machines also.

Working setup for me was with MLNX OFED Drivers:
MLNX_OFED_LINUX-24.01-0.3.3.1-debian12.1-x86_64.iso
Proxmox 8.0.9 pve-manager/8.0.9/
Debian 12 (bookworm)
Linux 6.2.16-19-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-19

Thank you for your guidance and help.

Allan Thrower

unread,
Apr 12, 2024, 6:38:57 PMApr 12
to fhgfs...@googlegroups.com
https://doc.beegfs.io/latest/release_notes.html

Is your installed OFED release on the supported list for 7.4.3 and RHEL 8.9?

--
You received this message because you are subscribed to the Google Groups "beegfs-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fhgfs-user+...@googlegroups.com.

Denis Anjos

unread,
Apr 12, 2024, 6:39:11 PMApr 12
to fhgfs...@googlegroups.com

Have you configured IPoIB on every interface? All IB interfaces need to have IP configured in order to work.


D.

 

--

Reply all
Reply to author
Forward
0 new messages