HA pacemaker setup and OFED

Skip to first unread message

Giannis Kapetanakis

Jun 11, 2022, 4:21:53 AM6/11/22
to beegfs-user
We're migrating our HPC storage from lustre to BeeGFS 7.3.0, on top of infiniband.

Since we've already had 1 shared SAN storage for storage and 1 for metadata we're keeping it like this. For HA we use 2 storage servers and 2 metadata servers. Each server sees 2 volumes on storage device so we have an active-active setup. Same goes for meta.

The setup and the failover works well. The only "strange" thing I see is that the virtual IPs  are not beeing used and the real IPs are being used.

# pcs resource
  * Resource Group: beegfs_storage1:
    * VIP-storage1    (ocf::heartbeat:IPaddr2):     Started oss-1
    * disk-storage1    (ocf::heartbeat:Filesystem):     Started oss-1
    * beegfs-storage1    (systemd:beegfs-storage@storage1):     Started oss-1
  * Resource Group: beegfs_storage2:
    * VIP-storage2    (ocf::heartbeat:IPaddr2):     Started oss-2
    * disk-storage2    (ocf::heartbeat:Filesystem):     Started oss-2
    * beegfs-storage2    (systemd:beegfs-storage@storage2):     Started oss-2

VIP-storage1 is
VIP-storage2 is

Metadata looks also like this.

I'm using connNetFilterFile, which as I understand has to do with destination IP addresses only. Is this correct? It does not filter on source IP address. It's NOT something like bindToIP.

on client I see:
# beegfs-net
storage1-ib0.example.com [ID: 1]
   Connections: RDMA: 1 (; (REAL IP HERE)
storage2-ib0.example.com [ID: 2]
   Connections: RDMA: 1 (; (REAL IP HERE)

# beegfs-check-servers
storage1-ib0.example.com [ID: 1]: reachable at (protocol: RDMA)
storage2-ib0.example.com [ID: 2]: reachable at (protocol: RDMA)
REAL IPs also above.

# beegfs-ctl --listnodes --nodetype=storage --nicdetails
storage1-ib0.example.com [ID: 1]
   Ports: UDP: 8003; TCP: 8003
   + ib0[ip addr:; type: RDMA] (REAL IP HERE)
   + ib0[ip addr:; type: RDMA] (VIP HERE)
   + ib0[ip addr:; type: TCP] (REAL IP HERE)
   + ib0[ip addr:; type: TCP] (VIP HERE)
storage2-ib0.example.com [ID: 2]
   Ports: UDP: 8004; TCP: 8004
   + ib0[ip addr:; type: RDMA] (REAL IP HERE)
   + ib0[ip addr:; type: RDMA] (VIP HERE)
   + ib0[ip addr:; type: TCP] (REAL IP HERE)
   + ib0[ip addr:; type: TCP] (VIP HERE)

So any way to force only the VIPs?
Listing only VIPs on connNetFilterFile does not work.
connInterfacesFile lists only ib0

If there is no such option I probably don't need VIPs at all.
Or maybe I don't need the real IPs, although even in this case, after failover I would see the two VIPs on each storage server which would still be wrong.

Second question:
I'm running on Rocky 8.5 since this is the latest version supported by MLNX_OFED 4.9- which is for my connectX-3 cards.

Is anyone using INBOX drivers from RHEL instead of MLNX_OFED?
Are they stable?
Performance wise, is there a big difference?

I would to do faster updates this time, since in the past MLNX as well as Lustre had kept the cluster outdated a lot.



Guan Xin

Jun 12, 2022, 4:25:55 AM6/12/22
to beegfs-user

1) The official Linux kernel driver is also from Mellanox. No idea why there should be a difference. Could also try kernel-lt or kernel-ml from elrepo.

2) Never though of using virtual IP addresses for meta and storage, but virtual servers could always be used when necessary.
  We put different groups of metadata servers into separate network namespaces, for example, to restrict them to (NUMA-)node local Infiniband ports.



Jun 13, 2022, 9:29:44 AM6/13/22
to beegfs-user
Use iflabel on your VIP and in the connInterfacesFile, that way BeeGFS will only use the ib0 address with the label specified. 


# cat /mnt/mgmtd_disk/mgmtd_config/connInterfacesFile.conf

# pcs resource create mgmtd-IP ocf:heartbeat:IPaddr2 ip=<VIP> iflabel=mgmtd nic=ib0 ...

Giannis Kapetanakis

Jun 24, 2022, 2:33:24 AM6/24/22
to beegfs-user
Thanks, iflabel made this work :)
Reply all
Reply to author
0 new messages