HA pacemaker setup and OFED

62 views
Skip to first unread message

Giannis Kapetanakis

unread,
Jun 11, 2022, 4:21:53 AM6/11/22
to beegfs-user
Hi,
We're migrating our HPC storage from lustre to BeeGFS 7.3.0, on top of infiniband.

Since we've already had 1 shared SAN storage for storage and 1 for metadata we're keeping it like this. For HA we use 2 storage servers and 2 metadata servers. Each server sees 2 volumes on storage device so we have an active-active setup. Same goes for meta.

The setup and the failover works well. The only "strange" thing I see is that the virtual IPs  are not beeing used and the real IPs are being used.

# pcs resource
  * Resource Group: beegfs_storage1:
    * VIP-storage1    (ocf::heartbeat:IPaddr2):     Started oss-1
    * disk-storage1    (ocf::heartbeat:Filesystem):     Started oss-1
    * beegfs-storage1    (systemd:beegfs-storage@storage1):     Started oss-1
  * Resource Group: beegfs_storage2:
    * VIP-storage2    (ocf::heartbeat:IPaddr2):     Started oss-2
    * disk-storage2    (ocf::heartbeat:Filesystem):     Started oss-2
    * beegfs-storage2    (systemd:beegfs-storage@storage2):     Started oss-2

VIP-storage1 is 10.1.7.4/32
VIP-storage2 is 10.1.7.5/32

Metadata looks also like this.

I'm using connNetFilterFile, which as I understand has to do with destination IP addresses only. Is this correct? It does not filter on source IP address. It's NOT something like bindToIP.

on client I see:
# beegfs-net
storage_nodes
=============
storage1-ib0.example.com [ID: 1]
   Connections: RDMA: 1 (10.1.7.21:8003); (REAL IP HERE)
storage2-ib0.example.com [ID: 2]
   Connections: RDMA: 1 (10.1.7.22:8004); (REAL IP HERE)

# beegfs-check-servers
 Storage
==========
storage1-ib0.example.com [ID: 1]: reachable at 10.1.7.21:8003 (protocol: RDMA)
storage2-ib0.example.com [ID: 2]: reachable at 10.1.7.22:8004 (protocol: RDMA)
REAL IPs also above.

# beegfs-ctl --listnodes --nodetype=storage --nicdetails
storage1-ib0.example.com [ID: 1]
   Ports: UDP: 8003; TCP: 8003
   Interfaces:
   + ib0[ip addr: 10.1.7.21; type: RDMA] (REAL IP HERE)
   + ib0[ip addr: 10.1.7.4; type: RDMA] (VIP HERE)
   + ib0[ip addr: 10.1.7.21; type: TCP] (REAL IP HERE)
   + ib0[ip addr: 10.1.7.4; type: TCP] (VIP HERE)
storage2-ib0.example.com [ID: 2]
   Ports: UDP: 8004; TCP: 8004
   Interfaces:
   + ib0[ip addr: 10.1.7.22; type: RDMA] (REAL IP HERE)
   + ib0[ip addr: 10.1.7.5; type: RDMA] (VIP HERE)
   + ib0[ip addr: 10.1.7.22; type: TCP] (REAL IP HERE)
   + ib0[ip addr: 10.1.7.5; type: TCP] (VIP HERE)

So any way to force only the VIPs?
Listing only VIPs on connNetFilterFile does not work.
connInterfacesFile lists only ib0

If there is no such option I probably don't need VIPs at all.
Or maybe I don't need the real IPs, although even in this case, after failover I would see the two VIPs on each storage server which would still be wrong.

Second question:
I'm running on Rocky 8.5 since this is the latest version supported by MLNX_OFED 4.9-0.1.7.0 which is for my connectX-3 cards.

Is anyone using INBOX drivers from RHEL instead of MLNX_OFED?
Are they stable?
Performance wise, is there a big difference?

I would to do faster updates this time, since in the past MLNX as well as Lustre had kept the cluster outdated a lot.

thanks,

Giannis

Guan Xin

unread,
Jun 12, 2022, 4:25:55 AM6/12/22
to beegfs-user
Hi,

1) The official Linux kernel driver is also from Mellanox. No idea why there should be a difference. Could also try kernel-lt or kernel-ml from elrepo.

2) Never though of using virtual IP addresses for meta and storage, but virtual servers could always be used when necessary.
  We put different groups of metadata servers into separate network namespaces, for example, to restrict them to (NUMA-)node local Infiniband ports.

Guan

bmer...@cambridgecomputer.com

unread,
Jun 13, 2022, 9:29:44 AM6/13/22
to beegfs-user
Use iflabel on your VIP and in the connInterfacesFile, that way BeeGFS will only use the ib0 address with the label specified. 

Eg.

# cat /mnt/mgmtd_disk/mgmtd_config/connInterfacesFile.conf
ib0:mgmtd

# pcs resource create mgmtd-IP ocf:heartbeat:IPaddr2 ip=<VIP> iflabel=mgmtd nic=ib0 ...

Giannis Kapetanakis

unread,
Jun 24, 2022, 2:33:24 AM6/24/22
to beegfs-user
Thanks, iflabel made this work :)
Reply all
Reply to author
Forward
0 new messages