Lost metadata buddy mirror group?

163 views

Skip to first unread message

Hutcheson, Mike

unread,

Jun 29, 2021, 4:08:28 PM6/29/21

to fhgfs...@googlegroups.com

This is one of those situations where multiple concurrent changes to a configuration can bite you when you're trying to solve a problem. We were running BeeGFS 7.2 on a CentOS 7.8 storage cluster with four storage nodes and two metadata nodes with metadata buddy mirroring enabled. The storage cluster was connected to a HPC cluster that was running Omni-Path, and things were working fine.

However, a new product we're introducing to the cluster doesn't support Omni-Path, so we replaced the Omni-Path network with HDR IB and installed the Mellanox OFED 5.3 driver stack. And, to complicate things more, we changed the IB network from 172.40/20 to 172.20/20 in order to be compliant with private network address space requirements.

In addition to the IB network, we have a 1Gb network between all the systems as well.

The trouble started when trying to rebuild the BeeGFS 7.2 client on a CentOS 7.8 system running the Mellanox OFED 5.3 stack. I (belatedly) checked the release notes on BeeGFS 7.2.2 and found it is compatible with the 5.3 stack, so I uninstalled BeeGFS 7.2 and installed 7.2.2. The client was able to build after that. Good!

Even though 7.2.2 is documented as being compatible with 7.2, I went ahead and installed 7.2.2 on the management, storage, and metadata servers.

After starting the management server, I started up the storage and metadata servers. The management server logged a message saying it rejected a new target registration request. Hmmm. I figured that was due to changing the IP addresses of the storage cluster nodes. So, naively, I set sysAllowTargets = True on the management server, restarted beegfs-mgmtd, and then started the storage and metadata services. I didn't receive any rejection messages that time. I have since set sysAllowTargets back to false and restarted mgmtd and all of the other services many times while troubleshooting.

I then tried starting the client and it failed. Here's what is in the client log:

[root@n025 ~]# cat /var/log/data-client.log
(1) Jun29 12:38:37 Main [App] >> BeeGFS Helper Daemon Version: 7.2.2
(1) Jun29 12:38:37 Main [App] >> Client log messages will be prefixed with an asterisk (*) symbol.
(3) Jun29 14:10:02 *mount(4733) [DatagramListener (init sock)] >> Listening for UDP datagrams: Port 8054
(1) Jun29 14:10:02 *mount(4733) [App_logInfos] >> BeeGFS Client Version: 7.2.2
(2) Jun29 14:10:02 *mount(4733) [App_logInfos] >> ClientID: 127D-60DB700A-n025
(2) Jun29 14:10:02 *mount(4733) [App_logInfos] >> Usable NICs: ib0(TCP) ib0(RDMA)
(2) Jun29 14:10:02 *mount(4733) [App_logInfos] >> Net filters: 2
(2) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Init] >> Waiting for beegfs-mgmtd@mgmt001:8058...
(3) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [NodeConn (acquire stream)] >> Connected: beegfs-...@127.0.0.1:8056 (protocol: TCP)
(2) Jun29 14:10:02 *beegfs_DGramLis(4734) [Heartbeat incoming] >> New node: beegfs-mgmtd sto001 [ID: 1];
(3) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Init] >> Management node found. Downloading node groups...
(3) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [NodeConn (acquire stream)] >> Connected: beegfs...@192.168.4.120:8058 (protocol: TCP)
(2) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Sync] >> Nodes added (sync results): 2 (Type: beegfs-meta)
(2) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Sync] >> Nodes added (sync results): 4 (Type: beegfs-storage)
(3) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Init] >> Node registration...
(2) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Registration] >> Node registration successful.
(3) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Init] >> Init complete.
(0) Jun29 14:10:02 *mount(4733) [Stat root dir] >> Unable to proceed without a working root metadata node <-------- !!!!!!!!!!!!!
(0) Jun29 14:10:02 *mount(4733) [Mount sanity check] >> Retrieval of root directory entry failed. Are all metadata servers running and registered at the management daemon? (Error: Unknown node)
(2) Jun29 14:10:02 *mount(4733) [App (stop components)] >> Stopping components...
(2) Jun29 14:10:02 *beegfs_XNodeSyn(4735) [Deregistration] >> Node deregistration successful.
(2) Jun29 14:10:04 *mount(4733) [App (wait for component termination)] >> Still waiting for this component to stop: beegfs_AckMgr
(2) Jun29 14:10:05 *mount(4733) [App (wait for component termination)] >> Component stopped: beegfs_AckMgr
(1) Jun29 14:10:05 *mount(4733) [App (stop)] >> All components stopped.

Here's what is in the mgmtd log:

Note, beegfs-mgmtd was started before the storage servers and metadata servers, which, if I understand correctly, explains the burst of Auto-offline messages at 12:28. 12:30 is when I brought the storage servers online then the metadata servers 12:32:55.

The client was started at 14:03.

[root@mgmt001 log]# cat beegfs-data-mgmtd.log
(1) Jun29 12:22:56 Main [App] >> Version: 7.2.2
(2) Jun29 12:22:56 Main [App] >> LocalNode: beegfs-mgmtd sto001 [ID: 1]
(2) Jun29 12:22:56 Main [App] >> Usable NICs: ens192(TCP)
(2) Jun29 12:23:16 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 1; Pool: Emergency; Reason: No capacity report received.
(2) Jun29 12:23:16 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 2; Pool: Emergency; Reason: No capacity report received.
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 0
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 1
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 2
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 3
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 4
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from storage target for 349 seconds. Setting storage target to offline. Storage target ID: 6866
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 0
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 1
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 2
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 4834
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 51989
(2) Jun29 12:28:46 XNodeSync [Auto-offline] >> No state report received from metadata node for 349 seconds. Setting metadata node to offline. Metadata node ID: 59652
(2) Jun29 12:30:48 DirectWorker1 [Change consistency states] >> Storage target is coming online. ID: 1
(2) Jun29 12:30:51 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 1; TargetID: 1; Pool: Normal.
(2) Jun29 12:31:22 DirectWorker1 [Change consistency states] >> Storage target is coming online. ID: 3
(2) Jun29 12:31:26 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 2; TargetID: 3; Pool: Normal.
(2) Jun29 12:32:06 Worker20 [Change consistency states] >> Storage target is coming online. ID: 4
(2) Jun29 12:32:11 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 3; TargetID: 4; Pool: Normal.
(2) Jun29 12:32:27 Worker17 [Change consistency states] >> Storage target is coming online. ID: 2
(2) Jun29 12:32:31 XNodeSync [Assign target to capacity pool] >> Storage target capacity pool assignment updated. NodeID: 4; TargetID: 2; Pool: Normal.
(2) Jun29 12:32:55 DirectWorker1 [Change consistency states] >> Metadata node is coming online. ID: 1
(2) Jun29 12:32:56 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 1; Pool: Normal; Reason: Free capacity threshold
(2) Jun29 12:33:25 Worker10 [Change consistency states] >> Metadata node is coming online. ID: 2
(2) Jun29 12:33:26 XNodeSync [Assign node to capacity pool] >> Metadata node capacity pool assignment updated. NodeID: 2; Pool: Normal; Reason: Free capacity threshold
(2) Jun29 14:10:03 Worker2 [Node registration] >> New node: beegfs-client 127D-60DB700A-n025 [ID: 126]; RDMA; Source: 192.168.1.25:38980
(2) Jun29 14:10:03 Worker3 [RemoveNodeMsgEx.cpp:66] >> Node removed. node: beegfs-client 127D-60DB700A-n025 [ID: 126]

This is the log of the first metadata server. Note that there's nothing in it about needing to resync with its partner. It doesn't appear to know it has a mirror buddy.

[root@stm001 beegfs]# cat /var/log/beegfs-data-meta.log
(3) Jun29 12:32:52 Main [App] >> Root directory loaded.
(1) Jun29 12:32:52 Main [App] >> Root metadata server (by possession of root directory): 1
(3) Jun29 12:32:52 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8055
(1) Jun29 12:32:52 Main [App] >> Waiting for beegfs-mgmtd@mgmt001:8058...
(2) Jun29 12:32:52 RegDGramLis [Heartbeat incoming] >> New node: beegfs-mgmtd sto001 [ID: 1]
(3) Jun29 12:32:52 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8055
(2) Jun29 12:32:52 Main [Register node] >> Node registration successful.
(3) Jun29 12:32:52 Main [NodeConn (acquire stream)] >> Connected: beegfs...@192.168.4.120:8058 (protocol: TCP)
(2) Jun29 12:32:52 Main [printSyncResults] >> Nodes added (sync results): 1 (Type: beegfs-meta)
(2) Jun29 12:32:52 Main [printSyncResults] >> Nodes added (sync results): 4 (Type: beegfs-storage)
(3) Jun29 12:32:52 Main [App] >> Registration and management info download complete.
(3) Jun29 12:32:52 Main [DGramLis] >> Listening for UDP datagrams: Port 8055
(3) Jun29 12:32:52 Main [ConnAccept] >> Listening for RDMA connections: Port 8055
(3) Jun29 12:32:52 Main [ConnAccept] >> Listening for TCP connections: Port 8055
(3) Jun29 12:32:52 Main [App] >> Restored 0 sessions and 1 mirrored sessions
(1) Jun29 12:32:52 Main [App] >> Version: 7.2.2
(2) Jun29 12:32:52 Main [App] >> LocalNode: beegfs-meta stm001 [ID: 1]
(2) Jun29 12:32:52 Main [App] >> Usable NICs: ib0(RDMA) ib0(TCP) em1(TCP)

Here's the log of the second metadata server. Again, nothing about resyncing with a buddy.

[root@stm002 beegfs]# cat /var/log/beegfs-data-meta.log
(3) Jun29 12:33:22 Main [App] >> Root directory loaded.
(1) Jun29 12:33:22 Main [App] >> Root metadata server (by possession of root directory): 1
(3) Jun29 12:33:22 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8055
(1) Jun29 12:33:22 Main [App] >> Waiting for beegfs-mgmtd@mgmt001:8058...
(2) Jun29 12:33:22 RegDGramLis [Heartbeat incoming] >> New node: beegfs-mgmtd sto001 [ID: 1]
(3) Jun29 12:33:22 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8055
(2) Jun29 12:33:22 Main [Register node] >> Node registration successful.
(3) Jun29 12:33:22 Main [NodeConn (acquire stream)] >> Connected: beegfs...@192.168.4.120:8058 (protocol: TCP)
(2) Jun29 12:33:22 Main [printSyncResults] >> Nodes added (sync results): 1 (Type: beegfs-meta)
(2) Jun29 12:33:22 Main [printSyncResults] >> Nodes added (sync results): 4 (Type: beegfs-storage)
(3) Jun29 12:33:22 Main [App] >> Registration and management info download complete.
(3) Jun29 12:33:22 Main [DGramLis] >> Listening for UDP datagrams: Port 8055
(3) Jun29 12:33:22 Main [ConnAccept] >> Listening for RDMA connections: Port 8055
(3) Jun29 12:33:22 Main [ConnAccept] >> Listening for TCP connections: Port 8055
(3) Jun29 12:33:22 Main [App] >> Restored 0 sessions and 1 mirrored sessions
(1) Jun29 12:33:22 Main [App] >> Version: 7.2.2
(2) Jun29 12:33:22 Main [App] >> LocalNode: beegfs-meta stm002 [ID: 2]
(2) Jun29 12:33:22 Main [App] >> Usable NICs: ib0(RDMA) ib0(TCP) em1(TCP)

Here's some info extracted from beegfs-ctl. Note that no mirror group is listed below.

[root@n025 beegfs]# beegfs-ctl --listtargets --nodetype=meta --cfgFile=data.d/beegfs-client.conf
TargetID NodeID
======== ======
1 1
2 2

[root@n025 beegfs]# beegfs-ctl --listtargets --mirrorgroups --nodetype=meta --cfgFile=data.d/beegfs-client.conf
MirrorGroupID MGMemberType TargetID NodeID
============= ============ ======== ======

So, amidst all of the changes, I managed to somehow screw up the metadata buddy mirroring configuration. Could the solution be as simple as just creating a new metadata buddy group? I imagine there's concern about which metadata server was primary and which was secondary before shutting down. However, maybe that's not an issue. All of the clients were stopped before shutting down the storage cluster, so I'm not too worried about the mirrored metadata being out of sync between the metadata servers.

What are your thoughts? Am I reading the problem correctly in that the metadata mirror group is lost? Or is there a different problem?

Thanks very much for your help,

Mike Hutcheson
Director of High Performance and Research Computing Services
Baylor University Information Technology Services

Reply all

Reply to author

Forward

0 new messages