BeeGFS node unreachable (ID reservation request was rejected by this mgmt node)

56 views
Skip to first unread message

Austin Shen

unread,
Jul 12, 2024, 11:53:47 AMJul 12
to beegfs-user
Hi all,

Our BeeGFS storage servers went down after an infrastructure restart. Using beegfs-check-servers I can see that some of my storage nodes are unreachable:

Management
==========
beegfs-meta [ID: 1]: reachable at 192.168.0.72:8008 (protocol: TCP)

Metadata
==========
beegfs-meta [ID: 1]: reachable at 192.168.0.72:8005 (protocol: TCP)
beegfs-meta-0 [ID: 2]: reachable at 192.168.0.86:8005 (protocol: TCP)

Storage
==========
beegfs-node-0 [ID: 1]: UNREACHABLE
beegfs-node-2 [ID: 2]: UNREACHABLE
beegfs-node-3 [ID: 3]: UNREACHABLE
beegfs-node-1 [ID: 4]: reachable at 192.168.0.180:8003 (protocol: TCP)
beegfs-node-4 [ID: 5]: reachable at 192.168.0.190:8003 (protocol: TCP)

When I go into the storage node VM and check the logs (/var/log/beegfs-storage.log) I get the following:

(2) Jul12 02:17:26 Main [App.cpp:1567] >> UUIDs of targets underlying file systems have not been configured and will therefore not be checked. To prevent starting the server accidentally with the wrong data, it is strongly recommended to set the storeFsUUID config parameter to the appropriate UUIDs.
(3) Jul12 02:17:26 Main [App] >> Built with NVFS RDMA support.
(3) Jul12 02:17:26 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8003
(1) Jul12 02:17:26 Main [App] >> Waiting for beegfs...@192.168.0.72:8008...
(2) Jul12 02:17:26 RegDGramLis [Heartbeat incoming] >> New node: beegfs-mgmtd beegfs-meta [ID: 1];
(3) Jul12 02:17:26 Main [NodeConn (acquire stream)] >> Connected: beegfs...@192.168.0.72:8008 (protocol: TCP)
(0) Jul12 02:17:26 Main [App] >> ID reservation request was rejected by this mgmt node: beegfs-meta [ID: 1]
(0) Jul12 02:17:26 Main [App] >> Node pre-registration at management node canceled

Does anybody know how to resolve this?

Cheers,
Austin

Joe McCormick

unread,
Jul 12, 2024, 12:05:13 PMJul 12
to beegfs-user
Hi, 

What does the management log say at this time? 

I suspect the storage target(s) for this storage node are not mounted properly so the storage service is trying to register itself as a new new node and the mgmtd is rejecting that request. This is why it is recommended to set that storeFsUUID parameter in the storage (and meta) config files that is called out earlier in the log:

# [storeFsUUID]
# Requires the underlying file systems of the storage targets to have the same
# UUID as set here. This prevents the storage node from accidentally starting targets
# from a wrong device, e.g. when it is not properly mounted.

~Joe

Waltar

unread,
Jul 16, 2024, 10:59:56 PMJul 16
to beegfs-user
Really looks like  ztupid zpools on storage nodes which didn't import and I think in case of zfs there is no dataset storeFsUUID available also which could help beegfs if zfs import and mount are done.

Yiwei Guo

unread,
Oct 31, 2024, 5:53:23 AM (2 days ago) Oct 31
to beegfs-user
@Austin,

I see similar issue, how did you fixed?

kissu8

unread,
Oct 31, 2024, 5:53:26 AM (2 days ago) Oct 31
to beegfs-user
How to fix it?
Reply all
Reply to author
Forward
0 new messages