Hi all,
Our BeeGFS storage servers went down after an infrastructure restart. Using beegfs-check-servers I can see that some of my storage nodes are unreachable:
Management
==========
beegfs-meta [ID: 1]: reachable at
192.168.0.72:8008 (protocol: TCP)
Metadata
==========
beegfs-meta [ID: 1]: reachable at
192.168.0.72:8005 (protocol: TCP)
beegfs-meta-0 [ID: 2]: reachable at
192.168.0.86:8005 (protocol: TCP)
Storage
==========
beegfs-node-0 [ID: 1]: UNREACHABLE
beegfs-node-2 [ID: 2]: UNREACHABLE
beegfs-node-3 [ID: 3]: UNREACHABLE
beegfs-node-1 [ID: 4]: reachable at
192.168.0.180:8003 (protocol: TCP)
beegfs-node-4 [ID: 5]: reachable at
192.168.0.190:8003 (protocol: TCP)
When I go into the storage node VM and check the logs (/var/log/beegfs-storage.log) I get the following:
(2) Jul12 02:17:26 Main [App.cpp:1567] >> UUIDs of targets underlying file systems have not been configured and will therefore not be checked. To prevent starting the server accidentally with the wrong data, it is strongly recommended to set the storeFsUUID config parameter to the appropriate UUIDs.
(3) Jul12 02:17:26 Main [App] >> Built with NVFS RDMA support.
(3) Jul12 02:17:26 Main [RegDGramLis] >> Listening for UDP datagrams: Port 8003
(1) Jul12 02:17:26 Main [App] >> Waiting for beegfs...@192.168.0.72:8008...
(2) Jul12 02:17:26 RegDGramLis [Heartbeat incoming] >> New node: beegfs-mgmtd beegfs-meta [ID: 1];
(3) Jul12 02:17:26 Main [NodeConn (acquire stream)] >> Connected:
beegfs...@192.168.0.72:8008 (protocol: TCP)
(0) Jul12 02:17:26 Main [App] >> ID reservation request was rejected by this mgmt node: beegfs-meta [ID: 1]
(0) Jul12 02:17:26 Main [App] >> Node pre-registration at management node canceled
Does anybody know how to resolve this?
Cheers,
Austin