Rebooting servers without interuption

40 views
Skip to first unread message

Andrew Smith

unread,
Dec 11, 2024, 2:42:35 PM12/11/24
to beegfs-user

I have a newly installed BeeGFS 7.4.5 system. It has 3 storage servers and a single metadata server + management server.

I think it would be nice to be able to reboot the servers without clients processes crashing, instead just hanging when the servers become temporarily unavailable. This is the behavior I get from NFS.

As a test, I rsync data as a client to the volume, then reboot the nodes. If I reboot the meta/mgmt node, the system works fine. It hangs for a few minutes waiting for the reboot, then picks up where it left off.

When I reboot the storage servers, the rsync fails with an error at about 200s after the system disconnected.

I would like to have the rsync wait for the storage servers to come back, without failing. I have the timeouts all set to longer times than the reboot time (about 250s):

sysTargetOfflineTimeoutSecs = 900
connRDMATimeouts = all > 5000

I used '--timeout 10000' with rsync, so it is not rsync that is timing out.

Is there some setting I can change to make the system behave as I'd like? On the client or the server?

Reply all
Reply to author
Forward
0 new messages