I have a newly installed BeeGFS 7.4.5 system. It has 3 storage servers and a single metadata server + management server.
I think it would be nice to be able to reboot the servers without clients processes crashing, instead just hanging when the servers become temporarily unavailable. This is the behavior I get from NFS.
As a test, I rsync data as a client to the volume, then reboot the nodes. If I reboot the meta/mgmt node, the system works fine. It hangs for a few minutes waiting for the reboot, then picks up where it left off.
When I reboot the storage servers, the rsync fails with an error at about 200s after the system disconnected.
I would like to have the rsync wait for the storage servers to come back, without failing. I have the timeouts all set to longer times than the reboot time (about 250s):
sysTargetOfflineTimeoutSecs = 900
connRDMATimeouts = all > 5000
I used '--timeout 10000' with rsync, so it is not rsync that is timing out.
Is there some setting I can change to make the system behave as I'd like? On the client or the server?