beegfs-client sometimes fails to start with error "system call failed: Operation cancelled".

18 views
Skip to first unread message

Michael Heinz

unread,
May 18, 2023, 10:25:00 AM5/18/23
to beegfs-user
Hello,

Some (most) times when the nodes in my cluster get rebooted the beegfs-client.service fails to start. The message is always:

systemd[1]: Starting Start BeeGFS Client...
beegfs-client[4310]: Starting BeeGFS Client:
beegfs-client[4310]: - Loading BeeGFS modules
beegfs-client[4310]: - Mounting directories from /etc/beegfs/beegfs-mounts.conf
beegfs-client[4403]: mount: /mnt/beegfs: mount(2) system call failed: Operation canceled.
systemd[1]: beegfs-client.service: Main process exited, code=exited, status=32/n/a
systemd[1]: beegfs-client.service: Failed with result 'exit-code'.


The client log has nothing useful, except that it doesn't mention the NIC that the client is supposed to be using:
# cat beegfs-client.log
(1) May18 10:10:34 Main [App] >> BeeGFS Helper Daemon Version: 7.3.3
(1) May18 10:10:34 Main [App] >> Client log messages will be prefixed with an asterisk (*) symbol.
(2) May18 10:10:34 Main [App] >> Usable NICs: enp23s0(TCP)


Because starting the service by hand after the machine has booted always works, I suspect it's because (for reasons I can't really go into) the NIC the client is configured to use is a little slow coming up. 

Any suggestions? Is there a way to prevent beegfs-client from starting until after the NIC is completely up?

Michael Heinz

unread,
May 18, 2023, 10:49:22 AM5/18/23
to beegfs-user
I've confirmed that the NIC isn't available when beegfs-client is running:

May 18 10:45:52 node01 kernel: beegfs: mount(3917): App (init local node info): Couldn't find any usable NIC
May 18 10:45:52 
node01   kernel: beegfs: mount(3917): Configuration error: Initialization of common objects failed. (Log file may provide additional information.)
May 18 10:45:52 
node01   kernel: beegfs: mount(3917): App (stop components): Stopping components...
May 18 10:45:52 
node01   kernel: beegfs: mount(3917): App (stop): All components stopped.

Chen Bill

unread,
May 19, 2023, 11:41:47 PM5/19/23
to beegfs-user
Hi Michael,

I think Beegfs-client will boot after network-online with systemd, you can check /usr/lib/systemd/system/beegfs-client.service
============================
After=network-online.target xxxxx
============================

You need to check systemctl status netwok ,  if network can work at boot time,  beegfs-client should work.

So the better way is to fix the network service issue  or just remove the After network-online.

Cheers,
Bill
Reply all
Reply to author
Forward
0 new messages