Hi Toby,
Toby Darling wrote on 05/29/2014 06:43 PM:
> Does anyone have a reliable way of killing an fhgfs-client process that
> thinks a mount is in use, but it isn't (and nothing is reported by lsof)?
>
> I've tried killing the multitude of fhgfs_(Worker|DGramLis|...)
> processes - with no effect, they don't die.
it's normal that those cannot be killed, because they are kernel
threads. (But even if you could kill them, it wouldn't change the
kernels opinion regarding the module being busy and thus being unremovable.)
> A manual 'umount -l' of the
> client mount point does remove it from /proc/mounts but leaves the
> client very confused. These only seem to exacerbate the situation; short
> of a reboot, I'm not sure what else to do.
Did you do the "umount -l" before or after lsof?
I don't see a way how "umount -l" could confuse the client - at least
from the fhgfs-side of things. Because the client module actually won't
even notice that an "umount -l" happened and thus will continue to
operate normally (until the last reference is released and the kernel
executes the deferred actual umount). So did something strange happen
when you used "umount -l"?.
However, it's definitely true that "umount -l" will complicate things if
you want to release all references from userspace processes to unload
the fhgfs client kernel module.
The main reason for that is because by using "umount -l", path
information gets lost. For instance, you can normally identify all
processes accessing a mount (and thus keeping the mount busy/referenced)
by looking at "ls -l /proc/<PID>/fd" and "ls -ld /proc/<PID>/cwd".
That will show processes accessing /mnt/fhgfs/somedir/somefile, similar
to what lsof does.
But after doing an "umount -l", these paths will be shown by the kernel
as "/somedir/somefile" (so the mountpoint will disappear from the path),
and then it gets really hard to identify the corresponding processes.
So unless something went wrong, the normal way would be to do as root:
$ lsof /mnt/fhgfs
Then decide that you want to get rid of all the processes that you see, so:
$ fuser -k /mnt/fhgfs
Then umount and rmmod, e.g. via:
$ /etc/init.d/fhgfs-client stop
If some processes are hanging a bit longer, e.g. because the network or
a server is down, you might want to use this first to disable retries:
$ echo 0 > /proc/fs/fhgfs/<clientID>/conn_retries_enabled
Best regards,
Sven Breuner
Fraunhofer