Killing an fhgfs-client

268 views
Skip to first unread message

Toby Darling

unread,
May 29, 2014, 12:43:57 PM5/29/14
to fhgfs...@googlegroups.com
Hi

Does anyone have a reliable way of killing an fhgfs-client process that
thinks a mount is in use, but it isn't (and nothing is reported by lsof)?

I've tried killing the multitude of fhgfs_(Worker|DGramLis|...)
processes - with no effect, they don't die. A manual 'umount -l' of the
client mount point does remove it from /proc/mounts but leaves the
client very confused. These only seem to exacerbate the situation; short
of a reboot, I'm not sure what else to do.

Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
Francis Crick Avenue
Cambridge Biomedical Campus
Cambridge CB2 0QH
Phone 01223 267070

Alfonso Núñez Salgado

unread,
May 30, 2014, 6:08:11 AM5/30/14
to fhgfs...@googlegroups.com
El 29/05/14 18:43, Toby Darling escribió:
> Hi
>
> Does anyone have a reliable way of killing an fhgfs-client process
> that thinks a mount is in use, but it isn't (and nothing is reported
> by lsof)?
>
> I've tried killing the multitude of fhgfs_(Worker|DGramLis|...)
> processes - with no effect, they don't die. A manual 'umount -l' of
> the client mount point does remove it from /proc/mounts but leaves the
> client very confused. These only seem to exacerbate the situation;
> short of a reboot, I'm not sure what else to do.
>
> Cheers
> Toby

If have you tried to stop the service (/etc/init.d/fhgfs-client stop)
and it fails, that means that you cannot unload the fhgfs modules.. so,
you depend on your kernel compilation to:

1) Use --force option on rmmod command to force module unload
2) Trace which process is currently using the module.

http://stackoverflow.com/questions/448999/is-there-a-way-to-figure-out-what-is-using-a-linux-kernel-module

I wish it helps
Alfonso

Sven Breuner

unread,
Jun 2, 2014, 7:28:09 AM6/2/14
to fhgfs...@googlegroups.com, Toby Darling
Hi Toby,

Toby Darling wrote on 05/29/2014 06:43 PM:
> Does anyone have a reliable way of killing an fhgfs-client process that
> thinks a mount is in use, but it isn't (and nothing is reported by lsof)?
>
> I've tried killing the multitude of fhgfs_(Worker|DGramLis|...)
> processes - with no effect, they don't die.

it's normal that those cannot be killed, because they are kernel
threads. (But even if you could kill them, it wouldn't change the
kernels opinion regarding the module being busy and thus being unremovable.)

> A manual 'umount -l' of the
> client mount point does remove it from /proc/mounts but leaves the
> client very confused. These only seem to exacerbate the situation; short
> of a reboot, I'm not sure what else to do.

Did you do the "umount -l" before or after lsof?

I don't see a way how "umount -l" could confuse the client - at least
from the fhgfs-side of things. Because the client module actually won't
even notice that an "umount -l" happened and thus will continue to
operate normally (until the last reference is released and the kernel
executes the deferred actual umount). So did something strange happen
when you used "umount -l"?.

However, it's definitely true that "umount -l" will complicate things if
you want to release all references from userspace processes to unload
the fhgfs client kernel module.
The main reason for that is because by using "umount -l", path
information gets lost. For instance, you can normally identify all
processes accessing a mount (and thus keeping the mount busy/referenced)
by looking at "ls -l /proc/<PID>/fd" and "ls -ld /proc/<PID>/cwd".
That will show processes accessing /mnt/fhgfs/somedir/somefile, similar
to what lsof does.
But after doing an "umount -l", these paths will be shown by the kernel
as "/somedir/somefile" (so the mountpoint will disappear from the path),
and then it gets really hard to identify the corresponding processes.

So unless something went wrong, the normal way would be to do as root:
$ lsof /mnt/fhgfs
Then decide that you want to get rid of all the processes that you see, so:
$ fuser -k /mnt/fhgfs
Then umount and rmmod, e.g. via:
$ /etc/init.d/fhgfs-client stop

If some processes are hanging a bit longer, e.g. because the network or
a server is down, you might want to use this first to disable retries:
$ echo 0 > /proc/fs/fhgfs/<clientID>/conn_retries_enabled

Best regards,
Sven Breuner
Fraunhofer
Reply all
Reply to author
Forward
0 new messages