Dreaded "Nfs Stale File Handle"

William Unruh

unread,

Apr 20, 2014, 6:54:20 PM4/20/14

to

I am again getting the "Stale NFS file handle" error on an nfs mounted
directory. Nothing I do gets rid of it. umount , umount -f, umount -l,
mount and then umount. Both machines (client and server ) are running
rpm.statd.

On the server, I get the message in /var/log/messages
authenticated unmount request from 142.103.xxx.xxx:yyy for /local/wwwww.
So it seems that the server and client are talking to each otehr and
that the server does not see anything terrible.

It really really really should not be necessary to reboot a machine in
order to solve this problem.

I am not sure when the particular problem happened, or what caused it.
Sometimes it happens when the server is rebooted without first having
umnounted on the client.

But -f I thought was supposed to force an unmount. It has never ever
worked for me.
(In this case both client and server are Mageia 3 machines.)

Jim Beard

unread,

Apr 20, 2014, 8:52:17 PM4/20/14

to

Are you running the commands on both machines?

Have you tried restarting nfs on both machines?

I have had the problem with stale file handles on rare occasion,
but need for a reboot was extremely rare.

Cheers!

jim b.

--
UNIX is not user-unfriendly; it merely
expects users to be computer-friendly.

William Unruh

unread,

Apr 20, 2014, 9:17:24 PM4/20/14

to

On 2014-04-21, Jim Beard <jdb...@patriot.net> wrote:
> On Sun, 20 Apr 2014 22:54:20 +0000, William Unruh wrote:
>
>> I am again getting the "Stale NFS file handle" error on an nfs mounted
>> directory. Nothing I do gets rid of it. umount , umount -f, umount -l,
>> mount and then umount. Both machines (client and server ) are running
>> rpm.statd.
>>
>> On the server, I get the message in /var/log/messages
>> authenticated unmount request from 142.103.xxx.xxx:yyy for /local/wwwww.
>> So it seems that the server and client are talking to each otehr and
>> that the server does not see anything terrible.
>>
>>
>> It really really really should not be necessary to reboot a machine in
>> order to solve this problem.
>>
>> I am not sure when the particular problem happened, or what caused it.
>> Sometimes it happens when the server is rebooted without first having
>> umnounted on the client.
>>
>> But -f I thought was supposed to force an unmount. It has never ever
>> worked for me.
>> (In this case both client and server are Mageia 3 machines.)
>
> Are you running the commands on both machines?

Not sure what you mean by that. The umount commands (umount, umount -f,
umount -l , umount -l -f) are run on the client where they are the
problem. The exportfs -f was run on the server.

>
> Have you tried restarting nfs on both machines?

I ran
systemctl restart nfs-secure nfs-server nfs-mountd nfs-blkmap
on both. Did not help.

client:10.0[root]>umount /disk9/home
umount.nfs: /disk9/home: Stale NFS file handle

On the client I get in var/log/syslog

Apr 20 18:06:10 boson systemd[1]: Started NFS Mount Daemon.
Apr 20 18:07:32 boson systemd[1]: Job dev-disk-by\x2duuid-7697a931\x2d5429\x2d4ddf\x2d9be4\x2d55d7e0935396.device/start timed out.
Apr 20 18:07:32 boson systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-7697a931\x2d5429\x2d4ddf\x2d9be4\x2d55d7e0935396.device.
Apr 20 18:07:32 boson systemd[1]: Dependency failed for /dev/disk/by-uuid/7697a931-5429-4ddf-9be4-55d7e0935396.
Apr 20 18:07:32 boson systemd[1]: Job dev-disk-by\x2duuid-7697a931\x2d5429\x2d4ddf\x2d9be4\x2d55d7e0935396.swap/start failed with result 'dependency'.
Apr 20 18:07:32 boson systemd[1]: Job dev-disk-by\x2duuid-7697a931\x2d5429\x2d4ddf\x2d9be4\x2d55d7e0935396.device/start failed with result 'timeout'.
Apr 20 18:07:40 boson nfsdcltrack[21388]: cltrack_legacy_gracedone: unable to rmdir /var/lib/nfs/v4recovery/.: -1
Apr 20 18:07:40 boson nfsdcltrack[21388]: cltrack_legacy_gracedone: unable to rmdir /var/lib/nfs/v4recovery/..: -1

No idea what that dev-disk-by... is but that is the error I get, but not
sure from what. (when I tried
systemctl restart nfs-mountd
umount -f /disk9/home
I did not get those lines,

It really seems that the only way of getting out of this is to reboot,
which is crazy. The Stale File handle really does seem to be a serious
bug in nfs that has been there for at least 10 years by now. One simply
should NOT have to reboot.

J.O. Aho

unread,

Apr 21, 2014, 3:10:37 AM4/21/14

to

On 21/04/14 03:17, William Unruh wrote:

> It really seems that the only way of getting out of this is to reboot,
> which is crazy. The Stale File handle really does seem to be a serious
> bug in nfs that has been there for at least 10 years by now. One simply
> should NOT have to reboot.

Yes, the Linux implementation of NFS has had a lot of issues and sadly
the opensolaris wasn't released under gpl, then they may have borrowed
some code.

You could try to see which process owns the files which causes the issue,

lsof | grep \/path\/to\/nfs\/share

kill those, then try to remount the nfs with

mount -o remount /path/to/nfs/share

not sure if it will work for you.

--

//Aho

William Unruh

unread,

Apr 21, 2014, 3:20:04 AM4/21/14

to

On 2014-04-21, J.O. Aho <us...@example.net> wrote:
> On 21/04/14 03:17, William Unruh wrote:
>
>> It really seems that the only way of getting out of this is to reboot,
>> which is crazy. The Stale File handle really does seem to be a serious
>> bug in nfs that has been there for at least 10 years by now. One simply
>> should NOT have to reboot.
>
> Yes, the Linux implementation of NFS has had a lot of issues and sadly
> the opensolaris wasn't released under gpl, then they may have borrowed
> some code.
>
> You could try to see which process owns the files which causes the issue,
>
> lsof | grep \/path\/to\/nfs\/share

lsof just gives "Stale NFS file Handle"

Everything just gives that.
So there is no information as to whether anything is trying to own some
file in that directory.

>
> kill those, then try to remount the nfs with
>
> mount -o remount /path/to/nfs/share
>
> not sure if it will work for you.

Nope Stale NFS File handle.

>
>

Jim Beard

unread,

Apr 21, 2014, 11:00:16 AM4/21/14

to

I think that comes from mount.nfs or mount.nfs4, which is trying to
mount a disk partition by UUID, and timing out. Try man mount.nfs
and then run mount.nfs4 or mount.nfs with an option of -v for verbose
output, and see what that says. Note the man page suggestion that
LABEL= be used to identify partitions rather than UUID etc.

Just for grins, I would also remove /var/lib/nfs/v4recovery/ as there
is a complaint that it could not be removed. With nfs running on
my system, the directory exists, but has nothing in it.

> It really seems that the only way of getting out of this is to reboot,
> which is crazy. The Stale File handle really does seem to be a serious
> bug in nfs that has been there for at least 10 years by now. One simply
> should NOT have to reboot.

Inability to mount could be a problem on the server-side or on the
client-side. I have also seen the stale file handle problem when
dhcp provided a new lease from the router. I vaguely remember that
I wound up rebooting the router and the server and client on one
occasion, but do not know if all that was really necessary. NFS is
supposed to take such in stride, but if something gets corrupted...

William Unruh

unread,

Apr 21, 2014, 1:29:17 PM4/21/14

to

Mine is also empty.

>
>> It really seems that the only way of getting out of this is to reboot,
>> which is crazy. The Stale File handle really does seem to be a serious
>> bug in nfs that has been there for at least 10 years by now. One simply
>> should NOT have to reboot.
>
> Inability to mount could be a problem on the server-side or on the
> client-side. I have also seen the stale file handle problem when
> dhcp provided a new lease from the router. I vaguely remember that
> I wound up rebooting the router and the server and client on one
> occasion, but do not know if all that was really necessary. NFS is
> supposed to take such in stride, but if something gets corrupted...

I ave had numerous of these Stale NFS Fiel handle over the years. The
only way out I have ever found is to reboot.

>
> Cheers!
>
> jim b.
>