Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#880549: nfs-kernel-server: NFSv4 sec=krb5p option broken on stretch

21 views
Skip to first unread message

Roberto C. Sanchez

unread,
Nov 2, 2017, 12:30:03 AM11/2/17
to
Package: nfs-kernel-server
Version: 1:1.3.4-2.1
Severity: important

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

I have recently upgraded my NFS server from Jessie to Stretch. It was
the last system on my network that needed to be upgraded. After the
upgrade, I found that NFSv4 was nearly unusable. All user home
directories, as well as backup and scratch space, are mounted via NFS.
Users trying to log in to a default GNOME session would experience hangs
that appeared to never resolve themselves. Logging in through ssh
resulted in ~2 minutes delay before a bash prompt would appear.

I attempted several things to troubleshoot the problem, including
reverting to kernel 3.16.0 from Jessie and nfs-utils 1.2.8-9.1. I tried
going all the way back to nfs-utils 1.2.8-9, but the version of systemd
in Stretch has "nfs-common (<< 1:1.2.8-9.1)". Even after reverting the
problem persisted. Based on that, I think that the problem may have
been introduced as far back as 1.2.8-9.1.

The problem also manifested itself in the form of problems for users
logged directly into the NFS server. Even though file accesses are
local in that scenario, if an NFS client had attemped to access a file
and hung on it, then it would also cause problems on the NFS server.
For example, I have a large MBOX file that I access with mutt. If I
access the file from an NFS client then it appears to hang with messages
like this flooding syslog on the client:

vmunix: [918369.611122] NFS: nfs4_reclaim_open_state: Lock reclaim failed

Trying to then open that same mbox with mutt on the NFS server causes
mutt to segfault because the call to fcntl() returns an error.

The only thing that resolved the problem was switching my exports from
sec=krb5p to sec=krb5i. Of course, this is considerably less desirable
than krb5p in some cases.

I believe the that problem is with rpc.svcgssd. Here is part of what
got captured in syslog during a failure:

Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: using domain: example.com
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: Realms list: 'EXAMPLE.COM'
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: processing 'Method' list
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: libnfsidmap: loaded plugin /lib/x86_64-linux-gnu/libnfsidmap/nsswitch.so for method nsswitch
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sname = nfs/ws1.exa...@EXAMPLE.COM
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509615211 (25678 from now), clnt: n...@ws1.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:25:33 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.exa...@EXAMPLE.COM
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14448 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:25:34 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.exa...@EXAMPLE.COM
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13881 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:26:12 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.exa...@EXAMPLE.COM
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sname = nfs/ws3.exa...@EXAMPLE.COM
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603453 (13756 from now), clnt: n...@ws3.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:17 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.exa...@EXAMPLE.COM
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14280 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:22 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sname = rob...@EXAMPLE.COM
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509623547 (33835 from now), clnt: <null>, uid: 3167, gid: 100, num aux grps: 17:
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 1) 100
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 2) 4
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 3) 24
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 4) 29
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 5) 33
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 6) 44
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 7) 46
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 8) 50
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 9) 60
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 10) 113
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 11) 115
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 12) 125
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 13) 132
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 14) 500
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 15) 501
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 16) 502
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: ( 17) 503
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:32 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: leaving poll
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: handling null request
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: svcgssd_limit_krb5_enctypes: Calling gss_set_allowable_enctypes with 7 enctypes from the kernel
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sname = nfs/ws2.exa...@EXAMPLE.COM
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: doing downcall
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: mech: krb5, hndl len: 4, ctx len 52, timeout: 1509603982 (14260 from now), clnt: n...@ws2.example.com, uid: -1, gid: -1, num aux grps: 0:
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: sending null reply
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: writing message: \x [snip message payload]
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: finished handling null request
Nov 1 22:28:42 nfssrv rpc.svcgssd[30152]: entering poll
Nov 1 22:32:42 nfssrv rpc.svcgssd[30152]: exiting on signal 15


- -- Package-specific info:
- -- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
391002 2 tcp 650 sgi_fam
100005 1 udp 32767 mountd
100005 1 tcp 32767 mountd
100005 2 udp 32767 mountd
100005 2 tcp 32767 mountd
100005 3 udp 32767 mountd
100005 3 tcp 32767 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049
100021 1 udp 52460 nlockmgr
100021 3 udp 52460 nlockmgr
100021 4 udp 52460 nlockmgr
100021 1 tcp 37351 nlockmgr
100021 3 tcp 37351 nlockmgr
100021 4 tcp 37351 nlockmgr
- -- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="-p 32767"
NEED_SVCGSSD="yes"
RPCSVCGSSDOPTS=""
- -- /etc/exports --
/srv/nfs 192.168.0.0/16(sync,rw,fsid=0,crossmnt,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/home 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/backup 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)
/srv/nfs/scratch 192.168.0.0/16(sync,rw,insecure,no_subtree_check,anonuid=65534,anongid=65534,sec=krb5i)

- -- System Information:
Debian Release: 9.2
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.9.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8), LANGUAGE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-kernel-server depends on:
ii init-system-helpers 1.48
ii keyutils 1.5.9-9
ii libblkid1 2.29.2-1
ii libc6 2.24-11+deb9u1
ii libcap2 1:2.25-1
ii libsqlite3-0 3.16.2-5
ii libtirpc1 0.2.5-1.2
ii libwrap0 7.6.q-26
ii lsb-base 9.20161125
ii netbase 5.4
ii nfs-common 1:1.3.4-2.1
ii ucf 3.0036

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

- -- no debconf information

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEz9ERzDttUsU/BH8iLNd4Xt2nsg8FAln6mi4ACgkQLNd4Xt2n
sg+OPRAAksDf+IR6dTn1Gl8OM4IOMVi5S48i+S6n3sDy9Uv04zRX1sdvTuzvGF1d
4mRqPson+eQ3CnuMQIH8XOMZJdrZoB7bsjMWjZC9FgBn42oE7wOxcMJW+FC1qCYe
AJMan4ssGDec62XnBkkZY2byhvbXygEmSq/cCQlCKp3amwe3VEm+lqf1qFKl2euV
qrnkgk7aAgrH6zu+q5WgAyrEZzvDi7NbcGheUe0NitAKEu2zR+Iq6mMOEP9rd91s
sgJ2tCHvwZtE6FoUHk9GJrluZyzItBIGV+aaA9PFjJcoQgxB5mE5cS0vFlfXiVbm
xS9v67UJXp/H7R5S+mnoMDqfKi+TCW+FaLEj5s+4bpYd7sk6Pyqugj3yEj6OYKAw
u2G7iB4MA64roxGeIEghgPvW/Zmuvpa+pi6RjOUcyvO4lyNGMVUM7TP59fWuXbwv
kwQ+coBI6SxAUnKOiA9MADHZcUxjdhPWckkTFagijv9mipAY9+G6+fWBY3iFZrMO
vrGABHecEzV5TzKv4J+sSGl5581/+vImK3SIUA6ILBED1SkWmFFonekE9P0YDgFr
tR2+vcCvhQehYSQW/hifJpbY/3puUXYOvqQmv+Be3AVw6hBaxVfzY1yjkDemjKXr
oHmDZ5L76TokKoiv45JYIWAr8sjNsxScDUlhnP+Ou7Tvi8eUIqM=
=CbNN
-----END PGP SIGNATURE-----

Dietrich Clauss

unread,
Nov 9, 2019, 5:00:03 AM11/9/19
to
This bug is still present in Buster.

It can be workarounded by using the mount option "vers=4.0".

NFS versions 4.2 (default in Buster), and 4.1 are affected by this bug.
Version 4.0 works fine. Can anyone confirm this?

Dietrich Clauss

unread,
Nov 9, 2019, 5:30:03 AM11/9/19
to
Package: nfs-kernel-server
Followup-For: Bug #880549
-- System Information:
Debian Release: 10.1
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages nfs-kernel-server depends on:
ii keyutils 1.6-6
ii libblkid1 2.33.1-0.1
ii libc6 2.28-10
ii libcap2 1:2.25-2
ii libsqlite3-0 3.27.2-3
ii libtirpc3 1.1.4-0.4
ii libwrap0 7.6.q-28
ii lsb-base 10.2019051400
ii netbase 5.6
ii nfs-common 1:1.3.4-2.5
ii ucf 3.0038+nmu1

Dietrich Clauss

unread,
Dec 18, 2023, 9:00:04 AM12/18/23
to
In Bookworm I no longer see the "lock reclaim failed" errors.

Further investigation revealed that the affected installations were also
affected by five-second delays as described here
<https://lore.kernel.org/linux-nfs/59682160-a246-395a...@mathematik.uni-wuerzburg.de/>.
These show similar symptoms and they also occur on nfs v4.{1,2} only.

So I had to change the lines in <etc/exports> from

| /srv/nfs gss/krb5p(rw,fsid=0,crossmnt,no_subtree_check)

to

| /srv/nfs *(sec=krb5p,rw,fsid=0,crossmnt,no_subtree_check)

Done that and everything runs fine.

In my experience the bug report can be closed.
signature.asc
0 new messages