Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#884284: nfs-kernel-server: NFSv4 broken

430 views
Skip to first unread message

Anton Ivanov

unread,
Dec 13, 2017, 6:40:03 AM12/13/17
to
Package: nfs-kernel-server
Version: 1:1.3.4-2.1
Severity: important

Dear Maintainer,

NFSv4 in stretch is broken and unusable.

After some time the server exporting the directories starts throwing

[1130732.440356] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[1130734.801510] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
[1173981.176268] NFS: nfs4_reclaim_open_state: Lock reclaim failed!

messages, read/writes slow down to a crawl and at the end there is
no choice but to reboot the server. Restarting nfs-kernel-server,
unmounting from all known clients and remouting does not help.

I have now been forced to downgrade back to nfsv3 across the board.
The same setup works fine with NFSv3.

NFSv4 used to work perfectly fine in jessie and before that.

I am not sure if this started from the stretch upgrade or after one
of the stretch mid-life kernel updates (I think it is the latter).

Setup: Standard mid-size classic Linux/Unix multiuser install. Server(s)
exporting $HOME and other directories to a local network. Clients mount
via autofs when needed. Most directories are mounted from at least 2 (usually
more) clients.


-- Package-specific info:
-- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100005 1 udp 58357 mountd
100005 1 tcp 37131 mountd
100005 2 udp 54135 mountd
100005 2 tcp 32951 mountd
100005 3 udp 47587 mountd
100005 3 tcp 41773 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049
100021 1 udp 46283 nlockmgr
100021 3 udp 46283 nlockmgr
100021 4 udp 46283 nlockmgr
100021 1 tcp 40039 nlockmgr
100021 3 tcp 40039 nlockmgr
100021 4 tcp 40039 nlockmgr
100004 2 udp 856 ypserv
100004 1 udp 856 ypserv
100004 2 tcp 857 ypserv
100004 1 tcp 857 ypserv
100009 1 udp 866 yppasswdd
600100069 1 udp 874 fypxfrd
600100069 1 tcp 875 fypxfrd
100007 2 udp 969 ypbind
100007 1 udp 969 ypbind
100007 2 tcp 970 ypbind
100007 1 tcp 970 ypbind
100024 1 udp 44513 status
100024 1 tcp 58657 status
-- /etc/default/nfs-kernel-server --
RPCNFSDCOUNT=8
RPCNFSDPRIORITY=0
RPCMOUNTDOPTS="--manage-gids"
NEED_SVCGSSD=""
RPCSVCGSSDOPTS=""
-- /etc/exports --
/exports 192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide,fsid=root) 127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide,fsid=root)
/exports/md0 192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
/exports/md1 192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
/exports/md2 192.168.0.0/16(rw,async,no_root_squash,no_subtree_check,nohide) 127.0.0.0/8(rw,async,no_root_squash,no_subtree_check,nohide)
-- /proc/fs/nfs/exports --
# Version 1.1
# Path Client(Flags) # IPs
/exports/md0 192.168.0.0/16(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,uuid=a114f04d:9e54427e:b051ce17:4dc02e9f,sec=1)
/exports 192.168.0.0/16(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,fsid=0,uuid=a3734f7a:774744b7:b41d4cea:bc2a4f0f,sec=1)
/exports 127.0.0.0/8(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,fsid=0,uuid=a3734f7a:774744b7:b41d4cea:bc2a4f0f,sec=1)
/exports/md0 127.0.0.0/8(rw,no_root_squash,async,wdelay,nohide,no_subtree_check,uuid=a114f04d:9e54427e:b051ce17:4dc02e9f,sec=1)

-- System Information:
Debian Release: 9.2
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-kernel-server depends on:
ii init-system-helpers 1.48
ii keyutils 1.5.9-9
ii libblkid1 2.29.2-1
ii libc6 2.24-11+deb9u1
ii libcap2 1:2.25-1
ii libsqlite3-0 3.16.2-5
ii libtirpc1 0.2.5-1.2
ii libwrap0 7.6.q-26
ii lsb-base 9.20161125
ii netbase 5.4
ii nfs-common 1:1.3.4-2.1
ii ucf 3.0036

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information

Daniel Smolik

unread,
Mar 30, 2018, 7:20:02 PM3/30/18
to
Dear Maintainer,
I have the same experience with NFSv4. My customer run PHP7/laravel application on server which I maintain. I put data directory with pictures and caches to NFSv4.
Log is filled with nfs4_reclaim_open_state: Lock reclaim failed! and after while system is unusable I can login with ssh but apache don't serve any pages with strace I cann't see anything.
Only reboot helps. I try upgrade to 4.14.31 and still the same. Only help is switch to NFSv4 local_lock=all.

Regards
Dan Smolik



--
Mydatex s r.o.
http://www.mydatex.cz
email: smo...@mydatex.cz
mob: 604200362

Daniel Smolik

unread,
Mar 31, 2018, 3:40:02 PM3/31/18
to
NFSv3 local_lock=all

Debian Bug Tracking System

unread,
Apr 10, 2018, 5:40:02 PM4/10/18
to
Processing control commands:

> retitle -1 nfs4_reclaim_open_state: Lock reclaim failed
Bug #884284 [nfs-kernel-server] nfs-kernel-server: NFSv4 broken
Changed Bug title to 'nfs4_reclaim_open_state: Lock reclaim failed' from 'nfs-kernel-server: NFSv4 broken'.

--
884284: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=884284
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

Sergio Gelato

unread,
Apr 10, 2018, 5:40:02 PM4/10/18
to
Control: retitle -1 nfs4_reclaim_open_state: Lock reclaim failed

For what it's worth, I've seen the same symptoms in jessie (kernel 3.16.36
at the time) and Ubuntu trusty (3.13.0-93). In my experience, NFSv4 in
stretch is no worse than in jessie.

Rate-limiting those "Lock reclaim failed!" messages would be useful. I've
had to add a filter for them in rsyslog to prevent a DoS on my central
logging infrastructure. I don't see them often, but when a client gets stuck
it can emit this message *many* times.

There is definitely more than one trigger for these. I'm under the impression
that network partitioning events generate short bursts of such messages, but
this is usually benign and does not require a reboot for recovery. Not sure
what causes the more severe incidents (I haven't had one in a while, and
my NFS environment is intentionally v4-only).

My troubleshooting checklist for the next incident includes
echo 1 > /sys/kernel/debug/tracing/events/nfs4_lock_reclaim/enable
but I haven't had a chance to put this into practice yet.

Stefan K

unread,
Jan 8, 2019, 9:10:02 AM1/8/19
to
Package: nfs-common
Version: 1:1.3.4-2.1
Followup-For: Bug #884284

Dear Maintainer,

the bug still exist, any news on that?
Today I get this error the first time after 4 month of uptime with no problems, after a restart I got this error again 5hours later :(
I'm running a mysqldatabase and the nextcloud shares on it:
(from nfs-server with zfs filesystem):
nc_storage 79T 5.4M 79T 1% /nc_storage
nc_storage/clouddata 84T 4.7T 79T 6% /nc_storage/clouddata
nc_storage/localdata 80T 905G 79T 2% /nc_storage/localdata
nc_storage/mariadb 79T 8.9G 79T 1% /nc_storage/mariadb

here some syslog snippet:
Jan 8 12:45:01 web-cloud-01 CRON[15298]: (www-data) CMD (php -f /var/www/nextcloud/cron.php)
Jan 8 13:00:01 web-cloud-01 CRON[16849]: (www-data) CMD (php -f /var/www/nextcloud/cron.php)
Jan 8 13:09:01 web-cloud-01 CRON[17742]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Jan 8 13:09:44 web-cloud-01 systemd[1]: Starting Clean php session files...
Jan 8 13:09:44 web-cloud-01 systemd[1]: Started Clean php session files.
Jan 8 13:15:01 web-cloud-01 CRON[18436]: (www-data) CMD (php -f /var/www/nextcloud/cron.php)
Jan 8 13:17:01 web-cloud-01 CRON[18645]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jan 8 13:30:01 web-cloud-01 CRON[20060]: (www-data) CMD (php -f /var/www/nextcloud/cron.php)
Jan 8 13:39:01 web-cloud-01 CRON[20997]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && if [ ! -d /run/systemd/system ]; then /usr/lib/php/sessionclean; fi)
Jan 8 13:39:57 web-cloud-01 systemd[1]: Starting Clean php session files...
Jan 8 13:39:57 web-cloud-01 systemd[1]: Started Clean php session files.
Jan 8 13:45:01 web-cloud-01 CRON[21694]: (www-data) CMD (php -f /var/www/nextcloud/cron.php)
Jan 8 13:49:01 web-cloud-01 kernel: [12254.157888] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.158279] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.158671] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.159068] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.159441] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.159816] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.160197] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.160570] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.161089] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:01 web-cloud-01 kernel: [12254.161504] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:02 web-cloud-01 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Jan 8 13:49:02 web-cloud-01 systemd[1]: mariadb.service: Unit entered failed state.
Jan 8 13:49:02 web-cloud-01 systemd[1]: mariadb.service: Failed with result 'signal'.
Jan 8 13:49:07 web-cloud-01 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Jan 8 13:49:07 web-cloud-01 systemd[1]: Stopped MariaDB database server.
Jan 8 13:49:07 web-cloud-01 systemd[1]: Starting MariaDB database server...
Jan 8 13:49:08 web-cloud-01 mysqld[22291]: 2019-01-08 13:49:08 139707211149888 [Note] /usr/sbin/mysqld (mysqld 10.1.26-MariaDB-0+deb9u1) starting as process 22291 ...
Jan 8 13:49:08 web-cloud-01 kernel: [12261.350159] nfs4_reclaim_open_state: 920 callbacks suppressed
Jan 8 13:49:08 web-cloud-01 kernel: [12261.350160] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:08 web-cloud-01 kernel: [12261.363555] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:08 web-cloud-01 kernel: [12261.370344] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:08 web-cloud-01 kernel: [12261.396344] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.010512] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.016810] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.025524] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.026284] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.036269] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 kernel: [12262.037036] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jan 8 13:49:09 web-cloud-01 systemd[1]: mariadb.service: Main process exited, code=killed, status=6/ABRT
Jan 8 13:49:09 web-cloud-01 systemd[1]: Failed to start MariaDB database server.
Jan 8 13:49:09 web-cloud-01 systemd[1]: mariadb.service: Unit entered failed state.
Jan 8 13:49:09 web-cloud-01 systemd[1]: mariadb.service: Failed with result 'signal'.
Jan 8 13:49:14 web-cloud-01 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
Jan 8 13:49:14 web-cloud-01 systemd[1]: Stopped MariaDB database server.
Jan 8 13:49:14 web-cloud-01 systemd[1]: Starting MariaDB database server...
Jan 8 13:49:15 web-cloud-01 mysqld[22461]: 2019-01-08 13:49:15 140417113449024 [Note] /usr/sbin/mysqld (mysqld 10.1.26-MariaDB-0+deb9u1) starting as process 22461 ...



any idea how to fix it?


best regards
Stefan

-- Package-specific info:
-- rpcinfo --
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
-- /etc/default/nfs-common --
NEED_STATD=
STATDOPTS=
NEED_IDMAPD=
NEED_GSSD=
-- /etc/idmapd.conf --
[General]
Verbosity = 0
Pipefs-Directory = /run/rpc_pipefs
[Mapping]
Nobody-User = nobody
Nobody-Group = nogroup
-- /etc/fstab --
172.16.101.70:/nc_storage /nfs_nc_storage nfs4 rw,_netdev 0 0
-- /proc/mounts --
172.16.101.70:/nc_storage /nfs_nc_storage nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=194.94.224.89,local_lock=none,addr=172.16.101.70 0 0

-- System Information:
Debian Release: 9.5
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-8-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages nfs-common depends on:
ii adduser 3.115
ii init-system-helpers 1.48
ii keyutils 1.5.9-9
ii libc6 2.24-11+deb9u3
ii libcap2 1:2.25-1
ii libcomerr2 1.43.4-2
ii libdevmapper1.02.1 2:1.02.137-2
ii libevent-2.0-5 2.0.21-stable-3
ii libgssapi-krb5-2 1.15-1+deb9u1
ii libk5crypto3 1.15-1+deb9u1
ii libkeyutils1 1.5.9-9
ii libkrb5-3 1.15-1+deb9u1
ii libmount1 2.29.2-1+deb9u1
ii libnfsidmap2 0.25-5.1
ii libtirpc1 0.2.5-1.2
ii libwrap0 7.6.q-26
ii lsb-base 9.20161125
ii rpcbind 0.2.3-0.6
ii ucf 3.0036

Versions of packages nfs-common recommends:
ii python 2.7.13-2

Versions of packages nfs-common suggests:
pn open-iscsi <none>
pn watchdog <none>

-- no debconf information

Dietrich Clauss

unread,
Nov 9, 2019, 5:00:02 AM11/9/19
to
Duplicate of #880549?

Dietrich Clauss

unread,
Nov 9, 2019, 5:30:02 AM11/9/19
to
Package: nfs-kernel-server
Followup-For: Bug #884284

Duplicate of #880549?

-- System Information:
Debian Release: 10.1
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.19.0-6-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8), LANGUAGE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages nfs-kernel-server depends on:
ii keyutils 1.6-6
ii libblkid1 2.33.1-0.1
ii libc6 2.28-10
ii libcap2 1:2.25-2
ii libsqlite3-0 3.27.2-3
ii libtirpc3 1.1.4-0.4
ii libwrap0 7.6.q-28
ii lsb-base 10.2019051400
ii netbase 5.6
ii nfs-common 1:1.3.4-2.5
ii ucf 3.0038+nmu1
0 new messages