anyways, I noticed recently that my NFS server at home seems to
have trouble with locking. I have 2 clients which use it to host
home directories(1 debian woody, 1 suse 8). I first noticed it about
a week ago when trying to load gnp (gnome notepad, my favorite X editor),
it didn't load, it just hung.. and i was getting this in my local(client)
kernel log:
Aug 25 13:56:37 aphro kernel: lockd: task 173568 can't get a request slot
Aug 25 13:57:59 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 13:58:49 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 13:59:39 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 14:00:29 aphro kernel: lockd: task 173597 can't get a request slot
Aug 25 14:01:19 aphro kernel: lockd: task 173597 can't get a request slot
I was getting this in my server kernel log:
lockd: cannot monitor 10.10.10.10
statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5
lockd: cannot monitor 10.10.10.10
statd: server localhost not responding, timed out
nsm_mon_unmon: rpc failed, status=-5
one website said this is the result of an overloaded server, but I
don't think it's overloaded with only 2 clients(usually only 1 of which
are using it at a time since these systems are on the same KVM). I
can usually work around it short term by restarting the NFS services ..
not many apps seem to be affected by it. gnome-terminal works fine, afterstep
is fine, mozilla and opera are fine, staroffice 6 is fine, I can only
assume that they either don't care for locking or do it in another
manor.
I have the NFS server(debian 3.0 / 2.2.19 / using kernel NFS) set
to load 19 NFS servers, it also loads the lockd service(kernel level):
(querying the server from the client):
[root@aphro:~]# rpcinfo -p gateway
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 19662 status
100024 1 tcp 7617 status
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100021 1 udp 19663 nlockmgr
100021 3 udp 19663 nlockmgr
100021 4 udp 19663 nlockmgr
100005 1 udp 19664 mountd
100005 1 tcp 7618 mountd
100005 2 udp 19664 mountd
100005 2 tcp 7618 mountd
100005 3 udp 19664 mountd
100005 3 tcp 7618 mountd
(querying the client from the client):
[root@aphro:~]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 udp 1024 nlockmgr
100021 3 udp 1024 nlockmgr
100021 4 udp 1024 nlockmgr
100024 1 udp 1025 status
100024 1 tcp 1025 status
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100005 1 udp 1026 mountd
100005 1 tcp 1026 mountd
100005 2 udp 1026 mountd
100005 2 tcp 1026 mountd
100005 3 udp 1026 mountd
100005 3 tcp 1026 mountd
running nfsstat on the server shows the following results:
Server rpc stats:
calls badcalls badauth badclnt xdrcall
11900099 1420 0 1420 0
Server nfs v3:
null getattr setattr lookup access readlink
15 0% 7292735 61% 171766 1% 625793 5% 1426891 11% 389 0%
read write create mkdir symlink mknod
830197 6% 1053611 8% 150175 1% 2889 0% 979 0% 3 0%
remove rmdir rename link readdir readdirplus
132602 1% 3179 0% 1195 0% 333 0% 18594 0% 2901 0%
fsstat fsinfo pathconf commit
395 0% 305 0% 0 0% 185152 1%
(I have the clients mounting the filesystem with the option nfsvers=3)
my next thing to try is to switch to nfsvers=2 and see if it helps
at all. (all other stats reported by nfsstat are 0)
all 3 machines are on the same VLAN of my Summit 48-port switch, with
a 17gig backplane I am certain there is no bandwidth issues. one website
reccomended doing a ping -f to the server/client and see if there
is packet loss, I did it anyways just to see the results:
server to client:
--- aphro.aphroland.org ping statistics ---
60496 packets transmitted, 60494 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.1/3.4 ms
client to server:
--- gateway.aphroland.org ping statistics ---
78989 packets transmitted, 78983 packets received, 0% packet loss
round-trip min/avg/max = 0.1/0.2/44.0 ms
server is:
P3-800
1GB ram
dual western digitial 100GB Special edition(8MB cache each) drives in raid1
2.2.19 kernel
client1 is:
Athlon 1300
768MB ram
9.1GB ultrawide SCSI disk
2.2.19 kernel
client2 is:
P3-500
512MB ram
12GB IBM IDE disk
2.4.18 kernel
one thing that is curious, is I ran an lsof to see the open ports used
by rpc.statd, it is using 2 at the moment, one of which is 7617/udp. I
ran a UDP nmap scan against localhost and nmap reported that port was
closed. I ran a nmap scan against that same port from my client and it
reported the port open. my firewalling rules only affect the eth0 interface,
so I am not sure why statd stops responding to localhost connecitons
which seems to be the heart of the problem ?
my rpc firewall rules:
PORTS="`rpcinfo -p | awk '{print $4}' | grep '[0-9]'`"
for rpcport in $PORTS
do
/sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p tcp -i eth0
/sbin/ipchains -A input -s 0/0 -d 0/0 $rpcport -j REJECT -p udp -i eth0
done
the 2nd port that rpc.statd is listening on(807/UDP) is reported to
be open by a UDP nmap scan against localhost on the server.
[root@portal:/etc/init.d]# nmap -sU -vv -p 807,7617 localhost
Starting nmap V. 2.54BETA31 ( www.insecure.org/nmap/ )
Host debian (127.0.0.1) appears to be up ... good.
Initiating UDP Scan against debian (127.0.0.1)
The UDP Scan took 2 seconds to scan 2 ports.
Adding open port 807/udp
Interesting ports on debian (127.0.0.1):
(The 1 port scanned but not shown below is in state: closed)
Port State Service
807/udp open unknown
Nmap run completed -- 1 IP address (1 host up) scanned in 2 seconds
thanks for any ideas!
nate
--
To UNSUBSCRIBE, email to debian-us...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listm...@lists.debian.org
A few weeks ago I installed Debian Woody on a SunBlade 100 (Sun/Sparc)
just to see what it was like. A few days before I had to reconvert it to
Solaris for production use, I got it where it was mounting home
directories via nfs (from a Solaris server). Everything worked fine for
a few days, then when the other 35 machines in the area were imaged with
Solaris and put on the network, but before people started logging in and
using them, my Debian box started getting issues like what you describe.
IIRC, it locked up the machine so I couldn't do anything but hard reset
the box. Since I was about to convert the machine back to Solaris in a
couple of days, I didn't try tracking it down, but thought I'd mention
it here just in case it provides a piece to the puzzle.
Kent
I have had problems with NFS file locking as well. Some parts
of gnome like to use file locking, so with nfs mounted home
directories the users could not really run gnome properly.
After investigation it looks like the user space NFS server does
not do file locking but the kernel NFS server does. In debian
it is easy enough to switch. Just install the package
nfs-kernel-server and it automatically conflicts and removes
the user nfs server. I guess the kernel you are running would
need nfs built-in as well.
Now file locking is working over here and gnome is looking a lot
better. To actually test that file locking is really working here
are two ways to do it that dont need gnome.
1) Try getting mutt to read mail from a file on the NFS mounted directory.
If file locking is working, it will work. If file locking is not working
mutt will report an error and state the file is read-only.
2) There is a system call to lock a file or parts of a file. In the
Stevens book "Advanced Programming for the Unix Environment" there
is a whole section on file locking including a c program to lock files.
If locking is working this program runs without error. If locking does
not work, It reports some weird message. I can send you the code if you
want. The program is about 70 lines long, so I am not sure if I can
post it to the list. It exists on the net, also. I found it in a gnome
mailing list where Havoc Pennington posted it in response to a gnome
file locking problem. A google search on "NFS file locking gnome" might
be how I stumbled on it.
3) A gnome test is to start nautilus on the command line. It will spit
out a whole slew of error messages and then die if file locking isnt
working on the home directory of the user initiating nautilus.
HTH
> I have had problems with NFS file locking as well. Some parts
> of gnome like to use file locking, so with nfs mounted home
> directories the users could not really run gnome properly.
that would explain some other problems I had.. I was testing
my mom's GNOME profile and it flat out wouldn't load, same
OS rev(SuSE 8), now it loads fine after I restarted nfs-common
> the user nfs server. I guess the kernel you are running would
> need nfs built-in as well.
in my particular case I am using the nfs-kernel-server, the problem
seems to be in the rpc.statd service rather then the nfs service,
once I did /etc/init.d/nfs-common restart the problem immediately
corrected itself. now to figure out why statd is dieing(even though
it doesn't exit)
> 3) A gnome test is to start nautilus on the command line. It will spit
> out a whole slew of error messages and then die if file locking isnt
> working on the home directory of the user initiating nautilus.
I tried this and it didn't work, when locking is broken nautilus just
hangs forever for me(I only have nautilus on suse 8), once I restarted
nfs-common it started immediately.
this seems to be a common problem with very little information, I
tend to think it is probably software bugs in the nfs server in linux
(which is nothing new, historically linux has always had weak NFS
though it's gotten better in the past 2-3 years)
nate
Hugo Graumann <grau...@ucalgary.ca> [2002-09-04 17:15:14 -0600]:
> 2) There is a system call to lock a file or parts of a file. In the
> Stevens book "Advanced Programming for the Unix Environment" there
> is a whole section on file locking including a c program to lock files.
> If locking is working this program runs without error. If locking does
> not work, It reports some weird message. I can send you the code if you
> want. The program is about 70 lines long, so I am not sure if I can
> post it to the list. It exists on the net, also. I found it in a gnome
> mailing list where Havoc Pennington posted it in response to a gnome
> file locking problem. A google search on "NFS file locking gnome" might
> be how I stumbled on it.
Additionally perl, ruby, etc. scripts can call the C library file lock
routines. Here is a perl script which can test out the rpc.lockd
functionality. I wrote this script for that reason but I was not
having the trouble on linux but other OS's. This is a simple and
brute force check.
Bob
#!/usr/bin/perl
use Fcntl ':flock'; # import LOCK_* constants
print("Just before opening file.\n");
open(LOCKFILE,">testlockfile") or
die "Error: Could not write to lock file: testlockfile: $!\n";
print("Just before locking file.\n");
flock(LOCKFILE,LOCK_EX);
print("Just before unlocking file.\n");
flock(LOCKFILE,LOCK_UN);
print("All done. File locked and unlocked.\n");
unlink("testlockfile");
exit(0);
--Nq2Wo0NMKNjxTN9z
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)
iD8DBQE9drAM0pRcO8E2ULYRAvrfAJ907JCKevPaDuFbvnhA6uIMP6LgugCcCjnR
9Ic22UlmwtiluAXOM4vfuN8=
=dEpb
-----END PGP SIGNATURE-----
--Nq2Wo0NMKNjxTN9z--