Why is NFSv4 so slow?

Rick C. Petty

unread,

Jun 27, 2010, 6:44:07 PM6/27/10

to freebsd...@freebsd.org

First off, many thanks to Rick Macklem for making NFSv4 possible in
FreeBSD!

I recently updated my NFS server and clients to v4, but have since noticed
significant performance penalties. For instance, when I try "ls a b c" (if
a, b, and c are empty directories) on the client, it takes up to 1.87
seconds (wall time) whereas before it always finished in under 0.1 seconds.
If I repeat the test, it takes the same amount of time in v4 (in v3, wall
time was always under 0.01 seconds for subsequent requests, as if the
directory listing was cached).

If I try to play an h264 video file on the filesystem using mplayer, it
often jitters and skipping around in time introduces up to a second or so
pause. With NFSv3 it behaved more like the file was on local disk (no
noticable pauses or jitters).

Has anyone seen this behavior upon switching to v4 or does anyone have any
suggestions for tuning?

Both client and server are running the same GENERIC kernel, 8.1-PRERELEASE
as of 2010-May-29. They are connected via gigabit. Both v3 and v4 tests
were performed on the exact same hardware and I/O, CPU, network loads.
All I did was toggle nfsv4_server_enable (and nfsuserd/nfscbd of course).

It seems like a server-side issue, because if I try an nfs3 client mount
to the nfs4 server and run the same tests, I see only a slight improvement
in performance. In both cases, my mount options were
"rdirplus,bg,intr,soft" (and "nfsv4" added in the one case, obviously).

On the server, I have these tunables explicitly set:

kern.ipc.maxsockbuf=524288
vfs.newnfs.issue_delegations=1

On the client, I just have the maxsockbuf setting (this is twice the
default value). I'm open to trying other tunables or patches. TIA,

-- Rick C. Petty
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stabl...@freebsd.org"

Rick Macklem

unread,

Jun 27, 2010, 7:48:35 PM6/27/10

to Rick C. Petty, freebsd...@freebsd.org

On Sun, 27 Jun 2010, Rick C. Petty wrote:

> First off, many thanks to Rick Macklem for making NFSv4 possible in
> FreeBSD!
>
> I recently updated my NFS server and clients to v4, but have since noticed
> significant performance penalties. For instance, when I try "ls a b c" (if
> a, b, and c are empty directories) on the client, it takes up to 1.87
> seconds (wall time) whereas before it always finished in under 0.1 seconds.
> If I repeat the test, it takes the same amount of time in v4 (in v3, wall
> time was always under 0.01 seconds for subsequent requests, as if the
> directory listing was cached).
>

Weird, I don't see that here. The only thing I can think of is that the
experimental client/server will try to do I/O at the size of MAXBSIZE
by default, which might be causing a burst of traffic your net interface
can't keep up with. (This can be turned down to 32K via the
rsize=32768,wsize=32768 mount options. I found this necessary to avoid
abissmal performance on some Macs for the Mac OS X port.)

The other thing that can really slow it down is if the uid<->login-name
(and/or gid<->group-name) is messed up, but this would normally only
show up for things like "ls -l". (Beware having multiple password database
entries for the same uid, such as "root" and "toor".)

> If I try to play an h264 video file on the filesystem using mplayer, it
> often jitters and skipping around in time introduces up to a second or so
> pause. With NFSv3 it behaved more like the file was on local disk (no
> noticable pauses or jitters).
>
> Has anyone seen this behavior upon switching to v4 or does anyone have any
> suggestions for tuning?
>
> Both client and server are running the same GENERIC kernel, 8.1-PRERELEASE
> as of 2010-May-29. They are connected via gigabit. Both v3 and v4 tests
> were performed on the exact same hardware and I/O, CPU, network loads.
> All I did was toggle nfsv4_server_enable (and nfsuserd/nfscbd of course).
>
> It seems like a server-side issue, because if I try an nfs3 client mount
> to the nfs4 server and run the same tests, I see only a slight improvement
> in performance. In both cases, my mount options were
> "rdirplus,bg,intr,soft" (and "nfsv4" added in the one case, obviously).
>

I don't recommend the use of "intr or soft" for NFSv4 mounts, but they
wouldn't affect performance for trivial tests. You might want to try:
"nfsv4,rsize=32768,wsize=32768" and see how that works.

When you did the nfs3 mount did you specify "newnfs" or "nfs" for the
file system type? (I'm wondering if you still saw the problem with the
regular "nfs" client against the server? Others have had good luck using
the server for NFSv3 mounts.)

> On the server, I have these tunables explicitly set:
>
> kern.ipc.maxsockbuf=524288
> vfs.newnfs.issue_delegations=1
>
> On the client, I just have the maxsockbuf setting (this is twice the
> default value). I'm open to trying other tunables or patches. TIA,
>

When I see abissmal NFS perf. it is usually an issue with the underlying
transport. Looking at things like "netstat -i" or "netstat -s" might
give you a hint?

Having said that, the only difference I can think of between the two
NFS subsystems that might affect the transport layer is the default
I/O size, as noted above.

rick

Rick C. Petty

unread,

Jun 27, 2010, 11:16:54 PM6/27/10

to Rick Macklem, freebsd...@freebsd.org

On Sun, Jun 27, 2010 at 08:04:28PM -0400, Rick Macklem wrote:
>
> Weird, I don't see that here. The only thing I can think of is that the
> experimental client/server will try to do I/O at the size of MAXBSIZE
> by default, which might be causing a burst of traffic your net interface
> can't keep up with. (This can be turned down to 32K via the
> rsize=32768,wsize=32768 mount options. I found this necessary to avoid
> abissmal performance on some Macs for the Mac OS X port.)

Hmm. When I mounted the same filesystem with nfs3 from a different client,
everything started working at almost normal speed (still a little slower
though).

Now on that same host I saw a file get corrupted. On the server, I see
the following:

But on the client I see this:

% hd testfile | tail -4
00011ff0 1e af dc 8e d6 73 67 a2 cd 93 fe cb 7e a4 dd 83 |.....sg.....~...|
00012000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00678000

The only thing I could do to fix it was to copy the file on the server,
delete the original file on the client, and move the copied file back.

Not only is it affecting random file reads, but started breaking src
and ports builds in random places. In one situation, portmaster failed
because of a port checksum. It then tried to refetch and failed with the
same checksum problem. I manually deleted the file, tried again and it
built just fine. The ports tree and distfiles are nfs4 mounted.

> The other thing that can really slow it down is if the uid<->login-name
> (and/or gid<->group-name) is messed up, but this would normally only
> show up for things like "ls -l". (Beware having multiple password database
> entries for the same uid, such as "root" and "toor".)

I use the same UIDs/GIDs on all my boxes, so that can't be it. But thanks
for the idea.

> I don't recommend the use of "intr or soft" for NFSv4 mounts, but they
> wouldn't affect performance for trivial tests. You might want to try:
> "nfsv4,rsize=32768,wsize=32768" and see how that works.

I'm trying that right now (with rdirplus also) on one host. If I start to
the delays again, I'll compare between hosts.

> When you did the nfs3 mount did you specify "newnfs" or "nfs" for the
> file system type? (I'm wondering if you still saw the problem with the
> regular "nfs" client against the server? Others have had good luck using
> the server for NFSv3 mounts.)

I used "nfs" for FStype. So I should be using "newnfs"? This wasn't very
clear in the man pages. In fact "newnfs" wasn't mentioned in
"man mount_newnfs".

> When I see abissmal NFS perf. it is usually an issue with the underlying
> transport. Looking at things like "netstat -i" or "netstat -s" might
> give you a hint?

I suspected it might be transport-related. I didn't see anything out of
the ordinary from netstat, but then again I don't know what's "ordinary"
with NFS. =)

~~

One other thing I noticed but I'm not sure if it's a bug or expected
behavior (unrelated to the delays or corruption), is I have the following
filesystems on the server:

/vol/a
/vol/a/b
/vol/a/c

I export all three volumes and set my NFS V4 root to "/". On the client,
I'll "mount ... server:vol /vol" and the "b" and "c" directories show up
but when I try "ls /vol/a/b /vol/a/c", they show up empty. In dmesg I see:

kernel: nfsv4 client/server protocol prob err=10020

After unmounting /vol, I discovered that my client already had /vol/a/b and
/vol/a/c directories (because pre-NFSv4, I had to mount each filesystem
separately). Once I removed those empty dirs and remounted, the problem
went away. But it did drive me crazy for a few hours.

-- Rick C. Petty

Rick C. Petty

unread,

Jun 27, 2010, 11:49:21 PM6/27/10

to Rick Macklem, freebsd...@freebsd.org

On Sun, Jun 27, 2010 at 08:04:28PM -0400, Rick Macklem wrote:
>
> Weird, I don't see that here. The only thing I can think of is that the
> experimental client/server will try to do I/O at the size of MAXBSIZE
> by default, which might be causing a burst of traffic your net interface
> can't keep up with. (This can be turned down to 32K via the
> rsize=32768,wsize=32768 mount options. I found this necessary to avoid
> abissmal performance on some Macs for the Mac OS X port.)

I just ran into the speed problem again after remounting. This time
I tried to do a "make buildworld" and make got stuck on [newnfsreq] for
ten minutes, with no other filesystem activity on either client or server.

The file system corruption is still pretty bad. I can no longer build any
ports on one machine, because after the port is extracted, the config.sub
files are being filled with all zeros. It took me awhile to track this
down while trying to build devel/libtool22:

+ ac_build_alias=amd64-portbld-freebsd8.1
+ test xamd64-portbld-freebsd8.1 = x
+ test xamd64-portbld-freebsd8.1 = x
+ /bin/sh libltdl/config/config.sub amd64-portbld-freebsd8.1
+ ac_cv_build=''
+ printf '%s\n' 'configure:4596: result: '
+ printf '%s\n' ''

+ as_fn_error 'invalid value of canonical build' 4600 5
+ as_status=0
+ test 0 -eq 0
+ as_status=1
+ test 5

And although my work dir is on local disk,

% hd work/libtool-2.2.6b/libltdl/config/config.sub:

00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
*
00007660

Again, my ports tree is mounted as FSType nfs with option nfsv4.
FreeBSD/amd64 8.1-PRERELEASE r208408M GENERIC kernel.

Rick Macklem

unread,

Jun 28, 2010, 12:14:42 AM6/28/10

to Rick C. Petty, freebsd...@freebsd.org

On Sun, 27 Jun 2010, Rick C. Petty wrote:

>
> Hmm. When I mounted the same filesystem with nfs3 from a different client,
> everything started working at almost normal speed (still a little slower
> though).
>
> Now on that same host I saw a file get corrupted. On the server, I see
> the following:
>
> % hd testfile | tail -4
> 00677fd0 2a 24 cc 43 03 90 ad e2 9a 4a 01 d9 c4 6a f7 14 |*$.C.....J...j..|
> 00677fe0 3f ba 01 77 28 4f 0f 58 1a 21 67 c5 73 1e 4f 54 |?..w(O.X.!g.s.OT|
> 00677ff0 bf 75 59 05 52 54 07 6f db 62 d6 4a 78 e8 3e 2b |.uY.RT.o.b.Jx.>+|
> 00678000
>
> But on the client I see this:
>
> % hd testfile | tail -4
> 00011ff0 1e af dc 8e d6 73 67 a2 cd 93 fe cb 7e a4 dd 83 |.....sg.....~...|
> 00012000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
> *
> 00678000
>
> The only thing I could do to fix it was to copy the file on the server,
> delete the original file on the client, and move the copied file back.
>
> Not only is it affecting random file reads, but started breaking src
> and ports builds in random places. In one situation, portmaster failed
> because of a port checksum. It then tried to refetch and failed with the
> same checksum problem. I manually deleted the file, tried again and it
> built just fine. The ports tree and distfiles are nfs4 mounted.
>

I can't explain the corruption, beyond the fact that "soft,intr" can
cause all sorts of grief. If mounts without "soft,intr" still show
corruption problems, try disabling delegations (either kill off the
nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0
on the server). It is disabled by default because it is the "greenest"
part of the subsystem.

>> The other thing that can really slow it down is if the uid<->login-name
>> (and/or gid<->group-name) is messed up, but this would normally only
>> show up for things like "ls -l". (Beware having multiple password database
>> entries for the same uid, such as "root" and "toor".)
>
> I use the same UIDs/GIDs on all my boxes, so that can't be it. But thanks
> for the idea.
>

Make sure you don't have multiple entries for the same uid, such as "root"
and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
them, if you have both)

>
>> When you did the nfs3 mount did you specify "newnfs" or "nfs" for the
>> file system type? (I'm wondering if you still saw the problem with the
>> regular "nfs" client against the server? Others have had good luck using
>> the server for NFSv3 mounts.)
>
> I used "nfs" for FStype. So I should be using "newnfs"? This wasn't very
> clear in the man pages. In fact "newnfs" wasn't mentioned in
> "man mount_newnfs".
>

When you specify "nfs" for an NFSv3 mount, you get the regular client.
When you specify "newnfs" for an NFSv3 mount, you get the experimental
client. When you specify "nfsv4" you always get the experimental NFS
client, and it doesn't matter which FStype you've specified.

>
> One other thing I noticed but I'm not sure if it's a bug or expected
> behavior (unrelated to the delays or corruption), is I have the following
> filesystems on the server:
>
> /vol/a
> /vol/a/b
> /vol/a/c
>
> I export all three volumes and set my NFS V4 root to "/". On the client,
> I'll "mount ... server:vol /vol" and the "b" and "c" directories show up
> but when I try "ls /vol/a/b /vol/a/c", they show up empty. In dmesg I see:
>

If you are using UFS/FFS on the server, this should work and I don't know
why the empty directories under /vol on the client confused it. If your
server is using ZFS, everything from / including /vol need to be exported.

> kernel: nfsv4 client/server protocol prob err=10020
>

This error indicates that there wasn't a valid FH for the server. I
suspect that the mount failed. (It does a loop of Lookups from "/" in
the kernel during the mount and it somehow got confused part way through.)

> After unmounting /vol, I discovered that my client already had /vol/a/b and
> /vol/a/c directories (because pre-NFSv4, I had to mount each filesystem
> separately). Once I removed those empty dirs and remounted, the problem
> went away. But it did drive me crazy for a few hours.
>

I don't know why these empty dirs would confuse it. I'll try a test
here, but I suspect the real problem was that the mount failed and
then happened to succeed after you deleted the empty dirs.

It still smells like some sort of transport/net interface/... issue
is at the bottom of this. (see response to your next post)

rick

Rick Macklem

unread,

Jun 28, 2010, 12:19:32 AM6/28/10

to Rick C. Petty, freebsd...@freebsd.org

On Sun, 27 Jun 2010, Rick C. Petty wrote:

> On Sun, Jun 27, 2010 at 08:04:28PM -0400, Rick Macklem wrote:
>>
>> Weird, I don't see that here. The only thing I can think of is that the
>> experimental client/server will try to do I/O at the size of MAXBSIZE
>> by default, which might be causing a burst of traffic your net interface
>> can't keep up with. (This can be turned down to 32K via the
>> rsize=32768,wsize=32768 mount options. I found this necessary to avoid
>> abissmal performance on some Macs for the Mac OS X port.)
>
> I just ran into the speed problem again after remounting. This time
> I tried to do a "make buildworld" and make got stuck on [newnfsreq] for
> ten minutes, with no other filesystem activity on either client or server.
>

Being stuck in "newnfsreq" means that it is trying to establish a TCP
connection with the server (again smells like some networking issue).

> The file system corruption is still pretty bad. I can no longer build any
> ports on one machine, because after the port is extracted, the config.sub
> files are being filled with all zeros. It took me awhile to track this
> down while trying to build devel/libtool22:
>

Assuming your mounts are not using "soft,intr", I can't explain the
corruption. Disabling delegations is the next step. (They aren't
required for correct behaviour and are disabled by default because
they are the "greenest" part of the implementation.)

rick

Jeremy Chadwick

unread,

Jun 28, 2010, 12:59:55 AM6/28/10

to Rick C. Petty, Rick Macklem, freebsd...@freebsd.org

This sounds like NFSv4 is "tickling" some kind of bug in your NIC driver
but I'm not entirely sure. Can you provide output from:

1) ifconfig -a (you can X out the IPs + MACs if you want)
2) netstat -m
3) vmstat -i
4) prtconf -lvc (only need the Ethernet-related entries)
5) sysctl dev.XXX.N (ex. for em0, XXX=em, N=0)

And also check "dmesg" to see if there's any messages the kernel has
been spitting out which look relevant? Thanks.

Rick C. Petty

unread,

Jun 28, 2010, 10:02:11 AM6/28/10

to Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 12:30:30AM -0400, Rick Macklem wrote:
>
> I can't explain the corruption, beyond the fact that "soft,intr" can
> cause all sorts of grief. If mounts without "soft,intr" still show
> corruption problems, try disabling delegations (either kill off the
> nfscbd daemons on the client or set vfs.newnfs.issue_delegations=0
> on the server). It is disabled by default because it is the "greenest"
> part of the subsystem.

I tried without soft,intr and "make buildworld" failed with what looks like
file corruption again. I'm trying without delegations now.

> Make sure you don't have multiple entries for the same uid, such as "root"
> and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
> them, if you have both)

Hmm, that's a strange requirement, since FreeBSD by default comes with
both. That should probably be documented in the nfsv4 man page.

> When you specify "nfs" for an NFSv3 mount, you get the regular client.
> When you specify "newnfs" for an NFSv3 mount, you get the experimental
> client. When you specify "nfsv4" you always get the experimental NFS
> client, and it doesn't matter which FStype you've specified.

Ok. So my comparison was with the regular and experimental clients.

> If you are using UFS/FFS on the server, this should work and I don't know
> why the empty directories under /vol on the client confused it. If your
> server is using ZFS, everything from / including /vol need to be exported.

Nope, UFS2 only (on both clients and server).

> > kernel: nfsv4 client/server protocol prob err=10020
>
> This error indicates that there wasn't a valid FH for the server. I
> suspect that the mount failed. (It does a loop of Lookups from "/" in
> the kernel during the mount and it somehow got confused part way through.)

If the mount failed, why would it allow me to "ls /vol/a" and see both "b"
and "c" directories as well as other files/directories on /vol/ ?

> I don't know why these empty dirs would confuse it. I'll try a test
> here, but I suspect the real problem was that the mount failed and
> then happened to succeed after you deleted the empty dirs.

It doesn't seem likely. I spent an hour mounting and unmounting and each
mount looked successful in that there were files and directories besides
the two I was trying to decend into.

> It still smells like some sort of transport/net interface/... issue
> is at the bottom of this. (see response to your next post)

It's possible. I just had another NFSv4 client (with the same server) lock
up:

load: 0.00 cmd: ls 17410 [nfsv4lck] 641.87r 0.00u 0.00s 0% 1512k

and:

load: 0.00 cmd: make 87546 [wait] 37095.09r 0.01u 0.01s 0% 844k

That make has been hung for hours, and the ls(1) was executed during that
lockup. I wish there was a way I could unhang these processes and unmount
the NFS mount without panicking the kernel, but alas even this fails:

# umount -f /sw
load: 0.00 cmd: umount 17479 [nfsclumnt] 1.27r 0.00u 0.04s 0% 788k

A "shutdown -p now" resulted in a panic with the speaker beeping
constantly and no console output.

It's possible the NICs are all suspect, but all of this worked fine a
couple of days ago when I was only using NFSv3.

-- Rick C. Petty

Rick C. Petty

unread,

Jun 28, 2010, 10:21:31 AM6/28/10

to Jeremy Chadwick, Rick Macklem, freebsd...@freebsd.org

On Sun, Jun 27, 2010 at 09:58:53PM -0700, Jeremy Chadwick wrote:
> >
> > Again, my ports tree is mounted as FSType nfs with option nfsv4.
> > FreeBSD/amd64 8.1-PRERELEASE r208408M GENERIC kernel.
>
> This sounds like NFSv4 is "tickling" some kind of bug in your NIC driver
> but I'm not entirely sure. Can you provide output from:
>
> 1) ifconfig -a (you can X out the IPs + MACs if you want)

On the NFSv4 server:

nfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8010b<RXCSUM,TXCSUM,VLAN_MTU,TSO4,LINKSTATE>
ether 00:22:15:b4:2d:XX
inet 172.XX.XX.4 netmask 0xffffff00 broadcast 172.XX.XX.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=3<RXCSUM,TXCSUM>
inet6 fe80::1 prefixlen 64 scopeid 0x2
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>

On one of the clients:

re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=389b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
ether e0:cb:4e:cd:d3:XX
inet 172.XX.XX.9 netmask 0xffffff00 broadcast 172.XX.XX.255
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=3<RXCSUM,TXCSUM>
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
inet6 ::1 prefixlen 128
inet 127.0.0.1 netmask 0xff000000
nd6 options=3<PERFORMNUD,ACCEPT_RTADV>

> 2) netstat -m

server:

1739/1666/3405 mbufs in use (current/cache/total)
257/1257/1514/25600 mbuf clusters in use (current/cache/total/max)
256/547 mbuf+clusters out of packet secondary zone in use (current/cache)
0/405/405/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
948K/4550K/5499K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

client:

264/2046/2310 mbufs in use (current/cache/total)
256/1034/1290/25600 mbuf clusters in use (current/cache/total/max)
256/640 mbuf+clusters out of packet secondary zone in use (current/cache)
3/372/375/12800 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
590K/4067K/4657K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

> 3) vmstat -i

Server:

interrupt total rate
irq1: atkbd0 24 0
irq18: atapci1 1883933 0
irq20: nfe0 ohci1 1712603504 793
cpu0: timer 4315963536 1999
irq256: hdac0 12 0
irq257: ahci0 139934363 64
cpu2: timer 4315960172 1999
cpu1: timer 4315960172 1999
cpu3: timer 4315960172 1999
Total 19118265888 8858

Client:

interrupt total rate
irq1: atkbd0 1063022 0
irq16: hdac0 16013959 6
irq17: atapci0+++ 6 0
irq18: ohci0 ohci1* 5324486 2
irq19: atapci1 7500968 2
irq20: ahc0 19 0
irq21: ahc1 112390 0
cpu0: timer 5125670841 1999
irq256: hdac1 2 0
irq257: re0 742537149 289
cpu1: timer 5125664297 1999
Total 11023887139 4301

> 4) prtconf -lvc (only need the Ethernet-related entries)

I'll assume you meant to type "pciconf", on the server:

nfe0@pci0:0:10:0: class=0x020000 card=0x82f21043 chip=0x076010de rev=0xa2 hdr=0x00
vendor = 'NVIDIA Corporation'
device = 'NForce Network Controller (MCP78 NIC)'
class = network
subclass = ethernet
cap 01[44] = powerspec 2 supports D0 D1 D2 D3 current D0
cap 05[50] = MSI supports 16 messages, 64 bit, vector masks
cap 08[6c] = HT MSI fixed address window disabled at 0xfee00000

client:

re0@pci0:1:0:0: class=0x020000 card=0x84321043 chip=0x816810ec rev=0x06 hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
class = network
subclass = ethernet
cap 01[40] = powerspec 3 supports D0 D1 D2 D3 current D0
cap 05[50] = MSI supports 1 message, 64 bit enabled with 1 message
cap 10[70] = PCI-Express 2 endpoint IRQ 2 max data 128(256) link x1(x1)
cap 11[b0] = MSI-X supports 4 messages in map 0x20
cap 03[d0] = VPD

> 5) sysctl dev.XXX.N (ex. for em0, XXX=em, N=0)

server:

dev.nfe.0.%desc: NVIDIA nForce MCP77 Networking Adapter
dev.nfe.0.%driver: nfe
dev.nfe.0.%location: slot=10 function=0 handle=\_SB_.PCI0.NMAC
dev.nfe.0.%pnpinfo: vendor=0x10de device=0x0760 subvendor=0x1043 subdevice=0x82f2 class=0x020000
dev.nfe.0.%parent: pci0
dev.nfe.0.process_limit: 192
dev.nfe.0.stats.rx.frame_errors: 0
dev.nfe.0.stats.rx.extra_bytes: 0
dev.nfe.0.stats.rx.late_cols: 0
dev.nfe.0.stats.rx.runts: 0
dev.nfe.0.stats.rx.jumbos: 0
dev.nfe.0.stats.rx.fifo_overuns: 0
dev.nfe.0.stats.rx.crc_errors: 0
dev.nfe.0.stats.rx.fae: 0
dev.nfe.0.stats.rx.len_errors: 0
dev.nfe.0.stats.rx.unicast: 1762645090
dev.nfe.0.stats.rx.multicast: 1
dev.nfe.0.stats.rx.broadcast: 7608
dev.nfe.0.stats.tx.octets: 2036479975330
dev.nfe.0.stats.tx.zero_rexmits: 2090186021
dev.nfe.0.stats.tx.one_rexmits: 0
dev.nfe.0.stats.tx.multi_rexmits: 0
dev.nfe.0.stats.tx.late_cols: 0
dev.nfe.0.stats.tx.fifo_underuns: 0
dev.nfe.0.stats.tx.carrier_losts: 0
dev.nfe.0.stats.tx.excess_deferrals: 0
dev.nfe.0.stats.tx.retry_errors: 0
dev.nfe.0.stats.tx.unicast: 0
dev.nfe.0.stats.tx.multicast: 0
dev.nfe.0.stats.tx.broadcast: 0
dev.nfe.0.wake: 0

client:

c: RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet
dev.re.0.%driver: re
dev.re.0.%location: slot=0 function=0
dev.re.0.%pnpinfo: vendor=0x10ec device=0x8168 subvendor=0x1043 subdevice=0x8432 class=0x020000
dev.re.0.%parent: pci1

> check "dmesg" to see if there's any messages the kernel has
> been spitting out which look relevant? Thanks.

server, immediately after restarting all of nfs scripts (rpcbind nfsclient nfsuserd nfsserver mountd nfsd statd lockd nfscbd):

Jun 27 18:04:44 rpcbind: cannot get information for udp6
Jun 27 18:04:44 rpcbind: cannot get information for tcp6
NLM: failed to contact remote rpcbind, stat = 5, port = 28416
Jun 27 18:05:12 amanda kernel: NLM: failed to contact remote rpcbind, stat = 5, port = 28416

client, when noticing the mounting-over-directories problem:

NLM: failed to contact remote rpcbind, stat = 5, port = 28416

nfsv4 client/server protocol prob err=10020
nfsv4 client/server protocol prob err=10020

..

No other related messages were found in /var/log/messages either.

--

-- Rick C. Petty

Jeremy Chadwick

unread,

Jun 28, 2010, 10:56:44 AM6/28/10

to Rick C. Petty, Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 09:20:25AM -0500, Rick C. Petty wrote:
> > >
> > > Again, my ports tree is mounted as FSType nfs with option nfsv4.
> > > FreeBSD/amd64 8.1-PRERELEASE r208408M GENERIC kernel.
> >
> > This sounds like NFSv4 is "tickling" some kind of bug in your NIC driver
> > but I'm not entirely sure. Can you provide output from:
> >
> > 1) ifconfig -a (you can X out the IPs + MACs if you want)
>
>

Yes sorry -- I spend my days at work dealing with Solaris (which is
where where prtconf comes from :-) ).

Three other things to provide output from if you could (you can X out IPs
and MACs too), from both client and server:

6) netstat -idn
7) sysctl hw.pci | grep msi
8) Contents of /etc/sysctl.conf

Thanks.

> server, immediately after restarting all of nfs scripts (rpcbind
> nfsclient nfsuserd nfsserver mountd nfsd statd lockd nfscbd):
>
> Jun 27 18:04:44 rpcbind: cannot get information for udp6
> Jun 27 18:04:44 rpcbind: cannot get information for tcp6

These two usually indicate you removed IPv6 support from the kernel,
except your ifconfig output (I've remove it) on the server shows you do
have IPv6 support. I've been trying to get these warnings removed for
quite some time (PR kern/96242). They're harmless, but the
inconsistency here is a little weird -- are you explicitly disabling
IPv6 on nfe0?

The remaining messages in your kernel log Rick can probably explain.

_______________________________________________

Rick C. Petty

unread,

Jun 28, 2010, 11:19:43 AM6/28/10

to Jeremy Chadwick, Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 07:56:00AM -0700, Jeremy Chadwick wrote:
>
> Three other things to provide output from if you could (you can X out IPs
> and MACs too), from both client and server:
>
> 6) netstat -idn

server:

Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
nfe0 1500 <Link#1> 00:22:15:b4:2d:XX 1767890778 0 0 872169302 0 0 0
nfe0 1500 172.XX.XX.0/2 172.XX.XX.4 1767882158 - - 1964274616 - - -
lo0 16384 <Link#2> 3728 0 0 3728 0 0 0
lo0 16384 (28)00:00:00:00:00:00:fe:80:00:02:00:00:00:00:00:00:00:00:00:00:00:01 3728 0 0 3728 0 0 0
lo0 16384 (28)00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:01 3728 0 0 3728 0 0 0
lo0 16384 127.0.0.0/8 127.0.0.1 3648 - - 3664 - - -

client:

Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop
re0 1500 <Link#1> e0:cb:4e:cd:d3:XX 955288523 0 0 696819089 0 0 0
re0 1500 172.XX.XX.0/2 172.XX.XX.2 955279721 - - 696814499 - - -
lo0 16384 <Link#2> 3148 0 0 3148 0 0 0
lo0 16384 (28)00:00:00:00:00:00:fe:80:00:02:00:00:00:00:00:00:00:00:00:00:00:01 3148 0 0 3148 0 0 0
lo0 16384 (28)00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:01 3148 0 0 3148 0 0 0
lo0 16384 127.0.0.0/8 127.0.0.1 3112 - - 3112 - - -

> 7) sysctl hw.pci | grep msi

both server and client:

hw.pci.honor_msi_blacklist: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1

> 8) Contents of /etc/sysctl.conf

server and client:

# 4 virtual channels
dev.pcm.0.play.vchans=4
# Read modules from /usr/local/modules
kern.module_path=/boot/kernel;/boot/modules;/usr/local/modules
# Remove those annoying ARP moved messages:
net.link.ether.inet.log_arp_movements=0
# 32MB write cache on disk controllers system-wide
vfs.hirunningspace=33554432
# Allow users to mount file systems
vfs.usermount=1
# misc
net.link.tap.user_open=1
net.inet.ip.forwarding=1
compat.linux.osrelease=2.6.16
debug.ddb.textdump.pending=1
# for NFSv4
kern.ipc.maxsockbuf=524288

> > server, immediately after restarting all of nfs scripts (rpcbind
> > nfsclient nfsuserd nfsserver mountd nfsd statd lockd nfscbd):
> >
> > Jun 27 18:04:44 rpcbind: cannot get information for udp6
> > Jun 27 18:04:44 rpcbind: cannot get information for tcp6
>
> These two usually indicate you removed IPv6 support from the kernel,
> except your ifconfig output (I've remove it) on the server shows you do
> have IPv6 support. I've been trying to get these warnings removed for
> quite some time (PR kern/96242). They're harmless, but the
> inconsistency here is a little weird -- are you explicitly disabling
> IPv6 on nfe0?

I have WITHOUT_IPV6= in my make.conf on all my machines (or I have
problems with jdk1.6) and WITHOUT_INET6= in my src.conf. I'm not sure
why the rpcbind/ifconfig binaries have a different concept than the
kernel since I always "make buildworld kernel" and keep things in sync
with mergemaster when I reboot. I'm building new worlds/kernels now
to see if that makes any difference.

-- Rick C. Petty

Rick C. Petty

unread,

Jun 28, 2010, 11:39:03 AM6/28/10

to Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 12:35:14AM -0400, Rick Macklem wrote:
>
> Being stuck in "newnfsreq" means that it is trying to establish a TCP
> connection with the server (again smells like some networking issue).

> <snip>

> Disabling delegations is the next step. (They aren't
> required for correct behaviour and are disabled by default because
> they are the "greenest" part of the implementation.)

After disabling delegations, I was able to build world and kernel on two
different clients, and my port build problems went away as well.

I'm still left with a performance problem, although not quite as bad as I
originally reported. Directory listings are snappy once again, but playing
h264 video is choppy, particularly when seeking around: there's almost a
full second delay before it kicks in, no matter where I seek. With NFSv3
the delay on seeks was less than 0.1 seconds and the playback was never
jittery.

I can try it again with v3 client and v4 server, if you think that's
worthy of pursuit. If it makes any difference, the server's four CPUs are
pegged at 100% (running "nice +4" cpu-bound jobs). But that was the case
before I enabled v4 server too.

-- Rick C. Petty

Jeremy Chadwick

unread,

Jun 28, 2010, 12:32:19 PM6/28/10

to Rick C. Petty, Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 10:18:35AM -0500, Rick C. Petty wrote:
> > 8) Contents of /etc/sysctl.conf
>
> server and client:
>

> # for NFSv4
> kern.ipc.maxsockbuf=524288

You might want to discuss this one with Rick a bit (I'm not sure of the
implications). Regarding heavy network I/O (I don't use NFS but Samba),
I've found that the following tunables do in fact make a performance
difference -- you might try and see if these have some impact (or, try
forcing a specific protocol type for NFS, e.g. TCP-only; I'm not
familiar with NFSv4 though). These are adjustable in sysctl.conf, thus
adjustable in real-time.

# Increase send/receive buffer maximums from 256KB to 16MB.
# FreeBSD 7.x and later will auto-tune the size, but only up to the max.
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216

# Double send/receive TCP datagram memory allocation. This defines the
# amount of memory taken up by default *per socket*.
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=131072

That's about all I can comment on -- if NFSv3 works OK for you
(performance-wise), then I'm not sure where the bottleneck could be.

> > > Jun 27 18:04:44 rpcbind: cannot get information for udp6
> > > Jun 27 18:04:44 rpcbind: cannot get information for tcp6
> >
> > These two usually indicate you removed IPv6 support from the kernel,
> > except your ifconfig output (I've remove it) on the server shows you do
> > have IPv6 support. I've been trying to get these warnings removed for
> > quite some time (PR kern/96242). They're harmless, but the
> > inconsistency here is a little weird -- are you explicitly disabling
> > IPv6 on nfe0?
>
> I have WITHOUT_IPV6= in my make.conf on all my machines (or I have
> problems with jdk1.6) and WITHOUT_INET6= in my src.conf. I'm not sure
> why the rpcbind/ifconfig binaries have a different concept than the
> kernel since I always "make buildworld kernel" and keep things in sync
> with mergemaster when I reboot. I'm building new worlds/kernels now
> to see if that makes any difference.

make.conf WITHOUT_IPV6 would affect ports, src.conf WITHOUT_INET6 would
affect the base system (thus rpcbind). The src.conf entry is what's
causing rpcbind to spit out the above "cannot get information" messages,
even though IPv6 is available in your kernel (see below).

However: your kernel configuration file must contain "options INET6" or
else you wouldn't have IPv6 addresses on lo0. So even though your
kernel and world are synchronised, IPv6 capability-wise they probably
aren't. This may be your intended desire though, and if so, no biggie.

If you wanted to work around the problem, you can supposedly comment out
the udp6 and tcp6 lines in /etc/netconfig. I choose not to do this (put
up with the warning messages) since I'm not sure of the repercussions of
adjusting this file (e.g. will something else down the road break).

_______________________________________________

Rick Macklem

unread,

Jun 28, 2010, 7:35:10 PM6/28/10

to Rick C. Petty, freebsd...@freebsd.org

On Mon, 28 Jun 2010, Rick C. Petty wrote:

> On Mon, Jun 28, 2010 at 12:35:14AM -0400, Rick Macklem wrote:
>>
>> Being stuck in "newnfsreq" means that it is trying to establish a TCP
>> connection with the server (again smells like some networking issue).
>> <snip>
>> Disabling delegations is the next step. (They aren't
>> required for correct behaviour and are disabled by default because
>> they are the "greenest" part of the implementation.)
>
> After disabling delegations, I was able to build world and kernel on two
> different clients, and my port build problems went away as well.
>

Ok, it sounds like you found some kind of race condition in the delegation
handling. (I'll see if I can reproduce it here. It could be fun to find:-)

> I'm still left with a performance problem, although not quite as bad as I
> originally reported. Directory listings are snappy once again, but playing
> h264 video is choppy, particularly when seeking around: there's almost a
> full second delay before it kicks in, no matter where I seek. With NFSv3
> the delay on seeks was less than 0.1 seconds and the playback was never
> jittery.
>

Hmm, see below w.r.t. 100% cpu.

> I can try it again with v3 client and v4 server, if you think that's
> worthy of pursuit. If it makes any difference, the server's four CPUs are
> pegged at 100% (running "nice +4" cpu-bound jobs). But that was the case
> before I enabled v4 server too.
>

It would be interesting to see if the performance problem exists for
NFSv3 mounts against the experimental (nfsv4) server.

Since the CPUs are 100% busy, it might be a scheduling issue w.r.t.
the nfsd threads (ie. the ones in the experimental server don't have
as high a priority as for the regular server?). I've always tested
on a machine where the CPU (I only have single core) are nowhere near
100% busy. If this theory is correct, the performance issue should
still be noticible for an NFSv3 mount to the experimental server.

I'll try running something compute bound on the server here and see
what happens.

rick

Rick Macklem

unread,

Jun 28, 2010, 7:55:52 PM6/28/10

to Rick C. Petty, freebsd...@freebsd.org

On Mon, 28 Jun 2010, Rick C. Petty wrote:

>
>> Make sure you don't have multiple entries for the same uid, such as "root"
>> and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
>> them, if you have both)
>
> Hmm, that's a strange requirement, since FreeBSD by default comes with
> both. That should probably be documented in the nfsv4 man page.
>

Well, if the mapping from uid->name is not unique, getpwuid() will just
return one of them and it probably won't be the expected one. Having
both "root" and "toor" only cause weird behaviour when "root" tries to
use a mount point. I had thought it was in the man pages, but I now
see it isn't mentioned. I'll try and remember to add it.

>>
>> This error indicates that there wasn't a valid FH for the server. I
>> suspect that the mount failed. (It does a loop of Lookups from "/" in
>> the kernel during the mount and it somehow got confused part way through.)
>
> If the mount failed, why would it allow me to "ls /vol/a" and see both "b"
> and "c" directories as well as other files/directories on /vol/ ?
>
>> I don't know why these empty dirs would confuse it. I'll try a test
>> here, but I suspect the real problem was that the mount failed and
>> then happened to succeed after you deleted the empty dirs.
>
> It doesn't seem likely. I spent an hour mounting and unmounting and each
> mount looked successful in that there were files and directories besides
> the two I was trying to decend into.
>

My theory was that, since you used "soft", one of the Lookups during
the mounting process in the kernel failed with ETIMEDOUT. It isn't
coded to handle that. There are lots of things that will break in
the NFSv4 client if "soft" or "intr" are used. (That is in the mount_nfs
man page, but right at the end, so it could get missed.)

Maybe "broken mount" would have been a better term than "failed mount".

If more recent mount attempts are without "soft", then I would expect
them to work reliably. (If you feel daring, add the empty subdirs back
and see if it fails?)

I will try a case with empty subdirs on the client, to see if there is
a problem when I do it. (It should just cover them up until umount, but
it could certainly be broken:-)

>> It still smells like some sort of transport/net interface/... issue
>> is at the bottom of this. (see response to your next post)
>
> It's possible. I just had another NFSv4 client (with the same server) lock
> up:
>
> load: 0.00 cmd: ls 17410 [nfsv4lck] 641.87r 0.00u 0.00s 0% 1512k
>
> and:
>
> load: 0.00 cmd: make 87546 [wait] 37095.09r 0.01u 0.01s 0% 844k
>
> That make has been hung for hours, and the ls(1) was executed during that
> lockup. I wish there was a way I could unhang these processes and unmount
> the NFS mount without panicking the kernel, but alas even this fails:
>
> # umount -f /sw
> load: 0.00 cmd: umount 17479 [nfsclumnt] 1.27r 0.00u 0.04s 0% 788k
>

The plan is to implement a "hard forced" umount (something like -ff)
which will throw away data, but get the umount done, but it hasn't been
coded yet. (For 8.2 maybe?)

> A "shutdown -p now" resulted in a panic with the speaker beeping
> constantly and no console output.
>
> It's possible the NICs are all suspect, but all of this worked fine a
> couple of days ago when I was only using NFSv3.
>

Yea, if NFSv3 worked fine with the same kernel, it seems more likely
an experimental NFS server issue, possibly related to scheduling the
busy CPUs. (If it was a NIC related problem, it is most likely related
to the driver, but if the NFSv3 case was using the same driver, that
doesn't seem likely.)

You are now using "rsize=32768,wsize=32768" aren't you?
(If you aren't yet using that, try it, since larger bursts of
traffic can definitely "tickle" nics driver problems, to borrow
Jeremy's term.)

rick

Rick C. Petty

unread,

Jun 28, 2010, 10:33:20 PM6/28/10

to Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 10:09:21PM -0400, Rick Macklem wrote:

>
>
> On Mon, 28 Jun 2010, Rick C. Petty wrote:
>
> > If it makes any difference, the server's four CPUs are
> >pegged at 100% (running "nice +4" cpu-bound jobs). But that was the case
> >before I enabled v4 server too.
>

> If it is practical, it would be interesting to see what effect killing
> off the cpu bound jobs has w.r.t. performance.

I sent SIGTSTP to all those processes and brought the CPUs to idle. The
jittering/stuttering is still present when watching h264 video. So that
rules out scheduling issues. I'll be investigating Jeremy's TCP tuning
suggestions next.

Rick C. Petty

unread,

Jun 28, 2010, 10:41:57 PM6/28/10

to Jeremy Chadwick, Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 09:29:11AM -0700, Jeremy Chadwick wrote:
>
> # Increase send/receive buffer maximums from 256KB to 16MB.
> # FreeBSD 7.x and later will auto-tune the size, but only up to the max.
> net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.recvbuf_max=16777216
>
> # Double send/receive TCP datagram memory allocation. This defines the
> # amount of memory taken up by default *per socket*.
> net.inet.tcp.sendspace=65536
> net.inet.tcp.recvspace=131072

I tried adjusting to these settings, on both the client and the server.
I still see the same jittery/stuttery video behavior. Thanks for your
suggestions though, these are probably good settings to have around anyway
since I have 12 GB of RAM on the client and 8 GB of RAM on the server.

> make.conf WITHOUT_IPV6 would affect ports, src.conf WITHOUT_INET6 would
> affect the base system (thus rpcbind). The src.conf entry is what's
> causing rpcbind to spit out the above "cannot get information" messages,
> even though IPv6 is available in your kernel (see below).
>
> However: your kernel configuration file must contain "options INET6" or
> else you wouldn't have IPv6 addresses on lo0. So even though your
> kernel and world are synchronised, IPv6 capability-wise they probably
> aren't. This may be your intended desire though, and if so, no biggie.

Oh forgot about that. I'll have to add the "nooptions" since I like to
build as close to GENERIC as possible. Mostly the WITHOUT_* stuff in
/etc/src.conf is to reduce my overall build times, since I don't need some
of those tools.

I'm okay with the messages though; I'll probably comment out WITHOUT_INET6.

Thanks again for your suggestions,

-- Rick C. Petty

Rick C. Petty

unread,

Jun 28, 2010, 11:12:30 PM6/28/10

to Rick Macklem, freebsd...@freebsd.org

On Mon, Jun 28, 2010 at 07:48:59PM -0400, Rick Macklem wrote:
>
> Ok, it sounds like you found some kind of race condition in the delegation
> handling. (I'll see if I can reproduce it here. It could be fun to find:-)

Good luck with that! =)

> >I can try it again with v3 client and v4 server, if you think that's
> >worthy of pursuit. If it makes any difference, the server's four CPUs are
> >pegged at 100% (running "nice +4" cpu-bound jobs). But that was the case
> >before I enabled v4 server too.
> >
> It would be interesting to see if the performance problem exists for
> NFSv3 mounts against the experimental (nfsv4) server.

Hmm, I couldn't reproduce the problem. Once I unmounted the nfsv4 client
and tried v3, the jittering stopped. Then I unmounted v3 and tried v4
again, no jitters. I played with a couple of combinations back and forth
(toggling the presence of "nfsv4" in the options) and sometimes I saw
jittering but only with v4, but nothing like what I was seeing before.
Perhaps this is a result of Jeremy's TCP tuning tweaks.

This is also a difficult thing to test, because the server and client have
so much memory, they cache the date blocks. So if I try my stutter test
on the same video a second time, I only notice stutters if I skip to parts
I haven't skipped to before. I can comment that it seemed like more of a
latency issue than a throughput issue to me. But the disks aren't ever
under a high load. But it's hard to determine accurate load when the
disks are seeking. Oh, I'm using the AHCI controller mode/driver on those
disks instead of ATA, if that matters.

One time when I mounted the v4 again, it broke subdirectories like I was
talking about before. Essentially it would give me a readout of all the
top-level directories but wouldn't descend into subdirectories which
reflect different mountpoints on the server. An unmount and a remount
(without changes to /etc/fstab) fixed the problem. I'm wondering if there
isn't some race condition that seems to affect crossing mountpoints on the
server. When the situation happens, it affects all mountpoints equally
and persists for the duration of that mount. And of course, I can't
reproduce the problem when I try.

I saw the broken mountpoint crossing on another client (without any TCP
tuning) but each time it happened I saw this in the logs:

nfscl: consider increasing kern.ipc.maxsockbuf

Once I doubled that value, the problem went away.. at least with this
particular v4 server mountpoint.

At the moment, things are behaving as expected. The v4 file system seems
just as fast as v3 did, and I don't need a dozen mountpoints specified
on each client thanks to v4. Once again, I thank you, Rick, for all your
hard work!

-- Rick C. Petty

Ian Smith

unread,

Jun 29, 2010, 12:05:31 AM6/29/10

to Rick Macklem, Rick C. Petty, freebsd...@freebsd.org

On Mon, 28 Jun 2010, Rick Macklem wrote:
> On Mon, 28 Jun 2010, Rick C. Petty wrote:
>
> >
> > > Make sure you don't have multiple entries for the same uid, such as
> > > "root"
> > > and "toor" both for uid 0 in your /etc/passwd. (ie. get rid of one of
> > > them, if you have both)
> >
> > Hmm, that's a strange requirement, since FreeBSD by default comes with
> > both. That should probably be documented in the nfsv4 man page.
> >
>
> Well, if the mapping from uid->name is not unique, getpwuid() will just
> return one of them and it probably won't be the expected one. Having
> both "root" and "toor" only cause weird behaviour when "root" tries to
> use a mount point. I had thought it was in the man pages, but I now
> see it isn't mentioned. I'll try and remember to add it.

Not wanting to hijack this (interesting) thread, but ..

I have to concur with Rick P - that's rather a odd requirement when each
FreeBSD install since at least 2.2 has come with root and toor (in that
order) in /etc/passwd. I don't use toor, but often enough read about
folks who do, and don't recall it ever being an issue with NFSv3. Are
you sure this is a problem that cannot be coded around in NFSv4?

cheers, Ian

Rick Macklem

unread,

Jun 29, 2010, 10:43:15 AM6/29/10

to Ian Smith, Rick C. Petty, freebsd...@freebsd.org

On Tue, 29 Jun 2010, Ian Smith wrote:

>
> Not wanting to hijack this (interesting) thread, but ..
>
> I have to concur with Rick P - that's rather a odd requirement when each
> FreeBSD install since at least 2.2 has come with root and toor (in that
> order) in /etc/passwd. I don't use toor, but often enough read about
> folks who do, and don't recall it ever being an issue with NFSv3. Are
> you sure this is a problem that cannot be coded around in NFSv4?
>

Currently when the nfsuserd needs to translate a uid (such as 0) into a
name (NFSv4 uses names instead of the numbers used by NFSv3), it calls
getpwuid() and uses whatever name is returned. If there are more than
one name for the uid (such as the above case for 0), then you get one
of them and that causes confusion.

I suppose if the FreeBSD world feels that "root" and "toor" must both
exist in the password database, then "nfsuserd" could be hacked to handle
the case of translating uid 0 to "root" without calling getpwuid(). It
seems ugly, but if deleting "toor" from the password database upsets
people, I can do that.

rick

Rick C. Petty

unread,

Jun 29, 2010, 11:34:00 AM6/29/10

to Adam Vande More, Rick Macklem, freebsd...@freebsd.org, Ian Smith

On Tue, Jun 29, 2010 at 10:20:57AM -0500, Adam Vande More wrote:

> On Tue, Jun 29, 2010 at 9:58 AM, Rick Macklem <rmac...@uoguelph.ca> wrote:
>
> > I suppose if the FreeBSD world feels that "root" and "toor" must both
> > exist in the password database, then "nfsuserd" could be hacked to handle
> > the case of translating uid 0 to "root" without calling getpwuid(). It
> > seems ugly, but if deleting "toor" from the password database upsets
> > people, I can do that.
>

> I agree with Ian on this. I don't use toor either, but have seen people use
> it, and sometimes it will get recommended here for various reasons e.g.
> running a root account with a different default shell. It wouldn't bother
> me having to do this provided it was documented, but having to do so would
> be a POLA violation to many users I think.

To be fair, I'm not sure this is even a problem. Rick M. only suggested it
as a possibility. I would think that getpwuid() would return the first
match which has always been root. At least that's what it does when
scanning the passwd file; I'm not sure about NIS. If someone can prove
that this will cause a problem with NFSv4, we could consider hackingit.
Otherwise I don't think we should change this behavior yet.

-- Rick C. Petty

Rick Macklem

unread,

Jun 29, 2010, 12:01:11 PM6/29/10

to Rick C. Petty, Adam Vande More, freebsd...@freebsd.org, Ian Smith

On Tue, 29 Jun 2010, Rick C. Petty wrote:

>
> To be fair, I'm not sure this is even a problem. Rick M. only suggested it
> as a possibility. I would think that getpwuid() would return the first
> match which has always been root. At least that's what it does when
> scanning the passwd file; I'm not sure about NIS. If someone can prove
> that this will cause a problem with NFSv4, we could consider hackingit.
> Otherwise I don't think we should change this behavior yet.
>

I do know that it causes problems from my testing. I think getpwuid() gets
"toor" because of the way /etc/passwd gets stored in the database created
from it via "vipw".

I have no problem coding it as a special case for nfsuserd and documenting
it. I just won't guarantee how soon it will happen:-)

rick

Rick Macklem

unread,

Jun 29, 2010, 12:15:08 PM6/29/10

to Rick C. Petty, freebsd...@freebsd.org

On Mon, 28 Jun 2010, Rick C. Petty wrote:

>>>
>> It would be interesting to see if the performance problem exists for
>> NFSv3 mounts against the experimental (nfsv4) server.
>
> Hmm, I couldn't reproduce the problem. Once I unmounted the nfsv4 client
> and tried v3, the jittering stopped. Then I unmounted v3 and tried v4
> again, no jitters. I played with a couple of combinations back and forth
> (toggling the presence of "nfsv4" in the options) and sometimes I saw
> jittering but only with v4, but nothing like what I was seeing before.
> Perhaps this is a result of Jeremy's TCP tuning tweaks.
>
> This is also a difficult thing to test, because the server and client have
> so much memory, they cache the date blocks. So if I try my stutter test
> on the same video a second time, I only notice stutters if I skip to parts
> I haven't skipped to before. I can comment that it seemed like more of a
> latency issue than a throughput issue to me. But the disks aren't ever
> under a high load. But it's hard to determine accurate load when the
> disks are seeking. Oh, I'm using the AHCI controller mode/driver on those
> disks instead of ATA, if that matters.
>

I basically don't have a clue what might be the source of the problem. I
do agree that it sounds like an intermittent latency issue.

The only thing I can think of that you might try is simply increasing
the number of nfsd threads on the server. They don't add much overhead
and the default of '4'is pretty small. (It's just the "-n N" option on
nfsd, just in case you weren't aware of it.)

> One time when I mounted the v4 again, it broke subdirectories like I was
> talking about before. Essentially it would give me a readout of all the
> top-level directories but wouldn't descend into subdirectories which
> reflect different mountpoints on the server. An unmount and a remount
> (without changes to /etc/fstab) fixed the problem. I'm wondering if there
> isn't some race condition that seems to affect crossing mountpoints on the
> server. When the situation happens, it affects all mountpoints equally
> and persists for the duration of that mount. And of course, I can't
> reproduce the problem when I try.
>

If it happened for a "hard mount" (no "soft,intr" mount options) then it
is a real bug. The server mount point crossings are detected via a change
in the value of the fsid attribute. I suspect that under some
circumstances, the wrong value of fsid is getting cached in the client.
(I just remembered that you use "rdirplus" and it "might" not be caching
the server's notion of fsid in the right place.)

If you were really keen (if you ever look up "keen" in Webster's, it's
not what we tend to use it for at all. It was actually a "wail for the
dead" and a keener was a professional wailer for the dead, hired for
funerals of important but maybe not that well liked individuals. But I
digress...), you could try a bunch of mounts./dismounts without "rdirplus"
and see if you can even get it to fail without the option.

> I saw the broken mountpoint crossing on another client (without any TCP
> tuning) but each time it happened I saw this in the logs:
>
> nfscl: consider increasing kern.ipc.maxsockbuf
>
> Once I doubled that value, the problem went away.. at least with this
> particular v4 server mountpoint.
>

If this had any effect, it was probably timing/latency related to a bug
w.r.t. caching of the server's notion of fsid. I'll poke around and see
if I can spot where this might be broken.

> At the moment, things are behaving as expected. The v4 file system seems
> just as fast as v3 did, and I don't need a dozen mountpoints specified
> on each client thanks to v4. Once again, I thank you, Rick, for all your
> hard work!
>

Btw, if the mountpoint crossing bug gets too irritating, you can do the
multiple mounts for NFSv4 just like NFSv3. (That's what you have to do
do for the Solaris10 NFSv4 client, because its completely broken w.r.t.
mountpoint crossings.)

rick

Dan Nelson

unread,

Jun 29, 2010, 12:17:28 PM6/29/10

to Rick C. Petty, Adam Vande More, Rick Macklem, freebsd...@freebsd.org, Ian Smith

In the last episode (Jun 29), Rick C. Petty said:
> On Tue, Jun 29, 2010 at 10:20:57AM -0500, Adam Vande More wrote:
> > On Tue, Jun 29, 2010 at 9:58 AM, Rick Macklem <rmac...@uoguelph.ca> wrote:
> >
> > > I suppose if the FreeBSD world feels that "root" and "toor" must both
> > > exist in the password database, then "nfsuserd" could be hacked to
> > > handle the case of translating uid 0 to "root" without calling
> > > getpwuid(). It seems ugly, but if deleting "toor" from the password
> > > database upsets people, I can do that.
> >
> > I agree with Ian on this. I don't use toor either, but have seen people
> > use it, and sometimes it will get recommended here for various reasons
> > e.g. running a root account with a different default shell. It
> > wouldn't bother me having to do this provided it was documented, but
> > having to do so would be a POLA violation to many users I think.
>
> To be fair, I'm not sure this is even a problem. Rick M. only suggested
> it as a possibility. I would think that getpwuid() would return the first
> match which has always been root. At least that's what it does when
> scanning the passwd file; I'm not sure about NIS. If someone can prove
> that this will cause a problem with NFSv4, we could consider hackingit.
> Otherwise I don't think we should change this behavior yet.

If there are multiple users that map to the same userid, nscd on Linux will
select one name at random and return it for getpwuid() calls. I haven't
seen this behaviour on FreeBSD or Solaris, though. They always seem to
return the first entry in the passwd file.

--
Dan Nelson
dne...@allantgroup.com

Ian Smith

unread,

Jun 30, 2010, 12:16:11 AM6/30/10

to Dan Nelson, Rick C. Petty, Rick Macklem, freebsd...@freebsd.org, Adam Vande More

On Tue, 29 Jun 2010, Dan Nelson wrote:
> In the last episode (Jun 29), Rick C. Petty said:
> > On Tue, Jun 29, 2010 at 10:20:57AM -0500, Adam Vande More wrote:
> > > On Tue, Jun 29, 2010 at 9:58 AM, Rick Macklem <rmac...@uoguelph.ca> wrote:
> > >
> > > > I suppose if the FreeBSD world feels that "root" and "toor" must both
> > > > exist in the password database, then "nfsuserd" could be hacked to
> > > > handle the case of translating uid 0 to "root" without calling
> > > > getpwuid(). It seems ugly, but if deleting "toor" from the password
> > > > database upsets people, I can do that.
> > >
> > > I agree with Ian on this. I don't use toor either, but have seen people
> > > use it, and sometimes it will get recommended here for various reasons
> > > e.g. running a root account with a different default shell. It
> > > wouldn't bother me having to do this provided it was documented, but
> > > having to do so would be a POLA violation to many users I think.
> >
> > To be fair, I'm not sure this is even a problem. Rick M. only suggested
> > it as a possibility. I would think that getpwuid() would return the first
> > match which has always been root. At least that's what it does when
> > scanning the passwd file; I'm not sure about NIS. If someone can prove
> > that this will cause a problem with NFSv4, we could consider hackingit.
> > Otherwise I don't think we should change this behavior yet.
>
> If there are multiple users that map to the same userid, nscd on Linux will
> select one name at random and return it for getpwuid() calls. I haven't
> seen this behaviour on FreeBSD or Solaris, though. They always seem to
> return the first entry in the passwd file.

I wondered whether this might be a Linux thing. On my 7.2 system,

% find /usr/src -name "*.[ch]" -exec grep -Hw getpwuid {} \; > file

returns 195 lines, many in the form getpwuid(getuid()), in many base and
contrib components - including id(1), bind, sendmail etc - that could be
nondeterministic if getpwuid(0) ever returned other than root.

Just one mention of 'toor' in /usr/src/usr.sbin/makefs/compat/pwcache.c

Not claiming to know how the lookups in /usr/src/lib/libc/gen/getpwent.c
work under the hood, but this does seem likely a non-issue on FreeBSD.

cheers, Ian

Rick Macklem

unread,

Jun 30, 2010, 7:57:20 PM6/30/10

to Ian Smith, Rick C. Petty, Dan Nelson, freebsd...@freebsd.org, Adam Vande More

On Wed, 30 Jun 2010, Ian Smith wrote:

>
> I wondered whether this might be a Linux thing. On my 7.2 system,
>
> % find /usr/src -name "*.[ch]" -exec grep -Hw getpwuid {} \; > file
>
> returns 195 lines, many in the form getpwuid(getuid()), in many base and
> contrib components - including id(1), bind, sendmail etc - that could be
> nondeterministic if getpwuid(0) ever returned other than root.
>
> Just one mention of 'toor' in /usr/src/usr.sbin/makefs/compat/pwcache.c
>
> Not claiming to know how the lookups in /usr/src/lib/libc/gen/getpwent.c
> work under the hood, but this does seem likely a non-issue on FreeBSD.
>

I remember it causing some confusion while testing, but I can't remember
when or where. It might have been Linux or I might have been logged in as
"toor" or ????

I think I will hardcode the "root" case in nfsuserd, just to be safe.
(I also migt have editted /etc/passwd and reordered entries without
paying attention to it.)

rick

Rick Macklem

unread,

Jul 1, 2010, 10:51:12 AM7/1/10

to Rick C. Petty, freebsd...@freebsd.org

On Mon, 28 Jun 2010, Rick C. Petty wrote:

> On Mon, Jun 28, 2010 at 12:35:14AM -0400, Rick Macklem wrote:
>>
>> Being stuck in "newnfsreq" means that it is trying to establish a TCP
>> connection with the server (again smells like some networking issue).
>> <snip>
>> Disabling delegations is the next step. (They aren't
>> required for correct behaviour and are disabled by default because
>> they are the "greenest" part of the implementation.)
>
> After disabling delegations, I was able to build world and kernel on two
> different clients, and my port build problems went away as well.
>

I was able to reproduce a problem when delegations are enabled and the
"rdirplus" option was used on a mount. Since I haven't done non-trivial
testing with "rdirplus" set, but have done quite a bit with delegations
enabled for mounts without "rdirplus", I suspect the problem is related
to using "rdirplus" on NFSv4 mounts.

So, I'd recommend against using "rdirplus" on NFSv4 mounts until the
problem gets resolved.

You could try re-enabling delegations and the try mounts without
"rdirplus" and see if the problems during builds still show up?

Thanks for your help with testing, rick

Rick Macklem

unread,

Jul 5, 2010, 11:58:51 PM7/5/10

to Rick C. Petty, freebsd...@freebsd.org

On Sun, 27 Jun 2010, Rick C. Petty wrote:

> First off, many thanks to Rick Macklem for making NFSv4 possible in
> FreeBSD!
>
> I recently updated my NFS server and clients to v4, but have since noticed
> significant performance penalties. For instance, when I try "ls a b c" (if
> a, b, and c are empty directories) on the client, it takes up to 1.87
> seconds (wall time) whereas before it always finished in under 0.1 seconds.
> If I repeat the test, it takes the same amount of time in v4 (in v3, wall
> time was always under 0.01 seconds for subsequent requests, as if the
> directory listing was cached).
>
> If I try to play an h264 video file on the filesystem using mplayer, it
> often jitters and skipping around in time introduces up to a second or so
> pause. With NFSv3 it behaved more like the file was on local disk (no
> noticable pauses or jitters).
>
I just came across a case where things get really slow during testing
of some experimental caching stuff. (It was caused by the experimental
stuff not in head, but...)

It turns out that if numvnodes > desiredvnodes, it sleeps for 1sec before
allocating a new vnode. This might explain your approx. 1sec delays.

When this happens, "ps axlH" will probably show a process sleeping on
"vlruwk" and desiredvnodes can be increased by setting a larger value
for kern.maxvnodes. (numvnodes can be seen as vfs.numvnodes)

I don't think there is actually a vnode leak, but you might find that
the experimental nfs subsystem is a vnode hog.

Rick C. Petty

unread,

Aug 28, 2010, 11:51:00 PM8/28/10

to Rick Macklem, freebsd...@freebsd.org

Hi. I'm still having problems with NFSv4 being very laggy on one client.
When the NFSv4 server is at 50% idle CPU and the disks are < 1% busy, I am
getting horrible throughput on an idle client. Using dd(1) with 1 MB block
size, when I try to read a > 100 MB file from the client, I'm getting
around 300-500 KiB/s. On another client, I see upwards of 20 MiB/s with
the same test (on a different file). On the broken client:

# uname -mv
FreeBSD 8.1-STABLE #5 r211534M: Sat Aug 28 15:53:10 CDT 2010 us...@example.com:/usr/obj/usr/src/sys/GENERIC i386

# ifconfig re0

re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=389b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>

ether 00:e0:4c:xx:yy:zz
inet xx.yy.zz.3 netmask 0xffffff00 broadcast xx.yy.zz.255

media: Ethernet autoselect (1000baseT <full-duplex>)
status: active

# netstat -m
267/768/1035 mbufs in use (current/cache/total)
263/389/652/25600 mbuf clusters in use (current/cache/total/max)
263/377 mbuf+clusters out of packet secondary zone in use (current/cache)
0/20/20/12800 4k (page size) jumbo clusters in use (current/cache/total/max)

0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)

592K/1050K/1642K bytes allocated to network (current/cache/total)

0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)

0/5/6656 sfbufs in use (current/peak/max)

0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile
0 calls to protocol drain routines

# netstat -idn

Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll Drop

re0 1500 <Link#1> 00:e0:4c:xx:yy:zz 232135 0 0 68984 0 0 0
re0 1500 xx.yy.zz.0/2 xx.yy.zz.3 232127 - - 68979 - - -
nfe0* 1500 <Link#2> 00:22:15:xx:yy:zz 0 0 0 0 0 0 0
plip0 1500 <Link#3> 0 0 0 0 0 0 0
lo0 16384 <Link#4> 42 0 0 42 0 0 0
lo0 16384 fe80:4::1/64 fe80:4::1 0 - - 0 - - -

lo0 16384 ::1/128 ::1 0 - - 0 - - -
lo0 16384 127.0.0.0/8 127.0.0.1 42 - - 42 - - -

# sysctl kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 1048576
# sysctl net.inet.tcp.sendbuf_max
net.inet.tcp.sendbuf_max: 16777216
# sysctl net.inet.tcp.recvbuf_max
net.inet.tcp.recvbuf_max: 16777216
# sysctl net.inet.tcp.sendspace
net.inet.tcp.sendspace: 65536
# sysctl net.inet.tcp.recvspace
net.inet.tcp.recvspace: 131072

# sysctl hw.pci | grep msi

hw.pci.honor_msi_blacklist: 1
hw.pci.enable_msix: 1
hw.pci.enable_msi: 1

# vmstat -i
interrupt total rate
irq14: ata0 47 0
irq16: re0 219278 191
irq21: ohci0+ 5939 5
irq22: vgapci0+ 77990 67
cpu0: timer 2294451 1998
irq256: hdac0 44069 38
cpu1: timer 2293983 1998
Total 4935757 4299

Any ideas?

-- Rick C. Petty

Rick Macklem

unread,

Aug 29, 2010, 11:45:15 AM8/29/10

to rick-fre...@kiwi-computer.com, freebsd...@freebsd.org

> Hi. I'm still having problems with NFSv4 being very laggy on one
> client.
> When the NFSv4 server is at 50% idle CPU and the disks are < 1% busy,
> I am
> getting horrible throughput on an idle client. Using dd(1) with 1 MB
> block
> size, when I try to read a > 100 MB file from the client, I'm getting
> around 300-500 KiB/s. On another client, I see upwards of 20 MiB/s
> with
> the same test (on a different file). On the broken client:
>

Since other client(s) are working well, that seems to suggest that it
is a network related problem and not a bug in the NFS code.

First off, the obvious question: How does this client differ from the
one that performs much better?
Do they both use the same "re" network interface for the NFS traffic?
(If the answer is "no", I'd be suspicious that the "re" hardware or
device driver is the culprit.)

Things that I might try in an effort to isolate the problem:
- switch the NFS traffic to use the nfe0 net interface.
- put a net interface identical to the one on the client that
works well in the machine and use that for the NFS traffic.
- turn off TXCSUM and RXCSUM on re0
- reduce the read/write data size, using rsize=N,wsize=N on the
mount. (It will default to MAXBSIZE and some net interfaces don't
handle large bursts of received data well. If you drop it to
rsize=8192,wszie=8192 and things improve, then increase N until it
screws up.)
- check the port configuration on the switch end, to make sure it
is also 1000bps-full duplex.
- move the client to a different net port on the switch or even a
different switch (and change the cable, while you're at it).
- Look at "netstat -s" and see if there are a lot of retransmits
going on in TCP.

If none of the above seems to help, you could look at a packet trace
and see what is going on. Look for TCP reconnects (SYN, SYN-ACK...)
or places where there is a large time delay/retransmit of a TCP
segment.

Hopefully others who are more familiar with the networking side
can suggest other things to try, rick

Rick C. Petty

unread,

Aug 30, 2010, 1:24:02 PM8/30/10

to Rick Macklem, freebsd...@freebsd.org

On Sun, Aug 29, 2010 at 11:44:06AM -0400, Rick Macklem wrote:
> > Hi. I'm still having problems with NFSv4 being very laggy on one
> > client.
> > When the NFSv4 server is at 50% idle CPU and the disks are < 1% busy,
> > I am
> > getting horrible throughput on an idle client. Using dd(1) with 1 MB
> > block
> > size, when I try to read a > 100 MB file from the client, I'm getting
> > around 300-500 KiB/s. On another client, I see upwards of 20 MiB/s
> > with
> > the same test (on a different file). On the broken client:
>
> Since other client(s) are working well, that seems to suggest that it
> is a network related problem and not a bug in the NFS code.

Well I wouldn't say "well". Every client I've set up has had this issue,
and somehow through tweaking various settings and restarting nfs a bunch of
times, I've been able to make it tolerable for most clients. Only one
client is behaving well, and that happens to be the only machine I haven't
rebooted since I enabled NFSv4. Other clients are seeing 2-3 MiB/s on my
dd(1) test.

I should point out that caching is an issue. The second time I run a dd on
the same input file, I get upwards of 20-35 MiB/s on the "bad" client. But
I can "invalidate" the cache by unmounting and remounting the file system
so it looks like client-side caching.

I'm not sure how you can say it's network-related and not NFS. Things
worked just fine with NFSv3 (in fact NFSv3 client using the same NFSv4
server doesn't have this problem). Using rsync over ssh I get around 15-20
MiB/s throughput, and dd piped through ssh gets almost 40 MiB/s (neither
one is using compression)!

> First off, the obvious question: How does this client differ from the
> one that performs much better?

Different hardware (CPU, board, memory). I'm also hoping it was some
sysctl tweak I did, but I can't seem to determine what it was.

> Do they both use the same "re" network interface for the NFS traffic?
> (If the answer is "no", I'd be suspicious that the "re" hardware or
> device driver is the culprit.)

That's the same thing you and others said about the *other* NFSv4 clients
I set up. How is v4 that much different than v3 in terms of network
traffic? The other clients are all using re0 and exactly the same
ifconfig "options" and flags, including the client that's behaving fine.

> Things that I might try in an effort to isolate the problem:
> - switch the NFS traffic to use the nfe0 net interface.

I'll consider it. I'm not convinced it's a NIC problem yet.

> - put a net interface identical to the one on the client that
> works well in the machine and use that for the NFS traffic.

It's already close enough. Bad client:

re0@pci0:1:7:0: class=0x020000 card=0x816910ec chip=0x816910ec rev=0x10 hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Single Gigabit LOM Ethernet Controller (RTL8110)'

class = network
subclass = ethernet

Good client:

re0@pci0:1:0:0: class=0x020000 card=0x84321043 chip=0x816810ec rev=0x06 hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
class = network
subclass = ethernet

Mediocre client:

re0@pci0:1:0:0: class=0x020000 card=0x84321043 chip=0x816810ec rev=0x06 hdr=0x00
vendor = 'Realtek Semiconductor'
device = 'Gigabit Ethernet NIC(NDIS 6.0) (RTL8168/8111/8111c)'
class = network
subclass = ethernet

The mediocre and good clients have exactly identical hardware, and often
I'll witness the "slow client" behavior on the mediocre client, and rarely
on the "good client" although in previous emails to you, it was the "good
client" which was behaving the worst of all.

Other differences:
good client = 8.1 GENERIC r210227M amd64 12GB RAM Athlon II X2 255
med. client = 8.1 GENERIC r209555M i386 4GB RAM Athlon II X2 255
bad client = 8.1 GENERIC r211534M i386 2GB RAM Athlon 64 X2 5200+

> - turn off TXCSUM and RXCSUM on re0

Tried that, didn't help although it seemed to slow things down a little.

> - reduce the read/write data size, using rsize=N,wsize=N on the
> mount. (It will default to MAXBSIZE and some net interfaces don't
> handle large bursts of received data well. If you drop it to
> rsize=8192,wszie=8192 and things improve, then increase N until it
> screws up.)

8k didn't improve things at all.

> - check the port configuration on the switch end, to make sure it
> is also 1000bps-full duplex.

It is, and has been.

> - move the client to a different net port on the switch or even a
> different switch (and change the cable, while you're at it).

I've tried that too. The switches are great and my cables are fine.
Like I said, NFSv3 on the same moint point works just fine (dd does
around 30-35 MiB/s).

> - Look at "netstat -s" and see if there are a lot of retransmits
> going on in TCP.

2 of 40k TCP packets retransmitted, 7k of 40k duplicate acks received.
I don't see anything else in "netstat -s" with numbers larger than 10.

> If none of the above seems to help, you could look at a packet trace
> and see what is going on. Look for TCP reconnects (SYN, SYN-ACK...)
> or places where there is a large time delay/retransmit of a TCP
> segment.

Is that something easily scriptable with tcpdump? I'd rather not look
for such things manually.

> Hopefully others who are more familiar with the networking side
> can suggest other things to try, rick

I'm still not convinced it's a network issue. Here are some specs I
tested with dd(1) on the same file on the file server, listed in the
order I performed these tests:

client mount first attempt second attempt
------ ----- ------------- --------------
bad NFSv3 32954968 B/s 643911087 B/s
bad NFSv4 439672 B/s 6694992 B/s
med. NFSv4 333837 B/s 617387 B/s
med. NFSv3 95043062 B/s 1617717600 B/s
good NFSv4 64276844 B/s 2488692465 B/s
good NFSv3 98051629 B/s 2697787313 B/s
bad NFSv4 580284 B/s 13554608 B/s

It seems pretty obvious to me that v3 outperforms v4, and that there are
some caching effects on the client. But even with the cache, the
performance from v4 is pretty pitiful, except for one of my clients. I'm
not sure what I tweaked. I'll include a diff of the relevant "sysctl -a"
outputs from those two machines:

--- bad client
+++ good client
-kern.version: FreeBSD 8.1-STABLE #5 r211534M: Sat Aug 28 15:53:10 CDT 2010
+kern.version: FreeBSD 8.1-PRERELEASE #1 r210227M: Sun Jul 18 23:24:16 CDT 2010
-kern.ipc.maxsockbuf: 1048576
+kern.ipc.maxsockbuf: 524288
-kern.ipc.max_datalen: 124
+kern.ipc.max_datalen: 92
-kern.ipc.pipekva: 114688
-kern.ipc.maxpipekva: 16777216
+kern.ipc.pipekva: 589824
+kern.ipc.maxpipekva: 207671296
-kern.ipc.numopensockets: 59
+kern.ipc.numopensockets: 202
-kern.ipc.nsfbufspeak: 5
-kern.ipc.nsfbufs: 6656
+kern.ipc.nsfbufspeak: 0
+kern.ipc.nsfbufs: 0
-kern.openfiles: 144
+kern.openfiles: 632
-kern.maxssiz: 67108864
+kern.maxssiz: 536870912
-kern.maxdsiz: 536870912
+kern.maxdsiz: 34359738368
-kern.maxbcache: 209715200
+kern.maxbcache: 0
-kern.nbuf: 7224
+kern.nbuf: 79194
-vfs.ufs.dirhash_lowmemcount: 351
+vfs.ufs.dirhash_lowmemcount: 1725
-vfs.ufs.dirhash_mem: 1123139
-vfs.freevnodes: 24960
+vfs.freevnodes: 25000
vfs.wantfreevnodes: 25000
-vfs.numvnodes: 36966
+vfs.numvnodes: 88150
-net.inet.icmp.bmcastecho: 0
+net.inet.icmp.bmcastecho: 1
-net.inet.tcp.sendspace: 65536
-net.inet.tcp.recvspace: 131072
+net.inet.tcp.sendspace: 32768
+net.inet.tcp.recvspace: 65536
-net.inet.tcp.hostcache.count: 3
+net.inet.tcp.hostcache.count: 8
-net.inet.tcp.recvbuf_max: 16777216
+net.inet.tcp.recvbuf_max: 262144
-net.inet.tcp.sendbuf_max: 16777216
+net.inet.tcp.sendbuf_max: 262144
-net.inet.tcp.reass.overflows: 0
+net.inet.tcp.reass.overflows: 1993
-net.inet.tcp.pcbcount: 16
+net.inet.tcp.pcbcount: 34
-machdep.tsc_freq: 2712350646
+machdep.tsc_freq: 3110426281

-- Rick

Rick Macklem

unread,

Aug 30, 2010, 10:00:40 PM8/30/10

to rick-fre...@kiwi-computer.com, freebsd...@freebsd.org

>
> Well I wouldn't say "well". Every client I've set up has had this
> issue,
> and somehow through tweaking various settings and restarting nfs a
> bunch of
> times, I've been able to make it tolerable for most clients. Only one
> client is behaving well, and that happens to be the only machine I
> haven't
> rebooted since I enabled NFSv4. Other clients are seeing 2-3 MiB/s on
> my
> dd(1) test.
>

All I can tell you is that, for my old hardware (100Mbps networking)
I see 10Mbytes/sec (all you can hope for) using the regular NFSv3
client. I see about 10% slower for NFSv3 and NFSv4 using the experimental
client (NFSv3 and NFSv4 about identical). The 10% doesn't surprise me,
since the experimental client is based on a FreeBSD6 client and,
although I plan on carrying all the newer client changes over to
it, I haven't gotten around to doing that. If it is still 10% slower
after the changes are carried over, I will be looking at why.

I don't tune anything with sysctl, I just use what I get from an
install from CD onto i386 hardware. (I don't even bother to increase
kern.ipc.maxsockbuf although I suggest that in the mount message.)

I also do not specify any mount options other than the protocol
version. My mount commands look like:
# mount -t nfs -o nfsv3 <server>:/path /mnt
# mount -t newnfs -o nfsv3 <server<:/path /mnt
# mount -t nfs -o nfsv4 <server>:/path /mnt

So, I don't see dramatically slower NFSv4 and expect to get the 10%
perf. reduction fixed when I bring the exp. client in line with
the current one, but can't be sure.

So, I have no idea what you are seeing. It might be an issue
that will be fixed when I bring the exp. client up to date,
but I have no idea if that's the case? (It will be a few
months before the client update happens.)

The only thing I can suggest is trying:
# mount -t newnfs -o nfsv3 <server>:/path /mnt
and seeing if that performs like the regukar NFSv3 or has
the perf. issue you see for NFSv4?

If this does have the perf. issue, then the exp. client
is most likely the cause and may get better in a few months
when I bring it up-to-date.

rick

Rick Macklem

unread,

Aug 30, 2010, 10:25:55 PM8/30/10

to rick-fre...@kiwi-computer.com, freebsd...@freebsd.org

> On Sun, Aug 29, 2010 at 11:44:06AM -0400, Rick Macklem wrote:
> > > Hi. I'm still having problems with NFSv4 being very laggy on one
> > > client.
> > > When the NFSv4 server is at 50% idle CPU and the disks are < 1%
> > > busy,
> > > I am
> > > getting horrible throughput on an idle client. Using dd(1) with 1
> > > MB
> > > block
> > > size, when I try to read a > 100 MB file from the client, I'm
> > > getting
> > > around 300-500 KiB/s. On another client, I see upwards of 20 MiB/s
> > > with
> > > the same test (on a different file). On the broken client:
> >
> > Since other client(s) are working well, that seems to suggest that
> > it
> > is a network related problem and not a bug in the NFS code.
>

Oh, one more thing...Make sure that the user and group name/number
space is consistent across all machines and nfsuserd is working on
them all. (Look at "ls -lg" on the clients and see that the
correct user/group names are showing up.) If this mapping isn't
working correctly, it will do an upcall to the userland nfsuserd for
every RPC and that would make NFSv4 run very slowly. It will also
use the domain part (after first '.') of each machine's hostname,
so make sure that all the hostnames (all clients and server) are
the same. ie: server.cis.uoguelph.ca, client1.cis.uoguelph.ca,...
are all .cis.uoguelph.ca.

If that is the problem

# mount -t newnfs -o nfsv3 <server>:/path /mnt

will work fine, since NFSv3 doesn't use the mapping daemon.

>
> Is that something easily scriptable with tcpdump? I'd rather not look
> for such things manually.
>

I've always done this manually and, although tcpdump can be used
to do the packet capture, wireshark actually understands NFS packets
and, as such, is much better for looking at the packets.

rick

Hannes Hauswedell

unread,

Sep 1, 2010, 10:15:22 AM9/1/10

to freebsd...@freebsd.org

Hi everyone,

I am experiencing similar issues with newnfs:

1) I have two clients that each get around 0.5MiB/s to 2.6MiB/s reading
from the NFS4-share on Gbit-Lan

2) Mounting with -t newnfs -o nfsv3 results in no performance gain
whatsoever.

3) Mounting with -t nfs results in 58MiB/s ! (Netcat has similar
performance) → not a hardware/driver issue from my pov

Is there anything I can do to help fix this?

Thank you,
--
┌─────────────────────────────────────────┐
Best Regards, │ Free Software Foundation Europe █▉ │
Hannes Hauswedell │ German Team █▉█▉█▉ │
│ Coordinator for pdfreaders.org ▉▉ │
└─────────────────────────────────────────┘

Rick Macklem

unread,

Sep 1, 2010, 11:47:38 AM9/1/10

to Hannes Hauswedell, freebsd...@freebsd.org

> Hi everyone,
>
> I am experiencing similar issues with newnfs:
>
> 1) I have two clients that each get around 0.5MiB/s to 2.6MiB/s
> reading
> from the NFS4-share on Gbit-Lan
>
> 2) Mounting with -t newnfs -o nfsv3 results in no performance gain
> whatsoever.
>
> 3) Mounting with -t nfs results in 58MiB/s ! (Netcat has similar
> performance) → not a hardware/driver issue from my pov
>

The experimental client does reads in larger MAXBSIZE chunks,
which did cause a similar problem in Mac OS X until
rsize=32768,wsize=32768 was specified. Rick already tried that,
but you might want to try it for your case.

> Is there anything I can do to help fix this?
>

Ok, so it does sound like an issue in the experimental client and
not NFSv4. For the most part, the read code is the same as
the regular client, but it hasn't been brought up-to-date
with recent changes.

One thing you could try is building a kernel without SMP enabled
and see if that helps? (I only have single core hardware, so I won't
see any SMP races.) If that helps, I can compare the regular vs
experimental client for smp locking in the read stuff.

Rick C. Petty

unread,

Sep 3, 2010, 10:32:20 PM9/3/10

to Rick Macklem, freebsd...@freebsd.org

On Mon, Aug 30, 2010 at 09:59:38PM -0400, Rick Macklem wrote:
>
> I don't tune anything with sysctl, I just use what I get from an
> install from CD onto i386 hardware. (I don't even bother to increase
> kern.ipc.maxsockbuf although I suggest that in the mount message.)

Sure. But maybe you don't have server mount points with 34k+ files in
them? I notice when I increase maxsockbuf, the problem of "disappearing
files" goes away, mostly. Often a "find /mnt" fixes the problem
temporarily, until I unmount and mount again.

> The only thing I can suggest is trying:
> # mount -t newnfs -o nfsv3 <server>:/path /mnt
> and seeing if that performs like the regukar NFSv3 or has
> the perf. issue you see for NFSv4?

Yes, that has the same exact problem. However, if I use:
mount -t nfs <server>:/path /mnt
The problem does indeed go away! But it means I have to mount all the
subdirectories independently, which I'm trying to avoid and is the
reason I went to NFSv4.

> If this does have the perf. issue, then the exp. client
> is most likely the cause and may get better in a few months
> when I bring it up-to-date.

Then that settles it-- the newnfs client seems to be the problem. Just
to recap... These two are *terribly* slow (e.g. a VBR mp3 avg 192kbps
cannot be played without skips):
mount -t newnfs -o nfsv4 server:/path /mnt

mount -t newnfs -o nfsv3 server:/path /mnt

But this one works just fine (H.264 1080p video does not skip):
mount -t nfs server:/path /mnt

I guess I will have to wait for you to bring the v4 client up to date.
Thanks again for all of your contributions and for porting NFSv4 to
FreeBSD!

-- Rick C. Petty

Rick C. Petty

unread,

Sep 3, 2010, 10:37:01 PM9/3/10

to Rick Macklem, Hannes Hauswedell, freebsd...@freebsd.org

On Wed, Sep 01, 2010 at 11:46:30AM -0400, Rick Macklem wrote:
> >
> > I am experiencing similar issues with newnfs:
> >
> > 1) I have two clients that each get around 0.5MiB/s to 2.6MiB/s
> > reading
> > from the NFS4-share on Gbit-Lan
> >
> > 2) Mounting with -t newnfs -o nfsv3 results in no performance gain
> > whatsoever.
> >
> > 3) Mounting with -t nfs results in 58MiB/s ! (Netcat has similar

> > performance) ??? not a hardware/driver issue from my pov

>
> Ok, so it does sound like an issue in the experimental client and
> not NFSv4. For the most part, the read code is the same as
> the regular client, but it hasn't been brought up-to-date
> with recent changes.

Do you (or will you soon) have some patches I/we could test? I'm
willing to try anything to avoid mounting ten or so subdirectories in
each of my mount points.

> One thing you could try is building a kernel without SMP enabled
> and see if that helps? (I only have single core hardware, so I won't
> see any SMP races.) If that helps, I can compare the regular vs
> experimental client for smp locking in the read stuff.

I can try disabling SMP too. Should that really matter, if you're not
even pegging one CPU? The locks shouldn't have *that* much overhead...

-- Rick C. Petty

Rick Macklem

unread,

Sep 4, 2010, 12:25:48 PM9/4/10

to rick-fre...@kiwi-computer.com, Hannes Hauswedell, freebsd...@freebsd.org

>
> Do you (or will you soon) have some patches I/we could test? I'm
> willing to try anything to avoid mounting ten or so subdirectories in
> each of my mount points.
>

Attached is a small patch for the only difference I can spot in the read
code between the regular and experimental NFS client.

I have asked jhb@ to try and do some testing, to see if he can reproduce
it. If he does reproduce it, maybe he can figure out what is going on.
(I don't think I'll have any further patches to try, unless he spots
something.)

> > One thing you could try is building a kernel without SMP enabled
> > and see if that helps? (I only have single core hardware, so I won't
> > see any SMP races.) If that helps, I can compare the regular vs
> > experimental client for smp locking in the read stuff.
>
> I can try disabling SMP too. Should that really matter, if you're not
> even pegging one CPU? The locks shouldn't have *that* much overhead...
>
> -- Rick C. Petty
>

If running UMP fixes the problem, it is most likely a missing lock that
allows a race to put things in a weird state. But for these things, it
is often something I'd never expect that turns out to be the culprit.

rick

nfsclbio.patch

Rick Macklem

unread,

Sep 4, 2010, 12:56:40 PM9/4/10

to rick-fre...@kiwi-computer.com, Hannes Hauswedell, freebsd...@freebsd.org

----- Original Message -----
> On Wed, Sep 01, 2010 at 11:46:30AM -0400, Rick Macklem wrote:
> > >
> > > I am experiencing similar issues with newnfs:
> > >
> > > 1) I have two clients that each get around 0.5MiB/s to 2.6MiB/s
> > > reading
> > > from the NFS4-share on Gbit-Lan
> > >
> > > 2) Mounting with -t newnfs -o nfsv3 results in no performance gain
> > > whatsoever.
> > >
> > > 3) Mounting with -t nfs results in 58MiB/s ! (Netcat has similar
> > > performance) ??? not a hardware/driver issue from my pov
> >
> > Ok, so it does sound like an issue in the experimental client and
> > not NFSv4. For the most part, the read code is the same as
> > the regular client, but it hasn't been brought up-to-date
> > with recent changes.
>
> Do you (or will you soon) have some patches I/we could test? I'm
> willing to try anything to avoid mounting ten or so subdirectories in
> each of my mount points.
>

One other thing you could do is run this in a loop while you have a
slow read running. The client threads must be blocked somewhere a
lot if the read rate is so slow. (Then take a look at "xxx" and please
email it to me too.)

ps axHl >> xxx
sleep 1

rick

Rick Macklem

unread,

Sep 12, 2010, 11:45:53 AM9/12/10

to rick-fre...@kiwi-computer.com, Hannes Hauswedell, freebsd...@freebsd.org

> On Wed, Sep 01, 2010 at 11:46:30AM -0400, Rick Macklem wrote:
> > >
> > > I am experiencing similar issues with newnfs:
> > >
> > > 1) I have two clients that each get around 0.5MiB/s to 2.6MiB/s
> > > reading
> > > from the NFS4-share on Gbit-Lan
> > >
> > > 2) Mounting with -t newnfs -o nfsv3 results in no performance gain
> > > whatsoever.
> > >
> > > 3) Mounting with -t nfs results in 58MiB/s ! (Netcat has similar
> > > performance) ??? not a hardware/driver issue from my pov
> >
> > Ok, so it does sound like an issue in the experimental client and
> > not NFSv4. For the most part, the read code is the same as
> > the regular client, but it hasn't been brought up-to-date
> > with recent changes.
>
> Do you (or will you soon) have some patches I/we could test? I'm
> willing to try anything to avoid mounting ten or so subdirectories in
> each of my mount points.
>
> > One thing you could try is building a kernel without SMP enabled
> > and see if that helps? (I only have single core hardware, so I won't
> > see any SMP races.) If that helps, I can compare the regular vs
> > experimental client for smp locking in the read stuff.
>
> I can try disabling SMP too. Should that really matter, if you're not
> even pegging one CPU? The locks shouldn't have *that* much overhead...
>
> -- Rick C. Petty
>

Just fyi, I asked folks to run read tests on the clients (over on
freebsd-fs), Sofar, I've only gotten one response, but they didn't
see the problem you are (they did see a factor of 2 slower, but it
is still 50Mbytes/sec). Maybe you can take a look at their email
and compare his client with yours? His message is:
http://docs.FreeBSD.org/cgi/mid.cgi?01NRSE7GZJEC0022AD

Btw, if anyone who didn't see the posting on freebsd-fs and would
like to run a quick test, it would be appreciated.
Bascially do both kinds of mount using a FreeBSD8.1 or later client
and then read a greater than 100Mbyte file with dd.

# mount -t nfs -o nfsv3 <server>:/path /<mnt-path>
- cd anywhere in mount that has > 100Mbyte file
# dd if=<file> of=/dev/null bs=1m
# umount /<mnt-path>

Then repeat with
# mount -t newnfs -o nfsv3 <server>:/path /<mnt-path>

and post the results along with the client machine's info
(machine arch/# of cores/memory/net interface used for NFS traffic).

Thanks in advance to anyone who runs the test, rick

Oliver Fromme

unread,

Sep 13, 2010, 11:05:24 AM9/13/10

to freebsd...@freebsd.org, freeb...@freebsd.org, rmac...@uoguelph.ca

Rick Macklem wrote:
> Btw, if anyone who didn't see the posting on freebsd-fs and would
> like to run a quick test, it would be appreciated.
> Bascially do both kinds of mount using a FreeBSD8.1 or later client
> and then read a greater than 100Mbyte file with dd.
>
> # mount -t nfs -o nfsv3 <server>:/path /<mnt-path>
> - cd anywhere in mount that has > 100Mbyte file
> # dd if=<file> of=/dev/null bs=1m
> # umount /<mnt-path>
>
> Then repeat with
> # mount -t newnfs -o nfsv3 <server>:/path /<mnt-path>
>
> and post the results along with the client machine's info
> (machine arch/# of cores/memory/net interface used for NFS traffic).
>
> Thanks in advance to anyone who runs the test, rick

Ok ...

NFS server:
- FreeBSD 8.1-PRERELEASE-20100620 i386
- intel Atom 330 (1.6 GHz dual-core with HT --> 4-way SMP)
- 4 GB RAM
- re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet>

NFS client:
- FreeBSD 8.1-STABLE-20100908 i386
- AMD Phenom II X6 1055T (2.8 GHz + "Turbo Core", six-core)
- 4 GB RAM
- re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet>

The machines are connected through a Netgear GS108T
gigabit ethernet switch.

I umounted and re-mounted the NFS path after every single
dd(1) command, so the data actually comes from the server
instead of from the local cache. I also made sure that
the file was in the cache on the server, so the server's
disk speed is irrelevant.

Testing with "mount -t nfs":

183649990 bytes transferred in 2.596677 secs (70725002 bytes/sec)
183649990 bytes transferred in 2.578746 secs (71216779 bytes/sec)
183649990 bytes transferred in 2.561857 secs (71686277 bytes/sec)
183649990 bytes transferred in 2.629028 secs (69854708 bytes/sec)
183649990 bytes transferred in 2.535422 secs (72433702 bytes/sec)

Testing with "mount -t newnfs":

183649990 bytes transferred in 5.361544 secs (34253192 bytes/sec)
183649990 bytes transferred in 5.401471 secs (33999996 bytes/sec)
183649990 bytes transferred in 5.052138 secs (36350946 bytes/sec)
183649990 bytes transferred in 5.311821 secs (34573829 bytes/sec)
183649990 bytes transferred in 5.537337 secs (33165760 bytes/sec)

So, nfs is roughly twice as fast as newnfs, indeed.

Best regards
Oliver

--
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd

"A language that doesn't have everything is actually easier
to program in than some that do."
-- Dennis M. Ritchie

Rick Macklem

unread,

Sep 13, 2010, 11:16:47 AM9/13/10

to freebsd...@freebsd.org, freeb...@freebsd.org, rmac...@uoguelph.ca

Thanks for doing the test. I think I can find out what causes the
factor of 2 someday. What is really weird is that some people see
several orders of magnitude slower (a few Mbytes/sec).

Your case was also useful, because you are using the same net
interface/driver as the original report of a few Mbytes/sec, so it
doesn't appear to be an re problem.

Have a good week, rick

Rick C. Petty

unread,

Sep 13, 2010, 11:40:00 AM9/13/10

to Rick Macklem, freeb...@freebsd.org, freebsd...@freebsd.org

On Mon, Sep 13, 2010 at 11:15:34AM -0400, Rick Macklem wrote:
> >
> > instead of from the local cache. I also made sure that
> > the file was in the cache on the server, so the server's
> > disk speed is irrelevant.
> >

<snip>

> > So, nfs is roughly twice as fast as newnfs, indeed.

Hmm, I have the same network switch as Oliver, and I wasn't caching the
file on the server before. When I cache the file on the server, I get
about 1 MiB/s faster throughput, so that doesn't seem to make the
difference to me (but with higher throughputs, I would imagine it would).

> Thanks for doing the test. I think I can find out what causes the
> factor of 2 someday. What is really weird is that some people see
> several orders of magnitude slower (a few Mbytes/sec).
>
> Your case was also useful, because you are using the same net
> interface/driver as the original report of a few Mbytes/sec, so it
> doesn't appear to be an re problem.

I believe I said something to that effect. :-P

The problem I have is that the magnitude of throughput varies randomly.
Sometimes I can repeat the test and see 3-4 MB/s. Then my server's
motherboard failed last week so I swapped things around and now I have 9-10
MB/s on the same client (but using 100Mbit interface instead of gigabit, so
those speeds make sense).

One thing I noticed is the lag seems to have disappeared after the reboots.
Another thing I had to change was that I was using an NFSv3 mount for /home
(with the v3 client, not the experimental v3/v4 client) and now I'm using
NFSv4 mounts exclusively. Too much hardware changed because of that board
failing (AHCI was randomly dropping disks, and it got to the point that it
wouldn't pick up drives after a cold start and then the board failed to
POST 11 of 12 times), so I haven't been able to reliably reproduce any
problems. I also had to reboot the "bad" client because of the broken
NFSv3 mountpoints, and the server was auto-upgraded to a newer 8.1-stable
(I often run "make buildworld kernel" regularly, so any reboots will
automatically have a newer kernel).

There's definite evidence that the newnfs mounts are slower than plain nfs,
and sometimes orders of magnitude slower (as others have shown). But the
old nfs is so broken in other ways that I'd prefer slower yet more stable.
Thanks again for all your help, Rick!

-- Rick C. Petty

Goran Lowkrantz

unread,

Sep 13, 2010, 12:56:41 PM9/13/10

to Rick Macklem, rick-fre...@kiwi-computer.com, Hannes Hauswedell, freebsd...@freebsd.org

--On September 12, 2010 11:44:40 -0400 Rick Macklem <rmac...@uoguelph.ca>
wrote:

>> On Wed, Sep 01, 2010 at 11:46:30AM -0400, Rick Macklem wrote:

<snip>

My results seems to confirm a factor of two (or 1.5) but it's stable:
new nfs nfsv4
3999969792 bytes transferred in 71.932692 secs (55607119 bytes/sec)
3999969792 bytes transferred in 66.806218 secs (59874214 bytes/sec)
3999969792 bytes transferred in 65.127972 secs (61417079 bytes/sec)
3999969792 bytes transferred in 64.493585 secs (62021204 bytes/sec)

old nfs nfsv3
3999969792 bytes transferred in 42.290365 secs (94583478 bytes/sec)
3999969792 bytes transferred in 42.135682 secs (94930700 bytes/sec)
3999969792 bytes transferred in 41.404841 secs (96606332 bytes/sec)
3999969792 bytes transferred in 41.461210 secs (96474989 bytes/sec)

new nfs nfsv3
3999969792 bytes transferred in 63.172592 secs (63318121 bytes/sec)
3999969792 bytes transferred in 64.149324 secs (62354044 bytes/sec)
3999969792 bytes transferred in 62.447537 secs (64053284 bytes/sec)
3999969792 bytes transferred in 57.203868 secs (69924813 bytes/sec)

Client:
FreeBSD 8.1-STABLE #200: Sun Sep 12 12:03:25 CEST 2010
ro...@skade.glz.hidden-powers.com:/usr/obj/usr/src/sys/GENERIC amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ (2600.26-MHz K8-class
CPU)
Origin = "AuthenticAMD" Id = 0x60fb2 Family = f Model = 6b Stepping =
2

em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
ether 00:1b:21:2e:7d:3c
inet 10.255.253.3 netmask 0xffffff00 broadcast 10.255.253.255

media: Ethernet autoselect (1000baseT <full-duplex>)
status: active

Server:
FreeBSD 8.1-STABLE #74: Sun Sep 5 18:47:12 CEST 2010
ro...@midgard.glz.hidden-powers.com:/usr/obj/usr/src/sys/SERVER amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Phenom(tm) 9550 Quad-Core Processor (2210.08-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0x100f23 Family = 10 Model = 2 Stepping
= 3

re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0
mtu 1500
options=3898<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC>
ether 00:1f:d0:59:d8:e2
inet 10.255.253.1 netmask 0xffffff00 broadcast 10.255.253.255

media: Ethernet autoselect (1000baseT <full-duplex>)
status: active

Network:
Systems connected via two Netgear GS108T, one system to each switch, the
switches connected via TP cable.

Patchar:
stable-8-v15.patch
zfs_metaslab_v2.patch
zfs_abe_stat_rrwlock.patch
arc.c.9.patch
r211970.patch

Cheers,
Göran

---
"There is hopeful symbolism in the fact that flags do not wave in a vacuum."
-- Arthur C. Clarke