Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Unstable NFS on recent CURRENT

115 views
Skip to first unread message

Ronald Klop

unread,
Mar 7, 2016, 9:32:11 AM3/7/16
to
On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather <pa...@gromit.dlib.vt.edu>
wrote:

> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
> having trouble with NFS. I have been doing a buildworld and buildkernel
> with /usr/src and /usr/obj mounted via NFS. Recently, this process has
> resulted in the buildworld failing at some point, with a variety of
> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR"
> of /usr/src doesn't manage to complete. It errors out thus:
>
> =====
> [[...]]
> total 0
> ls: ./.svn/pristine/fe: Permission denied
>
> ./.svn/pristine/ff:
> total 0
> ls: ./.svn/pristine/ff: Permission denied
> ls: fts_read: Permission denied
> =====
>
> On the console, I get the following:
>
> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
> MIDDLEWARE)
>
>
> I am using a FreeBSD/amd64 10.3-PRERELEASE (r296412) as the NFS server.
> On the BeagleBone Black, I am mounting /usr/src and /usr/obj via
> /etc/fstab as follows:
>
> chumby.chumby.lan:/build/src/head /usr/src nfs rw,nfsv4 0 0
> chumby.chumby.lan:/build/obj/bbb /usr/obj nfs rw,nfsv4 0 0
>
>
> /build/src/head and /build/obj/bbb are both ZFS file systems.
>
> Has anyone else encountered this? It has only started happening
> recently for me, it seems. Prior to this, I have been able to do a
> buildworld and buildkernel successfully over NFS.
>
> Cheers,
>
> Paul.

I cc this to freebsd-fs for you.

Ronald.
_______________________________________________
freeb...@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Rick Macklem

unread,
Mar 7, 2016, 9:56:11 PM3/7/16
to
Is it possible that a ZFS file system has gotten to the point where the
i-node# exceeds 32bits? ZFS does support more than 32bits for i-node#s,
but FreeBSD does not (it truncates to the low order 32bits).
I know diddly about ZFS, so I don't know if you actually have to create
more than 4billion files to get the i-node# to exceed 32bits or ???

There has been work done on making ino_t 64bits, but it hasn't made it
into FreeBSD-current and I have no idea when it might.

If you could try a build on newly created file systems (or UFS ones
instead of ZFS), that would tell you if the above might be the problem.

rick

Paul Mather

unread,
Mar 8, 2016, 9:01:52 AM3/8/16
to
I don't think I have that big of a ZFS pool (it's 2 TB). :-)

It doesn't seem that there are excessive numbers of inodes, and the counts match up between the NFS client and server sides.

In the information below, chumby is the NFS server and beaglebone the client:

pmather@beaglebone:~ % mount
/dev/mmcsd0s2a on / (ufs, local, noatime, soft-updates)
devfs on /dev (devfs, local)
/dev/mmcsd0s1 on /boot/msdos (msdosfs, local, noatime)
tmpfs on /tmp (tmpfs, local)
tmpfs on /var/log (tmpfs, local)
tmpfs on /var/tmp (tmpfs, local)
chumby.chumby.lan:/build/src/head on /usr/src (nfs, nfsv4acls)
chumby.chumby.lan:/build/obj/bbb on /usr/obj (nfs, nfsv4acls)
pmather@beaglebone:~ % df -i /usr/src /usr/obj
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
chumby.chumby.lan:/build/src/head 2097152 1344484 752668 64% 147835 1505336 9% /usr/src
chumby.chumby.lan:/build/obj/bbb 530875884 1949364 528926520 0% 70814 1057853040 0% /usr/obj


paul@chumby:/home/paul> df -i /build/src/head /build/obj/bbb
Filesystem 1K-blocks Used Avail Capacity iused ifree %iused Mounted on
zroot/SHARED/build/src/head 2097152 1344484 752668 64% 147835 1505336 9% /build/src/head
zroot/SHARED/build/obj/bbb 530876268 1949364 528926904 0% 70814 1057853808 0% /build/obj/bbb


On the NFS client system, these are the only NFS-related settings I have in /etc/rc.conf:

nfsuserd_enable="YES"
nfscbd_enable="YES"


Would you recommend I try it with nfscbd_enable="NO"?

I will try NFS from other clients to see whether it's just this FreeBSD/arm system that's having problems.

Cheers,

Paul.


>
> rick
>
>>> Has anyone else encountered this? It has only started happening
>>> recently for me, it seems. Prior to this, I have been able to do a
>>> buildworld and buildkernel successfully over NFS.
>>>
>>> Cheers,
>>>
>>> Paul.
>>
>> I cc this to freebsd-fs for you.
>>
>> Ronald.
>> _______________________________________________
>> freeb...@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"
>>
> _______________________________________________
> freeb...@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm...@freebsd.org"

Rick Macklem

unread,
Mar 8, 2016, 7:21:20 PM3/8/16
to
Paul Mather wrote:
> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmac...@uoguelph.ca> wrote:
>
> > Paul Mather (forwarded by Ronald Klop) wrote:
> >> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather <pa...@gromit.dlib.vt.edu>
> >> wrote:
> >>
> >>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
> >>> having trouble with NFS. I have been doing a buildworld and buildkernel
> >>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has
> >>> resulted in the buildworld failing at some point, with a variety of
> >>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR"
> >>> of /usr/src doesn't manage to complete. It errors out thus:
> >>>
> >>> =====
> >>> [[...]]
> >>> total 0
> >>> ls: ./.svn/pristine/fe: Permission denied
> >>>
> >>> ./.svn/pristine/ff:
> >>> total 0
> >>> ls: ./.svn/pristine/ff: Permission denied
> >>> ls: fts_read: Permission denied
> >>> =====
> >>>
> >>> On the console, I get the following:
> >>>
> >>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
> >>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
> >>> MIDDLEWARE)
> >>>
I have no idea how the fileid (i-node# in old terminology) could change.
I haven't heard anyone else reporting this, so I can't explain it (except
for the exceeds 32bits case, which you note below isn't likely).
Unless you have enabled delegations, nfscbd isn't needed, so you can try this.
However, I doubt it will make any difference. (Callbacks are only used for
Delegations for NFSv4.0. As such, it doesn't matter if they are working unless
delegations are enabled.)

> I will try NFS from other clients to see whether it's just this FreeBSD/arm
> system that's having problems.
>
There probably are few people using NFSv4 on arm (you may be the only one),
so I wouldn't be surprised if there is some size/alignment bug in the client
that arm finds.

Trying non-arm clients may certainly be useful. You could also try an NFSv3 mount,
since I would think others are using NFSv3 on arm.

Good luck with it, rick

Rick Macklem

unread,
Mar 8, 2016, 7:49:48 PM3/8/16
to
Paul Mather wrote:
> On Mar 7, 2016, at 9:55 PM, Rick Macklem <rmac...@uoguelph.ca> wrote:
>
> > Paul Mather (forwarded by Ronald Klop) wrote:
> >> On Sun, 06 Mar 2016 02:57:03 +0100, Paul Mather <pa...@gromit.dlib.vt.edu>
> >> wrote:
> >>
> >>> On my BeagleBone Black running 11-CURRENT (r296162) lately I have been
> >>> having trouble with NFS. I have been doing a buildworld and buildkernel
> >>> with /usr/src and /usr/obj mounted via NFS. Recently, this process has
> >>> resulted in the buildworld failing at some point, with a variety of
> >>> errors (Segmentation fault; Permission denied; etc.). Even a "ls -alR"
> >>> of /usr/src doesn't manage to complete. It errors out thus:
> >>>
> >>> =====
> >>> [[...]]
> >>> total 0
> >>> ls: ./.svn/pristine/fe: Permission denied
> >>>
> >>> ./.svn/pristine/ff:
> >>> total 0
> >>> ls: ./.svn/pristine/ff: Permission denied
> >>> ls: fts_read: Permission denied
> >>> =====
> >>>
> >>> On the console, I get the following:
> >>>
> >>> newnfs: server 'chumby.chumby.lan' error: fileid changed. fsid
> >>> 94790777:a4385de: expected fileid 0x4, got 0x2. (BROKEN NFS SERVER OR
> >>> MIDDLEWARE)
> >>>
Oh, I had forgotten this. Here's the comment related to this error.
(about line#445 in sys/fs/nfsclient/nfs_clport.c):
446 * BROKEN NFS SERVER OR MIDDLEWARE
447 *
448 * Certain NFS servers (certain old proprietary filers ca.
449 * 2006) or broken middleboxes (e.g. WAN accelerator products)
450 * will respond to GETATTR requests with results for a
451 * different fileid.
452 *
453 * The WAN accelerator we've observed not only serves stale
454 * cache results for a given file, it also occasionally serves
455 * results for wholly different files. This causes surprising
456 * problems; for example the cached size attribute of a file
457 * may truncate down and then back up, resulting in zero
458 * regions in file contents read by applications. We observed
459 * this reliably with Clang and .c files during parallel build.
460 * A pcap revealed packet fragmentation and GETATTR RPC
461 * responses with wholly wrong fileids.

If you can connect the client->server with a simple switch (or just an RJ45 cable), it
might be worth testing that way. (I don't recall the name of the middleware product, but
I think it was shipped by one of the major switch vendors. I also don't know if the product
supports NFSv4?)

rick
> I will try NFS from other clients to see whether it's just this FreeBSD/arm
> system that's having problems.
>

Paul Mather

unread,
Mar 9, 2016, 11:13:05 AM3/9/16
to
Currently, the client is connected to the server via a dumb gigabit switch, so it is already fairly direct.

As for the above error, it appeared on the console only once. (Sorry if I made it sound like it appears every time.)

I just tried another buildworld attempt via NFS and it failed again. This time, I get this on the BeagleBone Black console:

nfs_getpages: error 13
vm_fault: pager read error, pid 5401 (install)


The other thing I have noticed is that if I induce heavy load on the NFS server---e.g., by starting a Poudriere bulk build---then that provokes the client to crash much more readily. For example, I started a NFS buildworld on the BeagleBone Black, and it seemed to be chugging along nicely. The moment I kicked off a Poudriere build update of my packages on the NFS server, it crashed the buildworld on the NFS client.

I have had problems with swap on FreeBSD/arm before. Swapping to a file does not appear to work for me. As a result, I switched to swapping to a partition on the SD card. Maybe this is unreliable, too?

Cheers,

Paul.

Rick Macklem

unread,
Mar 9, 2016, 9:00:54 PM3/9/16
to
13 is EACCES and could be caused by what I mention below. (Any mount of a file
system on the server unless "-S" is specified as a flag for mountd.)

>
> The other thing I have noticed is that if I induce heavy load on the NFS
> server---e.g., by starting a Poudriere bulk build---then that provokes the
> client to crash much more readily. For example, I started a NFS buildworld
> on the BeagleBone Black, and it seemed to be chugging along nicely. The
> moment I kicked off a Poudriere build update of my packages on the NFS
> server, it crashed the buildworld on the NFS client.
>
Try adding "-S" to mountd_flags on the server. Any time file systems are mounted
(and Poudriere likes to do that, I am told), mount sends a SIGHUP to mountd to
reload /etc/exports. When /etc/exports are being reloaded, there will be access
errors for mounts (that are temporarily not exported) unless you specify "-S"
(which makes mountd suspend the nfsd threads during the reload of /etc/exports).

rick

Paul Mather

unread,
Mar 10, 2016, 9:29:46 AM3/10/16
to
Bingo! I think we may have a winner. I added that flag to mountd_flags on the server and the "instability" appears to have gone away.

It may be that all along the NFS problems on the client just coincided with Poudriere runs on the server. I build custom packages for my local machines using Poudriere so I use it quite a lot. Maybe the Poudriere port should come with a warning at install to those using NFS that it may provoke disruption and suggest the addition of "-S"? (Alternatively, maybe "-S" could become a default for mountd_flags? Is there a downside from using it that means making it a default option is unsuitable?)

Anyway, many, many thanks for all the help, Rick. I'll keep monitoring my BeagleBone Black, but it looks for now that this has solved the NFS "instability."

Rick Macklem

unread,
Mar 10, 2016, 8:08:40 PM3/10/16
to
Well, the first time I proposed "-S" the collective felt it wasn't the appropriate
solution to the "export reload" problem. The second time, the "collective" agreed
that it was ok as a non-default option. (Part of this story was an alternative to
mountd called nfse which did update exports atomically, but it never made it into
FreeBSD.) The only downside to making it a default is that it does change behaviour
and some might consider that a POLA violation. Others would consider it just a bug fix.
There was one report of long delays before exports got updated on a very busy server.
(I have a one line patch that fixes this, but that won't be committed into FreeBSD-current
until April.)

Now that "-S" has been in FreeBSD for a couple of years, I am planning on asking
the "collective" (I usually post these kind of things on freebsd-fs@) to make it the
default in FreeBSD-current, because this problem seems to crop up fairly frequently.
I will probably post w.r.t. this in April when I can again to svn commits.

I only recently found out the Poudriere does mounts and causes this problem.
I may also commit a man page update (which can be MFC'd) that mentions if you
are using Poudriere you want this flag.
Having the same thing mentioned in the Poudriere port install might be nice, too.

Thanks for testing this, rick

Marc Goroff

unread,
Mar 10, 2016, 8:51:38 PM3/10/16
to
I was amazed when I discovered the "-S" option last year and even more
amazed that it wasn't the default. The lack of -S caused us enormous
problems in our production ZFS environment last year and nearly caused
us to abandon FreeBSD altogether. Every time we'd provision a new ZFS
file system from the zpool, all our NFS clients would start throwing
alerts due to I/O errors! I'm unclear why it would be considered
acceptable to have a reload of an exports file cause spurious I/O errors
for NFS clients. I'd think such incorrect behavior would be clearly
considered a blatant POLA violation. Causing a delay and returning
correct data is far superior to incorrectly returning access errors that
screw up well behaved applications on NFS clients. NFS is capable of
handling network delays. Why choose instead to return invalid I/O errors?

IMHO, -S should be the default. I'm unable to think of any good reason
for it to be optional and the lack of it as the default behavior has
clearly caused lots of needless pain and suffering in the user community.

Marc

Julian Elischer

unread,
Mar 13, 2016, 7:31:59 PM3/13/16
to
I agree with this... ZFS has change things..
with ZFS fielsystems coming and going, we need this

Rick, if you send me the a patch, I'm looking for a commit to stave
off the grim (commit bit) reaper :-)
0 new messages