Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

NFS - only one client at a time can read files

156 views
Skip to first unread message

David Brown

unread,
Sep 20, 2013, 10:07:05 AM9/20/13
to
I have a strange problem with NFS.

I have NFS serving enabled on my Fedora 14 workstation, exporting a
directory with these options:

(ro,no_root_squash,sync,no_subtree_check)


I have an embedded Linux card that is getting its kernel and rootfs from
this export over NFS. The card is then copying a subdirectory of this
export onto a flash-mounted file system using rsync (roughly as "rsync
-av /unpacked/ /mnt/" ).

When I run with one card, this works fine.

When I have multiple cards connected, each one gets its kernel and
mounts its rootfs fine, but it seems that only one client card can read
at a time. If I watch the progress of the rsyncs, I can see one card
will run for a bit, then stop and complain about nfs timeouts. Another
card will run for a bit, before it too stops with a timeout. This goes
back and forth - at any given time, only one client is successfully reading.


Any ideas as to what might be wrong, or what I can check, would be
appreciated.


David

unruh

unread,
Sep 20, 2013, 11:15:01 AM9/20/13
to
Disks have only one read head. It cannot be in two places at once.

>
>
> David

Chris Davies

unread,
Sep 20, 2013, 11:54:20 AM9/20/13
to
David Brown <da...@westcontrol.removethisbit.com> wrote:
> I have a strange problem with NFS.

> If I watch the progress of the rsyncs, I can see one card will run for
> a bit, then stop and complain about nfs timeouts. Another card will
> run for a bit, before it too stops with a timeout. This goes back and
> forth - at any given time, only one client is successfully reading.

This sounds like you don't have anywhere near enough rpc/nfsd daemons
on your NFS server.

On my Debian box there's a file /etc/default/nfs-kernel-server that
defines the number of kernel nfsd "processes" to start at boot time. (I
don't know where your equivalent configuration file will live.) The
default on my system is 8, but you probably want to increase it to 32
or even 64.

To test the theory, count the number of nfsd processes already running
ps -ef | grep -w '[n]fsd' | wc -l

And then increase it, for example from 8 to 32
nfsd 32

If this works, you can configure it for boot-time.
Chris

Tauno Voipio

unread,
Sep 20, 2013, 1:37:13 PM9/20/13
to
This feels like rsync is doing exclusive access to the concerned files,
to prevent shooting at a moving target.

For boot disk copying, an image file and dd may be better.

--

Tauno Voipio

Chris Davies

unread,
Sep 21, 2013, 9:00:57 AM9/21/13
to
Tauno Voipio <tauno....@notused.fi.invalid> wrote:
> This feels like rsync is doing exclusive access to the concerned files,
> to prevent shooting at a moving target.

I've never seen rsync grab exclusive access to files. It could more likely
occur over SMB/CIFS, which provides file locking by default, but not
over NFS.

Chris

Tauno Voipio

unread,
Sep 21, 2013, 11:18:52 AM9/21/13
to
Thanks for correcting. I was too lazy to wade the sources.

--

-Tauno

David Brown

unread,
Sep 23, 2013, 3:28:35 AM9/23/13
to
Disks are not the bottleneck. The whole shared area is small enough to
be in ram cache on the server.


David Brown

unread,
Sep 23, 2013, 3:34:16 AM9/23/13
to
On 21/09/13 15:00, Chris Davies wrote:
> Tauno Voipio <tauno....@notused.fi.invalid> wrote:
>> This feels like rsync is doing exclusive access to the concerned files,
>> to prevent shooting at a moving target.

That would sound right - except that I agree with Chris' point below
that rsync does not lock files in any way. (I've often seen large
rsyncs end with a message saying that some files changed during the
rsync run.)

David Brown

unread,
Sep 23, 2013, 3:44:24 AM9/23/13
to
This sounds like a possible explanation. A quick check shows that I
have 8 nfsd threads running. Rsync almost certainly needs several
connections while it is working, as it runs through the source tree to
see what it should be copying - contention for the nfs connection
threads could be the cause.

I'll have to translate your commands here from "Debian" into "Fedora
14", but now that I know what I am looking for, google can help with the
translation. Later on, this whole thing will run on a debian server -
but at the moment it is prototyping on my (outdated) Fedora desktop.

Additionally, the mere act of talking about the problem has suggested an
alternative solution. I am copying a bunch of data from one computer to
another computer using rsync. Why not just use an rsync server? (The
historical answer is that the copy was originally a "cp -a" rather than
"rsync -a".)

Thanks for the help,

David




David Brown

unread,
Sep 23, 2013, 5:51:45 AM9/23/13
to
I've now changed the thread count in /etc/sysconfig/nfs to 64 and
re-starting the nfs server - it made no difference that I could see, but
my testing was done with a copy to tmpfs on the clients rather than to
the NAND filesystem (since that takes 20 seconds rather than 12
minutes). So I am not convinced that the nfs threads are the whole
answer, but can't yet rule them out. And it should certainly do no harm
to leave them at 64.

In the end, I am copying a compressed tarball from the server onto the
client's tmpfs with a simple "cp" on NFS - this takes about 3 seconds.
It will not matter if it takes x * 3 seconds for "x" cards in parallel.
Unpacking these tarballs into the NAND is now an entirely local
operation on the cards, and will therefore be free from any issues with
the server or network. It is also faster even for one card.

Other than that, I will also do testing with the network setup here.
The cards are currently running across our main LAN, which is well
over-due for a re-organisation after many years of "organic" growth.

But I am happy for now with the tarball copying solution.

Chris Davies

unread,
Sep 27, 2013, 5:40:32 AM9/27/13
to
David Brown <da...@westcontrol.removethisbit.com> wrote:
> Additionally, the mere act of talking about the problem has suggested an
> alternative solution. I am copying a bunch of data from one computer to
> another computer using rsync. Why not just use an rsync server? (The
> historical answer is that the copy was originally a "cp -a" rather than
> "rsync -a".)

If your file transfer is network bound then rsync as two separate
processes (client & server) should run faster than a single process
accessing a remote filesystem. If the bottleneck is elsewhere it won't
help, as single-process rsync falls back to a basic copy.

Chris

David Brown

unread,
Oct 10, 2013, 10:28:04 AM10/10/13
to
I figured out my problem - I'm noting it here in case anyone ever reads
these as archives.

It turned out that there was a configuration fault in the rootfs I had
mounted, leading to all the cards getting the same fixed IP address
shortly after root was mounted. Different cards therefore got contact
with the server according to who answered the ARP requests first - I'm
surprised everything worked in the end. The fix was obviously quite
simple once I had found the problem (thanks to wireshark and a managed
switch with port mirroring).

0 new messages