I've now changed the thread count in /etc/sysconfig/nfs to 64 and
re-starting the nfs server - it made no difference that I could see, but
my testing was done with a copy to tmpfs on the clients rather than to
the NAND filesystem (since that takes 20 seconds rather than 12
minutes). So I am not convinced that the nfs threads are the whole
answer, but can't yet rule them out. And it should certainly do no harm
to leave them at 64.
In the end, I am copying a compressed tarball from the server onto the
client's tmpfs with a simple "cp" on NFS - this takes about 3 seconds.
It will not matter if it takes x * 3 seconds for "x" cards in parallel.
Unpacking these tarballs into the NAND is now an entirely local
operation on the cards, and will therefore be free from any issues with
the server or network. It is also faster even for one card.
Other than that, I will also do testing with the network setup here.
The cards are currently running across our main LAN, which is well
over-due for a re-organisation after many years of "organic" growth.
But I am happy for now with the tarball copying solution.