NFSD high CPU usage

Adam Guimont

unread,

Apr 1, 2015, 4:06:21 PM4/1/15

to

I have an issue where NFSD will max out the CPU (1200% in this case)
when a client workstation runs out of memory while trying to write via
NFS. What also happens is the TCP Recv-Q fills up and causes connection
timeouts for any other client trying to use the NFS server.

I can reproduce the issue by running stress on a low-end client
workstation. Change into the NFS mounted directory and then use stress
to write via NFS and exhaust the memory, example:

stress --cpu 2 --io 4 --vm 20 --hdd 4

The client workstation will eventually run out of memory trying to write
into the NFS directory, fill the TCP Recv-Q on the NFS server, and then
NFSD will max out the CPU.

The actual client workstations (~50) are not running stress when this
happens, it's a mixture of EDA tools (simulation and verification).

For what it's worth, this is how I've been monitoring the TCP buffer
queues where "xx.xxx.xx.xxx" is the IP address of the NFS server:

cmdwatch -n1 'netstat -an | grep -e "Proto" -e "tcp4" | grep -e "Proto"
-e "xx.xxx.xx.xxx.2049"'

I have tried several tuning recommendations but it has not solved the
problem.

Has anyone else experienced this and is anyone else able to reproduce it?

---
NFS server specs:

OS = FreeBSD 10.0-RELEASE
CPU = E5-1650 v3
Memory = 96GB
Disks = 24x ST6000NM0034 in 4x raidz2
HBA = LSI SAS 9300-8i
NIC = Intel 10Gb X540-T2
---
/boot/loader.conf

autoboot_delay="3"
geom_mirror_load="YES"
mpslsi3_load="YES"
cc_htcp_load="YES"
---
/etc/rc.conf

hostname="***"
ifconfig_ix0="inet *** netmask 255.255.248.0 -tso -vlanhwtso"
defaultrouter="***"
sshd_enable="YES"
ntpd_enable="YES"
zfs_enable="YES"
sendmail_enable="NO"
nfs_server_enable="YES"
nfs_server_flags="-h *** -t -n 128"
nfs_client_enable="YES"
rpcbind_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
samba_enable="YES"
atop_enable="YES"
atop_interval="5"
zabbix_agentd_enable="YES"
---
/etc/sysctl.conf

vfs.nfsd.server_min_nfsvers=3
vfs.nfsd.cachetcp=0
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendspace=1048576
net.inet.tcp.recvspace=1048576
net.inet.tcp.sendbuf_inc=32768
net.inet.tcp.recvbuf_inc=65536
net.inet.tcp.keepidle=10000
net.inet.tcp.keepintvl=2500
net.inet.tcp.always_keepalive=1
net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.cc.htcp.adaptive_backoff=1
net.inet.tcp.cc.htcp.rtt_scaling=1
net.inet.tcp.sack.enable=0
kern.ipc.soacceptqueue=1024
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=1300
net.inet.tcp.tso=0
---
Client workstations:

OS = CentOS 6.6 x64
Mount options from `cat /proc/mounts` =
rw,nosuid,noatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=***,mountvers=3,mountport=916,mountproto=udp,local_lock=none,addr=***
---

Regards,

Adam Guimont

_______________________________________________
freeb...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-fs
To unsubscribe, send any mail to "freebsd-fs-...@freebsd.org"

Message has been deleted

Rick Macklem

unread,

Apr 1, 2015, 10:28:25 PM4/1/15

to

I can think of two explanations for this.
1 - The server nfsd threads get confused when the TCP recv Q fills
and start looping around.
OR
2 - The client is sending massive #s of RPCs (or crap that is
incomplete RPCs).

To get a better idea w.r.t. what is going on, I'd suggest that
you capture packets (for a relatively short period) when the
server is 100% CPU busy.
# tcpdump -s 0 -w out.pcap host <nfs-client>
- run on the server should do it.
Then look at out.pcap in wireshark and see what the packets
look like. (wireshark understands NFS, whereas tcpdump doesn't)
If #1, I'd guess very little traffic (maybe TCP layer stuff),
if #2, I'd guess you'll see a lot of RPC requests or garbage
that isn't a valid request. (This latter case would suggest a
CentOS problem.)

If you capture the packets but can't look at them in wireshark,
you could email me the packet capture as an attachment and I
can look at it after Apr. 10, when I get home.

rick

Adam Guimont

unread,

Apr 2, 2015, 11:14:14 AM4/2/15

to

Rick Romero wrote:
> Does your ZFS pool have log devices?
> How does gstat -d look?
>
> If the drives are busy, try adding
> vfs.nfsd.async: 0

No log devices but the disks are not busy when this happens.

I have an atop snapshot from the last time it happened:
http://pastebin.com/raw.php?i=LQjbKTXR

Message has been deleted

Adam Guimont

unread,

Apr 2, 2015, 3:26:04 PM4/2/15

to

Rick Romero wrote:
> Are the disks busy before it happens? I'm far from an expert, but when
> running ZFS with NFS, I've had a lot of issues. My final resolutions were
> to turn ASYNC off and have log devices and I even have SSD volumes now.
> Otherwise under load the NFS server gets hung up. It never seemed to happen
> on UFS, but due to the number of small files I have, ZFS provides the best
> backup functionality. I'm now trying to move all functions from NFS (to
> more TCP client/server).
>
> You have different info than I've gathered, and it might be because of
> usage. I actively use the system that I've seen NFS dump on, so I see the
> slowness beginning. Once NFS dies, the drive load goes back to normal. I
> wonder, if maybe you are just managing a system for others, and you don't
> see it until after the fact? Just a thought based on my limited
> experience.

No, the disks are not busy before this happens.

I use the server every day and keep a pretty close eye on it. The disks
can get busy but it doesn't spike nfsd that much and usually doesn't
last more than a few seconds.

When this particular issue happens with the nfsd CPU spike, it will last
until the job running on the client workstation gets killed or when the
client workstation is rebooted. After that it takes a few seconds for
the TCP buffers to flush out and allow other clients to connect again.

Adam Guimont

unread,

Apr 3, 2015, 5:33:49 PM4/3/15

to

Rick Macklem wrote:
> I can think of two explanations for this.
> 1 - The server nfsd threads get confused when the TCP recv Q fills
> and start looping around.
> OR
> 2 - The client is sending massive #s of RPCs (or crap that is
> incomplete RPCs).
>
> To get a better idea w.r.t. what is going on, I'd suggest that
> you capture packets (for a relatively short period) when the
> server is 100% CPU busy.
> # tcpdump -s 0 -w out.pcap host <nfs-client>
> - run on the server should do it.
> Then look at out.pcap in wireshark and see what the packets
> look like. (wireshark understands NFS, whereas tcpdump doesn't)
> If #1, I'd guess very little traffic (maybe TCP layer stuff),
> if #2, I'd guess you'll see a lot of RPC requests or garbage
> that isn't a valid request. (This latter case would suggest a
> CentOS problem.)
>
> If you capture the packets but can't look at them in wireshark,
> you could email me the packet capture as an attachment and I
> can look at it after Apr. 10, when I get home.
>
> rick
>

Thanks Rick,

I was able to capture this today while it was happening. The capture is
for about 100 seconds. I took a look at it in wireshark and to me it
appears like the #2 situation you were describing.

If you would like to confirm that I've uploaded the pcap file here:

https://www.dropbox.com/s/pdhwj5z5tz7iwou/out.pcap.20150403

I will continue running some tests and trying to gather as much data as
I can.

Rick Macklem

unread,

Apr 14, 2015, 8:36:45 PM4/14/15

to

Well, I took a look, but I'll admit I couldn't figure out much from it.
It appears that the TCP connection is in a pretty degraded state.
- FreeBSD is sending a whole bunch of TCP segments with 164bytes of
data (that appears to be the same for each one, but I didn't look at
them closely). Each of them has a Window size == 0 (PUSH + ACK).
--> Linux responds with an ACK and no data (which makes sense because
of the 0 length Window)
eventually FreeBSD does open up the Window after something like 1200 of the
above TCP segments.
--> It is possible that all these segments are RPC replies to similar
requests, but Wireshark just think they're all RPC continuations
and doesn't recognize an RPC message. (I couldn't be bothered to
try and decode one manually.)

One thing I see is that the Linux window size is 24576. If TSO is
enabled in FreeBSD's net device, you might try disabling TSO, in
case it is sending too much or somehow getting confused.

Other than that, I think it would take a packet capture just when
the trouble starts to try and figure out how things get messed up.

I'm not good enough w.r.t. TCP to have any idea what might be
happening. Maybe someone conversant with TCP can look at the trace?

rick