I recently upgraded to 2.0.0 in production and have had a couple
instances (always in the middle of the night, of course) when our
secondary node will spike to 100% CPU on all cores and start
increasing vsize like crazy. Load was 50-60. Queries all start timing
out. This time, I had some logging set up.
I have 3 nodes (1 master, 1 arbiter, and 1 slave) in a replica set. I
send lots of reads to the slave. Journaling is off on the slave but
enabled on the master.
Here is mongostat:
http://pastebin.com/ZX75Gfem
The only interesting thing is that around the same time CPU spiked to
100%, vsize started increasing rapidly while mapped stayed the same.
Here is db.serverStatus():
http://pastebin.com/qR83Di0c
The queue is backed up because I had already issued the restart
request, so you can ignore that. Before I sent the HUP the read and
write queue were at 0.
There wasn't a lot of interesting stuff in mongod.log except for lots
of these:
Wed Sep 21 11:31:43 [conn3927] warning: virtual size (17783MB) -
mapped size (12443MB) is large. could indicate a memory leak
and a lot of these:
Wed Sep 21 11:26:49 [conn2133184] serverStatus was very slow: { after
basic: 0, middle of mem: 1120, after mem: 1120, after connections:
1120, after extra info: 1120, after counters: 1120, after repl: 1120,
after asserts: 1120, after dur: 1120, at end: 1120 }
Here is vmstat:
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us
sy id wa st
17 0 72324 382292 27216 9811836 0 0 346 166 20 8 21
2 66 12 0
15 0 72324 381192 27216 9811864 0 0 0 0 1717 2202 99
1 0 0 0
15 0 72324 379960 27224 9811868 0 0 0 8 1904 2252 99
1 0 0 0
16 0 72324 378256 27240 9811892 0 0 0 1230 1527 1771
100 0 0 0 0
15 0 72324 375264 27240 9811912 0 0 0 0 1635 1951 99
1 0 0 0
16 0 72324 374924 27264 9811932 0 0 0 108 1705 1792
100 0 0 0 0
iostat showed plenty of available iops.
I'm running on CentOS 5.6 on a Rackspace server. I have a numa
processor but boot with numa=off
$ numactl --hardware
available: 1 nodes (0)
node 0 size: 16142 MB
node 0 free: 1035 MB
node distances:
node 0
0: 10
Possibly related:
http://groups.google.com/group/mongodb-user/browse_thread/thread/65cf5f6c0b456642
https://jira.mongodb.org/browse/SERVER-3822?page=com.atlassian.jira.plugin.system.issuetabpanels%3Achangehistory-tabpanel
Hope you can help! I think I will downgrade to 1.8.6 in the meantime.