We've experienced a significant slowdown of our application over the past week. I got involved with troubleshooting last evening, and am a bit confused. While I think my problem may be caused by a missed file limit (i'm waiting for a window to restart mongo).
The servers have 144GB of Memory, and 8x10k HW Raid10.
The server architecture is a 3 node replica set with the application connecting via mongos (we're not sharded at this point). All connections are happening over a private gigabit network.
10gens MMS service shows our writelock% spike on January 29. On this day, we were running around 90GB of indexes. I'm not sure how to read the memory graph provided.. (10gen graphs:
http://imgur.com/a/3AoMm)
So I got thinking that maybe we had too many indexes. Knowing that this would destroy the read performance, we removed them and restarted mongo last night. NOTHING was reading from the database, and we turned the write client back on. The WriteLock shot right back up to 80-90% range.
the first Mongostat output shows Mongo with the indexes still on the database. I also show some IO statistics. The second mongostat output shows mongo after the clients were stopped, the indexes dropped, mongo restarted. The writelock% goes from 0 to >85% instantly.
Anyone have any ideas on where to look?