Re: Slaves slow down responds to clients and stopping syncing with master

435 views
Skip to first unread message

gregor

unread,
Aug 20, 2012, 6:29:37 AM8/20/12
to mongod...@googlegroups.com
What it looks like is the SECONDARY is getting read starved. At the top of https://gist.github.com/3363374 there are no inserts or updates coming in from the PRIMARY - these are the "*" operations. The PRIMARY at this point is loaded with writes/updates and is holding the write lock so the secondary is unable to get a read lock on the oplog. What kind of reads are you using? It looks like all the reads are now on the SECONDARY and are no longer on the PRIMARY. So replication is falling behind and the SECONDARY is heavily loaded with reads - giving a high qr read queue. When the writes are completed on the PRIMARY, the SECONDARY is able to get a read lock and replicate operations - shown by mongostat. The PRIMARY is able to answer queries again and the number of queries on the SECONDARY falls. The SECONDARY also at this point has high write lock as it replicates operations from the PRIMARY. 

Can you do 
mongostat --discover
when the freeze occurs to show the PRIMARY and the SECONDARY at the same time?

On Wednesday, August 15, 2012 10:02:14 PM UTC+1, Roman Skvazh wrote:
Mongostat (all is fine, than problem occured)
https://gist.github.com/3363512

четверг, 16 августа 2012 г., 0:43:09 UTC+4 пользователь Roman Skvazh написал:
Hi everyone!
We have a problem with replica set slaves. Sometime (under high load) its freezes and do not updates from master and slows down responding to clients.
Configuration: 3 same servers (Xeon, 24GB RAM, SSD), replica set: 1 master and 2 slaves. Mongo ver 2.0.7, PHP Driver 1.2.12, Ubuntu 12.04 server with all updates

Mongostat (problem occured, then unfreezes)

slave's iostat -xm 2

Mongod config:
...
nojournal = true
syncdelay = 30
...

Roman Skvazh

unread,
Aug 21, 2012, 4:01:20 AM8/21/12
to mongod...@googlegroups.com
Freeze occured
https://gist.github.com/3413304

When I stopped all queries, slave unfreezes and continue syncing...

понедельник, 20 августа 2012 г., 14:29:37 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 21, 2012, 7:28:47 AM8/21/12
to mongod...@googlegroups.com
It looks like host d5 has blocked on the first gist - it has high queries and no inserts (the other secondaries are replicating inserts). When you kill the queries d5 starts to replicate inserts again. 
Are these hosts in MMS? Could you add them in? It only take about 5-10 minutes. https://mms.10gen.com/help/install.html
Could you also run a db.currentOp() on d5 when it is blocked like this?
Finally could you run iostat -xm 2 on d5 when it is blocked. 

Roman Skvazh

unread,
Aug 21, 2012, 7:48:16 AM8/21/12
to mongod...@googlegroups.com
All this hosts already in MMS.

iostat d5 seems fine when its blocked

In currentOp there is many queries from our application. I supposed that our app sends many queries when slave lag is growing...

But why System CPU is higher than User on all our nodes?

PS. In MMS timezone is Europe/Moscow (+0400)

вторник, 21 августа 2012 г., 15:28:47 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 21, 2012, 9:44:45 AM8/21/12
to mongod...@googlegroups.com
What is the mms group name?

Roman Skvazh

unread,
Aug 21, 2012, 11:11:43 AM8/21/12
to mongod...@googlegroups.com
Flysoft / replica set if_rs_1

вторник, 21 августа 2012 г., 17:44:45 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 21, 2012, 12:15:45 PM8/21/12
to mongod...@googlegroups.com
OK can you tell me what time (and timezone) you last saw this freeze at so I can check in MMS?

Roman Skvazh

unread,
Aug 21, 2012, 1:09:41 PM8/21/12
to mongod...@googlegroups.com
Last at 7:49 (GMT +04). At 8:19 I have been restart application and d14 freezing node.

Thank you for your help.

вторник, 21 августа 2012 г., 20:15:45 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 22, 2012, 7:23:12 AM8/22/12
to mongod...@googlegroups.com
OK I had a look I can see that d11 restarts at 07:41 moscow time on 22/08/12  
Also d14 restarts at 07:51 moscow time on the 22/08/12 
Did these two nodes freeze at this time? I can see replica lag increasing to these times on both nodes. When this happens again can you run db.currentOp() on the nodes to see what operations are running. It is possible that there is an operation running that is blocking things. 

Roman Skvazh

unread,
Aug 22, 2012, 7:15:33 PM8/22/12
to mongod...@googlegroups.com
When d13 freezes and all other slave freezes syncing at 2:38 (Europe/Moscow) 23 Aug 2012 in currentOp very-very many operations.
Some of them:
https://gist.github.com/3430459

In normal situation all this query are fast.
Opened cursors are very high at this moment on this d13 slave. How I can diagnose which cursors are opened?

среда, 22 августа 2012 г., 15:23:12 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 23, 2012, 10:22:45 AM8/23/12
to mongod...@googlegroups.com
Where we've seen high cpu and queues building, using tcmalloc has sometimes helped - this is why 2.2.0rc1 uses tcmalloc. If you want to try this with your existing 2.0.7 nodes you can 

  1. Install google perf tools (e.g. apt-get install libgoogle-perftools0)
  2. Stop mongod/mongos
  3. Start mongod/mongos with the following environment variable defined (note the paths may need to be updated; detailed instructions here):
LD_PRELOAD="/usr/lib/libtcmalloc.so"

This is probably worth trying.  

Roman Skvazh

unread,
Aug 23, 2012, 4:51:13 PM8/23/12
to mongod...@googlegroups.com
Ok, we will try this.
Thank you!

четверг, 23 августа 2012 г., 18:22:45 UTC+4 пользователь gregor написал:

Roman Skvazh

unread,
Aug 24, 2012, 3:18:16 AM8/24/12
to mongod...@googlegroups.com
Yeahhh!
I switched to tcmalloc and last night was without problems.
System cpu was decreased and all was stable.

Thank you very much!

Why not add this to documentation on mongodb.org?

четверг, 23 августа 2012 г., 18:22:45 UTC+4 пользователь gregor написал:

gregor

unread,
Aug 24, 2012, 3:47:05 AM8/24/12
to mongod...@googlegroups.com
Great! Glad it worked :) I will get it added to the documentation.

gregor

unread,
Aug 24, 2012, 3:51:18 AM8/24/12
to mongod...@googlegroups.com
Created documentation ticket https://jira.mongodb.org/browse/DOCS-441
Reply all
Reply to author
Forward
0 new messages