Mongostat: replica activity but no primary activity

89 views
Skip to first unread message

Kevin Rice

unread,
Aug 6, 2012, 6:24:09 PM8/6/12
to mongod...@googlegroups.com
Hello!

We've a 4-sharded 2-replicaset on 4 boxes.  We shut off all reads/writes to the db, but:

mongostat still shows activity (inserts/deletes) on the replicas.  Why?

            in  q  u   dele  gm  cmd    fl   mapped vsz    res     faults    locks   idxmiss    etc.       
box1:30000   0  0  0      0   0   16     0  78.1g   169g    23g      0        0          0       0|0     0|0     1k     9k   100 shard-00    M
box1:30200   0  0  0      0   0   15     0   116g   245g  32.2g      0        0          0       0|0     0|0   926b     9k   102 shard-02    M
box1:40000   0  0  0      0   0    1               2.02g    30m      0                                          62b   811b     5           RTR
box2:30100   0  0  0      0   1   16     0  74.1g   161g  36.1g      0        0          0       0|0     1|0     1k     9k   102 shard-01    M
box2:30300   0  0  0      0   1   16     0  72.1g   157g  40.3g      0        0          0       0|0     1|0     1k     9k   103 shard-03    M
box3:30001  *0 *0 *0    *60   0  5|0     0   148g   304g  25.1g     29       41          0       0|0     0|0   310b     3k    51 shard-00  SEC
box3:30201  *0 *0 *0    *56   0  5|0     0   188g   384g  30.9g     45     30.3          0       0|0     0|0   310b     3k    53 shard-02  SEC
box4:30101  *0 *0 *0     *0   0  6|0     0   144g   296g  34.6g      0        0          0       0|0     0|0   455b     4k    52 shard-01  SEC
box4:30301  *0 *0 *0     *0   0  6|0     0   144g   296g  40.2g      0        0          0       0|0     0|0   455b     4k    51 shard-03  SEC


I'm seeing in the replica's logs a lot of

Mon Aug  6 17:21:55 [conn16766] command admin.$cmd command: { getlasterror: 1, fsync: 1 } ntoreturn:1 reslen:97 110ms
Mon Aug  6 17:21:57 [conn24096] CMD fsync:  sync:1 lock:0

Where do I go to see what's going on, really?  does this 'getlasterr: 1' mean there was an error, really?

andre.defrere

unread,
Aug 6, 2012, 9:42:59 PM8/6/12
to mongod...@googlegroups.com
Hi Kevin,

The secondaries in a replica set do lag behind the primary.  Because the secondaries tail the OpLog there is always going to be some amount of lag caused by anything from network traffic to I/O contention.  The replication lag might explain why you are still seeing inserts and deletes on the secondaries after the primary is finished.

You can check the state of the replication lag with "rs.status()" from the shell. You will see the opTime of the primary and the opTime of all the secondaries.  The difference between these dates is the replication lag  You can see more information including the size of the opLog with "db.printReplicationInfo()"

How long after shutting down reads and writes are you still seeing activity on the secondaries?  How are you enforcing no more writes and reads?

On the second point regarding getLastErrror.  This is not actually indicating that an error has occurred.  By default a write operation with Mongo does not wait for a response, to get one you would issue a write and check for the last error with a call to getLastError.  Often this is done in the drivers in what is known as 'safemode' writes.  What I would suggest is happening  is that the replication writes are using getLastError to determine that write were successful.

Hope this helps,
André
Reply all
Reply to author
Forward
0 new messages