Mongostat: replica activity but no primary activity

89 views

Skip to first unread message

Kevin Rice

unread,

Aug 6, 2012, 6:24:09 PM8/6/12

to mongod...@googlegroups.com

Hello!

We've a 4-sharded 2-replicaset on 4 boxes. We shut off all reads/writes to the db, but:

mongostat still shows activity (inserts/deletes) on the replicas. Why?

in q u dele gm cmd    fl mapped vsz    res faults    locks idxmiss    etc.
box1:30000   0 0 0      0   0   16     0 78.1g   169g    23g      0        0          0       0|0     0|0     1k     9k   100 shard-00    M
box1:30200   0 0 0      0   0   15     0   116g   245g 32.2g      0        0          0       0|0     0|0   926b     9k   102 shard-02    M
box1:40000   0 0 0      0   0    1               2.02g    30m      0                                          62b   811b     5           RTR
box2:30100   0 0 0      0   1   16     0 74.1g   161g 36.1g      0        0          0       0|0     1|0     1k     9k   102 shard-01    M
box2:30300   0 0 0      0   1   16     0 72.1g   157g 40.3g      0        0          0       0|0     1|0     1k     9k   103 shard-03    M
box3:30001 *0 *0 *0    *60   0 5|0     0   148g   304g 25.1g     29       41          0       0|0     0|0   310b     3k    51 shard-00 SEC
box3:30201 *0 *0 *0 *56   0 5|0     0   188g   384g 30.9g     45     30.3          0       0|0     0|0   310b     3k    53 shard-02 SEC
box4:30101 *0 *0 *0     *0   0 6|0     0   144g   296g 34.6g      0        0          0       0|0     0|0   455b     4k    52 shard-01 SEC
box4:30301 *0 *0 *0     *0   0 6|0     0   144g   296g 40.2g      0        0          0       0|0     0|0   455b     4k    51 shard-03 SEC

I'm seeing in the replica's logs a lot of

Mon Aug 6 17:21:55 [conn16766] command admin.$cmd command: { getlasterror: 1, fsync: 1 } ntoreturn:1 reslen:97 110ms
Mon Aug 6 17:21:57 [conn24096] CMD fsync: sync:1 lock:0

Where do I go to see what's going on, really? does this 'getlasterr: 1' mean there was an error, really?

andre.defrere

unread,

Aug 6, 2012, 9:42:59 PM8/6/12

to mongod...@googlegroups.com

Hi Kevin,

The secondaries in a replica set do lag behind the primary. Because the secondaries tail the OpLog there is always going to be some amount of lag caused by anything from network traffic to I/O contention. The replication lag might explain why you are still seeing inserts and deletes on the secondaries after the primary is finished.

You can check the state of the replication lag with "rs.status()" from the shell. You will see the opTime of the primary and the opTime of all the secondaries. The difference between these dates is the replication lag You can see more information including the size of the opLog with "db.printReplicationInfo()"

How long after shutting down reads and writes are you still seeing activity on the secondaries? How are you enforcing no more writes and reads?

On the second point regarding getLastErrror. This is not actually indicating that an error has occurred. By default a write operation with Mongo does not wait for a response, to get one you would issue a write and check for the last error with a call to getLastError. Often this is done in the drivers in what is known as 'safemode' writes. What I would suggest is happening is that the replication writes are using getLastError to determine that write were successful.

Hope this helps,

André

Reply all

Reply to author

Forward

0 new messages