Hi All,
I configured replica set with 3 members i.e One Primary, One Secondary, One Arbiter.
After some days, Secondary service was stopped unexpectedly. I found the following error in mongo log
==========================================================================================
2014-08-14T15:20:00.139+0530 [RangeDeleter] warning: Error encountered while trying to delete range: Error encountered while deleting range: database.collection from { _id: "value" } -> { _id: "value" }, cause by: :: caused by :: 10107 waitForReplication called but not master anymore
2014-08-14T15:20:00.139+0530 [RangeDeleter] Helpers::removeRangeUnlocked time spent waiting for replication: 0ms
2014-08-14T15:20:00.139+0530 [RangeDeleter] rangeDeleter deleted 0 documents for database.collection from { _id: "value" } -> { _id: "value" }
2014-08-14T15:20:00.139+0530 [RangeDeleter] Assertion: 10107:replicatedToNum called but not master anymore
2014-08-14T15:20:00.142+0530 [RangeDeleter] 0x11c0e91 0x1163109 0x11477e6 0xebb7ab 0xeb5cfd 0xde2830 0xddd2dd 0x1205829 0x3518607851 0x3517ee890d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11c0e91]
/usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x1163109]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0xe6) [0x11477e6]
/usr/bin/mongod(_ZN5mongo13SlaveTracking18opReplicatedEnoughENS_6OpTimeERKSs+0x8b) [0xebb7ab]
/usr/bin/mongod(_ZN5mongo18opReplicatedEnoughENS_6OpTimeENS_11BSONElementE+0xdd) [0xeb5cfd]
/usr/bin/mongod(_ZN5mongo17RangeDeleterDBEnv11deleteRangeERKNS_10StringDataERKNS_7BSONObjES6_S6_bPSs+0xbb0) [0xde2830]
/usr/bin/mongod(_ZN5mongo12RangeDeleter6doWorkEv+0x86d) [0xddd2dd]
/usr/bin/mongod() [0x1205829]
/lib64/libpthread.so.0() [0x3518607851]
/lib64/libc.so.6(clone+0x6d) [0x3517ee890d]
2014-08-14T15:20:00.162+0530 [RangeDeleter] SEVERE: terminate() called, printing stack (if implemented for platform):
0x11c0e91 0x11c0118 0x351cebcbe6 0x351cebcc13 0x351cebcd0e 0x114788e 0xebb7ab 0xeb5cfd 0xde2830 0xddd2dd 0x1205829 0x3518607851 0x3517ee890d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11c0e91]
/usr/bin/mongod() [0x11c0118]
/usr/lib64/libstdc++.so.6() [0x351cebcbe6]
/usr/lib64/libstdc++.so.6() [0x351cebcc13]
/usr/lib64/libstdc++.so.6() [0x351cebcd0e]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x18e) [0x114788e]
/usr/bin/mongod(_ZN5mongo13SlaveTracking18opReplicatedEnoughENS_6OpTimeERKSs+0x8b) [0xebb7ab]
/usr/bin/mongod(_ZN5mongo18opReplicatedEnoughENS_6OpTimeENS_11BSONElementE+0xdd) [0xeb5cfd]
/usr/bin/mongod(_ZN5mongo17RangeDeleterDBEnv11deleteRangeERKNS_10StringDataERKNS_7BSONObjES6_S6_bPSs+0xbb0) [0xde2830]
/usr/bin/mongod(_ZN5mongo12RangeDeleter6doWorkEv+0x86d) [0xddd2dd]
/usr/bin/mongod() [0x1205829]
/lib64/libpthread.so.0() [0x3518607851]
/lib64/libc.so.6(clone+0x6d) [0x3517ee890d]
========================================================================================
After that I had started secondary service manually. It was in Recovery state due to secondary didn't catch up with primary.
and then I had started resync the replica set member from initial. But, it is RECOVERY state only. The following is the error log
============================================================================================
2014-08-15T05:13:00.357+0530 [rsSync] building index using bulk method
2014-08-15T05:13:03.000+0530 [rsSync] Index Build: 674600/1018567 66%
2014-08-15T05:13:08.609+0530 [rsSync] build index done. scanned 1018567 total records. 8.252 secs
2014-08-15T05:13:08.751+0530 [rsSync] replSet initial sync data copy, starting syncup
2014-08-15T05:13:08.751+0530 [rsSync] oplog sync 1 of 3
2014-08-15T05:13:09.780+0530 [rsBackgroundSync] replSet syncing to: PrimaryHost:27011
2014-08-15T05:13:09.781+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up, at least from PrimaryHost:27011
2014-08-15T05:13:09.781+0530 [rsBackgroundSync] replSet our last optime : Aug 14 23:45:46 53ecfcd2:42
2014-08-15T05:13:09.781+0530 [rsBackgroundSync] replSet oldest at PrimaryHost:27011 : Aug 15 03:59:32 53ed384c:4c8
2014-08-15T05:13:09.781+0530 [rsBackgroundSync] replSet See
http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember2014-08-15T05:13:10.054+0530 [rsBackgroundSync] replSet error RS102 too stale to catch up
2014-08-15T05:13:10.054+0530 [rsBackgroundSync] replSet RECOVERING
2014-08-15T05:13:12.408+0530 [conn1339] end connection Arbiter:53931 (26 connections now open)
2014-08-15T05:13:12.408+0530 [initandlisten] connection accepted from Arbiter:53938 #1341 (27 connections now open)
2014-08-15T05:13:24.103+0530 [conn1340] end connection PrimaryHost:45803 (26 connections now open)
2014-08-15T05:13:24.104+0530 [initandlisten] connection accepted from PrimaryHost:45809 #1342 (27 connections now open)
2014-08-15T05:13:42.421+0530 [conn1341] end connection Arbiter:53938 (26 connections now open)
2014-08-15T05:13:42.421+0530 [initandlisten] connection accepted from Arbiter:53944 #1343 (27 connections now open)
2014-08-15T05:13:54.115+0530 [conn1342] end connection PrimaryHost:45809 (26 connections now open)
2014-08-15T05:13:54.116+0530 [initandlisten] connection accepted from PrimaryHost:45815 #1344 (27 connections now open)
2014-08-15T05:14:08.748+0530 [clientcursormon] mem (MB) res:91991 virt:233739
2014-08-15T05:14:08.748+0530 [clientcursormon] mapped (incl journal view):232428
2014-08-15T05:14:08.748+0530 [clientcursormon] connections:27
2014-08-15T05:14:08.748+0530 [clientcursormon] replication threads:32
2014-08-15T05:14:12.432+0530 [conn1343] end connection Arbiter:53944 (26 connections now open)
2014-08-15T05:14:12.432+0530 [initandlisten] connection accepted from Arbiter:53950 #1345 (27 connections now open)
2014-08-15T05:14:24.128+0530 [conn1344] end connection PrimaryHost:45815 (26 connections now open)
2014-08-15T05:14:24.128+0530 [initandlisten] connection accepted from PrimaryHost:45821 #1346 (27 connections now open)
2014-08-15T05:14:42.444+0530 [conn1345] end connection Arbiter:53950 (26 connections now open)
2014-08-15T05:14:42.444+0530 [initandlisten] connection accepted from Arbiter:53956 #1347 (27 connections now open)
2014-08-15T05:14:54.145+0530 [conn1346] end connection PrimaryHost:45821 (26 connections now open)
2014-08-15T05:14:54.146+0530 [initandlisten] connection accepted from PrimaryHost:45827 #1348 (27 connections now open)
2014-08-15T05:15:12.455+0530 [conn1347] end connection Arbiter:53956 (26 connections now open)
2014-08-15T05:15:12.455+0530 [initandlisten] connection accepted from Arbiter:53962 #1349 (27 connections now open)
2014-08-15T05:15:15.055+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 474 more seconds
2014-08-15T05:15:15.055+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 474 more seconds
2014-08-15T05:15:20.056+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 469 more seconds
2014-08-15T05:15:20.056+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 469 more seconds
2014-08-15T05:15:24.164+0530 [conn1348] end connection PrimaryHost:45827 (26 connections now open)
2014-08-15T05:15:24.165+0530 [initandlisten] connection accepted from PrimaryHost:45833 #1350 (27 connections now open)
2014-08-15T05:15:25.056+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 464 more seconds
2014-08-15T05:15:25.057+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 464 more seconds
2014-08-15T05:15:29.692+0530 [conn8] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53bfc43d429aac2a013bbe25') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:29.710+0530 [conn10] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53bfda689b3f9287fd61fa97') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:29.711+0530 [conn12] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53bfc5344fe0e2debbacc83d') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:29.713+0530 [conn11] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53bfe37114393bcbfc59e2dd') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:29.713+0530 [conn9] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53b7238fa3c2fc4f399d430f') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:29.724+0530 [conn13] command admin.$cmd command: writebacklisten { writebacklisten: ObjectId('53bfc50ccbac799a6d9da8c8') } ntoreturn:1 keyUpdates:0 numYields:0 reslen:44 300000ms
2014-08-15T05:15:30.057+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 459 more seconds
2014-08-15T05:15:30.057+0530 [rsBackgroundSync] replSet not trying to sync from PrimaryHost:27011, it is vetoed for 459 more seconds
====================================================================================================
I observed Oplog size is 10GB
Actually, my doubt is why the secondary member stopped unexpectedly?
Why the resyc is not successful ? Is resyn depends on Oplog size ?
Best Regards,
Satya D