I have shutdown a server for maintenance and replaced the motherboard, this came up with a different time offset which the ntp did not handle well.
at first it started as a secondary to catch up with the master then a few hours later i received this: Thu Sep 1 06:15:18 [conn22215] Assertion: 13312:replSet error : logOp() but not primary?
and then an exceptions:
Thu Sep 1 06:15:19 [conn22215] update storage.object query: { _id:
2397448961 } exception 13312 replSet error : logOp() but not primary? 127ms
Thu Sep 1 06:15:19 [conn22531] Assertion: 13312:replSet error : logOp() but not primary?
0x55f5aa 0x713191 0x711052 0x66a467 0x66cd30 0x7588bf 0x75b251 0x8a8fce 0x8bb630 0x3022c0673d 0x30220d44bd
Thu Sep 1 06:15:19 [initandlisten] connection accepted from xx.xx.xx.237:53888 #23793
Thu Sep 1 06:15:19 [conn23790] end connection xx.xx.xx.232:60094
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f5aa]
/usr/bin/mongod [0x713191]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x711052]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverE+0x1327) [0x66a467]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugE+0x130) [0x66cd30]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x47f) [0x7588bf]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1941) [0x75b251]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib64/libpthread.so.0 [0x3022c0673d]
/lib64/libc.so.6(clone+0x6d) [0x30220d44bd]
then the primary went into recovery mode, and could not rollback because of time difference.
I have changed the server time but this does not help.
shard0001:PRIMARY> db.printReplicationInfo()
configured oplog size: 48958.344531250004MB
log length start to end: 463340secs (128.71hrs)
oplog first event time: Sat Aug 27 2011 08:23:01 GMT+0000 (GMT)
oplog last event time: Thu Sep 01 2011 17:05:21 GMT+0000 (GMT)
now: Thu Sep 01 2011 10:04:51 GMT+0000 (GMT)
shard0001:ROLLBACK> db.printReplicationInfo()
configured oplog size: 44682.4892578125MB
log length start to end: 384524secs (106.81hrs)
oplog first event time: Sat Aug 27 2011 19:22:49 GMT+0000 (GMT)
oplog last event time: Thu Sep 01 2011 06:11:33 GMT+0000 (GMT)
now: Thu Sep 01 2011 10:07:26 GMT+0000 (GMT)
currently what i can only do is to delete the data on the problematic node, but i understand that there's a need to delete the oplog (local db) on the primary as well as it's causing problems
What really bothers me is the way mongodb handle with those time differences, so in case a server time go out of sync it may cause a lot of problems.
is there any other way to fix it?