We had an issue with Mongo 2.2.0. with a replica set of 3 servers in total.
Two of our secondary server couldn't start due to duplicate key while syncing.
Because we have 3 server in total (2 secondary and 1 primary), the primary was then unable to remain primary because no other replica member was alive. so we had in total 2 mongo down and 1 secondary up.
In my case, I believe the primary (MONGO1) become unreachable for the two secondary servers, so one secondary become primary (MONGO3) and the other secondary server (MONGO2) start the replication against that new elected primary (MONGO3).
But it look like when the network was back for the first primary (MONGO1), the two other mongo instance (MONGO 2 and 3) tried both to become secondary again and rollback but failed due to a duplicate key.
Here are the log of one replica server (MONGO2).
//003 PRIMARY at that stage
Tue Jan 21 10:49:31 [rsHealthPoll] couldn't connect to MONGO1
Tue Jan 21 10:49:32 [rsHealthPoll] replSet member MONGO1 is now in state UNKNOWN
//MONGO1 UNKNOWN
Tue Jan 21 10:49:34 [rsHealthPoll] replSet member MONGO1 is now in state PRIMARY
//MONGO1 PRIMARY
Tue Jan 21 10:49:34 [rsMgr] replSet info two primaries (transiently)
//MONGO1 AND MONGO3 BOTH PRIMARY
Tue Jan 21 10:49:34 [rsSyncNotifier] replset tracking exception: exception: 10278 dbclient error communicating with server: MONGO3
Tue Jan 21 10:49:34 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: MONGO3
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet we are ahead of the primary, will try to roll back
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 0
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet ROLLBACK
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 1
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 2 FindCommonPoint
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet info rollback our last optime: Jan 21 10:50:15:a
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet info rollback their last optime: Jan 21 10:49:46:2a
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet info rollback diff in end of log times: 29 seconds
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback found matching events at Jan 21 10:49:08:2
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback findcommonpoint scanned : 984
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet replSet rollback 3 fixup
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 3.5
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 4 n:64
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet minvalid=Jan 21 10:49:47 52de50cb:16
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 4.6
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 4.7
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 5 d:0 u:64
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 6
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback 7
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet rollback done
Tue Jan 21 10:49:48 [rsBackgroundSync] replSet RECOVERING
Tue Jan 21 10:49:48 [rsSync] replSet still syncing, not yet to minValid optime 52de50cb:16
Tue Jan 21 10:49:48 [repl writer worker 6] ERROR: writer worker caught exception: E11000 duplicate key error index: production.table.$_id_ dup key: { : "EDO9wSyAV_1zY5QmDcZVuGA" } on: { ts: Timestamp 1390301410000|9, h: 7393169436717622182, op: "u", ns: "production.table", o2: { _id: "EDO9wSyAV_1zY5QmDcZVuGA"}, $set: { Version: 151 } } }
Tue Jan 21 10:49:48 [repl writer worker 6] Fatal Assertion 16360
0xade6e1 0x802e03 0x64f77d 0x77d3dd 0x7c3659 0x7f18517aee9a 0x7f1850ac44bd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12d) [0x64f77d]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
/usr/bin/mongod() [0x7c3659]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f18517aee9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f1850ac44bd]
Tue Jan 21 10:49:48 [repl writer worker 6]
***aborting after fassert() failure
Tue Jan 21 10:49:48 Got signal: 6 (Aborted).
Tue Jan 21 10:49:48 Backtrace:
0xade6e1 0x5582d9 0x7f1850a084c0 0x7f1850a08445 0x7f1850a0bbab 0x802e3e 0x64f77d 0x77d3dd 0x7c3659 0x7f18517aee9a 0x7f1850ac44bd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5582d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x364c0) [0x7f1850a084c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f1850a08445]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f1850a0bbab]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0x802e3e]
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12d) [0x64f77d]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
/usr/bin/mongod() [0x7c3659]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f18517aee9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f1850ac44bd]