Hi,
There's situation in our mongodb cluster. In one of the replica set (say machine A, B), B was down due to server failure and required repair. Unfortunately, right before B was done repairing, A was down due to segfault:
Mon Nov 12 16:37:24 [conn233105274] Uncaught std::exception: St9bad_alloc, terminating
Mon Nov 12 16:37:24 dbexit:
Mon Nov 12 16:37:24 Backtrace:
0x8ad399 0x8ad970 0x367ee0eb70 0x2279a90
mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8ad399]
mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8ad970]
/lib64/libpthread.so.0 [0x367ee0eb70]
Then, we tried to restart A, while B was master. At that moment, A goes into ROLLBACK state as expected.
From mongostat, we see "UNK" for A; "M" for B.
From the log, we see some:
Mon Nov 12 17:53:18 [replica set sync] replSet info rollback of renameCollection is slow in this version of mongod
Mon Nov 12 17:53:18 [replica set sync] replSet WARNING ignoring op on rollback no _id TODO : xs.system.indexes { ts: Timestamp 1351583022000|218, h: 5274814664110145128, op: "i", ns: "xs.system.indexes", o: { ns: "xs.tmp.mr.profile_tmp.mrs.profile_1351583022_86493_1139205_inc", key: { 0: 1 }, name: "0_1", v: 0 } }
However, for quiet sometime, we only see:
Mon Nov 12 18:41:16 [initandlisten] connection accepted from
10.28.6.91:55271 #155
and now more ROLLBACK logs. And we have also checked the currentOp, and it does show things like below
{
"opid" : "rs_c:1265572266",
"active" : false,
"waitingForLock" : false,
"op" : "none",
"ns" : "?xs.profile",
"query" : {
},
"client_s" : "(NONE)",
"desc" : "replica set sync"
},
So, is the system still in the process of rolling back? or it's stuck?
*The DB cluster is v1.8.3
Thanks.