[Rollback State] Is it making progress? and should we wait?

34 views
Skip to first unread message

Steve

unread,
Nov 12, 2012, 10:19:02 PM11/12/12
to mongod...@googlegroups.com
Hi,

There's situation in our mongodb cluster. In one of the replica set (say machine A, B), B was down due to server failure and required repair. Unfortunately, right before B was done repairing, A was down due to segfault:

Mon Nov 12 16:37:24 [conn233105274]   Uncaught std::exception: St9bad_alloc, terminating
Mon Nov 12 16:37:24 dbexit:
Mon Nov 12 16:37:24 Backtrace:
0x8ad399 0x8ad970 0x367ee0eb70 0x2279a90 
 mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8ad399]
 mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8ad970]
 /lib64/libpthread.so.0 [0x367ee0eb70]
 [0x2279a90]

Then, we tried to restart A, while B was master. At that moment, A goes into ROLLBACK state as expected.

From mongostat, we see "UNK" for A; "M" for B.

From the log, we see some:

Mon Nov 12 17:53:18 [replica set sync] replSet info rollback of renameCollection is slow in this version of mongod
Mon Nov 12 17:53:18 [replica set sync] replSet WARNING ignoring op on rollback no _id TODO : xs.system.indexes { ts: Timestamp 1351583022000|218, h: 5274814664110145128, op: "i", ns: "xs.system.indexes", o: { ns: "xs.tmp.mr.profile_tmp.mrs.profile_1351583022_86493_1139205_inc", key: { 0: 1 }, name: "0_1", v: 0 } }

However, for quiet sometime, we only see:

Mon Nov 12 18:41:04 [initandlisten] connection accepted from 10.28.120.169:44420 #154
Mon Nov 12 18:41:04 [conn154] end connection 10.28.120.169:44420
Mon Nov 12 18:41:16 [conn153] end connection 10.28.6.91:55265
Mon Nov 12 18:41:16 [initandlisten] connection accepted from 10.28.6.91:55271 #155

and now more ROLLBACK logs. And we have also checked the currentOp, and it does show things like below

{
"opid" : "rs_c:1265572266",
"active" : false,
"waitingForLock" : false,
"op" : "none",
"ns" : "?xs.profile",
"query" : {
},
"client_s" : "(NONE)",
"desc" : "replica set sync"
},

So, is the system still in the process of rolling back? or it's stuck?

*The DB cluster is v1.8.3

Thanks.
Reply all
Reply to author
Forward
0 new messages