Message from discussion
Crash while performing initial sync (2.2)
Received: by 10.66.85.136 with SMTP id h8mr3491456paz.46.1346730947805;
Mon, 03 Sep 2012 20:55:47 -0700 (PDT)
X-BeenThere: mongodb-user@googlegroups.com
Received: by 10.68.194.202 with SMTP id hy10ls20255740pbc.0.gmail; Mon, 03 Sep
2012 20:55:36 -0700 (PDT)
Received: by 10.68.129.69 with SMTP id nu5mr2934065pbb.17.1346730936466;
Mon, 03 Sep 2012 20:55:36 -0700 (PDT)
Date: Mon, 3 Sep 2012 20:55:36 -0700 (PDT)
From: Braden Evans <bra...@numericalorder.com>
To: mongodb-user@googlegroups.com
Message-Id: <4b9bc55c-0bfc-44bc-9bee-2fa0cddf15a9@googlegroups.com>
In-Reply-To: <09042996-fd83-4369-b3a7-9df02f6a5736@googlegroups.com>
References: <09042996-fd83-4369-b3a7-9df02f6a5736@googlegroups.com>
Subject: Re: Crash while performing initial sync (2.2)
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_1761_23390276.1346730936089"
------=_Part_1761_23390276.1346730936089
Content-Type: multipart/alternative;
boundary="----=_Part_1762_30211953.1346730936089"
------=_Part_1762_30211953.1346730936089
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
2nd attempt has failed in the same way, sync fails with "ERROR: exception:
Can no longer connect to initial sync source: 192.168.1.40:27017" and then
an endless cascade of these duplicate key assertions.
I am guessing now there are two issues here, one is that for some reason
sync is interrupted and two that it can not resume.
Log of the 2nd failure:
Mon Sep 3 20:13:56 [repl writer worker 5] repl: couldn't connect to server
192.168.1.40:27017
Mon Sep 3 20:13:56 [repl writer worker 5] ERROR: exception: Can no longer
connect to initial sync source: 192.168.1.40:27017 on: { ts: Timestamp
1346717498000|261, h: -4332373531262227304, op: "u", ns: "[collection]",
o2: { _id: ObjectId('500e8550003c0b0b380cd9e9') }, o: { $set: { st: 1, la:
new Date(1346717775888) } } }
Mon Sep 3 20:13:56 [repl writer worker 5] Fatal Assertion 16361
0xade6e1 0x802e03 0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
/usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x1d6)
[0x6505b6]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
/usr/bin/mongod() [0x7c3659]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f39b0e75efc]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f39b020759d]
Mon Sep 3 20:13:56 [repl writer worker 5]
***aborting after fassert() failure
Mon Sep 3 20:13:56 Got signal: 6 (Aborted).
Mon Sep 3 20:13:56 Backtrace:
0xade6e1 0x5582d9 0x7f39b015a420 0x7f39b015a3a5 0x7f39b015db0b 0x802e3e
0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5582d9]
/lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f39b015a420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f39b015a3a5]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f39b015db0b]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0x802e3e]
/usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x1d6)
[0x6505b6]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
/usr/bin/mongod() [0x7c3659]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f39b0e75efc]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f39b020759d]
On Monday, September 3, 2012 3:35:06 PM UTC-7, Braden Evans wrote:
>
> We just updated our secondary to 2.2 and while performing the inital sync
> (near the end while replaying the oplog) we get an fassert:
>
> Mon Sep 3 15:00:38 [repl writer worker 7] ERROR: writer worker caught
> exception: E11000 duplicate key error index: [collection].$_id_ dup key: {
> : ObjectId('5044fe2d293c0b02a893576b') } on: { ts: Timestamp
> 1346698527000|3315, h: -2145615260524114526, op: "u", ns: "[collection]",
> o2: { _id: ObjectId('5044fe2d293c0b02a893576b'), pc: { $size: 35 } }, o: {
> $push: { pc: { u: ObjectId('0000000000000000002283a9'), c: [ 12, 20, 43,
> 51, 73, 5, 24, 31, 48, 67, 1, 23, 34, 58, 69, 3, 30, 36, 53, 71, 11, 19,
> 39, 59, 68 ] } } } }
> Mon Sep 3 15:00:38 [repl writer worker 7] Fatal Assertion 16360
> 0xade6e1 0x802e03 0x64f77d 0x77d3dd 0x7c3659 0x7f4a859c1efc 0x7f4a84d5359d
> /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
> /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
> /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12d)
> [0x64f77d]
> /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
> /usr/bin/mongod() [0x7c3659]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f4a859c1efc]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f4a84d5359d]
> Mon Sep 3 15:00:38 [repl writer worker 7]
>
> ***aborting after fassert() failure
>
> Is there any way to work around this problem? Manually deleting the record
> (on the secondary) does nothing, I'm not sure how to proceed except
> resyncing and hoping it doesn't happen again.
>
> This collection is updated frequently, I wonder if it is this case:
> https://jira.mongodb.org/browse/SERVER-6671
>
>
>
>
------=_Part_1762_30211953.1346730936089
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
<div>2nd attempt has failed in the same way, sync fails with "ERROR: except=
ion: Can no longer connect to initial sync source: 192.168.1.40:27017" and =
then an endless cascade of these duplicate key assertions. <br></div><=
div><br></div><div>I am guessing now there are two issues here, one is that=
for some reason sync is interrupted and two that it can not resume. <=
/div><div><br></div><div><br></div><div>Log of the 2nd failure:</div><div><=
br></div><div><br></div><div><div>Mon Sep 3 20:13:56 [repl writer wor=
ker 5] repl: couldn't connect to server 192.168.1.40:27017</div><div>Mon Se=
p 3 20:13:56 [repl writer worker 5] ERROR: exception: Can no longer c=
onnect to initial sync source: 192.168.1.40:27017 on: { ts: Timestamp 13467=
17498000|261, h: -4332373531262227304, op: "u", ns: "[collection]", o2: { _=
id: ObjectId('500e8550003c0b0b380cd9e9') }, o: { $set: { st: 1, la: new Dat=
e(1346717775888) } } }</div><div>Mon Sep 3 20:13:56 [repl writer work=
er 5] Fatal Assertion 16361</div><div>0xade6e1 0x802e03 0x6505b6 0x7=
7d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d</div><div> /usr/bin/mongo=
d(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]</div><div> /usr/bin/=
mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]</div><div> /usr/bin=
/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESa=
IS2_EEPNS0_8SyncTailE+0x1d6) [0x6505b6]</div><div> /usr/bin/mongod(_ZN=
5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]</div><div> /usr/bin=
/mongod() [0x7c3659]</div><div> /lib/x86_64-linux-gnu/libpthread.so.0(=
+0x7efc) [0x7f39b0e75efc]</div><div> /lib/x86_64-linux-gnu/libc.so.6(c=
lone+0x6d) [0x7f39b020759d]</div><div>Mon Sep 3 20:13:56 [repl writer=
worker 5]</div></div><div><br></div><div><div><br></div><div>***aborting a=
fter fassert() failure</div><div><br></div><div><br></div><div>Mon Sep &nbs=
p;3 20:13:56 Got signal: 6 (Aborted).</div><div><br></div><div>Mon Sep &nbs=
p;3 20:13:56 Backtrace:</div><div>0xade6e1 0x5582d9 0x7f39b015a420 0x7f39b0=
15a3a5 0x7f39b015db0b 0x802e3e 0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x=
7f39b020759d</div><div> /usr/bin/mongod(_ZN5mongo15printStackTraceERSo=
+0x21) [0xade6e1]</div><div> /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0=
x399) [0x5582d9]</div><div> /lib/x86_64-linux-gnu/libc.so.6(+0x36420) =
[0x7f39b015a420]</div><div> /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x=
35) [0x7f39b015a3a5]</div><div> /lib/x86_64-linux-gnu/libc.so.6(abort+=
0x17b) [0x7f39b015db0b]</div><div> /usr/bin/mongod(_ZN5mongo13fassertF=
ailedEi+0xde) [0x802e3e]</div><div> /usr/bin/mongod(_ZN5mongo7replset2=
1multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x1d=
6) [0x6505b6]</div><div> /usr/bin/mongod(_ZN5mongo10threadpool6Worker4=
loopEv+0x26d) [0x77d3dd]</div><div> /usr/bin/mongod() [0x7c3659]</div>=
<div> /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f39b0e75efc]<=
/div><div> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f39b020759d=
]</div></div><div><br></div><br>On Monday, September 3, 2012 3:35:06 PM UTC=
-7, Braden Evans wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0=
;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">We just=
updated our secondary to 2.2 and while performing the inital sync (near th=
e end while replaying the oplog) we get an fassert:<div><br></div><div><div=
>Mon Sep 3 15:00:38 [repl writer worker 7] ERROR: writer worker caugh=
t exception: E11000 duplicate key error index: [collection].$_id_ dup=
key: { : ObjectId('<wbr>5044fe2d293c0b02a893576b') } on: { ts: Timestamp 1=
346698527000|3315, h: -2145615260524114526, op: "u", ns: "[collection]", o2=
: { _id: ObjectId('<wbr>5044fe2d293c0b02a893576b'), pc: { $size: 35 } }, o:=
{ $push: { pc: { u: ObjectId('<wbr>0000000000000000002283a9'), c: [ 12, 20=
, 43, 51, 73, 5, 24, 31, 48, 67, 1, 23, 34, 58, 69, 3, 30, 36, 53, 71, 11, =
19, 39, 59, 68 ] } } } }<br></div><div>Mon Sep 3 15:00:38 [repl write=
r worker 7] Fatal Assertion 16360</div><div>0xade6e1 0x802e03 0x64f7=
7d 0x77d3dd 0x7c3659 0x7f4a859c1efc 0x7f4a84d5359d</div><div> /usr/bin=
/mongod(_<wbr>ZN5mongo15printStackTraceERSo+<wbr>0x21) [0xade6e1]</div><div=
> /usr/bin/mongod(_<wbr>ZN5mongo13fassertFailedEi+<wbr>0xa3) [0x802e03=
]</div><div> /usr/bin/mongod(_<wbr>ZN5mongo7replset14multiSyncApp<wbr>=
lyERKSt6vectorINS_<wbr>7BSONObjESaIS2_EEPNS0_<wbr>8SyncTailE+0x12d) [0x64f7=
7d]</div><div> /usr/bin/mongod(_<wbr>ZN5mongo10threadpool6Worker4lo<wb=
r>opEv+0x26d) [0x77d3dd]</div><div> /usr/bin/mongod() [0x7c3659]</div>=
<div> /lib/x86_64-linux-gnu/<wbr>libpthread.so.0(+0x7efc) [0x7f4a859c1=
efc]</div><div> /lib/x86_64-linux-gnu/libc.<wbr>so.6(clone+0x6d) [0x7f=
4a84d5359d]</div><div>Mon Sep 3 15:00:38 [repl writer worker 7]</div>=
<div><br></div><div>***aborting after fassert() failure</div><div><br></div=
><div>Is there any way to work around this problem? Manually deleting the r=
ecord (on the secondary) does nothing, I'm not sure how to proceed except r=
esyncing and hoping it doesn't happen again. <br></div><div><br></div>=
<div><div>This collection is updated frequently, I wonder if it is this cas=
e: <a href=3D"https://jira.mongodb.org/browse/SERVER-6671" target=3D"_=
blank">https://jira.mongodb.<wbr>org/browse/SERVER-6671</a> </div><br>=
</div><div><br></div><div><br></div></div></blockquote>
------=_Part_1762_30211953.1346730936089--
------=_Part_1761_23390276.1346730936089--