Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Crash while performing initial sync (2.2)

Date: Fri, 21 Sep 2012 17:17:38 -0700 (PDT)
From: William Zola <william.z...@10gen.com>
To: mongodb-user@googlegroups.com
Message-Id: <f08fa0a5-67ac-4fde-b33c-6cfa26d920c1@googlegroups.com>
In-Reply-To: <946cf965-8359-4184-9326-e0ed2af85f7c@googlegroups.com>
References: <09042996-fd83-4369-b3a7-9df02f6a5736@googlegroups.com>
 <4b9bc55c-0bfc-44bc-9bee-2fa0cddf15a9@googlegroups.com>
 <946cf965-8359-4184-9326-e0ed2af85f7c@googlegroups.com>
Subject: Re: Crash while performing initial sync (2.2)
MIME-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_384_26254200.1348273058872"

------=_Part_384_26254200.1348273058872
Content-Type: multipart/alternative; 
	boundary="----=_Part_385_13566946.1348273058873"

------=_Part_385_13566946.1348273058873
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

Hi Tim!

This problem is different from the one that Braden reported.  If you 
examine the stack trace, you'll notice that the addresses and symbols are 
different from the ones in his report.

There should be some additional lines in the log file ahead of these: can 
you post those as well?

 -William 


On Wednesday, September 12, 2012 7:58:53 AM UTC-7, Tim wrote:
>
> Experiencing  the exact same problem can anyone give us a resolution?
> 0xade6e1 0x802e03 0x5e29f6 0x6538aa 0x65394a 0x653d58 0x7c3659 
> 0x39fec07851 0x39fe8e811d
>  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
>  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
>  /usr/bin/mongod(_ZN5mongo11ReplSetImpl17syncDoInitialSyncEv+0x46) 
> [0x5e29f6]
>  /usr/bin/mongod(_ZN5mongo11ReplSetImpl11_syncThreadEv+0x6a) [0x6538aa]
>  /usr/bin/mongod(_ZN5mongo11ReplSetImpl10syncThreadEv+0x2a) [0x65394a]
>  /usr/bin/mongod(_ZN5mongo15startSyncThreadEv+0xa8) [0x653d58]
>  /usr/bin/mongod() [0x7c3659]
>  /lib64/libpthread.so.0() [0x39fec07851]
>  /lib64/libc.so.6(clone+0x6d) [0x39fe8e811d]
> Wed Sep 12 15:43:35 [rsSync]
> ***aborting after fassert() failure
>  
>
> On Tuesday, 4 September 2012 04:55:36 UTC+1, Braden Evans wrote:
>
>> 2nd attempt has failed in the same way, sync fails with "ERROR: 
>> exception: Can no longer connect to initial sync source: 
>> 192.168.1.40:27017" and then an endless cascade of these duplicate key 
>> assertions. 
>>
>> I am guessing now there are two issues here, one is that for some reason 
>> sync is interrupted and two that it can not resume. 
>>
>>
>> Log of the 2nd failure:
>>
>>
>> Mon Sep  3 20:13:56 [repl writer worker 5] repl: couldn't connect to 
>> server 192.168.1.40:27017
>> Mon Sep  3 20:13:56 [repl writer worker 5] ERROR: exception: Can no 
>> longer connect to initial sync source: 192.168.1.40:27017 on: { ts: 
>> Timestamp 1346717498000|261, h: -4332373531262227304, op: "u", ns: 
>> "[collection]", o2: { _id: ObjectId('500e8550003c0b0b380cd9e9') }, o: { 
>> $set: { st: 1, la: new Date(1346717775888) } } }
>> Mon Sep  3 20:13:56 [repl writer worker 5]   Fatal Assertion 16361
>> 0xade6e1 0x802e03 0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d
>>  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
>>  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
>>  /usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x1d6) 
>> [0x6505b6]
>>  /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
>>  /usr/bin/mongod() [0x7c3659]
>>  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f39b0e75efc]
>>  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f39b020759d]
>> Mon Sep  3 20:13:56 [repl writer worker 5]
>>
>>
>> ***aborting after fassert() failure
>>
>>
>> Mon Sep  3 20:13:56 Got signal: 6 (Aborted).
>>
>> Mon Sep  3 20:13:56 Backtrace:
>> 0xade6e1 0x5582d9 0x7f39b015a420 0x7f39b015a3a5 0x7f39b015db0b 0x802e3e 
>> 0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d
>>  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
>>  /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5582d9]
>>  /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f39b015a420]
>>  /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f39b015a3a5]
>>  /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f39b015db0b]
>>  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0x802e3e]
>>  /usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x1d6) 
>> [0x6505b6]
>>  /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
>>  /usr/bin/mongod() [0x7c3659]
>>  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f39b0e75efc]
>>  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f39b020759d]
>>
>>
>> On Monday, September 3, 2012 3:35:06 PM UTC-7, Braden Evans wrote:
>>>
>>> We just updated our secondary to 2.2 and while performing the inital 
>>> sync (near the end while replaying the oplog) we get an fassert:
>>>
>>> Mon Sep  3 15:00:38 [repl writer worker 7] ERROR: writer worker caught 
>>> exception: E11000 duplicate key error index: [collection].$_id_  dup key: { 
>>> : ObjectId('5044fe2d293c0b02a893576b') } on: { ts: Timestamp 
>>> 1346698527000|3315, h: -2145615260524114526, op: "u", ns: "[collection]", 
>>> o2: { _id: ObjectId('5044fe2d293c0b02a893576b'), pc: { $size: 35 } }, o: { 
>>> $push: { pc: { u: ObjectId('0000000000000000002283a9'), c: [ 12, 20, 43, 
>>> 51, 73, 5, 24, 31, 48, 67, 1, 23, 34, 58, 69, 3, 30, 36, 53, 71, 11, 19, 
>>> 39, 59, 68 ] } } } }
>>> Mon Sep  3 15:00:38 [repl writer worker 7]   Fatal Assertion 16360
>>> 0xade6e1 0x802e03 0x64f77d 0x77d3dd 0x7c3659 0x7f4a859c1efc 
>>> 0x7f4a84d5359d
>>>  /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
>>>  /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0x802e03]
>>>  /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12d) 
>>> [0x64f77d]
>>>  /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0x77d3dd]
>>>  /usr/bin/mongod() [0x7c3659]
>>>  /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f4a859c1efc]
>>>  /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f4a84d5359d]
>>> Mon Sep  3 15:00:38 [repl writer worker 7]
>>>
>>> ***aborting after fassert() failure
>>>
>>> Is there any way to work around this problem? Manually deleting the 
>>> record (on the secondary) does nothing, I'm not sure how to proceed except 
>>> resyncing and hoping it doesn't happen again. 
>>>
>>> This collection is updated frequently, I wonder if it is this case: 
>>> https://jira.mongodb.org/browse/SERVER-6671 
>>>
>>>
>>>
>>>
------=_Part_385_13566946.1348273058873
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Tim!<br><br>This problem is different from the one that Braden reported.=
&nbsp; If you examine the stack trace, you'll notice that the addresses and=
 symbols are different from the ones in his report.<br><br>There should be =
some additional lines in the log file ahead of these: can you post those as=
 well?<br><br>&nbsp;-William <br><br><br>On Wednesday, September 12, 2012 7=
:58:53 AM UTC-7, Tim wrote:<blockquote class=3D"gmail_quote" style=3D"margi=
n: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><di=
v><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div><p style=3D"margin:0cm 0cm 0pt" class=3D"MsoNormal"><font size=
=3D"3"><font color=3D"#000000"><font face=3D"Calibri">Experiencing <span>&n=
bsp;</span>the
exact same problem can anyone give us a resolution?</font></font></font></p=
><div><font color=3D"#000000" face=3D"Times New Roman" size=3D"3">

</font></div><div><font color=3D"#000000" face=3D"Times New Roman" size=3D"=
3">0xade6e1 0x802e03 0x5e29f6 0x6538aa 0x65394a 0x653d58 0x7c3659 0x39fec07=
851 0x39fe8e811d<br>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo15printStackTraceER=
So+<wbr>0x21) [0xade6e1]<br>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo13fassertFa=
iledEi+<wbr>0xa3) [0x802e03]<br>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo11ReplS=
etImpl17syncDoI<wbr>nitialSyncEv+0x46) [0x5e29f6]<br>&nbsp;/usr/bin/mongod(=
_<wbr>ZN5mongo11ReplSetImpl11_<wbr>syncThreadEv+0x6a) [0x6538aa]<br>&nbsp;/=
usr/bin/mongod(_<wbr>ZN5mongo11ReplSetImpl10syncThr<wbr>eadEv+0x2a) [0x6539=
4a]<br>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo15startSyncThreadEv+<wbr>0xa8) [=
0x653d58]<br>&nbsp;/usr/bin/mongod() [0x7c3659]<br>&nbsp;/lib64/libpthread.=
so.0() [0x39fec07851]<br>&nbsp;/lib64/libc.so.6(clone+0x6d) [0x39fe8e811d]<=
br>Wed Sep 12 15:43:35 [rsSync]</font></div><div><font color=3D"#000000" fa=
ce=3D"Times New Roman" size=3D"3">***aborting after fassert() failure<br></=
font></div><div><font color=3D"#000000" face=3D"Times New Roman" size=3D"3"=
></font>&nbsp;</div><div><br>On Tuesday, 4 September 2012 04:55:36 UTC+1, B=
raden Evans  wrote:</div><blockquote style=3D"margin:0px 0px 0px 0.8ex;padd=
ing-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;borde=
r-left-style:solid" class=3D"gmail_quote"><div>2nd attempt has failed in th=
e same way, sync fails with "ERROR: exception: Can no longer connect to ini=
tial sync source: <a href=3D"http://192.168.1.40:27017" target=3D"_blank">1=
92.168.1.40:27017</a>" and then an endless cascade of these duplicate key a=
ssertions.&nbsp;<br></div><div><br></div><div>I am guessing now there are t=
wo issues here, one is that for some reason sync is interrupted and two tha=
t it can not resume.&nbsp;</div><div><br></div><div><br></div><div>Log of t=
he 2nd failure:</div><div><br></div><div><br></div><div><div>Mon Sep &nbsp;=
3 20:13:56 [repl writer worker 5] repl: couldn't connect to server <a href=
=3D"http://192.168.1.40:27017" target=3D"_blank">192.168.1.40:27017</a></di=
v><div>Mon Sep &nbsp;3 20:13:56 [repl writer worker 5] ERROR: exception: Ca=
n no longer connect to initial sync source: <a href=3D"http://192.168.1.40:=
27017" target=3D"_blank">192.168.1.40:27017</a> on: { ts: Timestamp 1346717=
498000|261, h: -4332373531262227304, op: "u", ns: "[collection]", o2: { _id=
: ObjectId('<wbr>500e8550003c0b0b380cd9e9') }, o: { $set: { st: 1, la: new =
Date(1346717775888) } } }</div><div>Mon Sep &nbsp;3 20:13:56 [repl writer w=
orker 5] &nbsp; Fatal Assertion 16361</div><div>0xade6e1 0x802e03 0x6505b6 =
0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d</div><div>&nbsp;/usr/bin/mo=
ngod(_<wbr>ZN5mongo15printStackTraceERSo+<wbr>0x21) [0xade6e1]</div><div>&n=
bsp;/usr/bin/mongod(_<wbr>ZN5mongo13fassertFailedEi+<wbr>0xa3) [0x802e03]</=
div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo7replset21multiInitial<wbr>Syn=
cApplyERKSt6vectorINS_<wbr>7BSONObjESaIS2_EEPNS0_<wbr>8SyncTailE+0x1d6) [0x=
6505b6]</div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo10threadpool6Worker4l=
o<wbr>opEv+0x26d) [0x77d3dd]</div><div>&nbsp;/usr/bin/mongod() [0x7c3659]</=
div><div>&nbsp;/lib/x86_64-linux-gnu/<wbr>libpthread.so.0(+0x7efc) [0x7f39b=
0e75efc]</div><div>&nbsp;/lib/x86_64-linux-gnu/libc.<wbr>so.6(clone+0x6d) [=
0x7f39b020759d]</div><div>Mon Sep &nbsp;3 20:13:56 [repl writer worker 5]</=
div></div><div><br></div><div><div><br></div><div>***aborting after fassert=
() failure</div><div><br></div><div><br></div><div>Mon Sep &nbsp;3 20:13:56=
 Got signal: 6 (Aborted).</div><div><br></div><div>Mon Sep &nbsp;3 20:13:56=
 Backtrace:</div><div>0xade6e1 0x5582d9 0x7f39b015a420 0x7f39b015a3a5 0x7f3=
9b015db0b 0x802e3e 0x6505b6 0x77d3dd 0x7c3659 0x7f39b0e75efc 0x7f39b020759d=
</div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo15printStackTraceERSo+<wbr>0=
x21) [0xade6e1]</div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo10abruptQuitE=
i+0x399) [0x5582d9]</div><div>&nbsp;/lib/x86_64-linux-gnu/libc.<wbr>so.6(+0=
x36420) [0x7f39b015a420]</div><div>&nbsp;/lib/x86_64-linux-gnu/libc.<wbr>so=
.6(gsignal+0x35) [0x7f39b015a3a5]</div><div>&nbsp;/lib/x86_64-linux-gnu/lib=
c.<wbr>so.6(abort+0x17b) [0x7f39b015db0b]</div><div>&nbsp;/usr/bin/mongod(_=
<wbr>ZN5mongo13fassertFailedEi+<wbr>0xde) [0x802e3e]</div><div>&nbsp;/usr/b=
in/mongod(_<wbr>ZN5mongo7replset21multiInitial<wbr>SyncApplyERKSt6vectorINS=
_<wbr>7BSONObjESaIS2_EEPNS0_<wbr>8SyncTailE+0x1d6) [0x6505b6]</div><div>&nb=
sp;/usr/bin/mongod(_<wbr>ZN5mongo10threadpool6Worker4lo<wbr>opEv+0x26d) [0x=
77d3dd]</div><div>&nbsp;/usr/bin/mongod() [0x7c3659]</div><div>&nbsp;/lib/x=
86_64-linux-gnu/<wbr>libpthread.so.0(+0x7efc) [0x7f39b0e75efc]</div><div>&n=
bsp;/lib/x86_64-linux-gnu/libc.<wbr>so.6(clone+0x6d) [0x7f39b020759d]</div>=
</div><div><br></div><br>On Monday, September 3, 2012 3:35:06 PM UTC-7, Bra=
den Evans wrote:<blockquote style=3D"margin:0px 0px 0px 0.8ex;padding-left:=
1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-st=
yle:solid" class=3D"gmail_quote">We just updated our secondary to 2.2 and w=
hile performing the inital sync (near the end while replaying the oplog) we=
 get an fassert:<div><br></div><div><div>Mon Sep &nbsp;3 15:00:38 [repl wri=
ter worker 7] ERROR: writer worker caught exception: E11000 duplicate key e=
rror index: [collection].$_id_ &nbsp;dup key: { : ObjectId('<wbr>5044fe2d29=
3c0b02a893576b') } on: { ts: Timestamp 1346698527000|3315, h: -214561526052=
4114526, op: "u", ns: "[collection]", o2: { _id: ObjectId('<wbr>5044fe2d293=
c0b02a893576b'), pc: { $size: 35 } }, o: { $push: { pc: { u: ObjectId('<wbr=
>0000000000000000002283a9'), c: [ 12, 20, 43, 51, 73, 5, 24, 31, 48, 67, 1,=
 23, 34, 58, 69, 3, 30, 36, 53, 71, 11, 19, 39, 59, 68 ] } } } }<br></div><=
div>Mon Sep &nbsp;3 15:00:38 [repl writer worker 7] &nbsp; Fatal Assertion =
16360</div><div>0xade6e1 0x802e03 0x64f77d 0x77d3dd 0x7c3659 0x7f4a859c1efc=
 0x7f4a84d5359d</div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mongo15printStackT=
raceERSo+<wbr>0x21) [0xade6e1]</div><div>&nbsp;/usr/bin/mongod(_<wbr>ZN5mon=
go13fassertFailedEi+<wbr>0xa3) [0x802e03]</div><div>&nbsp;/usr/bin/mongod(_=
<wbr>ZN5mongo7replset14multiSyncApp<wbr>lyERKSt6vectorINS_<wbr>7BSONObjESaI=
S2_EEPNS0_<wbr>8SyncTailE+0x12d) [0x64f77d]</div><div>&nbsp;/usr/bin/mongod=
(_<wbr>ZN5mongo10threadpool6Worker4lo<wbr>opEv+0x26d) [0x77d3dd]</div><div>=
&nbsp;/usr/bin/mongod() [0x7c3659]</div><div>&nbsp;/lib/x86_64-linux-gnu/<w=
br>libpthread.so.0(+0x7efc) [0x7f4a859c1efc]</div><div>&nbsp;/lib/x86_64-li=
nux-gnu/libc.<wbr>so.6(clone+0x6d) [0x7f4a84d5359d]</div><div>Mon Sep &nbsp=
;3 15:00:38 [repl writer worker 7]</div><div><br></div><div>***aborting aft=
er fassert() failure</div><div><br></div><div>Is there any way to work arou=
nd this problem? Manually deleting the record (on the secondary) does nothi=
ng, I'm not sure how to proceed except resyncing and hoping it doesn't happ=
en again.&nbsp;<br></div><div><br></div><div><div>This collection is update=
d frequently, I wonder if it is this case:&nbsp;<a href=3D"https://jira.mon=
godb.org/browse/SERVER-6671" target=3D"_blank">https://jira.mongodb.<wbr>or=
g/browse/SERVER-6671</a>&nbsp;</div><br></div><div><br></div><div><br></div=
></div></blockquote></blockquote></blockquote>
------=_Part_385_13566946.1348273058873--

------=_Part_384_26254200.1348273058872--