Message from discussion
rs member died because other member crashed?
Date: Wed, 26 Sep 2012 04:11:19 -0700 (PDT)
From: Daniel <daniel.brue...@gmail.com>
To: mongodb-user@googlegroups.com
Message-Id: <2c5cfae9-8501-45cc-b41f-8f15ad5542b0@googlegroups.com>
In-Reply-To: <c03d3521-599b-4e3c-ae2c-873c034db790@googlegroups.com>
References: <c03d3521-599b-4e3c-ae2c-873c034db790@googlegroups.com>
Subject: Re: rs member died because other member crashed?
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_168_7571545.1348657879477"
------=_Part_168_7571545.1348657879477
Content-Type: multipart/alternative;
boundary="----=_Part_169_16566486.1348657879477"
------=_Part_169_16566486.1348657879477
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
This "ERROR: moveChunk commit failed: version is at" just happened again.
This time in a 3 member replicaset where 2 members where in status STARTUP.
:(
On Tuesday, September 25, 2012 2:04:45 PM UTC+2, Daniel wrote:
>
> Hi,
>
> setup: 9 servers, 3 shards with 3 rs members each. MongoDB 2.2
>
> One member (member1) in a set crashed, because of a server crash. This
> server also runs 1 of 3 config servers. After that another member (member2)
> crashed because it couldn't reach the crashed server? This is the message
> on member2:
>
> Tue Sep 25 13:34:56 [conn538] DBClientCursor::init call() failed
>> Tue Sep 25 13:34:56 [conn538] scoped connection to
>> config1:27019,config2:27019,config3:27019 not being returned to the pool
>> Tue Sep 25 13:34:56 [conn538] warning: 13104
>> SyncClusterConnection::findOne prepare failed: 10276 DBClientBase::findN:
>> transport error: config3:27019 ns: admin.$cmd query: { fsync: 1 }
>> config3:27019:{}
>> Tue Sep 25 13:34:56 [conn538] warning: moveChunk commit outcome ongoing:
>> { applyOps: [ { op: "u", b: false, ns: "config.chunks", o: { _id:
>> "db.coll1-uuid_"38f9dbbe-86ec-444b-9e6a-483eab0f9bb2"_id_ObjectId('50444151e4b0c4a3a8c5cf74')",
>> lastmod: Timest$
>> Tue Sep 25 13:34:57 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:34:59 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:03 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:05 [rsHealthPoll] couldn't connect to member1:27018:
>> couldn't connect to server member1:27018
>> Tue Sep 25 13:35:06 [conn538] ERROR: moveChunk commit failed: version is
>> at907|1||000000000000000000000000 instead of 908|1||50604b9fb961dd917fdc2316
>> Tue Sep 25 13:35:06 [conn538] ERROR: TERMINATING
>> Tue Sep 25 13:35:06 dbexit:
>> Tue Sep 25 13:35:06 [conn538] shutdown: going to close listening
>> sockets...
>> Tue Sep 25 13:35:06 [conn538] closing listening socket: 6
>> Tue Sep 25 13:35:06 [conn538] closing listening socket: 7
>> Tue Sep 25 13:35:06 [conn538] shutdown: going to flush diaglog...
>> Tue Sep 25 13:35:06 [conn538] shutdown: going to close sockets...
>> Tue Sep 25 13:35:06 [conn538] shutdown: waiting for fs preallocator...
>> Tue Sep 25 13:35:06 [conn538] shutdown: lock for final commit...
>> Tue Sep 25 13:35:06 [conn538] shutdown: final commit...
>> Tue Sep 25 13:35:06 [conn1] end connection member2_IP:41925 (21
>> connections now open)
>> Tue Sep 25 13:35:06 [initandlisten] now exiting
>> Tue Sep 25 13:35:06 dbexit: ; exiting immediately
>
>
> This is really weird, because the redundancy of 3 server should provide
> some kind of failover right? But if one member drags down another member,
> than thats really ugly.
>
> Is this a bug ?
>
> Thanks & regards
>
> Daniel
>
------=_Part_169_16566486.1348657879477
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
This "ERROR: moveChunk commit failed: version is at" just happened aga=
in. This time in a 3 member replicaset where 2 members where in status STAR=
TUP. :(<div><br></div><div><br><br>On Tuesday, September 25, 2012 2:04:45 P=
M UTC+2, Daniel wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;=
margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">Hi,<div>=
<br></div><div>setup: 9 servers, 3 shards with 3 rs members each. Mon=
goDB 2.2</div><div><br></div><div>One member (member1) in a set crashed, be=
cause of a server crash. This server also runs 1 of 3 config servers. After=
that another member (member2) crashed because it couldn't reach the crashe=
d server? This is the message on member2:</div><div><br></div><div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-wid=
th:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-l=
eft:1ex">Tue Sep 25 13:34:56 [conn538] DBClientCursor::init call() failed<b=
r>Tue Sep 25 13:34:56 [conn538] scoped connection to config1:27019,config2:=
27019,<wbr>config3:27019 not being returned to the pool<br>Tue Sep 25 13:34=
:56 [conn538] warning: 13104 SyncClusterConnection::findOne prepare failed:=
10276 DBClientBase::findN: transport error: config3:27019 ns: admin.$cmd q=
uery: { fsync: 1 } config3:27019:{}<br>Tue Sep 25 13:34:56 [conn538] warnin=
g: moveChunk commit outcome ongoing: { applyOps: [ { op: "u", b: false, ns:=
"config.chunks", o: { _id: "db.coll1-uuid_"38f9dbbe-86ec-<wbr>444b-9e6a-48=
3eab0f9bb2"_id_<wbr>ObjectId('<wbr>50444151e4b0c4a3a8c5cf74')", lastmod: Ti=
mest$<br>Tue Sep 25 13:34:57 [rsHealthPoll] couldn't connect to member1:270=
18: couldn't connect to server member1:27018<br>Tue Sep 25 13:34:59 [rsHeal=
thPoll] couldn't connect to member1:27018: couldn't connect to server membe=
r1:27018<br>Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:=
27018: couldn't connect to server member1:27018<br>Tue Sep 25 13:35:01 [rsH=
ealthPoll] couldn't connect to member1:27018: couldn't connect to server me=
mber1:27018<br>Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to membe=
r1:27018: couldn't connect to server member1:27018<br>Tue Sep 25 13:35:03 [=
rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server=
member1:27018<br>Tue Sep 25 13:35:05 [rsHealthPoll] couldn't connect to me=
mber1:27018: couldn't connect to server member1:27018<br>Tue Sep 25 13:35:0=
6 [conn538] ERROR: moveChunk commit failed: version is at907|1||<wbr>000000=
000000000000000000 instead of 908|1||<wbr>50604b9fb961dd917fdc2316<br>Tue S=
ep 25 13:35:06 [conn538] ERROR: TERMINATING<br>Tue Sep 25 13:35:06 dbexit:<=
br>Tue Sep 25 13:35:06 [conn538] shutdown: going to close listening sockets=
...<br>Tue Sep 25 13:35:06 [conn538] closing listening socket: 6<br>Tue Sep=
25 13:35:06 [conn538] closing listening socket: 7<br>Tue Sep 25 13:35:06 [=
conn538] shutdown: going to flush diaglog...<br>Tue Sep 25 13:35:06 [conn53=
8] shutdown: going to close sockets...<br>Tue Sep 25 13:35:06 [conn538] shu=
tdown: waiting for fs preallocator...<br>Tue Sep 25 13:35:06 [conn538] shut=
down: lock for final commit...<br>Tue Sep 25 13:35:06 [conn538] shutdown: f=
inal commit...<br>Tue Sep 25 13:35:06 [conn1] end connection member2_IP:419=
25 (21 connections now open)<br>Tue Sep 25 13:35:06 [initandlisten] now exi=
ting<br>Tue Sep 25 13:35:06 dbexit: ; exiting immediately</blockquote><div>=
<br></div><div>This is really weird, because the redundancy of 3 server sho=
uld provide some kind of failover right? But if one member drags down anoth=
er member, than thats really ugly.</div><div><br></div><div>Is this a bug ?=
</div><div><br></div><div>Thanks & regards</div><div><br></div><div>Dan=
iel </div></div></blockquote></div>
------=_Part_169_16566486.1348657879477--
------=_Part_168_7571545.1348657879477--