rs member died because other member crashed?

61 views
Skip to first unread message

Daniel

unread,
Sep 25, 2012, 8:04:45 AM9/25/12
to mongod...@googlegroups.com
Hi,

setup: 9 servers, 3 shards with 3 rs members each.  MongoDB 2.2

One member (member1) in a set crashed, because of a server crash. This server also runs 1 of 3 config servers. After that another member (member2) crashed because it couldn't reach the crashed server? This is the message on member2:

Tue Sep 25 13:34:56 [conn538] DBClientCursor::init call() failed
Tue Sep 25 13:34:56 [conn538] scoped connection to config1:27019,config2:27019,config3:27019 not being returned to the pool
Tue Sep 25 13:34:56 [conn538] warning: 13104 SyncClusterConnection::findOne prepare failed: 10276 DBClientBase::findN: transport error: config3:27019 ns: admin.$cmd query: { fsync: 1 } config3:27019:{}
Tue Sep 25 13:34:56 [conn538] warning: moveChunk commit outcome ongoing: { applyOps: [ { op: "u", b: false, ns: "config.chunks", o: { _id: "db.coll1-uuid_"38f9dbbe-86ec-444b-9e6a-483eab0f9bb2"_id_ObjectId('50444151e4b0c4a3a8c5cf74')", lastmod: Timest$
Tue Sep 25 13:34:57 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:34:59 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:01 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:03 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:05 [rsHealthPoll] couldn't connect to member1:27018: couldn't connect to server member1:27018
Tue Sep 25 13:35:06 [conn538] ERROR: moveChunk commit failed: version is at907|1||000000000000000000000000 instead of 908|1||50604b9fb961dd917fdc2316
Tue Sep 25 13:35:06 [conn538] ERROR: TERMINATING
Tue Sep 25 13:35:06 dbexit:
Tue Sep 25 13:35:06 [conn538] shutdown: going to close listening sockets...
Tue Sep 25 13:35:06 [conn538] closing listening socket: 6
Tue Sep 25 13:35:06 [conn538] closing listening socket: 7
Tue Sep 25 13:35:06 [conn538] shutdown: going to flush diaglog...
Tue Sep 25 13:35:06 [conn538] shutdown: going to close sockets...
Tue Sep 25 13:35:06 [conn538] shutdown: waiting for fs preallocator...
Tue Sep 25 13:35:06 [conn538] shutdown: lock for final commit...
Tue Sep 25 13:35:06 [conn538] shutdown: final commit...
Tue Sep 25 13:35:06 [conn1] end connection member2_IP:41925 (21 connections now open)
Tue Sep 25 13:35:06 [initandlisten] now exiting
Tue Sep 25 13:35:06 dbexit: ; exiting immediately

This is really weird, because the redundancy of 3 server should provide some kind of failover right? But if one member drags down another member, than thats really ugly.

Is this a bug ?

Thanks & regards

Daniel 

Daniel

unread,
Sep 26, 2012, 7:11:19 AM9/26/12
to mongod...@googlegroups.com
This "ERROR: moveChunk commit failed: version is at" just happened again. This time in a 3 member replicaset where 2 members where in status STARTUP. :(

Daniel

unread,
Sep 27, 2012, 11:11:53 AM9/27/12
to mongod...@googlegroups.com
Oh, sorry the log files are gone. I have started from skretch. I really try to get a stable system, but since days such errors avoid from going productive. Thanks anyway.

Daniel

On Wednesday, September 26, 2012 9:43:54 PM UTC+2, Shaun wrote:
Hi Daniel,

It looks like your config server went down halfway through a chunk migration.  When you have fewer than 3 config servers, the config metadata goes read only.

It does look like something that should be made more robust, or at least be more clearly defined.  Could you post all of your logfiles from when this happened?

Thanks,
-Shaun

Shaun

unread,
Sep 27, 2012, 5:45:51 PM9/27/12
to mongod...@googlegroups.com
All right.  We will continue to look into it using the log you gave us.  It may mitigate the problem a bit if you make sure to put config servers on hosts by themselves so that if a host goes down you don't lose multiple servers.

Daniel

unread,
Sep 28, 2012, 9:12:49 AM9/28/12
to mongod...@googlegroups.com
Ok, thanks. I was inspired by this setup (http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture), which also uses config servers on other members.

Shaun

unread,
Oct 5, 2012, 2:59:08 PM10/5/12
to mongod...@googlegroups.com
Hi Daniel,

Thanks for reporting this issue.  We've filed a server ticket and you can track the progress here: https://jira.mongodb.org/browse/SERVER-7271

-Shaun
Reply all
Reply to author
Forward
0 new messages