Two primaries with network partition in replica set

Sam

unread,

Feb 10, 2011, 1:40:20 PM2/10/11

to mongodb-user

I've been experimenting with network partitions in a multi-site
replication scenario (much like the one http://www.mongodb.org/display/DOCS/Data+Center+Awareness)
before putting my first Mongo deployment into production, and I've run
into some undesirable behaviour with a particular scenario of network
partition.

Being new at all this, I'd value some community input - after all this
is my first foray into Mongo replication, so I could be going about
this entirely the wrong way :)

Three hosts (all v1.6.5), in a replica set:

config = {_id: 'test1', members: [
{_id: 0, host: 'sf1'},
{_id: 1, host: 'ny1'},
{_id: 2, host: 'uk1'}]
}

sf1 is master.

A 'routing issue' occurs ( root@uk1:~# route add -host sf1 reject ),
such that:

sf1 can talk to ny1.
ny1 can talk to uk1.
sf1 cannot talk to uk1.

sf1 notices uk1 has gone quiet, and remains a master. (it's a master,
it can see a majority, so that's reasonable)
uk1 votes for itself. (it can see a majority, but no master, so
that's also reasonable)
ny1 votes for uk1. (that's probably less sensible, given that it can
already see a master)
ny1 then bemoans that fact that there are two primaries.

Log entries:

sf1:
Thu Feb 10 17:29:39 [conn2] end connection uk1:35740
Thu Feb 10 17:29:57 [ReplSetHealthPollTask] replSet info uk1 is now
down (or slow to respond)

ny1:
Thu Feb 10 17:29:37 [conn4] replSet info voting yea for 2
Thu Feb 10 17:29:39 [ReplSetHealthPollTask] replSet uk1 PRIMARY
Thu Feb 10 17:29:39 [rs Manager] replSet warning DIAG two primaries
(transiently)
Thu Feb 10 17:29:45 [rs Manager] replSet warning DIAG two primaries
(transiently)
Thu Feb 10 17:29:51 [rs Manager] replSet warning DIAG two primaries
(transiently)
(etc, the situation doesn't resolve until uk1 is un-partitioned again)

uk1:
Thu Feb 10 17:29:37 [ReplSetHealthPollTask] replSet info sf1 is now
down (or slow to respond)
Thu Feb 10 17:29:37 [rs Manager] replSet info electSelf 2
Thu Feb 10 17:29:37 [rs Manager] replSet PRIMARY

The impact of this is probably rather mitigated in the real world, as
if I repeat this scenario with frequent writes onto sf1, uk1 when
partitioned in this way will hold back, saying "[rs Manager] replSet
info not electing self, we are not freshest".

So I guess my question is - is this a reasonable topology, and thus I
should be logging this behaviour as a bug?

Dwight Merriman

unread,

Feb 11, 2011, 9:33:56 AM2/11/11

to mongod...@googlegroups.com

what version of mongod?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Dwight Merriman

unread,

Feb 11, 2011, 9:35:45 AM2/11/11

to mongod...@googlegroups.com

sorry i missed the version #.

can you try v1.7.5 for your tests? this may have been fixed already.

also: is the dual primary transient or does it last?

On Thu, Feb 10, 2011 at 1:40 PM, Sam <grandhightr...@gmail.com> wrote:

Dwight Merriman

unread,

Feb 11, 2011, 9:38:55 AM2/11/11

to mongod...@googlegroups.com

if this happens with 1.7.5 and isn't transient, please create a jira. thanks.

On Thu, Feb 10, 2011 at 1:40 PM, Sam <grandhightr...@gmail.com> wrote:

Sam

unread,

Feb 13, 2011, 6:33:00 AM2/13/11

to mongodb-user, Dwight Merriman

Hi Dwight, thanks for the response.

I'm able to reproduce this with 1.7.5, so have created a jira at
http://jira.mongodb.org/browse/SERVER-2544

Reply all

Reply to author

Forward