Two primaries with network partition in replica set

299 views
Skip to first unread message

Sam

unread,
Feb 10, 2011, 1:40:20 PM2/10/11
to mongodb-user
I've been experimenting with network partitions in a multi-site
replication scenario (much like the one http://www.mongodb.org/display/DOCS/Data+Center+Awareness)
before putting my first Mongo deployment into production, and I've run
into some undesirable behaviour with a particular scenario of network
partition.

Being new at all this, I'd value some community input - after all this
is my first foray into Mongo replication, so I could be going about
this entirely the wrong way :)

Three hosts (all v1.6.5), in a replica set:

config = {_id: 'test1', members: [
{_id: 0, host: 'sf1'},
{_id: 1, host: 'ny1'},
{_id: 2, host: 'uk1'}]
}

sf1 is master.

A 'routing issue' occurs ( root@uk1:~# route add -host sf1 reject ),
such that:

sf1 can talk to ny1.
ny1 can talk to uk1.
sf1 cannot talk to uk1.

sf1 notices uk1 has gone quiet, and remains a master. (it's a master,
it can see a majority, so that's reasonable)
uk1 votes for itself. (it can see a majority, but no master, so
that's also reasonable)
ny1 votes for uk1. (that's probably less sensible, given that it can
already see a master)
ny1 then bemoans that fact that there are two primaries.

Log entries:

sf1:
Thu Feb 10 17:29:39 [conn2] end connection uk1:35740
Thu Feb 10 17:29:57 [ReplSetHealthPollTask] replSet info uk1 is now
down (or slow to respond)

ny1:
Thu Feb 10 17:29:37 [conn4] replSet info voting yea for 2
Thu Feb 10 17:29:39 [ReplSetHealthPollTask] replSet uk1 PRIMARY
Thu Feb 10 17:29:39 [rs Manager] replSet warning DIAG two primaries
(transiently)
Thu Feb 10 17:29:45 [rs Manager] replSet warning DIAG two primaries
(transiently)
Thu Feb 10 17:29:51 [rs Manager] replSet warning DIAG two primaries
(transiently)
(etc, the situation doesn't resolve until uk1 is un-partitioned again)

uk1:
Thu Feb 10 17:29:37 [ReplSetHealthPollTask] replSet info sf1 is now
down (or slow to respond)
Thu Feb 10 17:29:37 [rs Manager] replSet info electSelf 2
Thu Feb 10 17:29:37 [rs Manager] replSet PRIMARY



The impact of this is probably rather mitigated in the real world, as
if I repeat this scenario with frequent writes onto sf1, uk1 when
partitioned in this way will hold back, saying "[rs Manager] replSet
info not electing self, we are not freshest".

So I guess my question is - is this a reasonable topology, and thus I
should be logging this behaviour as a bug?

Dwight Merriman

unread,
Feb 11, 2011, 9:33:56 AM2/11/11
to mongod...@googlegroups.com
what version of mongod?


--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Dwight Merriman

unread,
Feb 11, 2011, 9:35:45 AM2/11/11
to mongod...@googlegroups.com
sorry i missed the version #.

can you try v1.7.5 for your tests?  this may have been fixed already.

also: is the dual primary transient or does it last?

On Thu, Feb 10, 2011 at 1:40 PM, Sam <grandhightr...@gmail.com> wrote:

Dwight Merriman

unread,
Feb 11, 2011, 9:38:55 AM2/11/11
to mongod...@googlegroups.com
if this happens with 1.7.5 and isn't transient, please create a jira. thanks.


On Thu, Feb 10, 2011 at 1:40 PM, Sam <grandhightr...@gmail.com> wrote:

Sam

unread,
Feb 13, 2011, 6:33:00 AM2/13/11
to mongodb-user, Dwight Merriman
Hi Dwight, thanks for the response.

I'm able to reproduce this with 1.7.5, so have created a jira at
http://jira.mongodb.org/browse/SERVER-2544
Reply all
Reply to author
Forward
0 new messages