MongoDB Replica Pair Fail-Over Issues

11 views
Skip to first unread message

Brian Carpio

unread,
Jun 10, 2010, 11:37:34 AM6/10/10
to mongodb-user
Hi,

Currently I am using "mongodb-linux-x86_64-1.5.1" to do my testig here
is the situation:

I have two servers lets call them mongodb01 and mongodb01 and they are
running in replica pair mode:

mongodb01 = Master
mongodb02 = Slave

If I hard shutdown, like just power off mongodb01 mongodb02 takes over
as master as it should, then I bring mongodb01 back up and generally
its DB is corrupt since I just powered it off and crashed it, mongo
comes up and complains about an old lock file etc.. which is all
great..

Then I delete everything in my dbpath path, since my data is corrupt
on mongodb01 it will just resync with mongodb02... After I delete the
data I restart mongo and it appears to be replicating data from the
master, which is now mongodb02...

So once the data is synced I crash mongodb02 (which is now the master)
and mongodb01 never takes over as master.. Never I waited like 10
minutes..

However, if I do the following it works..

Hard shutdown mongodb01 (master), mongodb02 takes over as master...
recover mongodb01.... mongodb01's mongo services don't start due to
lock file, remove everything in the dbpath, start mongo service.. wait
for the mongo instances to start to sync, then once more restart the
mongo service on mongodb01... Then if I hard crash mongodb02 which is
now the master mongodb01 will take over again as the master..

Here is an odd thing I see in the logs right before mongodb01 starts
to resync with mongodb02 after the hard crash:

REPLICA PAIR NON-MASTER: received 'forcedead' command, replication
forced to stop

MASTER:
Thu Jun 10 02:14:02 remote slave log filled, forcing slave resync
Thu Jun 10 02:14:02
**********************************************************
Thu Jun 10 02:14:02 Sending forcedead command to slave to stop its
replication
Thu Jun 10 02:14:02 Host: mongodb01:27717 paired: 1

This is displayed after the indexes are rebuilt and such after I
delete the data under the dbpath... when I do the second restart, I
see the mongo replicat pairs connect but I do not see this message..

Thanks,
Brian Carpio

Eliot Horowitz

unread,
Jun 10, 2010, 4:02:51 PM6/10/10
to mongod...@googlegroups.com
What's in the log when 01 won't take back over from 02?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Brian Carpio

unread,
Jun 10, 2010, 4:51:21 PM6/10/10
to mongodb-user
I'm not sure what you mean... Basically when I hard crash the master
(after a failover) the other replca pair doesn't take over as master..
when I tail the log file on the replica pair that is not up there is
no additional info it stops at the last sync:

Thu Jun 10 03:10:57 getmore local.oplog.$main cid:558380028805474902
ntoreturn:0 getMore: { ts: { $gte: new Date(5481069690711179265) } }
bytes:20 nreturned:0 3112ms

I can clear the logs, do the entire test again and attach them
somewhere.. so you can see both hard crashes.. let me know..
Brian

On Jun 10, 2:02 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> What's in the log when 01 won't take back over from 02?
>

Eliot Horowitz

unread,
Jun 10, 2010, 5:19:21 PM6/10/10
to mongod...@googlegroups.com
Attaching Everything to a jira case would be ideal
Reply all
Reply to author
Forward
0 new messages