Errors after replica master change

Michael

unread,

May 17, 2011, 2:23:59 AM5/17/11

to mongodb-user

Hello.

I run a replica set with 2 servers + 1 arbiter on mongodb 1.8.1

Today i had a network split, after which replica set elected another
primary. After this event i started to see in logs of former primary
records like this:

Tue May 17 05:24:16 [ReplSetHealthPollTask] ERROR:
MessagingPort::call() wrong id got:378e5614 expect:378e563c
toSend op: 2004
response msgid:1514574491
response len: 141
response op: 1

Tue May 17 05:30:49 [ReplSetHealthPollTask] Assertion failure false
util/message.cpp 512
Tue May 17 05:30:51 [ReplSetHealthPollTask] ERROR:
MessagingPort::call() wrong id got:378e65d8 expect:378e65dd
toSend op: 2004
response msgid:213976870
response len: 141
response op: 1

After network connection was restored, i tried to switch the primary
back (with rs.stepDown()), and found, that we lost about 10 minutes of
data, while another server was acting like primary. Is is expected or
a bug?

Alvin Richards

unread,

May 17, 2011, 2:43:49 AM5/17/11

to mongodb-user

https://jira.mongodb.org/browse/SERVER-2933

Do you have the logs available for the primary that took over and the
primary that was stepped down?

-Alvin

Michael

unread,

May 17, 2011, 2:54:58 AM5/17/11

to mongodb-user

Yes, where can i send privately them? There are ip addresses there,
which i would like not to expose.

Dhruva Sagar

unread,

May 17, 2011, 2:57:47 AM5/17/11

to mongod...@googlegroups.com

Hi Michael,

Just a tip : You should remove any private / identifiable information from the logs before sending to anybody.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

--

Thanks & Regards,

Dhruva Sagar

----------------------------

Technical Developer - Mentor,

Artha42 Innovations Pvt. Ltd.

Become an expert in Rails. Join our 3 day Rails workshop and learn Ruby, Rails 3, Cucumber and Git.

http://www.railspundit.com

Michael

unread,

May 17, 2011, 3:48:28 AM5/17/11

to mongodb-user

I found 10 minutes of lost data, while second server was a primary in
rollback folder. Will this happen every time, when primary is changed
due to network split? Or is there a misconfiguration on my part?

Eliot Horowitz

unread,

May 17, 2011, 8:22:43 AM5/17/11

to mongod...@googlegroups.com

This probably means a replica wasn't keeping up.
To ensure this doesn't happen, you can call getLastError (or safe
mode) with w=2.
On writes, that wil wait for a secondary to acknowledge the write
before continuing, guaranteeing this won't happen.

Michael

unread,

May 17, 2011, 9:37:16 AM5/17/11

to mongodb-user

Writes would fail with w=2, because former primary was not accessible.
The situation was:

1) Connection server1(primary) -> server2 (secondary) is lost
2) server2 was voted to be primary (with the help of arbiter)
3) Connection was restored, and i did a rs.stepDown() on server2
4) All data while server2 was primary was lost.

How can i deal with such situation? Should i have more servers?

Eliot Horowitz

unread,

May 17, 2011, 10:13:44 AM5/17/11

to mongod...@googlegroups.com

I see.
The semantics of stepdown in 1.8 imply you know the data is synced.
For 2.0, we've changed it so you can't do that without adding a
"force" option saying you know things are out of sync and are ok with
it.

Michael

unread,

May 17, 2011, 1:03:11 PM5/17/11

to mongodb-user

What is the correct procedure before stepDown in such case?

Eliot Horowitz

unread,

May 17, 2011, 1:35:22 PM5/17/11

to mongod...@googlegroups.com

Looking at rs.status() or db.printReplicationInfo() to make sure there
is a secondary that is caught up.

Reply all

Reply to author

Forward