Errors after replica master change

56 views
Skip to first unread message

Michael

unread,
May 17, 2011, 2:23:59 AM5/17/11
to mongodb-user
Hello.

I run a replica set with 2 servers + 1 arbiter on mongodb 1.8.1

Today i had a network split, after which replica set elected another
primary. After this event i started to see in logs of former primary
records like this:

Tue May 17 05:24:16 [ReplSetHealthPollTask] ERROR:
MessagingPort::call() wrong id got:378e5614 expect:378e563c
toSend op: 2004
response msgid:1514574491
response len: 141
response op: 1

Tue May 17 05:30:49 [ReplSetHealthPollTask] Assertion failure false
util/message.cpp 512
Tue May 17 05:30:51 [ReplSetHealthPollTask] ERROR:
MessagingPort::call() wrong id got:378e65d8 expect:378e65dd
toSend op: 2004
response msgid:213976870
response len: 141
response op: 1

After network connection was restored, i tried to switch the primary
back (with rs.stepDown()), and found, that we lost about 10 minutes of
data, while another server was acting like primary. Is is expected or
a bug?

Alvin Richards

unread,
May 17, 2011, 2:43:49 AM5/17/11
to mongodb-user
https://jira.mongodb.org/browse/SERVER-2933

Do you have the logs available for the primary that took over and the
primary that was stepped down?

-Alvin

Michael

unread,
May 17, 2011, 2:54:58 AM5/17/11
to mongodb-user
Yes, where can i send privately them? There are ip addresses there,
which i would like not to expose.

Dhruva Sagar

unread,
May 17, 2011, 2:57:47 AM5/17/11
to mongod...@googlegroups.com
Hi Michael,

Just a tip : You should remove any private / identifiable information from the logs before sending to anybody.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.




--
Thanks & Regards,
Dhruva Sagar
----------------------------
Technical Developer - Mentor,

Become an expert in Rails. Join our 3 day Rails workshop and learn Ruby, Rails 3, Cucumber and Git.

Michael

unread,
May 17, 2011, 3:48:28 AM5/17/11
to mongodb-user
I found 10 minutes of lost data, while second server was a primary in
rollback folder. Will this happen every time, when primary is changed
due to network split? Or is there a misconfiguration on my part?

Eliot Horowitz

unread,
May 17, 2011, 8:22:43 AM5/17/11
to mongod...@googlegroups.com
This probably means a replica wasn't keeping up.
To ensure this doesn't happen, you can call getLastError (or safe
mode) with w=2.
On writes, that wil wait for a secondary to acknowledge the write
before continuing, guaranteeing this won't happen.

Michael

unread,
May 17, 2011, 9:37:16 AM5/17/11
to mongodb-user
Writes would fail with w=2, because former primary was not accessible.
The situation was:

1) Connection server1(primary) -> server2 (secondary) is lost
2) server2 was voted to be primary (with the help of arbiter)
3) Connection was restored, and i did a rs.stepDown() on server2
4) All data while server2 was primary was lost.

How can i deal with such situation? Should i have more servers?

Eliot Horowitz

unread,
May 17, 2011, 10:13:44 AM5/17/11
to mongod...@googlegroups.com
I see.
The semantics of stepdown in 1.8 imply you know the data is synced.
For 2.0, we've changed it so you can't do that without adding a
"force" option saying you know things are out of sync and are ok with
it.

Michael

unread,
May 17, 2011, 1:03:11 PM5/17/11
to mongodb-user
What is the correct procedure before stepDown in such case?

Eliot Horowitz

unread,
May 17, 2011, 1:35:22 PM5/17/11
to mongod...@googlegroups.com
Looking at rs.status() or db.printReplicationInfo() to make sure there
is a secondary that is caught up.

Reply all
Reply to author
Forward
0 new messages