Removing a Replica Set node

124 views
Skip to first unread message

Damon C

unread,
Dec 14, 2011, 6:54:29 PM12/14/11
to mongod...@googlegroups.com
Hi all,

I just wanted to get a little bit of clarity around removing a replica set primary node given some behavior I saw when I tried to do so recently. Here are the steps I took on mongo 1.8.x:

1. Request primary step down: db.adminCommand( { replSetStepDown : 60 } )
2. Wait for secondary to assume primary role
3. Remove old primary node from replica set: db.remove("hostname:27018")

Now at this point, I started getting "WriteBackListener exception : socket exception" exceptions in my mongos log. I assume this is from attempting to connect to the old host, which is now down. Eventually, those stopped. So I fired everything back up and it was going fine until I tried to do a slave_ok read. Attempting to do that read, whether from the ruby driver or direct from a mongo shell, I would receive a "not master or secondary, can't read" error. I tried flushing the router config, but that didn't appear to help. At this point, I reconfigured all clients *not* to use slave_ok and point to the primary and all operations worked perfectly fine.

After coming back to the problem several hours later, I was suddenly able to do slave_ok queries without a problem.

Can somebody explain what's happening here? We process real-time data and being unable to use slave_ok queries for some indeterminate period of time after removing a replica set node is a little unsettling.

Thanks,

Damon

Richard Kreuter

unread,
Dec 14, 2011, 8:05:20 PM12/14/11
to mongodb-user
Did you leave the old primary online for a while after removing it
from the set?

Damon C

unread,
Dec 14, 2011, 8:47:00 PM12/14/11
to mongod...@googlegroups.com
It looks like that may have been what happened. 

rs.remove shut the database down:
Tue Dec 13 22:23:49 dbexit: removed from replica set
Tue Dec 13 22:23:49 [rs Manager] shutdown: going to close listening sockets...

But it looks like it actually started back up shortly after that and that's where the assertions were coming from:
Tue Dec 13 22:53:47 [conn9] assertion 13436 not master or secondary, can't read ns:db_name.collection_name query:{ _id: "xxx" }
Tue Dec 13 22:53:47 [conn9]  ntoskip:0 ntoreturn:-1

That makes a little more sense, but still odd that mongos would be sending queries to that system...

Nat

unread,
Dec 14, 2011, 8:57:34 PM12/14/11
to mongod...@googlegroups.com
Did you update your mongos config to remove that node from the replicaset seed? You should update both at the replicaset and at the shard level.
From: Damon C <d.life...@gmail.com>
Date: Wed, 14 Dec 2011 17:47:00 -0800 (PST)
Subject: [mongodb-user] Re: Removing a Replica Set node
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/GZ2F_Ef7o20J.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Damon C

unread,
Dec 15, 2011, 1:55:06 PM12/15/11
to mongod...@googlegroups.com, nat....@gmail.com
I did not, but I looked around for some official documentation on if that was necessary and how to do it and didn't have much luck.

Nothing in the mongo documentation indicates that this is a necessary step. :\

Richard Kreuter

unread,
Dec 15, 2011, 2:27:57 PM12/15/11
to mongodb-user
Damon,

We're going to make a change that should prevent the "not master or
slave" error from being possible via the mongos. See:

https://jira.mongodb.org/browse/SERVER-4501

Regards,
Richard

Reply all
Reply to author
Forward
0 new messages