Removing a Replica Set node

Damon C

unread,

Dec 14, 2011, 6:54:29 PM12/14/11

to mongod...@googlegroups.com

Hi all,

I just wanted to get a little bit of clarity around removing a replica set primary node given some behavior I saw when I tried to do so recently. Here are the steps I took on mongo 1.8.x:

1. Request primary step down: db.adminCommand( { replSetStepDown : 60 } )

2. Wait for secondary to assume primary role

3. Remove old primary node from replica set: db.remove("hostname:27018")

Now at this point, I started getting "WriteBackListener exception : socket exception" exceptions in my mongos log. I assume this is from attempting to connect to the old host, which is now down. Eventually, those stopped. So I fired everything back up and it was going fine until I tried to do a slave_ok read. Attempting to do that read, whether from the ruby driver or direct from a mongo shell, I would receive a "not master or secondary, can't read" error. I tried flushing the router config, but that didn't appear to help. At this point, I reconfigured all clients *not* to use slave_ok and point to the primary and all operations worked perfectly fine.

After coming back to the problem several hours later, I was suddenly able to do slave_ok queries without a problem.

Can somebody explain what's happening here? We process real-time data and being unable to use slave_ok queries for some indeterminate period of time after removing a replica set node is a little unsettling.

Thanks,

Damon

Richard Kreuter

unread,

Dec 14, 2011, 8:05:20 PM12/14/11

to mongodb-user

Did you leave the old primary online for a while after removing it
from the set?

Damon C

unread,

Dec 14, 2011, 8:47:00 PM12/14/11

to mongod...@googlegroups.com

It looks like that may have been what happened.

rs.remove shut the database down:

Tue Dec 13 22:23:49 dbexit: removed from replica set

Tue Dec 13 22:23:49 [rs Manager] shutdown: going to close listening sockets...

But it looks like it actually started back up shortly after that and that's where the assertions were coming from:

Tue Dec 13 22:53:47 [conn9] assertion 13436 not master or secondary, can't read ns:db_name.collection_name query:{ _id: "xxx" }

Tue Dec 13 22:53:47 [conn9] ntoskip:0 ntoreturn:-1

That makes a little more sense, but still odd that mongos would be sending queries to that system...

Nat

unread,

Dec 14, 2011, 8:57:34 PM12/14/11

to mongod...@googlegroups.com

Did you update your mongos config to remove that node from the replicaset seed? You should update both at the replicaset and at the shard level.

From: Damon C <d.life...@gmail.com>

Sender: mongod...@googlegroups.com

Date: Wed, 14 Dec 2011 17:47:00 -0800 (PST)

To: <mongod...@googlegroups.com>

ReplyTo: mongod...@googlegroups.com

Subject: [mongodb-user] Re: Removing a Replica Set node

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/GZ2F_Ef7o20J.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Damon C

unread,

Dec 15, 2011, 1:55:06 PM12/15/11

to mongod...@googlegroups.com, nat....@gmail.com

I did not, but I looked around for some official documentation on if that was necessary and how to do it and didn't have much luck.

Nothing in the mongo documentation indicates that this is a necessary step. :\

Richard Kreuter

unread,

Dec 15, 2011, 2:27:57 PM12/15/11

to mongodb-user

Damon,

We're going to make a change that should prevent the "not master or
slave" error from being possible via the mongos. See: