I just wanted to get a little bit of clarity around removing a replica set primary node given some behavior I saw when I tried to do so recently. Here are the steps I took on mongo 1.8.x:
1. Request primary step down: db.adminCommand( { replSetStepDown : 60 } ) 2. Wait for secondary to assume primary role 3. Remove old primary node from replica set: db.remove("hostname:27018")
Now at this point, I started getting "WriteBackListener exception : socket exception" exceptions in my mongos log. I assume this is from attempting to connect to the old host, which is now down. Eventually, those stopped. So I fired everything back up and it was going fine until I tried to do a slave_ok read. Attempting to do that read, whether from the ruby driver or direct from a mongo shell, I would receive a "not master or secondary, can't read" error. I tried flushing the router config, but that didn't appear to help. At this point, I reconfigured all clients *not* to use slave_ok and point to the primary and all operations worked perfectly fine.
After coming back to the problem several hours later, I was suddenly able to do slave_ok queries without a problem.
Can somebody explain what's happening here? We process real-time data and being unable to use slave_ok queries for some indeterminate period of time after removing a replica set node is a little unsettling.
> I just wanted to get a little bit of clarity around removing a replica set > primary node given some behavior I saw when I tried to do so recently. Here > are the steps I took on mongo 1.8.x:
> 1. Request primary step down: db.adminCommand( { replSetStepDown : 60 } ) > 2. Wait for secondary to assume primary role > 3. Remove old primary node from replica set: db.remove("hostname:27018")
> Now at this point, I started getting "WriteBackListener exception : socket > exception" exceptions in my mongos log. I assume this is from attempting to > connect to the old host, which is now down. Eventually, those stopped. So I > fired everything back up and it was going fine until I tried to do a > slave_ok read. Attempting to do that read, whether from the ruby driver or > direct from a mongo shell, I would receive a "not master or secondary, > can't read" error. I tried flushing the router config, but that didn't > appear to help. At this point, I reconfigured all clients *not* to use > slave_ok and point to the primary and all operations worked perfectly fine.
> After coming back to the problem several hours later, I was suddenly able > to do slave_ok queries without a problem.
> Can somebody explain what's happening here? We process real-time data and > being unable to use slave_ok queries for some indeterminate period of time > after removing a replica set node is a little unsettling.
rs.remove shut the database down: Tue Dec 13 22:23:49 dbexit: removed from replica set Tue Dec 13 22:23:49 [rs Manager] shutdown: going to close listening sockets...
But it looks like it actually started back up shortly after that and that's where the assertions were coming from: Tue Dec 13 22:53:47 [conn9] assertion 13436 not master or secondary, can't read ns:db_name.collection_name query:{ _id: "xxx" } Tue Dec 13 22:53:47 [conn9] ntoskip:0 ntoreturn:-1
That makes a little more sense, but still odd that mongos would be sending queries to that system...
-----Original Message-----
From: Damon C <d.lifehac...@gmail.com>
Sender: mongodb-user@googlegroups.com
Date: Wed, 14 Dec 2011 17:47:00 To: <mongodb-user@googlegroups.com>
Reply-To: mongodb-user@googlegroups.com
Subject: [mongodb-user] Re: Removing a Replica Set node
It looks like that may have been what happened.
rs.remove shut the database down:
Tue Dec 13 22:23:49 dbexit: removed from replica set
Tue Dec 13 22:23:49 [rs Manager] shutdown: going to close listening sockets...
But it looks like it actually started back up shortly after that and that's where the assertions were coming from:
Tue Dec 13 22:53:47 [conn9] assertion 13436 not master or secondary, can't read ns:db_name.collection_name query:{ _id: "xxx" }
Tue Dec 13 22:53:47 [conn9] ntoskip:0 ntoreturn:-1
That makes a little more sense, but still odd that mongos would be sending queries to that system...
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/GZ2F_Ef7o20J.
To post to this group, send email to mongodb-user@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.