Force replica set reconfig from pymongo?

754 views
Skip to first unread message

David Paulsen

unread,
Mar 10, 2015, 1:50:44 PM3/10/15
to mongod...@googlegroups.com
New to mongodb, investigating a 2-node (no arbiters) replica set reconfiguration when one node goes down (panic, power out, etc.)  I'm starting with MongoDB 2.6.8 - may consider 3.0 real soon now.

From the mongo client, one can forcefully remove a replica set member according to the procedure described in
http://docs.mongodb.org/manual/tutorial/remove-replica-set-member/

For example, say the secondary crashes, panics, whatever, one can do this:
rs100:PRIMARY>
2015-03-10T03:29:17.271-0700 DBClientCursor::init call() failed
2015-03-10T03:29:17.273-0700 trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed
2015-03-10T03:29:17.274-0700 reconnect 127.0.0.1:27017 (127.0.0.1) ok
rs100:SECONDARY> cfg = rs.conf()
{
        "_id" : "rs100",
        "version" : 1039759,
        "members" : [
                {
                        "_id" : 13,
                        "host" : "sasp-dev-100:27017"
                },
                {
                        "_id" : 14,
                        "host" : "sasp-dev-101:27017"
                }
        ]
}
rs100:SECONDARY> cfg.members = [cfg.members[0]]
[ { "_id" : 13, "host" : "sasp-dev-100:27017" } ]
rs100:SECONDARY> rs.reconfig(cfg, {force : true})
{ "ok" : 1 }
rs100:SECONDARY> rs.conf()
{
        "_id" : "rs100",
        "version" : 1118210,
        "members" : [
                {
                        "_id" : 13,
                        "host" : "sasp-dev-100:27017"
                }
        ]
}
rs100:PRIMARY>

Later on, when the secondary is repaired and back online, I can go ahead and do rs.add(
"sasp-dev-101:27017") and we're back to normal operating state.

Picked up a hint about doing this from pymongo in the post
https://groups.google.com/forum/#!topic/mongodb-user/BpPc9nlS6nY

And I hacked up a little test script:
#!/usr/local/bin/python3.4
import sys
import pymongo
import pprint as pp

if len(sys.argv) < 2:
    print('Specify replica set host to remove: {0} "host:port"'.format(sys.argv[0]))
    sys.exit(1)

client = pymongo.MongoClient()

cfgDict = client.local.system.replset.find_one()

pp.pprint(cfgDict)

ndx = 0
for mem in cfgDict['members']:
    if mem.get('host') == sys.argv[1]:
        print("FOUND MATCH ... RECONFIGURING!")
        del cfgDict['members'][ndx]
        cfgDict['version'] = cfgDict['version'] + 1
        print("New Replica Set Config:")
        pp.pprint(cfgDict)
        input("HIT ENTER to try it :) ")
        try:
            client.admin.command({'replSetReconfig': cfgDict}, {'force': True})
        except pymongo.errors.ConnectionFailure:
            pass
        break
    ndx = ndx + 1
print("Done...")


What seems to be lacking is the effect of "{force: true}" in the replSetReconfig command issued to the admin db - this little script works fine as long as the secondary is up, but if the secondary is down,  the primary has degraded into a secondary state, it will raise an OperationFailure, output from the script being:
New Replica Set Config:
{'_id': 'rs100',
 'members': [{'_id': 13, 'host': 'sasp-dev-100:27017'}],
 'version': 1118214}
HIT ENTER to try it :)
Traceback (most recent call last):
  File "./rm2.py", line 28, in <module>
    client.admin.command({'replSetReconfig': cfgDict}, {'force': True})
  File "/usr/local/lib/python3.4/site-packages/pymongo/database.py", line 439, in command
    uuid_subtype, compile_re, **kwargs)[0]
  File "/usr/local/lib/python3.4/site-packages/pymongo/database.py", line 345, in _command
    msg, allowable_errors)
  File "/usr/local/lib/python3.4/site-packages/pymongo/helpers.py", line 182, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: command {'replSetReconfig': {'version': 1118214, 'members': [{'host': 'sasp-dev-100:27017', '_id': 13}], '_id': 'rs100'}} on namespace admin.$cmd failed: replSetReconfig command must be sent to the current replica set primary.


My gut this morning says that on line 28, in the client.admin.command(...) I'm not getting the same 'force:true' effect that I can get in the mongo shell, that I'm not passing it through properly or that the pymongo command method won't do what I'm wanting.  Going to pdb and step through this, but any insight is welcome!


Will Berkeley

unread,
Mar 10, 2015, 2:10:03 PM3/10/15
to mongod...@googlegroups.com
Wait, why are you ejecting a replica set member when it goes down? You should allow it to come back up (or intervene to bring it back up) and wait for it reconnect to the set and recover.

-Will

David Paulsen

unread,
Mar 10, 2015, 2:22:15 PM3/10/15
to mongod...@googlegroups.com
Yes, I may be off in the weeds here.  What I really want is for the (still living) primary to be able to perform updates (inserts, whatever) while the secondary is recovering, and I'm assuming an outage could be anywhere from minutes to hours to days, understanding that we're vulnerable as long as we're in this degraded state.  Forcefully evicting the non-functioning member seemed to be a way to get that behavior.  (Again, I'm living with a limit of a 2-node system ... if I had a 3rd available, arbiters could, I realize, make life wonderful.)

Will Berkeley

unread,
Mar 10, 2015, 2:39:15 PM3/10/15
to mongod...@googlegroups.com
Ah, I misunderstood what you meant when you said "2-node (no arbiters)", somehow. You really want to run with at lest three nodes - otherwise, you don't benefit from the automated failover. Can you colocate an arbiter with a data-bearing node? Obviously, there are still problems if the machine holding both processes goes down, but it's better than a 2 member replica set. If you can't have enough nodes for a fault-tolerant system, why are you running a replica set at all? To try to keep a copy of the data?

-Will

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c40caada-71f4-42c9-a31c-0a59946de7e8%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Paulsen

unread,
Mar 10, 2015, 2:47:12 PM3/10/15
to mongod...@googlegroups.com
I'd tried out adding arbiters on each node, but saw complaints in the log to the effect, "I can't elect myself..."  At any rate, reading through the excellent docstrings in pymongo (pymongo/database.py method command) I sorted out the ability to pass force=True as a kwarg and get the wanted effect
client.admin.command({'replSetReconfig': cfgDict}, force=True)

LOTS of testing to see if this will play out.

Will Berkeley

unread,
Mar 10, 2015, 3:06:18 PM3/10/15
to mongod...@googlegroups.com
You would just add 1 arbiter. You want an odd number of members/votes in a replica set.

-Will

David Paulsen

unread,
Mar 11, 2015, 5:32:35 PM3/11/15
to mongod...@googlegroups.com
In practice, my little snippet here proves to be unreliable.

client.admin.command({'replSetReconfig': cfgDict}, force=True)

Sometimes it works, other times I'll end up getting stuff like:
'errmsg': 'no such cmd: force', 'code': 59

Something to do with the way pymongo is constructing a SON object...

If it gets to a point where in database.py def _command: command is:
SON([('replSetReconfig', {'version': 2668533, 'members': [{'host': 'sasp-dev-100:27017', '_id': 15}], '_id': 'rs100'}), ('force', True)])
this will work.

However sometimes (now that I want to post about it, I can't reproduce it!) I've seen errors such as:
{'ok': 0.0, 'bad cmd': {'force': True, 'replSetReconfig': {'_id': 'rs100', 'members': [{'_id': 15, 'host': 'sasp-dev-100:27017'}], 'version': 2668533}}, 'errmsg': 'no such cmd: force', 'code': 59}
... which leads me to believe that the lower levels of pymongo are seeing the first dict key as "force", an unknown command, as opposed to seeing the first dict key as "replSetReconfig". 

Bernie Hackett

unread,
May 27, 2015, 3:48:42 PM5/27/15
to mongod...@googlegroups.com, djp...@gmail.com
The reason you are having this problem is that you are passing a python dict to command. The dict type does not preserve order. The API of Database.command is meant to work around this problem. This should be how you call it:

client.admin.command('replSetReconfig', cfgDict, force=True)

The driver will create a SON under the covers. bson.son.SON is very similar to collections.OrderedDict, but works back to python 2.4.

David Paulsen

unread,
May 28, 2015, 3:11:26 PM5/28/15
to mongod...@googlegroups.com
Thanks Bernie!  Looking back at my code, that's exactly what I ended up doing:
self.client.admin.command('replSetReconfig', cfgDict, force=True)
Though it's still an open issue whether we're going to take this approach at all.
Reply all
Reply to author
Forward
0 new messages