Using replica sets with mongokit

233 views
Skip to first unread message

Éric Araujo

unread,
Apr 27, 2012, 7:06:39 PM4/27/12
to mongod...@googlegroups.com
Hello,

I’ve been learning how to use MongoDB with Python for a few days in order to update some code so that reads go to a secondary replica.
After much reading and trying I have hit a wall and would like to confirm a few things to see what direction I should take. I don’t mind getting
rtfm if you give me the address of the specific friendly manual, or being redirected to the mongokit mailing list if you tell me the Mongo setup
is fine and the problem is on the Python side.

The code has many classes that subclass mongokit.Document and share a common connection for find, save and remove.  In my test
suite I start three mongod nodes, give them 30 seconds to create their files and connect to the repl set primary with this idiom:
mongokit.Connection('mongodb://127.0.0.1:27031/?replicaSet=tstest').  After configuring the repl set and waiting 30 more seconds I can see
that writes propagate to all replicas, but I can’t get reads to go to the secondaries.  From the pymongo docs I thought that I would be able to
use the same connection object and pass read_preference=SECONDARY (this supersedes slave_ok) to the find method, but that does not
work.  I get no error but the setting seems to have no effect (I check with system.profile.find; more on that later).  Is that expected?

There is also a ReplicaSetConnection in pymongo but I don’t understand when one should use Connection or ReplicaSetConnection, and
anyway I have to use mongokit.Connection which inherits from pymongo.Connection.  I tried testing with a subclass of
pymongo.ReplicaSetConnection and mongokit.Connection but quickly stopped after seeing they did not work together.

I finally read on the mongokit ML and a blog that one needs to open a second connection to a secondary in order to send reads to it.  Is
this the only way?  It would be a bit inconvenient for the code I’m working on.  Currently the Mongo configuration is one string in a config
file, i.e. a full URI to the repl set primary with the repl set name embedded, which works great for different environments (dev/stage/etc)
and automatically handles changes in the repl set config.  If I had to put the URI of a secondary server in the config files it would lose
the auto-discovery and config changes adaptation, which are great features of replica sets.  Then in the code I would have to open a
second connection in addition to the existing one, and handle failover myself.

Last, I’m not sure that my test code is right.  To check that the reads go to the secondaries, I wanted to use profiling, but again I can’t get
information just for a secondary.  If I use my main connection object with read_preference=SECONDARY I get the same info than that from
the primary, and if I try to make a second connection and get profiling info it fails with AutoReconnect('master has changed').  Is profiling the
right call for what I want to do?  Is it normal that I can’t work with two parallel connections?  I could of course change the code, run the
application on a test server and check the logs manually, but if possible I would prefer to have an automated test to make sure that reads go
to secondaries.

Versions: mongod 2.0.4, pymongo 2.1.1, mongokit 0.7.2.

Thanks in advance for any idea; I’ll check email for some time today and then on Monday.
Best regards

Bernie Hackett

unread,
Apr 27, 2012, 7:30:24 PM4/27/12
to mongod...@googlegroups.com
Distributing reads to secondaries requires the use of
ReplicaSetConnection with ReadPreference.SECONDARY.

http://api.mongodb.org/python/current/examples/replica_set.html#replicasetconnection

Looking at the MongoKit source, it doesn't look like it currently
supports ReplicaSetConnection but implementing it doesn't seem like it
would be hard. Take a look at mongokit/connection.py

https://github.com/namlook/mongokit/blob/master/mongokit/connection.py#L47

You may also want to ask on the mongokit list:

https://groups.google.com/group/mongokit
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/tPBTJQUSeXQJ.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.

Éric Araujo

unread,
Apr 30, 2012, 9:28:08 AM4/30/12
to mongod...@googlegroups.com
Hi,


On Friday, April 27, 2012 7:30:24 PM UTC-4, Bernie Hackett wrote:
Distributing reads to secondaries requires the use of
ReplicaSetConnection with ReadPreference.SECONDARY.

http://api.mongodb.org/python/current/examples/replica_set.html#replicasetconnection

Thanks for clarifying that.  I did have a good hard look at the doc for read_preference but it was not enough.

Looking at the MongoKit source, it doesn't look like it currently
supports ReplicaSetConnection but implementing it doesn't seem like it
would be hard. Take a look at mongokit/connection.py

I will take this to the mongokit list.

I’d be grateful for feedback on my other queries: Is it right to use the profiling collection to check that a secondary got the read requests instead of the primary?  Is it normal that I can’t do a direct connection to a secondary node?

Cheers

Bernie Hackett

unread,
Apr 30, 2012, 7:52:09 PM4/30/12
to mongod...@googlegroups.com
> Is it right to use the profiling collection to check that a secondary got the read requests instead of the primary?

Assuming MongoKit adds support for ReplicaSetConnection, that should
work fine. You'll have to enable the profiler on your secondary(s).

> Is it normal that I can’t do a direct connection to a secondary node?

You should be able to:

>>> from mongokit.connection import Connection
>>> c = Connection(port=27018)
>>> c.admin.command('ismaster')['ismaster']
False
>>> c.admin.command('ismaster')['secondary']
True
>>> from pymongo import ReadPreference
>>> c.foo.foo.find(read_preference=ReadPreference.SECONDARY)
<mongokit.cursor.Cursor object at 0x7f65feb018d0>
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/LxVWX90Mos8J.

Éric Araujo

unread,
May 1, 2012, 6:54:49 PM5/1/12
to mongod...@googlegroups.com
Hello,


> Is it right to use the profiling collection to check that a secondary got the read requests instead of the primary?
Assuming MongoKit adds support for ReplicaSetConnection, that should
work fine.
 
Using two connections is the way to do it now.  There is an open bug requesting addition of RSC to mongokit but it has no replies.
 
You'll have to enable the profiler on your secondary(s).

That was a useful bit of info.  I’m still getting to understand how replicas work; it makes sense that I have to enable profiling on the secondary, and that I must call drop_collection on the primary only.

> Is it normal that I can’t do a direct connection to a secondary node?
You should be able to: [snip]

With trial and error I managed to find the right way to create the connection objects to do my requests.  I query the system.profile collection on each node to see where the writes and reads ended up and it finally works.  My thanks!

Cheers

Éric Araujo

unread,
May 10, 2012, 6:15:52 PM5/10/12
to mongod...@googlegroups.com
Hi,


> Is it right to use the profiling collection to check that a secondary got the read requests instead of the primary?
Assuming MongoKit adds support for ReplicaSetConnection, that should
work fine.

I ended up writing a ReplicaSetConnection class for mongokit and it works fine.  The last thing I need to fix is related
to test cleanup: I want to delete all profiling records after a test, but I can’t manage to do that.  I loop over collection
names and call drop_connection on my ReplicaSetConnection object, which causes the primary and secondary nodes
to delete the collections, except for system.profile: on the secondaries, this collection persists between two runs.  I
connect to each secondary with pymongo.Connection and call db.system.profile.remove (I can’t call drop_collection on
a secondary, there is an error if I try) but it has no effect.  Any idea?

Cheers
Reply all
Reply to author
Forward
0 new messages