Hello,
I’ve been learning how to use MongoDB with Python for a few days in order to update some code so that reads go to a secondary replica.
After much reading and trying I have hit a wall and would like to confirm a few things to see what direction I should take. I don’t mind getting
rtfm if you give me the address of the specific friendly manual, or being redirected to the mongokit mailing list if you tell me the Mongo setup
is fine and the problem is on the Python side.
The code has many classes that subclass mongokit.Document and share a common connection for find, save and remove. In my test
suite I start three mongod nodes, give them 30 seconds to create their files and connect to the repl set primary with this idiom:
mongokit.Connection('mongodb://
127.0.0.1:27031/?replicaSet=tstest'). After configuring the repl set and waiting 30 more seconds I can see
that writes propagate to all replicas, but I can’t get reads to go to the secondaries. From the pymongo docs I thought that I would be able to
use the same connection object and pass read_preference=SECONDARY (this supersedes slave_ok) to the find method, but that does not
work. I get no error but the setting seems to have no effect (I check with system.profile.find; more on that later). Is that expected?
There is also a ReplicaSetConnection in pymongo but I don’t understand when one should use Connection or ReplicaSetConnection, and
anyway I have to use mongokit.Connection which inherits from pymongo.Connection. I tried testing with a subclass of
pymongo.ReplicaSetConnection and mongokit.Connection but quickly stopped after seeing they did not work together.
I finally read on the mongokit ML and a blog that one needs to open a second connection to a secondary in order to send reads to it. Is
this the only way? It would be a bit inconvenient for the code I’m working on. Currently the Mongo configuration is one string in a config
file, i.e. a full URI to the repl set primary with the repl set name embedded, which works great for different environments (dev/stage/etc)
and automatically handles changes in the repl set config. If I had to put the URI of a secondary server in the config files it would lose
the auto-discovery and config changes adaptation, which are great features of replica sets. Then in the code I would have to open a
second connection in addition to the existing one, and handle failover myself.
Last, I’m not sure that my test code is right. To check that the reads go to the secondaries, I wanted to use profiling, but again I can’t get
information just for a secondary. If I use my main connection object with read_preference=SECONDARY I get the same info than that from
the primary, and if I try to make a second connection and get profiling info it fails with AutoReconnect('master has changed'). Is profiling the
right call for what I want to do? Is it normal that I can’t work with two parallel connections? I could of course change the code, run the
application on a test server and check the logs manually, but if possible I would prefer to have an automated test to make sure that reads go
to secondaries.
Versions: mongod 2.0.4, pymongo 2.1.1, mongokit 0.7.2.
Thanks in advance for any idea; I’ll check email for some time today and then on Monday.
Best regards