mongo + ruby slow after node is disconnected from replica set

55 views
Skip to first unread message

Brian Takita

unread,
Mar 7, 2011, 4:49:23 PM3/7/11
to mongod...@googlegroups.com
Hello,

We have a replica set, which performs well until a node (master or
slave) is disconnected. If that node remains in the replica set
configuration, data access to mongo becomes painfully slow.
Has anybody else encountered this issue? If yes, what is the recommend
resolution?

Thank you,
Brian

Brian Takita

unread,
Mar 7, 2011, 4:52:11 PM3/7/11
to mongod...@googlegroups.com
Here is my configuration when in the slow state (after I shut down one
of the mongo dbs).

> rs.status()
{
"set" : "tcsf",
"date" : "Mon Mar 07 2011 21:50:50 GMT+0000 (UTC)",
"myState" : 1,
"members" : [
{
"_id" : 2,
"name" : "...",
"health" : 1,
"state" : 1,
"self" : true
},
{
"_id" : 3,
"name" : "...",
"health" : 0,
"state" : 2,
"uptime" : 0,
"lastHeartbeat" : "Mon Mar 07 2011 21:50:47 GMT+0000 (UTC)",
"errmsg" : "connect/transport error"
}
],
"ok" : 1

Kyle Banker

unread,
Mar 7, 2011, 5:01:47 PM3/7/11
to mongod...@googlegroups.com
Can provide more details? How much slower? Are you seeing any errors?
What version of the driver? What version of MongoDB?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Brian Takita

unread,
Mar 7, 2011, 5:32:25 PM3/7/11
to mongod...@googlegroups.com, Kyle Banker
no :-)

I was seeing ~ 5 seconds per query vs < 100ms

I tried to reproduce the issue and I'm getting a different error:

Read error: #<Mongo::ConnectionFailure: Failed to connect any given host:port>
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/mongo-1.2.0/lib/../lib/mongo/repl_set_connection.rb:121:in
`connect'
/data/honk/releases/20110301211724/vendor/gems/mongo-reconnect-0.0.1/lib/mongo_reconnect.rb:9:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/rails-2.3.10/lib/rails/rack/static.rb:31:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/rack-1.1.0/lib/rack/urlmap.rb:47:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/rack-1.1.0/lib/rack/urlmap.rb:41:in
`each'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/rack-1.1.0/lib/rack/urlmap.rb:41:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/rails-2.3.10/lib/rails/rack/log_tailer.rb:17:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:519:in
`process_client'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:594:in
`worker_loop'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:592:in
`each'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:592:in
`worker_loop'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/honkster-newrelic_rpm-2.13.1/lib/new_relic/control/../agent/instrumentation/unicorn_instrumentation.rb:7:in
`call'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/honkster-newrelic_rpm-2.13.1/lib/new_relic/control/../agent/instrumentation/unicorn_instrumentation.rb:7:in
`worker_loop'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:482:in
`spawn_missing_workers'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:479:in
`fork'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:479:in
`spawn_missing_workers'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:475:in
`each'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:475:in
`spawn_missing_workers'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:489:in
`maintain_worker_count'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn/http_server.rb:163:in
`start'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/lib/unicorn.rb:13:in
`run'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/gems/unicorn-3.0.0/bin/unicorn_rails:208
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/bin/unicorn_rails:19:in `load'
/usr/local/rvm/gems/ree-1.8.7-2010.02@honk/bin/unicorn_rails:19

It seems like automatic fail over is not working at all now :-(

I didn't see any errors when access was slow.

Mongodb v1.6.5
Ruby mongo driver v1.2.0

Thanks,
Brian

Kyle Banker

unread,
Mar 7, 2011, 8:09:01 PM3/7/11
to mongod...@googlegroups.com, Brian Takita
Brian,

Quick question: what's are you expecting the driver to do when the
replica set fails over? Do you expect no ConnectionFailure exceptions
at all? The driver doesn't provide that functionality.

See these docs:
http://api.mongodb.org/ruby/current/file.REPLICA_SETS.html

Also, I have an FAQ in the docs that explain the reasoning on this:
http://api.mongodb.org/ruby/current/file.FAQ.html#I_periodically_see_connection_failures_between_the_driver_and_MongoDB._Why_can't_the_driver_retry_the_operation_automatically_

I _think_ the issue here is that we have different expectations for
how the driver is supposed to react. All the driver does is attempt to
connect on the subsequent request using everything it knows about the
replica set.

Kyle

Brian Takita

unread,
Mar 8, 2011, 7:07:44 PM3/8/11
to Kyle Banker, mongod...@googlegroups.com
I expect a retry on the next node, with a ConnectionFailure when all
nodes fail. Also, I'd like this behavior on all of my calls by
default.
I guess I'll monkey-patch it for now :-(

It would be nice if there were a supported way to override the default
behavior, which is to raise a ConnectionFailure when any node fails.

Kyle Banker

unread,
Mar 8, 2011, 10:26:56 PM3/8/11
to mongod...@googlegroups.com
If a genuine replica set failover is in progress, then the next node
won't be available for anywhere between 2 and 60 seconds. There's no
generic, ideal implementation of this on the driver side. The right
course of action is application-specific.

More on this:
http://groups.google.com/group/mongodb-user/msg/7038a88b0f005413

Brian Takita

unread,
Mar 9, 2011, 5:14:25 PM3/9/11
to mongod...@googlegroups.com, Kyle Banker
Ok, that makes sense. I guess, the behavior that is strange to me is
the fact that it never works if one of the nodes is having issues,
even after 60 seconds.

I'll do the monkey patch for now, but I bet there can be a
configuration method that takes a lambda (or block), to handle the
connection error logic.

Would such a patch be welcome?

Brian Takita

unread,
Mar 11, 2011, 8:49:51 PM3/11/11
to mongod...@googlegroups.com, Kyle Banker
It turns out that I was running a two member replica set without an
arbiter. If one of the nodes go down, then there is no elected master
(2 votes are required to elect a master) and the client fails.

I fixed this by adding another replica set node.

Kyle Banker

unread,
Mar 11, 2011, 10:21:41 PM3/11/11
to mongodb-user
Ah, that would do it. Thanks for reporting back.

Brian Takita

unread,
Mar 14, 2011, 10:07:44 PM3/14/11
to mongod...@googlegroups.com, Kyle Banker
Ok, it turns out the slowness was due to a piece of Rack Middleware
that I made that reconnected during *every single* request.

It is completely unnecessary, as the mongo driver seems to attempt to
reconnect if it is not connected.

Here's more the gory details:
http://jira.mongodb.org/browse/RUBY-248

Kyle, thanks for your help,
Brian

Kyle Banker

unread,
Mar 15, 2011, 11:28:30 AM3/15/11
to Brian Takita, mongod...@googlegroups.com
Great. Glad to see it resolved!
Reply all
Reply to author
Forward
0 new messages