Re: Does the Ruby driver not reestablish connections?

Message has been deleted

roger

unread,

Feb 11, 2011, 2:57:29 PM2/11/11

to mongodb-user

Like other drivers, you'll have to catch the error and reconnect.
Reconnecting will be succesful when the new driver is selected.
-Roger

On Feb 11, 11:41 am, tsxn <tsx...@gmail.com> wrote:
> I am running a replicaset setup with a master, secondary, and arbiter
> with reads to secondary enabled using a ReplSetConnection. My master
> server went down and was started back up as a secondary. Now my reads
> are producting Mongo::ConnectionFailure exceptions. Does the Ruby
> driver not reestablish connections to restarted MongoDB servers?
>
> Here's the stack trace:
>
> Operation failed with the following exception: Broken pipe - send(2)
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:746:in `send_message_on_socket'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:418:in `receive_message'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:417:in `synchronize'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:417:in `receive_message'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:382:in `send_initial_query'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:348:in `refresh'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:72:in `next_document'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/collection.rb:230:in `find_one'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:62:in `find_one'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:79:in `first'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/ign-
> mongo_mapper-0.8.6.2/lib/mongo_mapper/plugins/querying/decorator.rb:
> 29:in `first'
> /home/jetty/.rvm/gems/ree-1.8.7-2010...@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:68:in `find'

Kyle Banker

unread,

Feb 11, 2011, 2:57:47 PM2/11/11

to mongod...@googlegroups.com

It does try to reconnect. However, you'll get connection failures
until a new primary is elected. If you want seamless failover, you'll
need to catch these exceptions until a connection can be
reestablished. See these docs:

http://api.mongodb.org/ruby/current/file.REPLICA_SETS.html#Recovery

On Fri, Feb 11, 2011 at 2:41 PM, tsxn <tsx...@gmail.com> wrote:
> I am running a replicaset setup with a master, secondary, and arbiter
> with reads to secondary enabled using a ReplSetConnection. My master
> server went down and was started back up as a secondary. Now my reads
> are producting Mongo::ConnectionFailure exceptions. Does the Ruby
> driver not reestablish connections to restarted MongoDB servers?
>
> Here's the stack trace:
>
> Operation failed with the following exception: Broken pipe - send(2)

> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:746:in `send_message_on_socket'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:418:in `receive_message'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:417:in `synchronize'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/connection.rb:417:in `receive_message'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:382:in `send_initial_query'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:348:in `refresh'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/cursor.rb:72:in `next_document'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/mongo-1.2.0/
> lib/../lib/mongo/collection.rb:230:in `find_one'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:62:in `find_one'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:79:in `first'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/ign-
> mongo_mapper-0.8.6.2/lib/mongo_mapper/plugins/querying/decorator.rb:
> 29:in `first'
> /home/jetty/.rvm/gems/ree-1.8....@api.ign.com/gems/plucky-0.3.6/
> lib/plucky/query.rb:68:in `find'
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

roger

unread,

Feb 11, 2011, 2:58:03 PM2/11/11

to mongodb-user

when the new primary is selected i mean :-)

tsxn

unread,

Feb 11, 2011, 3:06:44 PM2/11/11

to mongodb-user

I am receiving those exceptions long after the primary has been
selected. There isn't a network problem seeing as I can connect to
both the new primary and secondary with the mongo shell on the server
hosting the application.

Kyle Banker

unread,

Feb 11, 2011, 3:18:03 PM2/11/11

to mongod...@googlegroups.com

The broken pipe exception could be almost anything. Are you having
network connectivity issues?

The driver definitely reconnects after failures. If you're resetting
your app, you do need to make sure you've specified seed nodes. If you
can provide a reproducible test case, that'd be very helpful.

Tobias Schlottke

unread,

Feb 15, 2011, 12:37:04 PM2/15/11

to mongodb-user

I'm having the same problem:
One of my servers (in a replica set) was restarted today.
Some nodes have not reconnected.
They're dyning with the following backtrace:

[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:76:in `rescue in
checkout_new_socket'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:72:in
`checkout_new_socket'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:110:in `block (2
levels) in checkout':
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:106:in `block in
checkout'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:99:in `loop'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/util/pool.rb:99:in `checkout'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/repl_set_connection.rb:276:in
`checkout_reader'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/connection.rb:414:in
`receive_message'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/cursor.rb:382:in
`send_initial_query'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/cursor.rb:348:in `refresh'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/cursor.rb:72:in `next_document'
[GEM_ROOT]/gems/mongo-1.2.0/lib/mongo/collection.rb:230:in `find_one'
[GEM_ROOT]/gems/mongoid-2.0.0.rc.6/lib/mongoid/collections/master.rb:
15:in `block (2 levels) in <class:Master>'
[GEM_ROOT]/gems/mongoid-2.0.0.rc.6/lib/mongoid/collection.rb:70:in
`find_one'

Any Iideas?

Best,

Tobias

Kyle Banker

unread,

Feb 15, 2011, 12:48:16 PM2/15/11

to mongod...@googlegroups.com

Can you provide more information?

What's the current status of the replica set?

How are you connecting to the replica set?

Chuck Remes

unread,

Feb 15, 2011, 1:38:18 PM2/15/11

to mongod...@googlegroups.com

I added a line after:

https://github.com/mongodb/mongo-ruby-driver/blob/master/lib/mongo/util/pool.rb#L77

I put in:

socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_REUSEADDR, true)

That seems to cure some issues with reconnection. I haven't submitted it as a patch because I can't seem to produce a failure case on a consistent basis. If you have code that can do so, that would be really helpful.

If this seems reasonable, then we probably want to add a similar line in other source files where a socket is allocated.

cr

Kyle Banker

unread,

Feb 15, 2011, 3:32:05 PM2/15/11

to mongod...@googlegroups.com

Those of you seeing problems reconnecting: which version of MongoDB
are you running?

tsxn

unread,

Feb 15, 2011, 5:03:30 PM2/15/11

to mongodb-user

I am running the 1.6.5 Linux 64-bit version. I have not had the
chance to attempt to reproduce the issue in a non-production
environment.

Kyle Banker

unread,

Feb 15, 2011, 5:12:48 PM2/15/11

to mongod...@googlegroups.com

Just pushed 1.2.2, which contains a minor replica set failover fix
that my help you guys out.

All the tests I use to verify that replica set failover works live here:
https://github.com/mongodb/mongo-ruby-driver/tree/master/test/replica_sets

Feel free to run them in your environments with the following Rake task:
rake test:rs

@Chuck. Thanks for the note about Socket::SO_REUSEADDR. My first
thought is that this would be unnecessary given that all sockets are
closed on any connection failure.

Again, if anyone can find any reproducible scenarios, or simply
provide more details, for when the driver fails to reconnect, that'd
be much appreciated.

Tobias Schlottke

unread,

Feb 16, 2011, 1:28:37 AM2/16/11

to mongodb-user

Yes I am connecting to a four node replica set using mongoid.
After restarting the process everything went fine again.
All nodes are up and running.
The only problem is that the client did not reconnect after one node
went down for a while.
Any ideas? Any information you need?

Best,

Tobias

On 15 Feb., 18:48, Kyle Banker <k...@10gen.com> wrote:
> Can you provide more information?
>
> What's the current status of the replica set?
>
> How are you connecting to the replica set?
>

Kyle Banker

unread,

Feb 16, 2011, 9:29:02 AM2/16/11

to mongod...@googlegroups.com

Okay. I believe I fixed that in the latest driver release. Can you test?

Reid

unread,

Feb 16, 2011, 10:02:00 AM2/16/11

to mongodb-user

I have been looking into automated failover when the primary server in
a replica set fails. The server side behavior is exceptional with
failover usually occurring in well under 3 seconds.

I am driving the adoption of MongoDB within our organization and would
like to not have to make programmers wrap all the calls as follows:

# Wrapping a call to #count()
rescue_connection_failure do
@db.collection('users').count()
end

After looking at the Ruby MongoDB driver code it appears that we could
handle this transparently in the following methods:
Connection::send_message
Connection::send_message_with_safe_check
Connection::receive_message

I put together a new Connection class that is derived from
ReplSetConnection to implement this behavior.

After some manual testing this technique appears to work very well
with no errors being reported to the application when a server fails
over in the replica set.

If this is something we would want in the core product, let me know
and I can make the changes in a Fork. Also, not sure if such a
capability would be better in Connection or only ReplSetConnection.
Would users of Connection ever want to make use of a auto-reconnection
feature? (Allow master to be restarted without impacting active
clients?)

By default this auto-reconnect feature is disabled. To enable the end-
user would supply an additional parameter to the Connection class.
This auto-reconnect behavior is based on what I saw implemented in the
HornetQ Client.

The current parameters for controlling auto-reconnect are:
:reconnect_attempts => Number of times to attempt to
reconnect. Default = 0 (No retry)
:reconnect_retry_seconds => Initial delay before retrying
:reconnect_retry_multiplier => Multiply delay by this number with
each retry to prevent overwhelming the server
:reconnect_max_retry_seconds => Maximum number of seconds to wait
before retrying again

Clearly it does not yet address every scenario, but it can be expanded
upon over time. Some I can think of:
- A failure occurs in the middle of an open cursor
- Fails to connect to a server on startup

Let me know if this is of any interest and I will submit a fork for
anyone to play with

Kyle Banker

unread,

Feb 16, 2011, 12:27:33 PM2/16/11

to mongod...@googlegroups.com

The reason we haven't built this into the driver is that there can be
a lot of uncertainty in a failover situation, and the right course of
action is always going to be application dependent.

For instance, if you're writing to the database without safe mode
enabled, and a failover occurs, then you have no idea how many recent
writes you've lost, and you may want to do more than simply retry the
previous write.

If you are running in safe mode, you still don't know for certain if
the previous write arrived so, again, retrying the write automatically
may not be best.

Thus, the question of whether to retry really depends on the operation
being performed and the needs of the application. That's why we leave
this up to the app developer. A global driver setting to automatically
retry everything suggests that this can be a good policy for a lot of
applications. But this is almost never the case, and that's why we
don't support it at the moment.

Definitely interested in hearing more thoughts on the issue, and
certainly feel free to open a JIRA for further discussion.

Kyle

Reid

unread,

Feb 16, 2011, 2:48:53 PM2/16/11

to mongodb-user

Have a look at how HornetQ Client does HA and automatic failover at
the client end:
http://hornetq.sourceforge.net/docs/hornetq-2.0.0.GA/user-manual/en/html/ha.html

The HornetQ Client takes care of everything. Of course it is not
simple and does involve an overhead since the client keeps track of
everything it is doing and is able to recreate its current state on a
failed over server. It also has to deal with un-acknowledged messages,
and that is where message loss considerations are made at an
architectural level rather than on a per developer per case basis.

We are happy with the HA choices above and understand that any
inflight messages could be lost unless safe mode is enabled and it is
written to at least 2 servers in our MongoDB replica set. I prefer
setting a global standard than relying on each developer to come up
with their own idea of HA and message loss.

For any cases where we want finer control of the behavior we will just
set :reconnect_attempts to 0 (which is the default) and handle the
complexity of failover in just those specialized scenarios.

Tobias Schlottke

unread,

Feb 17, 2011, 5:42:39 AM2/17/11

to mongodb-user

Hi Kyle,

I'll test it as soon as mongoid pushes a new version with 1.2.2
support.
We had another problem regarding Timeouts:
A server accidentally read-locked everything and all nodes that tried
to read from it hung in the read(...) call for hours.
Is this fixed aswell?
I mean: There should be some sort of default timeout?

Best,

Tobias

Kyle Banker

unread,

Feb 17, 2011, 10:00:09 AM2/17/11

to mongod...@googlegroups.com

> We are happy with the HA choices above and understand that any
> inflight messages could be lost unless safe mode is enabled and it is
> written to at least 2 servers in our MongoDB replica set. I prefer
> setting a global standard than relying on each developer to come up
> with their own idea of HA and message loss.

If these are the ideal choices for your application, then you can
easily build a thin layer atop the driver to handle them. But everyone
has to make different choices here, so we're not convinced that
building a given failover architecture into the driver makes sense.
That said, you're welcome to post a ticket to
http://jira.mongodb.org/browse/RUBY where we can continue the
discussion and allow other users to vote and comment on the issue as
well.

Kyle Banker

unread,

Feb 17, 2011, 10:08:32 AM2/17/11

to mongod...@googlegroups.com

Tobias,

There is no timeout at the moment, but I've just created a ticket for it:

http://jira.mongodb.org/browse/RUBY-236

This hasn't been implemented yet for two reasons. One is simply lack
of demand and the other is that socket timeouts aren't easy to
implement well in Ruby. In any case, I'll start looking into it and
hopefully get something added before the next release.

Kyle

Lincoln

unread,

Mar 13, 2011, 1:20:30 PM3/13/11

to mongodb-user

Reid,

I'd be very interested in seeing your connection class. I'm trying to
implement something similar to what you describe but am having some
problems getting the details right. Would you mind sharing your code
(or at least some snippets of the relevant portions)?

Thanks
-lincoln

Reply all

Reply to author

Forward