Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
SERVER-3062 Read from nearest slave
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Pierre Ynard  
View profile  
 More options Apr 3 2012, 10:56 am
From: Pierre Ynard <linkfa...@yahoo.fr>
Date: Tue, 3 Apr 2012 16:56:31 +0200
Local: Tues, Apr 3 2012 10:56 am
Subject: [RFC] SERVER-3062 Read from nearest slave
Hello,

This patch changes the way that mongos chooses slaves among a replica
set. It uses the ping statistics that the replica set monitor maintains
to read only from the nearest slaves. This is especially useful in
configurations where a cluster is spread across several datacenters.

As it is, this patch is probably incomplete. This significantly changes
the default behavior, so a configuration option or a protocol parameter
to enable/disable it is probably needed. The metric here checks if the
ping of a server is within 20% of the ping of the fastest server, but
that threshold could be made configurable and/or another metric could be
used altogether.

This addresses SERVER-3062. We're looking forward to seeing what
direction the implementation of this feature can take.

diff --git a/src/mongo/client/dbclient_rs.cpp b/src/mongo/client/dbclient_rs.cpp
index d4dfee2..8b11983 100644
--- a/src/mongo/client/dbclient_rs.cpp
+++ b/src/mongo/client/dbclient_rs.cpp
@@ -226,6 +226,14 @@ namespace mongo {

+    bool ReplicaSetMonitor::Node::isNearerThan( const Node &other ) const {
+        if ( pingTimeMillis >= other.pingTimeMillis )
+            return false;
+
+        // TODO: make threshold configurable
+        return ( pingTimeMillis / ( other.pingTimeMillis - pingTimeMillis ) < 5 );
+    }
+
     HostAndPort ReplicaSetMonitor::getMaster() {
         {
             scoped_lock lk( _lock );
@@ -244,6 +252,7 @@ namespace mongo {
         // make sure its valid

         bool wasFound = false;
+        bool wasTooFar = false;
         bool wasMaster = false;

         // This is always true, since checked in port()
@@ -257,7 +266,18 @@ namespace mongo {
                 wasFound = true;

                 if ( _nodes[i].okForSecondaryQueries() )
-                    return prev;
+                {
+                    // TODO: make this behavior configurable
+                    for ( unsigned ii=0; ii<_nodes.size(); ii++ ) {
+                        if ( ii != i && _nodes[ii].okForSecondaryQueries() && _nodes[ii].isNearerThan( _nodes[i] ) )
+                        {
+                            wasTooFar = true;
+                            break;
+                        }
+                    }
+                    if ( ! wasTooFar )
+                        return prev;
+                }

                 wasMaster = _nodes[i].ok && ! _nodes[i].secondary;

@@ -266,8 +286,9 @@ namespace mongo {
         }

         if( prev.host().size() ){
-            if( wasFound ){ LOG(1) << "slave '" << prev << ( wasMaster ? "' is master node, trying to find another node" :
-                                                                         "' is no longer ok to use" ) << endl; }
+            if( wasFound ){ LOG(1) << "slave '" << prev << ( wasTooFar ? "' is no longer among the nearest nodes, switching to a nearer slave" :
+                                                           ( wasMaster ? "' is master node, trying to find another node" :
+                                                                         "' is no longer ok to use" ) ) << endl; }
             else{ LOG(1) << "slave '" << prev << "' was not found in the replica set" << endl; }
         }
         else LOG(1) << "slave '" << prev << "' is not initialized or invalid" << endl;
@@ -280,14 +301,24 @@ namespace mongo {

         scoped_lock lk( _lock );

+        int slave = -1;
         for ( unsigned ii = 0; ii < _nodes.size(); ii++ ) {
-            _nextSlave = ( _nextSlave + 1 ) % _nodes.size();
-            if ( _nextSlave != _master ) {
-                if ( _nodes[ _nextSlave ].okForSecondaryQueries() )
-                    return _nodes[ _nextSlave ].addr;
-                LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[_nextSlave] << ", not currently okForSecondaryQueries" << endl;
+            int node = ( _nextSlave + 1 + ii ) % _nodes.size();
+            if ( node != _master ) {
+                if ( _nodes[ node ].okForSecondaryQueries() )
+                {
+                    // TODO: make this behavior configurable
+                    if ( slave < 0 || _nodes[ node ].isNearerThan( _nodes[ slave ] ) )
+                        slave = node;
+                }
+                else
+                    LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[node] << ", not currently okForSecondaryQueries" << endl;
             }
         }
+        if ( slave >= 0 ) {
+            _nextSlave = slave;
+            return _nodes[ _nextSlave ].addr;
+        }
         uassert(15899, str::stream() << "No suitable member found for slaveOk query in replica set: " << _name, _master >= 0 && _nodes[_master].ok);

         // Fall back to primary
diff --git a/src/mongo/client/dbclient_rs.h b/src/mongo/client/dbclient_rs.h
index e35ba96..160a4b0 100644
--- a/src/mongo/client/dbclient_rs.h
+++ b/src/mongo/client/dbclient_rs.h
@@ -167,6 +167,14 @@ namespace mongo {
                 return ok && secondary && ! hidden;
             }

+            /**
+             * This is used to establish a set of nearest nodes, using some metric; nodes to connect to
+             * can then be picked from this set. Two nodes may either be equivalent and both to be used,
+             * or one of them may be clearly nearer and should be exclusively preferred over the other.
+             * @return true if this node is nearer, false if other is nearer or if they are equivalent
+             */
+            bool isNearerThan( const Node &other ) const;
+
             BSONObj toBSON() const {
                 return BSON( "addr" << addr.toString() <<
                              "isMaster" << ismaster <<

Best regards,

--
Pierre Ynard
"Une me dans un corps, c'est comme un dessin sur une feuille de papier."


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[RFC] SERVER-3062 Read from nearest slave" by Eliot Horowitz
Eliot Horowitz  
View profile  
 More options Apr 6 2012, 10:17 am
From: Eliot Horowitz <el...@10gen.com>
Date: Fri, 6 Apr 2012 10:17:16 -0400
Local: Fri, Apr 6 2012 10:17 am
Subject: Re: [mongodb-dev] [RFC] SERVER-3062 Read from nearest slave
There is already a spec that the other drivers are using, which is
what the server will end up doing.
The spec is based highly off of the what java driver does currently
which works very well for a lot of people.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "SERVER-3062 Read from nearest slave" by Pierre Ynard
Pierre Ynard  
View profile  
 More options Apr 11 2012, 12:09 pm
From: Pierre Ynard <linkfa...@yahoo.fr>
Date: Wed, 11 Apr 2012 18:09:26 +0200
Local: Wed, Apr 11 2012 12:09 pm
Subject: Re: [RFC] SERVER-3062 Read from nearest slave

> There is already a spec that the other drivers are using, which is
> what the server will end up doing.
> The spec is based highly off of the what java driver does currently
> which works very well for a lot of people.

I assume you make reference to read preferences, as described in
https://jira.mongodb.org/browse/JAVA-428 ?

This patch doesn't tackle yet the issue of read tagging, but I might
get to work on it later. The ping buckets mentioned are essentially the
same approach as the proportional threshold used in the patch, but in
less flexible; although the java driver actually uses a constant 15 ms
threshold. I can change my patch to do the same, and even use an ugly
environment variable to configure it, but... well the spec doesn't
really address the question of how to configure that nearest slave
selection. How to fiddle with the ping settings of a particular replica
set monitor from the mongos shell, or even just display the current ping
values between the servers and the mongos?

Passing it through additional arguments to addShard doesn't sound like a
very good idea. Perhaps it could be a setting stored in the config db,
like the chunk size?

--
Pierre Ynard
"Une me dans un corps, c'est comme un dessin sur une feuille de papier."


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Grégoire Seux  
View profile  
 More options Apr 19 2012, 11:40 am
From: Grégoire Seux <kamaradclim...@gmail.com>
Date: Thu, 19 Apr 2012 08:40:54 -0700 (PDT)
Local: Thurs, Apr 19 2012 11:40 am
Subject: Re: [RFC] SERVER-3062 Read from nearest slave

It seems to me that reading from a set of close secondaries is a good
refinement of the slaveOk strategy. Large deployments of mongodb usually
span over several datacenters and you don't want to read from another
datacenter (unless forced).
Of course, it could be done with tags describing your setup, but this is
requires more configuration work.

Storing the threshold in the config db seems to be a good idea.

--
Gregoire


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Discussion subject changed to "[RFC] SERVER-3062 Read from nearest slave" by Scott Hernandez
Scott Hernandez  
View profile  
 More options Apr 19 2012, 1:12 pm
From: Scott Hernandez <scotthernan...@gmail.com>
Date: Thu, 19 Apr 2012 10:12:05 -0700
Local: Thurs, Apr 19 2012 1:12 pm
Subject: Re: [mongodb-dev] Re: [RFC] SERVER-3062 Read from nearest slave
This will actually be an option for each read not just a static or
global variable since each application may have different
requirements.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregoire Seux  
View profile  
 More options Apr 19 2012, 4:17 pm
From: Gregoire Seux <kamaradclim...@gmail.com>
Date: Thu, 19 Apr 2012 22:17:07 +0200
Local: Thurs, Apr 19 2012 4:17 pm
Subject: Re: [mongodb-dev] Re: [RFC] SERVER-3062 Read from nearest slave

On Thu, Apr 19, 2012 at 10:12:05AM -0700, Scott Hernandez wrote:
> This will actually be an option for each read not just a static or
> global variable since each application may have different
> requirements.

Ok, is there already a draft for this kind of query specifier ?
$readPref for instance ?

I can think of :
$readPref : {primary : 1}
$readPref : [secondaries:1}
$readPref : {tags : [...,...]}

with optional refinements like :
$readPref : {secondaries : 1, primaryOk :1,  maxPing : 5}

I am quite interested in having a way to precise from which servers I
prefer to read, so I am eager to help .

--
Greg


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »