This patch changes the way that mongos chooses slaves among a replica set. It uses the ping statistics that the replica set monitor maintains to read only from the nearest slaves. This is especially useful in configurations where a cluster is spread across several datacenters.
As it is, this patch is probably incomplete. This significantly changes the default behavior, so a configuration option or a protocol parameter to enable/disable it is probably needed. The metric here checks if the ping of a server is within 20% of the ping of the fastest server, but that threshold could be made configurable and/or another metric could be used altogether.
This addresses SERVER-3062. We're looking forward to seeing what direction the implementation of this feature can take.
if( prev.host().size() ){ - if( wasFound ){ LOG(1) << "slave '" << prev << ( wasMaster ? "' is master node, trying to find another node" : - "' is no longer ok to use" ) << endl; } + if( wasFound ){ LOG(1) << "slave '" << prev << ( wasTooFar ? "' is no longer among the nearest nodes, switching to a nearer slave" : + ( wasMaster ? "' is master node, trying to find another node" : + "' is no longer ok to use" ) ) << endl; } else{ LOG(1) << "slave '" << prev << "' was not found in the replica set" << endl; } } else LOG(1) << "slave '" << prev << "' is not initialized or invalid" << endl; @@ -280,14 +301,24 @@ namespace mongo {
scoped_lock lk( _lock );
+ int slave = -1; for ( unsigned ii = 0; ii < _nodes.size(); ii++ ) { - _nextSlave = ( _nextSlave + 1 ) % _nodes.size(); - if ( _nextSlave != _master ) { - if ( _nodes[ _nextSlave ].okForSecondaryQueries() ) - return _nodes[ _nextSlave ].addr; - LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[_nextSlave] << ", not currently okForSecondaryQueries" << endl; + int node = ( _nextSlave + 1 + ii ) % _nodes.size(); + if ( node != _master ) { + if ( _nodes[ node ].okForSecondaryQueries() ) + { + // TODO: make this behavior configurable + if ( slave < 0 || _nodes[ node ].isNearerThan( _nodes[ slave ] ) ) + slave = node; + } + else + LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[node] << ", not currently okForSecondaryQueries" << endl; } } + if ( slave >= 0 ) { + _nextSlave = slave; + return _nodes[ _nextSlave ].addr; + } uassert(15899, str::stream() << "No suitable member found for slaveOk query in replica set: " << _name, _master >= 0 && _nodes[_master].ok);
// Fall back to primary diff --git a/src/mongo/client/dbclient_rs.h b/src/mongo/client/dbclient_rs.h index e35ba96..160a4b0 100644 --- a/src/mongo/client/dbclient_rs.h +++ b/src/mongo/client/dbclient_rs.h @@ -167,6 +167,14 @@ namespace mongo { return ok && secondary && ! hidden; }
+ /** + * This is used to establish a set of nearest nodes, using some metric; nodes to connect to + * can then be picked from this set. Two nodes may either be equivalent and both to be used, + * or one of them may be clearly nearer and should be exclusively preferred over the other. + * @return true if this node is nearer, false if other is nearer or if they are equivalent + */ + bool isNearerThan( const Node &other ) const; + BSONObj toBSON() const { return BSON( "addr" << addr.toString() << "isMaster" << ismaster <<
Best regards,
-- Pierre Ynard "Une me dans un corps, c'est comme un dessin sur une feuille de papier."
There is already a spec that the other drivers are using, which is what the server will end up doing. The spec is based highly off of the what java driver does currently which works very well for a lot of people.
On Tue, Apr 3, 2012 at 10:56 AM, Pierre Ynard <linkfa...@yahoo.fr> wrote: > Hello,
> This patch changes the way that mongos chooses slaves among a replica > set. It uses the ping statistics that the replica set monitor maintains > to read only from the nearest slaves. This is especially useful in > configurations where a cluster is spread across several datacenters.
> As it is, this patch is probably incomplete. This significantly changes > the default behavior, so a configuration option or a protocol parameter > to enable/disable it is probably needed. The metric here checks if the > ping of a server is within 20% of the ping of the fastest server, but > that threshold could be made configurable and/or another metric could be > used altogether.
> This addresses SERVER-3062. We're looking forward to seeing what > direction the implementation of this feature can take.
> if( prev.host().size() ){ > - if( wasFound ){ LOG(1) << "slave '" << prev << ( wasMaster ? "' is master node, trying to find another node" : > - "' is no longer ok to use" ) << endl; } > + if( wasFound ){ LOG(1) << "slave '" << prev << ( wasTooFar ? "' is no longer among the nearest nodes, switching to a nearer slave" : > + ( wasMaster ? "' is master node, trying to find another node" : > + "' is no longer ok to use" ) ) << endl; } > else{ LOG(1) << "slave '" << prev << "' was not found in the replica set" << endl; } > } > else LOG(1) << "slave '" << prev << "' is not initialized or invalid" << endl; > @@ -280,14 +301,24 @@ namespace mongo {
> scoped_lock lk( _lock );
> + int slave = -1; > for ( unsigned ii = 0; ii < _nodes.size(); ii++ ) { > - _nextSlave = ( _nextSlave + 1 ) % _nodes.size(); > - if ( _nextSlave != _master ) { > - if ( _nodes[ _nextSlave ].okForSecondaryQueries() ) > - return _nodes[ _nextSlave ].addr; > - LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[_nextSlave] << ", not currently okForSecondaryQueries" << endl; > + int node = ( _nextSlave + 1 + ii ) % _nodes.size(); > + if ( node != _master ) { > + if ( _nodes[ node ].okForSecondaryQueries() ) > + { > + // TODO: make this behavior configurable > + if ( slave < 0 || _nodes[ node ].isNearerThan( _nodes[ slave ] ) ) > + slave = node; > + } > + else > + LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[node] << ", not currently okForSecondaryQueries" << endl; > } > } > + if ( slave >= 0 ) { > + _nextSlave = slave; > + return _nodes[ _nextSlave ].addr; > + } > uassert(15899, str::stream() << "No suitable member found for slaveOk query in replica set: " << _name, _master >= 0 && _nodes[_master].ok);
> // Fall back to primary > diff --git a/src/mongo/client/dbclient_rs.h b/src/mongo/client/dbclient_rs.h > index e35ba96..160a4b0 100644 > --- a/src/mongo/client/dbclient_rs.h > +++ b/src/mongo/client/dbclient_rs.h > @@ -167,6 +167,14 @@ namespace mongo { > return ok && secondary && ! hidden; > }
> + /** > + * This is used to establish a set of nearest nodes, using some metric; nodes to connect to > + * can then be picked from this set. Two nodes may either be equivalent and both to be used, > + * or one of them may be clearly nearer and should be exclusively preferred over the other. > + * @return true if this node is nearer, false if other is nearer or if they are equivalent > + */ > + bool isNearerThan( const Node &other ) const; > + > BSONObj toBSON() const { > return BSON( "addr" << addr.toString() << > "isMaster" << ismaster <<
> Best regards,
> -- > Pierre Ynard > "Une āme dans un corps, c'est comme un dessin sur une feuille de papier."
> -- > You received this message because you are subscribed to the Google Groups "mongodb-dev" group. > To post to this group, send email to mongodb-dev@googlegroups.com. > To unsubscribe from this group, send email to mongodb-dev+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.
> There is already a spec that the other drivers are using, which is > what the server will end up doing. > The spec is based highly off of the what java driver does currently > which works very well for a lot of people.
This patch doesn't tackle yet the issue of read tagging, but I might get to work on it later. The ping buckets mentioned are essentially the same approach as the proportional threshold used in the patch, but in less flexible; although the java driver actually uses a constant 15 ms threshold. I can change my patch to do the same, and even use an ugly environment variable to configure it, but... well the spec doesn't really address the question of how to configure that nearest slave selection. How to fiddle with the ping settings of a particular replica set monitor from the mongos shell, or even just display the current ping values between the servers and the mongos?
Passing it through additional arguments to addShard doesn't sound like a very good idea. Perhaps it could be a setting stored in the config db, like the chunk size?
-- Pierre Ynard "Une me dans un corps, c'est comme un dessin sur une feuille de papier."
It seems to me that reading from a set of close secondaries is a good refinement of the slaveOk strategy. Large deployments of mongodb usually span over several datacenters and you don't want to read from another datacenter (unless forced). Of course, it could be done with tags describing your setup, but this is requires more configuration work.
Storing the threshold in the config db seems to be a good idea.
On Thu, Apr 19, 2012 at 8:40 AM, Grégoire Seux <kamaradclim...@gmail.com> wrote: > It seems to me that reading from a set of close secondaries is a good > refinement of the slaveOk strategy. Large deployments of mongodb usually > span over several datacenters and you don't want to read from another > datacenter (unless forced). > Of course, it could be done with tags describing your setup, but this is > requires more configuration work.
> Storing the threshold in the config db seems to be a good idea.
> To post to this group, send email to mongodb-dev@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-dev+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-dev?hl=en.
On Thu, Apr 19, 2012 at 10:12:05AM -0700, Scott Hernandez wrote: > This will actually be an option for each read not just a static or > global variable since each application may have different > requirements.
Ok, is there already a draft for this kind of query specifier ? $readPref for instance ?
I can think of : $readPref : {primary : 1} $readPref : [secondaries:1} $readPref : {tags : [...,...]}
with optional refinements like : $readPref : {secondaries : 1, primaryOk :1, maxPing : 5}
I am quite interested in having a way to precise from which servers I prefer to read, so I am eager to help .