This patch changes the way that mongos chooses slaves among a replica
set. It uses the ping statistics that the replica set monitor maintains
to read only from the nearest slaves. This is especially useful in
configurations where a cluster is spread across several datacenters.
As it is, this patch is probably incomplete. This significantly changes
the default behavior, so a configuration option or a protocol parameter
to enable/disable it is probably needed. The metric here checks if the
ping of a server is within 20% of the ping of the fastest server, but
that threshold could be made configurable and/or another metric could be
used altogether.
This addresses SERVER-3062. We're looking forward to seeing what
direction the implementation of this feature can take.
diff --git a/src/mongo/client/dbclient_rs.cpp b/src/mongo/client/dbclient_rs.cpp
index d4dfee2..8b11983 100644
--- a/src/mongo/client/dbclient_rs.cpp
+++ b/src/mongo/client/dbclient_rs.cpp
@@ -226,6 +226,14 @@ namespace mongo {
+ bool ReplicaSetMonitor::Node::isNearerThan( const Node &other ) const {
+ if ( pingTimeMillis >= other.pingTimeMillis )
+ return false;
+
+ // TODO: make threshold configurable
+ return ( pingTimeMillis / ( other.pingTimeMillis - pingTimeMillis ) < 5 );
+ }
+
HostAndPort ReplicaSetMonitor::getMaster() {
{
scoped_lock lk( _lock );
@@ -244,6 +252,7 @@ namespace mongo {
// make sure its valid
bool wasFound = false;
+ bool wasTooFar = false;
bool wasMaster = false;
// This is always true, since checked in port()
@@ -257,7 +266,18 @@ namespace mongo {
wasFound = true;
if ( _nodes[i].okForSecondaryQueries() )
- return prev;
+ {
+ // TODO: make this behavior configurable
+ for ( unsigned ii=0; ii<_nodes.size(); ii++ ) {
+ if ( ii != i && _nodes[ii].okForSecondaryQueries() && _nodes[ii].isNearerThan( _nodes[i] ) )
+ {
+ wasTooFar = true;
+ break;
+ }
+ }
+ if ( ! wasTooFar )
+ return prev;
+ }
wasMaster = _nodes[i].ok && ! _nodes[i].secondary;
@@ -266,8 +286,9 @@ namespace mongo {
}
if( prev.host().size() ){
- if( wasFound ){ LOG(1) << "slave '" << prev << ( wasMaster ? "' is master node, trying to find another node" :
- "' is no longer ok to use" ) << endl; }
+ if( wasFound ){ LOG(1) << "slave '" << prev << ( wasTooFar ? "' is no longer among the nearest nodes, switching to a nearer slave" :
+ ( wasMaster ? "' is master node, trying to find another node" :
+ "' is no longer ok to use" ) ) << endl; }
else{ LOG(1) << "slave '" << prev << "' was not found in the replica set" << endl; }
}
else LOG(1) << "slave '" << prev << "' is not initialized or invalid" << endl;
@@ -280,14 +301,24 @@ namespace mongo {
scoped_lock lk( _lock );
+ int slave = -1;
for ( unsigned ii = 0; ii < _nodes.size(); ii++ ) {
- _nextSlave = ( _nextSlave + 1 ) % _nodes.size();
- if ( _nextSlave != _master ) {
- if ( _nodes[ _nextSlave ].okForSecondaryQueries() )
- return _nodes[ _nextSlave ].addr;
- LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[_nextSlave] << ", not currently okForSecondaryQueries" << endl;
+ int node = ( _nextSlave + 1 + ii ) % _nodes.size();
+ if ( node != _master ) {
+ if ( _nodes[ node ].okForSecondaryQueries() )
+ {
+ // TODO: make this behavior configurable
+ if ( slave < 0 || _nodes[ node ].isNearerThan( _nodes[ slave ] ) )
+ slave = node;
+ }
+ else
+ LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[node] << ", not currently okForSecondaryQueries" << endl;
}
}
+ if ( slave >= 0 ) {
+ _nextSlave = slave;
+ return _nodes[ _nextSlave ].addr;
+ }
uassert(15899, str::stream() << "No suitable member found for slaveOk query in replica set: " << _name, _master >= 0 && _nodes[_master].ok);
// Fall back to primary
diff --git a/src/mongo/client/dbclient_rs.h b/src/mongo/client/dbclient_rs.h
index e35ba96..160a4b0 100644
--- a/src/mongo/client/dbclient_rs.h
+++ b/src/mongo/client/dbclient_rs.h
@@ -167,6 +167,14 @@ namespace mongo {
return ok && secondary && ! hidden;
}
+ /**
+ * This is used to establish a set of nearest nodes, using some metric; nodes to connect to
+ * can then be picked from this set. Two nodes may either be equivalent and both to be used,
+ * or one of them may be clearly nearer and should be exclusively preferred over the other.
+ * @return true if this node is nearer, false if other is nearer or if they are equivalent
+ */
+ bool isNearerThan( const Node &other ) const;
+
BSONObj toBSON() const {
return BSON( "addr" << addr.toString() <<
"isMaster" << ismaster <<
Best regards,
--
Pierre Ynard
"Une �me dans un corps, c'est comme un dessin sur une feuille de papier."
> "Une āme dans un corps, c'est comme un dessin sur une feuille de papier."
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
> To post to this group, send email to mongo...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.
>
I assume you make reference to read preferences, as described in
https://jira.mongodb.org/browse/JAVA-428 ?
This patch doesn't tackle yet the issue of read tagging, but I might
get to work on it later. The ping buckets mentioned are essentially the
same approach as the proportional threshold used in the patch, but in
less flexible; although the java driver actually uses a constant 15 ms
threshold. I can change my patch to do the same, and even use an ugly
environment variable to configure it, but... well the spec doesn't
really address the question of how to configure that nearest slave
selection. How to fiddle with the ping settings of a particular replica
set monitor from the mongos shell, or even just display the current ping
values between the servers and the mongos?
Passing it through additional arguments to addShard doesn't sound like a
very good idea. Perhaps it could be a setting stored in the config db,
like the chunk size?
--
Pierre Ynard
"Une �me dans un corps, c'est comme un dessin sur une feuille de papier."
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-dev" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-dev/-/tuHHi1cJKPMJ.
Ok, is there already a draft for this kind of query specifier ?
$readPref for instance ?
I can think of :
$readPref : {primary : 1}
$readPref : [secondaries:1}
$readPref : {tags : [...,...]}
with optional refinements like :
$readPref : {secondaries : 1, primaryOk :1, maxPing : 5}
I am quite interested in having a way to precise from which servers I
prefer to read, so I am eager to help .
--
Greg