[RFC] SERVER-3062 Read from nearest slave

67 views
Skip to first unread message

Pierre Ynard

unread,
Apr 3, 2012, 10:56:31 AM4/3/12
to mongo...@googlegroups.com
Hello,

This patch changes the way that mongos chooses slaves among a replica
set. It uses the ping statistics that the replica set monitor maintains
to read only from the nearest slaves. This is especially useful in
configurations where a cluster is spread across several datacenters.

As it is, this patch is probably incomplete. This significantly changes
the default behavior, so a configuration option or a protocol parameter
to enable/disable it is probably needed. The metric here checks if the
ping of a server is within 20% of the ping of the fastest server, but
that threshold could be made configurable and/or another metric could be
used altogether.

This addresses SERVER-3062. We're looking forward to seeing what
direction the implementation of this feature can take.


diff --git a/src/mongo/client/dbclient_rs.cpp b/src/mongo/client/dbclient_rs.cpp
index d4dfee2..8b11983 100644
--- a/src/mongo/client/dbclient_rs.cpp
+++ b/src/mongo/client/dbclient_rs.cpp
@@ -226,6 +226,14 @@ namespace mongo {



+ bool ReplicaSetMonitor::Node::isNearerThan( const Node &other ) const {
+ if ( pingTimeMillis >= other.pingTimeMillis )
+ return false;
+
+ // TODO: make threshold configurable
+ return ( pingTimeMillis / ( other.pingTimeMillis - pingTimeMillis ) < 5 );
+ }
+
HostAndPort ReplicaSetMonitor::getMaster() {
{
scoped_lock lk( _lock );
@@ -244,6 +252,7 @@ namespace mongo {
// make sure its valid

bool wasFound = false;
+ bool wasTooFar = false;
bool wasMaster = false;

// This is always true, since checked in port()
@@ -257,7 +266,18 @@ namespace mongo {
wasFound = true;

if ( _nodes[i].okForSecondaryQueries() )
- return prev;
+ {
+ // TODO: make this behavior configurable
+ for ( unsigned ii=0; ii<_nodes.size(); ii++ ) {
+ if ( ii != i && _nodes[ii].okForSecondaryQueries() && _nodes[ii].isNearerThan( _nodes[i] ) )
+ {
+ wasTooFar = true;
+ break;
+ }
+ }
+ if ( ! wasTooFar )
+ return prev;
+ }

wasMaster = _nodes[i].ok && ! _nodes[i].secondary;

@@ -266,8 +286,9 @@ namespace mongo {
}

if( prev.host().size() ){
- if( wasFound ){ LOG(1) << "slave '" << prev << ( wasMaster ? "' is master node, trying to find another node" :
- "' is no longer ok to use" ) << endl; }
+ if( wasFound ){ LOG(1) << "slave '" << prev << ( wasTooFar ? "' is no longer among the nearest nodes, switching to a nearer slave" :
+ ( wasMaster ? "' is master node, trying to find another node" :
+ "' is no longer ok to use" ) ) << endl; }
else{ LOG(1) << "slave '" << prev << "' was not found in the replica set" << endl; }
}
else LOG(1) << "slave '" << prev << "' is not initialized or invalid" << endl;
@@ -280,14 +301,24 @@ namespace mongo {

scoped_lock lk( _lock );

+ int slave = -1;
for ( unsigned ii = 0; ii < _nodes.size(); ii++ ) {
- _nextSlave = ( _nextSlave + 1 ) % _nodes.size();
- if ( _nextSlave != _master ) {
- if ( _nodes[ _nextSlave ].okForSecondaryQueries() )
- return _nodes[ _nextSlave ].addr;
- LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[_nextSlave] << ", not currently okForSecondaryQueries" << endl;
+ int node = ( _nextSlave + 1 + ii ) % _nodes.size();
+ if ( node != _master ) {
+ if ( _nodes[ node ].okForSecondaryQueries() )
+ {
+ // TODO: make this behavior configurable
+ if ( slave < 0 || _nodes[ node ].isNearerThan( _nodes[ slave ] ) )
+ slave = node;
+ }
+ else
+ LOG(2) << "dbclient_rs getSlave not selecting " << _nodes[node] << ", not currently okForSecondaryQueries" << endl;
}
}
+ if ( slave >= 0 ) {
+ _nextSlave = slave;
+ return _nodes[ _nextSlave ].addr;
+ }
uassert(15899, str::stream() << "No suitable member found for slaveOk query in replica set: " << _name, _master >= 0 && _nodes[_master].ok);

// Fall back to primary
diff --git a/src/mongo/client/dbclient_rs.h b/src/mongo/client/dbclient_rs.h
index e35ba96..160a4b0 100644
--- a/src/mongo/client/dbclient_rs.h
+++ b/src/mongo/client/dbclient_rs.h
@@ -167,6 +167,14 @@ namespace mongo {
return ok && secondary && ! hidden;
}

+ /**
+ * This is used to establish a set of nearest nodes, using some metric; nodes to connect to
+ * can then be picked from this set. Two nodes may either be equivalent and both to be used,
+ * or one of them may be clearly nearer and should be exclusively preferred over the other.
+ * @return true if this node is nearer, false if other is nearer or if they are equivalent
+ */
+ bool isNearerThan( const Node &other ) const;
+
BSONObj toBSON() const {
return BSON( "addr" << addr.toString() <<
"isMaster" << ismaster <<


Best regards,

--
Pierre Ynard
"Une �me dans un corps, c'est comme un dessin sur une feuille de papier."

Eliot Horowitz

unread,
Apr 6, 2012, 10:17:16 AM4/6/12
to mongo...@googlegroups.com
There is already a spec that the other drivers are using, which is
what the server will end up doing.
The spec is based highly off of the what java driver does currently
which works very well for a lot of people.

> "Une āme dans un corps, c'est comme un dessin sur une feuille de papier."
>
> --
> You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
> To post to this group, send email to mongo...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-dev?hl=en.
>

Pierre Ynard

unread,
Apr 11, 2012, 12:09:26 PM4/11/12
to mongo...@googlegroups.com
> There is already a spec that the other drivers are using, which is
> what the server will end up doing.
> The spec is based highly off of the what java driver does currently
> which works very well for a lot of people.

I assume you make reference to read preferences, as described in
https://jira.mongodb.org/browse/JAVA-428 ?

This patch doesn't tackle yet the issue of read tagging, but I might
get to work on it later. The ping buckets mentioned are essentially the
same approach as the proportional threshold used in the patch, but in
less flexible; although the java driver actually uses a constant 15 ms
threshold. I can change my patch to do the same, and even use an ugly
environment variable to configure it, but... well the spec doesn't
really address the question of how to configure that nearest slave
selection. How to fiddle with the ping settings of a particular replica
set monitor from the mongos shell, or even just display the current ping
values between the servers and the mongos?

Passing it through additional arguments to addShard doesn't sound like a
very good idea. Perhaps it could be a setting stored in the config db,
like the chunk size?

--
Pierre Ynard
"Une �me dans un corps, c'est comme un dessin sur une feuille de papier."

Grégoire Seux

unread,
Apr 19, 2012, 11:40:54 AM4/19/12
to mongo...@googlegroups.com
It seems to me that reading from a set of close secondaries is a good refinement of the slaveOk strategy. Large deployments of mongodb usually span over several datacenters and you don't want to read from another datacenter (unless forced).
Of course, it could be done with tags describing your setup, but this is requires more configuration work.

Storing the threshold in the config db seems to be a good idea.

-- 
Gregoire

Scott Hernandez

unread,
Apr 19, 2012, 1:12:05 PM4/19/12
to mongo...@googlegroups.com
This will actually be an option for each read not just a static or
global variable since each application may have different
requirements.

> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-dev" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-dev/-/tuHHi1cJKPMJ.

Gregoire Seux

unread,
Apr 19, 2012, 4:17:07 PM4/19/12
to mongo...@googlegroups.com
On Thu, Apr 19, 2012 at 10:12:05AM -0700, Scott Hernandez wrote:
> This will actually be an option for each read not just a static or
> global variable since each application may have different
> requirements.
>

Ok, is there already a draft for this kind of query specifier ?
$readPref for instance ?

I can think of :
$readPref : {primary : 1}
$readPref : [secondaries:1}
$readPref : {tags : [...,...]}

with optional refinements like :
$readPref : {secondaries : 1, primaryOk :1, maxPing : 5}

I am quite interested in having a way to precise from which servers I
prefer to read, so I am eager to help .

--
Greg

Reply all
Reply to author
Forward
0 new messages