Monotonic Read Consistency and multiple read-only client applications using the Java driver

ahubold

unread,

Apr 3, 2013, 6:03:09 AM4/3/13

to mongod...@googlegroups.com

Hi,

we plan a setup of one (or very few) applications that write to a MongoDB cluster and lots of read-only applications. It's okay for the read-only applications to see slightly stale data as long as they see updates in the same order as they happen on the primary.

The MongoDB blog series "On distributed consistency" explained that MongoDB supports Monotonic Read Consistency which is exactly what we need for our reading clients. We also want to scale read load by distributing it over all replica nodes. Due to MongoDBs in-order replication, this works fine as long as a reading application is connected to exactly one MongoDB node.

But it does not work if we let the client connect to the replica set to achieve high availability, because the Java driver then distributes reads of a single application across multiple secondaries which may have different replication lags. For scaling reads across secondaries we use the read preference "secondaryPreferred". I cannot find a way to tell the driver that it should connect to just one mongod and provide Monotonic Read Consistency. While there are methods to use a single node for the current thread (with requestStart(), requestEnd()) this is not sufficient for the whole application. Using only one thread to read data is not an option as it would introduce performance problems.

Of course we could just use read preference "primary" for all clients. But then we'd need to use sharding to distribute the read-load. The additional complexity does not make sense in our case where the write load and the amount of data are rather low and can easily handled by a single mongod server. We have the secondaries running anyway (for HA) and for cost-efficiency we should use them to scale reads.

What's your recommendation for this use case?

Are there plans to support such a setup in the Java driver? (Maybe a new read preference "Monotonic Read Consistency" with failover support, i.e. where the driver connects to a single mongod but also supports failover to a new mongod after one was found that does not lag further behind).

As somewhat dirty workaround we currently let our read-only clients connect to single mongod servers, try to detect unavailability and perform a failover to a different mongod in our application code. As you can imagine it's not very nice to implement this in the application code. But the real problem is that if at some time in the future the write load of our application may grow and we have to use sharding. With our workaround we cannot use sharding because while we can reimplement some parts of the Java driver in the application it's not possible to implement the mongos logic in the application code. That's why I'm asking for proper support of this use case in MongoDB.

Regards,
Andreas

ahubold

unread,

Apr 9, 2013, 10:10:52 AM4/9/13

to mongod...@googlegroups.com

Hi,

does really nobody have an answer to this question? Somebody from 10gen maybe?

Regards,
Andreas

William Zola

unread,

Apr 9, 2013, 4:53:13 PM4/9/13

to mongod...@googlegroups.com

Hi Andreas

According to your problem description, you want both strong read consistency, and to be able to do read scaling by reading from secondaries. Unfortunately, the MongoDB Java driver does not support this, nor are there any plans that I know of to introduce this into the Java driver.

If you have a properly-functioning replica set using release 2.2.3 or later, you shouldn't be seeing significant replica lag anyway. If you've provisioned all of the nodes in the replica set identically, the secondaries should always be able to keep within a few seconds of the primary. If you really need read-your-own-writes semantics, then you should use a ReadPreference of Primary.

In general, I don't think it's a good idea to do read scaling via replica sets. MongoDB is designed to scale horizontally via sharding; in my personal opinion trying to do read scaling via secondary reads usually leads to problems down the road.

Let me know if you have further questions.

-William

ahubold

unread,

Apr 10, 2013, 3:22:44 AM4/10/13

to mongod...@googlegroups.com

Hi William,

thanks for your answer but it seems you have misunderstood me.

I do not need strong consistency, I only need monotonic read consistency. Please read http://blog.mongodb.org/post/523516007/on-distributed-consistency-part-6-consistency-chart for an explanation.
In the described use case a client should just see the time moving forward. The client does not need to read its own writes (especially if the client doesn't write at all) or see writes of other clients as they happen.

Even if replica lags are typically small in current Mongo versions, they are significant. A client may read old data after new one (even for the same document), which means you don't have monotonic reads.

I still think this would be a good feature for Mongo.

What are the mentioned usual problems when using replica sets for scaling reads? Do you have a pointer to some documentation or blog for me?

Regards,
Andreas

Sam Millman

unread,

Apr 10, 2013, 3:35:14 AM4/10/13

to mongod...@googlegroups.com

Those blog posts are old, i.e. they still show master slave replication which is deprecated, either way MongoDB by default goes higher level than that so I don't think you need to worry.

"Even if replica lags are typically small in current Mongo versions, they are significant. A client may read old data after new one (even for the same document), which means you don't have monotonic reads."

That is why you do replica set acked writes or you improve your network.

Tbh it sounds like you want synchronous replication...

"What are the mentioned usual problems when using replica sets for scaling reads?"

The documentation comes in handy here: http://docs.mongodb.org/manual/core/read-preference/

I would be careful about reading blog posts since they are not regularly updated to new features.

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

ahubold

unread,

Apr 10, 2013, 5:45:11 AM4/10/13

to mongod...@googlegroups.com

The mentioned blog post may be old but the types of consistency described are still valid. They aren't specific to MongoDB or its replication mechanism.

Again, I do *not* need strong consistency or synchronous replication. The Java driver already supports Monotonic Read Consistency (MRC) for a single application thread (via #requestStart / #requestEnd). I'm just asking for the same feature for multiple or all threads in a single application with failover support. No need to change replication semantics.

Acked writes or an improved network cannot ensure MRC if the driver reads from multiple secondaries.

I haven't found any "usual problems when using replica sets for scaling reads" on the documentation page you provided. Instead it says

an application that does not require fully up-to-date data, you can improve read throughput, or reduce latency, by distributing some or all reads to secondary members of the replica set.

Again, in the described use case we do not require fully up-to-date data.

Regards,
Andreas

Sam Millman

unread,

Apr 10, 2013, 6:34:34 AM4/10/13

to mongod...@googlegroups.com

Hmm that's annoying I was fairly certain that page did house some notes and waring messages, must be another page.

You are contradicting yourself, in your reply before you said:

"Even if replica lags are typically small in current Mongo versions, they are significant. A client may read old data after new one (even for the same document), which means you don't have monotonic reads."

but then in your last reply you say:

"Again, in the described use case we do not require fully up-to-date data."

Which is it?

Sam Millman

unread,

Apr 10, 2013, 6:37:58 AM4/10/13

to mongod...@googlegroups.com

The "usual problems when using replica sets for scaling reads" is:

"Even if replica lags are typically small in current Mongo versions, they are significant. A client may read old data after new one (even for the same document), which means you don't have monotonic reads."

Which judging by your past replies you are trying to avoid, yet you also say you don't care about.

If you don't care about old data then you can just use secondaryPreferred read preferences in the Java driver, no need to setup some complex infrastructure specific to your scenario which may or may not actually be of any use in the future.

Sam Millman

unread,

Apr 10, 2013, 6:45:55 AM4/10/13

to mongod...@googlegroups.com

"Acked writes or an improved network cannot ensure MRC if the driver reads from multiple secondaries."

Does not happen. The dirver should hold onto one secondarey for as long as that seocndary is connected to the replica set. This is to stop these exact cases.

ahubold

unread,

Apr 10, 2013, 7:02:42 AM4/10/13

to mongod...@googlegroups.com

No. Both are correct.

Okay let's provide a simple example. I have a replica set with a primary P and two secondaries S1 and S2. A single document D is written three times to different values V1, V2, V3 in exactly that order.

Let's say at some point in time: P has D=V3, S1 has D=V2 and S2 has D=V1.

Now a client connects with read preference secondaryPreferred.
1) It uses one thread to read D and sees V2. The driver used S1 for this thread. That's fine. It's not a problem for the application that it does not yet see V3.
2) Now the application reads D again from a different thread and sees V1. The driver used S2 for this different thread. The driver may use different servers for different threads
3) After some time the application reads D again and sees eventually V3.

The application saw the following order of changes for D: V2, V1, V3
While the correct order was V1, V2, V3

The application relies on the correct order, it needs Monotonic Read Consistency.

Fully up-to-date data is not needed. The application can read from a single secondary without problems. The problems occur as soon as it reads from different secondaries and sees data jumping around in a different order than it happened on the primary.

Sam Millman

unread,

Apr 10, 2013, 7:02:10 AM4/10/13

to mongod...@googlegroups.com

I am trying to find the doc page where I read that last night but it is buried in some sentence some where in one of the pages and kinda hard to find it without going through each page but from what I read it should be the case where it tries to keep hold of one secondary for as long as possible.

Sam Millman

unread,

Apr 10, 2013, 7:09:18 AM4/10/13

to mongod...@googlegroups.com

hmm, yea the only way to get true monotonic in the manner you speak of is either:

a- single thread

b- synchronous whereby operations are applied as they come in, not picked from a queue by multiple threads async.

If you forced, from the application end, to ack a write to all nodes of the set then it *should* block other writes in the order you wish to do them in the application, this means you can garauntee the order...maybe.

I have not read the docs on the deprecated master-slave replication however I think that also supports true monotonic.

ahubold

unread,

Apr 10, 2013, 7:09:52 AM4/10/13

to mongod...@googlegroups.com

"Does not happen. The dirver should hold onto one secondarey for as long as that seocndary is connected to the replica set. This is to stop these exact cases."

No. At least the documentation tells me the opposite. See http://docs.mongodb.org/ecosystem/drivers/java-concurrency/#java-driver-concurrency
It says

"""
Additionally in the case of a replica set with slaveOk option turned on, the read operations will be distributed evenly across all slaves. This means that within the same thread, a write followed by a read may be sent to different servers (master then slave).

Sam Millman

unread,

Apr 10, 2013, 7:11:50 AM4/10/13

to mongod...@googlegroups.com

If you are using multithreading in Java of course the application end won't work.

Sam Millman

unread,

Apr 10, 2013, 7:13:48 AM4/10/13

to mongod...@googlegroups.com

Yea I may have been reading about replica set internals there not the driver read preference, my bad.

ahubold

unread,

Apr 10, 2013, 7:21:42 AM4/10/13

to mongod...@googlegroups.com

If you are using multithreading in Java of course the application end won't work.

Okay, now it's getting funny. As written in my first message:

"While there are methods to use a single node for the current thread (with requestStart(), requestEnd()) this is not sufficient for the whole application. Using only one thread to read data is not an option as it would introduce performance problems."

The network latency will kill the application performance if you just use one thread for reading.

Sam Millman

unread,

Apr 10, 2013, 7:35:04 AM4/10/13

to mongod...@googlegroups.com

indeed...

Sam Millman

unread,

Apr 10, 2013, 7:39:41 AM4/10/13

to mongod...@googlegroups.com

You know, you could instead use sharding for this. It is designed to use multiple servers to spread out writes and reads but you won't be able to distribute reads over a single range of data like you could in a replica, so the sharding is of course designed to distribute reads across all your data while replicas are more designed for a range.

ahubold

unread,

Apr 10, 2013, 7:52:22 AM4/10/13

to mongod...@googlegroups.com

Yes, thank you. Of course I could use sharding. But sharding comes with an additional cost. That's the whole point of my original question.

Sam Millman

unread,

Apr 10, 2013, 7:58:19 AM4/10/13

to mongod...@googlegroups.com

Yea I'm out, apart from using master-slave replication which I think that blog post you linked actually talks about I am unsure if you can get true monotonic in MongoDB.

Jeff Yemin

unread,

Apr 10, 2013, 5:11:00 PM4/10/13

to mongod...@googlegroups.com

While not ideal, you may be able to achieve your goals with tagged read preferences (see http://docs.mongodb.org/ecosystem/drivers/java-replica-set-semantics/#read-preferences-and-tagging). If you give each of your replica set nodes a unique "id" (you could use any name you prefer) tag ("rs0", "rs1", "rs2", etc), you can construct a read preference that will order the application's preference for a particular node, with failover to the next preferred, and so on. You can create such a read preference like this:

ReadPreference pref = ReadPreference.secondary(new BasicDBObject("id", "rs0"),

new BasicDBObject("id", "rs1"),

new BasicDBObject("id", "rs2"));

The effect of this read preference will be to route all queries to the (one) node in the replica set that is currently a secondary and tagged with {"id" : "rs0"}, if one exists. If none exists, it will try for "rs1", and so on.

The problem with this approach is that you could go back in time without being notified, in the case of a failure of, say, the node tagged with "rs0".

If you want to make certain you don't go back in time, you could use a read preference like:

ReadPreference pref = ReadPreference.secondaryPreferred(new BasicDBObject("id", "rs0"));

This will force all queries to the secondary tagged with {"id" : "rs0"}, if one exists. If that node is down or is no longer a secondary, it will route the queries to the primary (which may be "rs0" if it was elected.

I could see this approach working for your application if you could segment it into logical units of work, and use a read preference with a different value for the "id" tag for each of these units. That way you could load balance the units of work across multiple secondaries, with the caveat of if that secondary happens to be down, queries will go to the primary, and if in addition there is no primary (say during an election), the queries will fail.

Regards,

Jeff

ahubold

unread,

Apr 11, 2013, 3:20:07 AM4/11/13

to mongod...@googlegroups.com

Hi Jeff,

thanks for joining this discussion.

We thought about using tagged read preferences as well. We really need to ensure that we can't go back in time even in the case of failovers.

If you want to make certain you don't go back in time, you could use a read preference like:

ReadPreference pref = ReadPreference.secondaryPreferred(new BasicDBObject("id", "rs0"));

There is still a problem with this approach. If the tagged secondary cannot be reached, the client will read from the primary. That's fine. But if the secondary comes back again, the client will again use the secondary at some time. It may have read newer state from the primary then and will afterwards read possibly older state from the secondary. Time can still go back in back.

Furthermore it could even be possible with this approach that some threads read from the primary (after a very short network hickup to the secondary) while other threads still use the secondary. But I'm not totally sure on this one. I don't know the driver's internals.

It seems that it's currently not possible to achieve true monotonic read consistency when reading from secondaries with MongoDB. As workaround we will now use direct connections to single replica nodes and perform failover in our application code.

However I think this would be great feature for some future MongoDB version. I should probably open a feature request in JIRA, right?

Regards,
Andreas

Jeff Yemin

unread,

Apr 11, 2013, 7:52:42 AM4/11/13

to mongod...@googlegroups.com

It looks like mgo, the MongoDB driver for Go, has the behavior you want: http://godoc.org/labix.org/v2/mgo. See the discussion on Monotonic consistency.

Yes, please file an issue in the JAVA project.

Regards,

Jeff

William Zola

unread,

Apr 11, 2013, 9:23:05 AM4/11/13

to mongod...@googlegroups.com

Hi Andreas!

Previously, I said:

In general, I don't think it's a good idea to do read scaling via replica sets. MongoDB is designed to scale horizontally via sharding; in my personal opinion trying to do read scaling via secondary reads usually leads to problems down the road.

You then asked:

What are the mentioned usual problems when using replica sets for scaling reads? Do you have a pointer to some documentation or blog for me?

You've just run into one of the problems: most of the drivers don't support monotonic read consistency.

The other problem is a scalability/availability issue. Let's say you have a three-node replica set, and you've used secondary reads to perform read scaling. You've done this since your primary node doesn't have the IOPS to sustain your normal workload; you need 3x the IOPS, and you've achieved this by having 2/3 of the read operations go to the secondaries. Now: what happens when you lose a node? More importantly: what happens when you have to take a node down for scheduled maintenance? That's right: you no longer have enough IOPS to sustain your normal workload.

You can get around this by using tag sets to make sure that your application only uses (e.g.) two of the three nodes for reading, but most folks don't do this.

It looks like you're going to continue down the replica set road. I invite you to consider what you'll do in the event of a node failure when you design both your application and your capacity model.

-William

On Wednesday, April 10, 2013 4:02:42 AM UTC-7, ahubold wrote:

No. Both are correct.

Okay let's provide a simple example. I have a replica set with a primary P and two secondaries S1 and S2. A single document D is written three times to different values V1, V2, V3 in exactly that order.

Let's say at some point in time: P has D=V3, S1 has D=V2 and S2 has D=V1.

Now a client connects with read preference secondaryPreferred.
1) It uses one thread to read D and sees V2. The driver used S1 for this thread. That's fine. It's not a problem for the application that it does not yet see V3.
2) Now the application reads D again from a different thread and sees V1. The driver used S2 for this different thread. The driver may use different servers for different threads
3) After some time the application reads D again and sees eventually V3.

The application saw the following order of changes for D: V2, V1, V3
While the correct order was V1, V2, V3

The application relies on the correct order, it needs Monotonic Read Consistency.

Fully up-to-date data is not needed. The application can read from a single secondary without problems. The problems occur as soon as it reads from different secondaries and sees data jumping around in a different order than it happened on the primary.

[snip]

Andreas Hubold

unread,

Apr 12, 2013, 3:03:15 AM4/12/13

to mongod...@googlegroups.com

Hi William,

yes, it's important to reserve enough capacity to actually be able to handle the load after failures. I think this can be achieved with additional hidden secondaries, i.e. nodes with hidden:true in the replica set configuration. With some monitoring in place, we can unhide these secondaries automatically when needed. For some setups it may also be okay if these nodes are part of the active nodes - but of course one has to understand the load characteristics and test for failures.

In theory it sounds nice to handle read scaling and availability as completely orthogonal aspects: use sharding to scale and replication for availability. However in practice this comes at a much higher cost in terms of number of machines, operational costs and complexity (for read heavy applications that don't need strong consistency). Of course if the load increases you have to consider sharding at some point (e.g. increased write load or maximum number of replica set nodes reached).

Regards,
Andreas

--

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

---

You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/0LG7e9OAEPs/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

ahubold

unread,

Apr 12, 2013, 3:53:16 AM4/12/13

to mongod...@googlegroups.com

Thank you. I've opened https://jira.mongodb.org/browse/JAVA-804

Regards,
Andreas

Sam Millman

unread,

Apr 12, 2013, 4:01:24 AM4/12/13

to mongod...@googlegroups.com

One concern I have here which ahs changed into a question is of putting this all in the driver is that to ensure monotonic read upon one secondary you must ensure that the async batch processor does multi document operations in sequence, which the docs hint that it might not. I can see how operations on a single document would be monotonic however is it true in saying that the threads will pick all operations and write all operations in a monotonic manner to a single secondary?

Reply all

Reply to author

Forward