Query oplog.rs is slow

3,754 views
Skip to first unread message

richardwilly98

unread,
Sep 28, 2013, 8:33:44 AM9/28/13
to mongod...@googlegroups.com
Hi,

db.oplog.rs.count() shows 18878152 records

An external java app query periodically oplog.rs with the following code snippet:

private DBCursor oplogCursor(final BSONTimestamp timestampOverride) {
BSONTimestamp time = timestampOverride == null
? MongoDBRiver.getLastTimestamp(client, definition) : timestampOverride;
DBObject indexFilter = getOplogFilter(time);
if (indexFilter == null) {
return null;
}

int options = Bytes.QUERYOPTION_TAILABLE
| Bytes.QUERYOPTION_AWAITDATA | Bytes.QUERYOPTION_NOTIMEOUT;

// Using OPLOGREPLAY to improve performance:
if (indexFilter.containsField(MongoDBRiver.OPLOG_TIMESTAMP)) {
options = options | Bytes.QUERYOPTION_OPLOGREPLAY;
}
return oplogCollection.find(indexFilter)
.sort(new BasicDBObject(MongoDBRiver.MONGODB_NATURAL_OPERATOR, 1))
.setOptions(options);
}

OPLOG_REPLAY flag is set but it currently takes 5 seconds for this query (see traces below).

Wed Sep 18 21:11:05.118 [conn68] getmore local.oplog.rs query: { $query: { ts: { $gt: Timestamp 1379529386000|1 }, $or: [ { ns: "vendop.vendor" }, { ns: "vendop.$cmd" } ] }, $orderby: { $natural: 1 } } cursorid:1102542858630512437 ntoreturn:0 keyUpdates:0 locks(micros) r:188 nreturned:0 reslen:20 5011ms

Is it expected response time?
Is there any way to improve it?

Thanks,
   Richard.

Евгений Власенко

unread,
Sep 30, 2013, 11:51:55 AM9/30/13
to mongod...@googlegroups.com
+1

Asya Kamsky

unread,
Oct 1, 2013, 12:07:31 AM10/1/13
to mongodb-user
So the problem is that the special OPLOG_REPLAY flag is for tailing
the oplog. That's it.
You are trying to use it not for tailing the oplog but to selectively
tail some of the oplog AND you are requesting that it also be
"re-sorted".

First, remove sort $natural:1 - that is already the default.
Second, are you sure you need that other clause to filter "ns"? If
you do, I would re-write it without the $or ($or is not going to be
properly optimized and should never be used if you can use $in
instead).
I would change the whole $or to just "ns":{"$in:[ "vendop.vendor",
"vendop.$cmd"]} (or just eliminate it to see how that improves the
performance).

By the way, note this "r:188 5011ms" <- this means the reading took
188 microseconds, the rest of the time was spent waiting.
Is there a lot going on with the oplog (i.e. a lot of writes happening
on that mongod?)

Asya
> --
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb
>
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Asya Kamsky

unread,
Oct 1, 2013, 9:20:01 AM10/1/13
to mongodb-user
It was pointed out to me that I didn't mention that what is happening conceptually is this:
the query directs the server:
"If there is no new data to return, please wait for data rather than returning immediately"
(AWAITDATA option).  

Since this instance returned nothing (nreturned:0) it's expected that it didn't return 
immediately but rather waited in case more data would show up.

Did you expect that there *would* be data in this particular call?

Asya

David Blado

unread,
Oct 2, 2013, 1:59:03 PM10/2/13
to mongod...@googlegroups.com
The mongod has very little load on it.  It's hosted on an AWS m1.large that's dedicated to mongod and the oplog queries and the occasional read/write from pymongo is all that's happening.

Thanks for explaining the r:188 5011ms -- didn't know that.

> See also the IRC channel -- freenode.net#mongodb
>
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an

richardwilly98

unread,
Oct 7, 2013, 7:05:19 AM10/7/13
to mongod...@googlegroups.com
I have added to additional filter about vendor because I am only interested to replicate data from vendor collection. I will replace $or by $in

richardwilly98

unread,
Nov 27, 2013, 5:24:41 PM11/27/13
to mongod...@googlegroups.com
Hi,

It looks like using $in instead of $or does not help. The additional filter were trying to limit the traffic on the network as I am only interested in entries from specific database / collection.

I want to try to use OPLOG_REPLAY the correct way to tail oplog.rs in case the connection is lost do I need to tail oplog.rs from the beginning event if I know the last row processed by its "ts" field?

Thanks,
   Richard.

Asya Kamsky

unread,
Nov 28, 2013, 12:16:39 AM11/28/13
to mongodb-user

You want to tail it from last processed event when you requery to avoid getting entire oplog being queried on reconnection.

Asya

richardwilly98

unread,
Nov 28, 2013, 5:27:02 AM11/28/13
to mongod...@googlegroups.com
Yes - I know the timestamp of the last oplog.rs entry processed so I'd like to skip all old entries.

How can I do that?

Thanks,
   Richard.

Richard Louapre

unread,
Nov 28, 2013, 6:48:18 AM11/28/13
to mongod...@googlegroups.com
I was using this example to tail from a specific timestamp [1]. Is this example still valid and efficient?



You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/E7BSv624nBg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

William Zola

unread,
Nov 28, 2013, 10:47:02 AM11/28/13
to mongod...@googlegroups.com
You should probably also add  QUERYOPTION_OPLOGREPLAY to your list of options that you set: it will optimize the process of finding the latest timestamp.

 -William 

Dwight Merriman

unread,
Nov 29, 2013, 7:08:10 PM11/29/13
to mongod...@googlegroups.com
i think the various post answer the question but to summarize:

- if there is not new data, QUERYOPTION_AWAITDATA will wait for 3 seconds before returning...
- the oplog is a collection without any indexes, and also, huge -- 100GB not unusual.  so scanning it would be slow.  looking at the end with OPLOG_REPLAY is reasonably efficient, but the entire last extent might be scanned, and that may be 2GB.  

so that could easily take 5 seconds.  so in general, oplog.rs.find({ ts : { $gte : ... }, ... }).sort({$natural:1} w/oplog replay will not be instantaneous.  the best thing to do is use tailable cursors and *not* issue the query very often at all, and then you should be able to get back a continuous stream of tail'd data once you get going without delay.

richardwilly98

unread,
Nov 30, 2013, 12:10:45 PM11/30/13
to mongod...@googlegroups.com
Hi Dwight,

Thanks a lot to summarize all suggestions.
Let me add few comments just to make sure I don't misunderstand any of these suggestions:
- QUERYOPTION_AWAITDATA will wait for 3 seconds before returning... At Java driver level the code blocks anyway waiting for the next entry. So the 3 seconds are not really relevant from the driver point of view. Is that correct?
- So my best option are
1. Tailable cursor using OPLOG_REPLAY with ts filter to be positioned at soon as possible on the interesting record. Will the cursor still use ts filter will looking for the next entry?
2. Tailable cursor using OPLOG_REPLAY without ts filter. Positioning on the interesting record will take time but once there getting the next entry will not cost to Mongo instances.

Basically what will be the extra cost on MongoDB side of using oplog.rs.find({ ts : { $gte : ... }, ... }).sort({$natural:1} vs oplog.rs.find().sort({$natural:1}?

Thanks,
   Richard.

Dwight Merriman

unread,
Dec 5, 2013, 12:04:32 AM12/5/13
to mongod...@googlegroups.com


Basically what will be the extra cost on MongoDB side of using oplog.rs.find({ ts : { $gte : ... }, ... }).sort({$natural:1} vs oplog.rs.find().sort({$natural:1}?

let's say you have a large drive and the oplog contains a billion entries.  
 
 oplog.rs.find().sort({$natural:1}

will return a billion documents.  if you really just wanted the ones from the last 2 minutes, 

oplog.rs.find({ ts : { $gte : <2minutesagotime> }).sort({$natural:1}

will only return those -- maybe that's thousands of documents.  so very different queries then.  

one very important detail though: for the latter query, without OPLOG_REPLAY it will scan the whole oplog to get those last 2 minutes, as it will table scan.  so it would still scan a billion it just doesn't return the ones not matching.  so that would be very inefficient.  OPLOG_REPLAY will make it reasonably efficient, but still take seconds or a minute to skip old data as there is no index, it is just looking "near the end" of the billion heuristically.  so it is important to use OPLOG_REPLAY to look at the end of the oplog, and in addition to use tailable, so that you don't keep issuing the query again and again.

AWAIT_DATA is as you say optional -- it will simply make the latency lower to get the new records that are born.  if you don't use AWAIT_DATA be sure to sleep a little before you call more() so that you don't do thousands of GET_MORE's per second if say, there is no new data...

Dwight Merriman

unread,
Dec 5, 2013, 12:06:33 AM12/5/13
to mongod...@googlegroups.com
clarifying note that all this is rather specific to tailing the oplog -- i.e. to tailing a continually being inserted in, gigantic, capped collection with no index.

in other circumstances with capped collections these extra options aren't typically needed.

also : sort( { $natural : 1 } ) is the default btw.  the reason it exists explicitly is so that you could also do { $natural : -1 } if desired on a capped collection.  so you can leave that off if desired for brevity, with same result, at least with all versions to date.
Reply all
Reply to author
Forward
0 new messages