Is there any advice for optimising date queries?

53 views
Skip to first unread message

Roman Gaufman

unread,
Apr 17, 2018, 6:52:31 PM4/17/18
to mongodb-user
Dear MongoDB User Group,

I have an optimisation question, I have a bunch of "events" and each one has a zond_id, starts_at and ends_at -- there's a new trigger coming from an IOT device, this device may have been offline for a while, so I need to find the relevant event for it. The query looks like this:

find { find: "events", filter: { zone_id: ObjectId('55fad4fb3d1f921a09000007'), starts_at: { $lte: new Date(1523985883000) }, ends_at: { $gte: new Date(1523985943000) } }, limit: 1, sort: { starts_at: -1 }, singleBatch: true } planSummary: IXSCAN { zone_id: 1, starts_at: -1, complete: -1 } keysExamined:19634 docsExamined:19634 cursorExhausted:1 numYields:154 nreturned:0 reslen:121 locks:{ Global: { acquireCount: { r: 310 } }, Database: { acquireCount: { r: 155 } }, Collection: { acquireCount: { r: 155 } } } protocol:op_query 176ms

Note, as shown above, it's already using an index { zone_id: 1, starts_at: -1, complete: -1 } but it's still very slow. Any advice what I can do to speed things up?

mdewit

unread,
Apr 18, 2018, 7:21:17 AM4/18/18
to mongodb-user
If you can change the initial find query into a covered query (query of which the projected fields are all present in the index) then you should see a speed increase. 

Can you create an index that includes the "ends_at" field?
Eg: index { zone_id: 1, starts_at: -1, ends_at: 1, complete: -1 }

Then, project only the fields contained in this index.
eg: { _id: 0, zone_id: 1, starts_at: 1, ends_at: 1, complete: 1}

See https://docs.mongodb.com/manual/core/query-optimization/#covered-query

Roman Gaufman

unread,
Apr 23, 2018, 5:57:05 PM4/23/18
to mongodb-user
Thank you for the reply, I have created a "covered index" but for some reason, it tries it a few times and then switches back to the original index, hmm - maybe I did something wrong, my indexes are:

> db.events.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "timeline_shared_production.events"
},
{
"v" : 1,
"key" : {
"timeline_id" : 1
},
"name" : "timeline_id_1",
"ns" : "timeline_shared_production.events",
"background" : true
},
{
"v" : 1,
"key" : {
"zone_id" : 1
},
"name" : "zone_id_1",
"ns" : "timeline_shared_production.events",
"background" : true
},
{
"v" : 1,
"key" : {
"location_id" : 1
},
"name" : "location_id_1",
"ns" : "timeline_shared_production.events",
"background" : true
},
{
"v" : 1,
"key" : {
"zone_id" : 1,
"starts_at" : -1,
"ends_at" : 1,
"complete" : -1
},
"name" : "zone_id_1_starts_at_-1_ends_at_1_complete_-1",
"ns" : "timeline_shared_production.events",
"background" : true,
"sparse" : true
},
{
"v" : 1,
"key" : {
"zone_id" : 1,
"starts_at" : -1,
"complete" : -1
},
"name" : "zone_id_1_starts_at_-1_complete_-1",
"ns" : "timeline_shared_production.events",
"background" : true
},
{
"v" : 2,
"key" : {
"timeline_id" : 1,
"starts_at" : -1,
"complete" : -1
},
"name" : "timeline_id_1_starts_at_-1_complete_-1",
"ns" : "timeline_shared_production.events",
"background" : true,
"sparse" : true
},
{
"v" : 2,
"key" : {
"zone_id" : 1,
"starts_at" : -1,
"ends_at" : -1,
"complete" : -1
},
"name" : "zone_id_1_starts_at_-1_ends_at_-1_complete_-1",
"ns" : "timeline_shared_production.events",
"background" : true
}
]

But in the logs I'm still seeing mostly:

2018-04-23T22:55:07.546+0100 I COMMAND  [conn2771] command timeline_shared_production.events command: find { find: "events", filter: { zone_id: ObjectId('524ecbb2c5399fa6b705c33d'), starts_at: { $lte: new Date(1524520446000) }, ends_at: { $gte: new Date(1524520506000) } }, limit: 1, sort: { starts_at: -1 }, singleBatch: true, $db: "timeline_shared_production" } planSummary: IXSCAN { zone_id: 1, starts_at: -1, complete: -1 } keysExamined:12442 docsExamined:12442 cursorExhausted:1 numYields:97 nreturned:0 reslen:106 locks:{ Global: { acquireCount: { r: 196 } }, Database: { acquireCount: { r: 98 } }, Collection: { acquireCount: { r: 98 } } } protocol:op_msg 119ms

Very rarely I will see:

2018-04-23T00:43:02.292+0100 I COMMAND  [conn2465] command timeline_shared_production.events command: getMore { getMore: 158119919901, collection: "events", $db: "timeline_shared_production" } originatingCommand: { find: "events", filter: { zone_id: ObjectId('59131b073d1f925da537a418'), starts_at: { $lte: new Date(1523576421564) }, flagged: false, cloud: true }, sort: { starts_at: -1 }, $db: "timeline_shared_production" } planSummary: IXSCAN { zone_id: 1, starts_at: -1, ends_at: -1, complete: -1 } cursorid:158119919901 keysExamined:11938 docsExamined:11938 cursorExhausted:1 numYields:96 nreturned:2203 reslen:955111 locks:{ Global: { acquireCount: { r: 194 } }, Database: { acquireCount: { r: 97 } }, Collection: { acquireCount: { r: 97 } } } protocol:op_msg 194ms

Any ideas?

Marius De Wit

unread,
Apr 25, 2018, 3:50:09 AM4/25/18
to mongod...@googlegroups.com
Hi just a couple of additional things that may help:

1) Im thinking the long index will only be used for queries where more than one record is found with a starts_at <= new Date(...). In the situations where only one or zero such records exist, a shorter index may be used.

Regarding that:  I was wondering if your starts_at ends_at filter isnt the wrong way around? In your filter I see the following:

starts_at: { $lte: new Date(1523985883000) }, ends_at: { $gte: new Date(1523985943000) } 
Should starts_at not be be measured with $gte and ends_at with $lte?

At the moment you are measuring where (x <= start_date) AND (x >= end_date). 
I would think that the range you want to check should be: ( start_date <= x <= end_date)

2) I just realised now that since you are doing a sort, maybe best to change the index so that the sort can also make use of it - preventing an in memory sort:

Eg rather than using  { zone_id: 1, starts_at: -1, ends_at: 1, complete: -1 }
Try moving starts_at as the first index element:

{starts_at: 1, ends_at: 1, zone_id: 1, complete: 1}. (direction of indexes does not matter as you are only sorting on one field). Only project these fields.

The query optimiser should now first use the index for sorting.

Hope this helps!

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/04be3ad5-2df7-4eff-9a53-793bea98ba01%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Kevin Adistambha

unread,
May 23, 2018, 2:20:19 AM5/23/18
to mongodb-user

Hi Roman

It’s been a while since you posted this question. Have you managed to get the performance you require?

What I noticed from the two log outputs you posted are this:

In the first output:

keysExamined:12442 docsExamined:12442 cursorExhausted:1 numYields:97 nreturned:0

So the query examined 12,442 index keys, and returned no output. This suggests that the index does not really match the query. Ideally, you should see keysExamined and nreturned as the same number, and docsExamined matching that number, or zero (if a covered query). In this output, MongoDB did the work of examining many thousands of index keys and documents for nothing.

In the second output:

keysExamined:11938 docsExamined:11938 cursorExhausted:1 numYields:96 nreturned:2203

This is slightly better, because it returns 2,203 documents after examining 11,938 of them, but still not great. However you may not be able to avoid this, depending on the distribution of your data. The query you’re doing also has two diverging ranges, which limits its selectivity and thus its performance.

I tried replicating your query by inserting 100,000 random documents using this mgeneratejs template:

{
  "zone_id": {"$choose": {"from": [123, 456, 789]}},
  "starts_at": {"$date": {"min": "2017-01-01", "max": "2017-12-31"}},
  "ends_at": {"$date": {"min": "2018-01-01", "max": "2018-12-31"}}
}

By following the “equality-sort-range” index field ordering as discussed in Optimizing MongoDB Compound Indexes, I created an index of:

db.test.createIndex({"zone_id":1, "starts_at":1, "ends_at":1})

And executing the query:

db.test.explain('executionStats').find(
    {"zone_id":123, "starts_at": {"$lte": new Date("2017-01-31")}, "ends_at": {"$gte": new Date("2018-12-01")}},
    {"_id":0, "zone_id":1, "starts_at":1, "ends_at":1})
    .sort({"starts_at": -1})

I managed to get the query to do:

"executionStats": {
    "executionSuccess": true,
    "nReturned": 218,
    "executionTimeMillis": 5,
    "totalKeysExamined": 2755,
    "totalDocsExamined": 0,

Note that this is a covered query, as evidenced by totalDocsExamined: 0 metric. This means that MongoDB answers this query purely from the index. This should speed things up since MongoDB doesn’t have to load the actual documents from disk. The small ratio of nReturned vs. totalKeysexamined is currently not avoidable due to the divergent nature of the query.

Regarding your indexes, I believe you have too many similar-looking indexes, which slows down the query somewhat because MongoDB must do the additional work of choosing the proper one to answer your query. Minimizing the number of similar-looking indexes and ensure that the remaining ones are highly optimized would be beneficial as well.

If you’re still having issues with this, could you post:

  • Your MongoDB version
  • The explain('executionStats') output of your query
  • Some example documents

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages