Mongodb problem with sorting & indexes

625 views
Skip to first unread message

Manuel Tarantini

unread,
Nov 2, 2016, 11:32:05 AM11/2/16
to mongodb-user
Hello, 

After reading the offical documentation about indexes, sort, intersection sort, i'm a little bit lost in how everything work.
The problem I have is that a do some query that need to be sorted, and a single large collection (~4 millions of documents)

Lets say I have a document with 6 fields. 
The query I want to achieve is the following : db.mycollection.find({a:"OK", b:true, c:"provider", d:true, e:{ $gte:ISODate("2016-10-28T12:00:00Z"),$lt:ISODate("2016-10-28T12:00:10Z")} }).sort({ f: 1 })

Now I have created 2 indexes:
db.transactionEdr.createIndex({a: 1, b: 1, c: 1, d: 1, e:1 }, {background: true})
db.transactionEdr.createIndex({f: 1}, {background: true})

With this setup, I seams that I really don't use the indexes at their fullest, only the once on f is used, and I take a very long time to execute the query. I've read that cardinal & range are a problem with combine with a sort.

What should I change, or rather which index will be best. 

Thanks for your support.
 

Kevin Adistambha

unread,
Nov 9, 2016, 12:45:56 AM11/9/16
to mongodb-user

Hi

With this setup, I seams that I really don’t use the indexes at their fullest, only the once on f is used, and I take a very long time to execute the query. I’ve read that cardinal & range are a problem with combine with a sort.

What should I change, or rather which index will be best.

If you require sorting, it is advisable to have the sort criteria in your index, since MongoDB has a limit of 32 MB for an in-memory sort stage of a query (see Use Indexes to Sort Query Results). Exceeding this limit will make the query fail to produce any result.

The best tool to determine index effectiveness is using query explain() output. In short, you want to avoid stages named “SORT_KEY_GENERATOR” (which means an in-memory sort), and “COLLSCAN” (which means a collection scan). Another good indicator for index effectiveness is the numbers nReturned, totalKeysExamined, and totalDocsExamined in the explain('executionStats') output. It’s best if the nReturned number corresponds closely to the totalKeysExamined andtotalDocsExamined numbers.

For more details regarding this subject, please see: http://stackoverflow.com/a/36420196/5619724, and also my post illustrating compound index: https://groups.google.com/d/msg/mongodb-user/LhcipsTCpqw/lP-gyBz8BgAJ. Another good source to understand indexing is Optimizing MongoDB Compound Indexes. Please note that although the article was written in 2012 and uses outdated terms, the content is still relevant.

With your example query, it’s difficult to tell what index would be best without knowing your use case in detail, the cardinality of the fields, other queries you perform, and other things specific to your deployment. However, I did some cursory observation using some random data:

> db.test.findOne()
{
  "_id": ObjectId("58228641e5a536965c7f5d01"),
  "a": "Alpha",
  "b": true,
  "c": "Alpha",
  "d": false,
  "e": ISODate("2016-03-23T23:45:45Z"),
  "f": 395366
}

and found:

  • {a: 1, b: 1, c: 1, d: 1, e:1 } contains the “SORT_KEY_GENERATOR” stage, due to the field f not included in the index.
  • {f: 1} is only used for sorting and did not help with the rest of the query.
  • {a: 1, b: 1, c: 1, d: 1, f: 1, e: 1} works best due to the lack of “SORT_KEY_GENERATOR” and “COLLSCAN” stages.

Please note that the observation I did was using some random data. It’s best if you can examine the explain() result yourself to determine the best index for your use case.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages