Mongodb $in operator performance impact when input array sorted or unsorted

703 views
Skip to first unread message

anoop...@flochat.in

unread,
Dec 27, 2017, 4:52:26 AM12/27/17
to mongodb-user

I have a collection with 50 million+ documents of unique mobile numbers in my mongodb. There is find query on indexed field mobile using $in operator. Input array size range may vary from 1 to 5000. As per mongodb doc indexing using B+Tree structure. Complexity of this is Nlog(M), where M is number of indexed key and N is size of input array(unsorted). now the question is If I am sorting the input array N (sorted) and then searching in db, is there any option to optimise number of comparisons by sorting input array?

Number of comparisons for each element in input array-

  1st element=>(1*log(M))
  2nd element=>(1*log(M - IndexOf last searched element))
  3rd element=>(1*log(M - IndexOf last searched element))
  .....
  Nth element=>(1*log(M - IndexOf last searched element))

Or

Internally mongodb does like this only?
Or

Minimising number of comparison as above does not impact on searching time?

Or

It may create overhead for search which will increase search time?

Kevin Adistambha

unread,
Dec 29, 2017, 1:31:39 AM12/29/17
to mongodb-user

Hi

I believe the $in query is parsed as an ordered set. It’s evident by looking at the explain() output:

> db.test.explain().find({a:{$in:[3,2,1,2,3]}})
{
  "queryPlanner": {
    "plannerVersion": 1,
    "namespace": "test.test",
    "indexFilterSet": false,
    "parsedQuery": {
      "a": {
        "$in": [
          1,
          2,
          3
        ]
      }
    },

...

Note that the parsed query section shows that the array [3,2,1,2,3] is parsed as [1,2,3].

Having said that, I don’t think query parsing will take a significant amount of time as to be detrimental to the query time, since: 1) this is only done once per query, and 2) there are other parts of the query that have more effect toward performance. It’s much more important to have a sensible $in content (e.g. not having thousands of elements in it) and to have proper indexes in place. That is, if you don’t have the proper indexes, it is much more expensive to fetch documents from disk vs. parsing the query.

Please see Create Indexes to Support Your Queries for more information.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages