efficient querying of "exists"

95 views

Skip to first unread message

Daniel Doyle

unread,

Oct 11, 2018, 11:17:12 AM10/11/18

to mongodb-user

Hi all,

I have a database with documents such as

{

_id: ...........

fieldA: "foo",

fieldB: "bar"

{

_id: ...........

fieldA: "foo2"

}

The set of fields is consistent, but some documents don't have some fields because they aren't applicable, or they are set to null. An index exists on every field. I am having a really hard time trying to find an efficient way of executing some use cases, however.

One of the use cases is "show me the first 10 results for fieldB". In order to do that, I have to do something like "{fieldB: {$exists: true}}" in addition to whatever else I am doing. When looking at this through the explain, this is doing a really big scan through the index. The performance difference between specifying exists and without it is enormous. Doing "{fieldB: {$ne: null}}" seems to result in similar behavior, probably because negation isn't an efficient index loop.

Is there a more effective way to do this sort of operation? It's the difference between "returns instantly" and hangs for so long people think the application has died.

Kevin Adistambha

unread,

Oct 22, 2018, 11:19:19 PM10/22/18

to mongodb-user

Hi Daniel,

For the case where fieldB doesn’t exist, you can use a partial index. From the linked page:

Partial indexes only index the documents in a collection that meet a specified filter expression. By indexing a subset of the documents in a collection, partial indexes have lower storage requirements and reduced performance costs for index creation and maintenance.

The section titled Comparison with the sparse Index described your use case. For example:

> db.test.find()
{ "_id": 0, "a": 1, "b": 1 }
{ "_id": 1, "a": 2 }

You can create a partial index with a specification to only index documents where the field b exists:

> db.test.createIndex({b:1}, {partialFilterExpression: {b: {$exists:true}}})

By default, partial index won’t be used if it will create an incomplete result set. On the flip side, the query planner can use the partial index if it detects that the query will return a complete result set. To ensure the index is used, you can force MongoDB to use it by using hint():

> db.test.find()
{ "_id": 0, "a": 1, "b": 1 }
{ "_id": 1, "a": 2 }

> db.test.find().hint('b_1')
{ "_id": 0, "a": 1, "b": 1 }

Note that by hint()ing, the 2nd query above only returns the document which contains the field b. In other words, using the partial index implies having a query with {$exists: true} in this case.

Having said that, there are certain restrictions in creating a partial index. Namely, you cannot create a partial filter expression with a {$ne: null} option. In your case, you might be able to work around this requirement if you know the desired range of b values, such as:

> db.test.find({b:{$gte:1}}).hint('b_1')
{ "_id": 0, "a": 1, "b": 1 }

> db.test.find({b:{$gte:2}}).hint('b_1')
(no result)