COUNT_SCAN vs IXSCAN when query includes null

1,650 views
Skip to first unread message

nonagon

unread,
Dec 28, 2016, 2:44:59 AM12/28/16
to mongodb-user
I'm working with mongo 3.2.11 and I have a collection containing ~40 million documents. I'd like to count the number of documents where a certain field has certain values. I found that some queries are covered and some are not:

These queries show COUNT_SCAN in the output of explain():
db.collection.find({myfield:100}).count()
db.collection.find({myfield:{$gt:10}}).count()
db.collection.find({myfield:{$lt:10}}).count()

These queries show IXSCAN -> FETCH in the output of explain():
db.collection.find({myfield:null}).count()
db.collection.find({myfield:{$ne:null}}).count()
db.collection.find({myfield:{$exists:true}}).count()
db.collection.find({myfield:{$type:10}}).count()

My understanding is that COUNT_SCAN means the query is covered and documents are not examined (i.e. only the index is examined to fulfill the query). From the above I can guess that anything implying an existence check cannot be covered. I'm wondering:

1) Is it true that indices can't cover count operations involving an existence check? Is this specific to count (i.e. if I don't count and I use a suitable projection can I cover the above queries)?
2) If so is this documented anywhere? I'd like to understand the rules here since the COUNT_SCAN operations take far less than a second and the IXSCAN->FETCH operations take more than a minute to run.
3) In general is the output of explain() documented in more detail anywhere? I'd like to confirm my understand of COUNT_SCAN vs IXSCAN etc.

Regards,
Jeremy

Attila Tozser

unread,
Dec 28, 2016, 8:37:35 AM12/28/16
to mongod...@googlegroups.com
Hi Jeremy, 

Creating a sparse index on the field seems to work on the exists query, thought it does not help on the others:

> db.coll.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"name" : "_id_",
"ns" : "test.coll"
},
{
"v" : 1,
"key" : {
"a" : 1
},
"name" : "a_1",
"ns" : "test.coll",
"sparse" : true
}
]


db.runCommand({explain:{count:'coll',query:{a:{$exists:true}}}})

"executionStats" : {
"totalKeysExamined" : 2001,
"totalDocsExamined" : 0,
"executionStages" : {
"stage" : "COUNT",
"nReturned" : 0,
"executionTimeMillisEstimate" : 0,
"works" : 2001,
"advanced" : 0,
"needTime" : 2000,
"needYield" : 0,
"saveState" : 15,
"restoreState" : 15,
"isEOF" : 1,
"invalidates" : 0,
"nCounted" : 2000,
"nSkipped" : 0,
"inputStage" : {
"stage" : "COUNT_SCAN",
"nReturned" : 2000,
"executionTimeMillisEstimate" : 0,


On the other hand, 

both :
db.runCommand({explain:{count:'coll',query:{a:{$ne:null}}}})
db.runCommand({explain:{count:'coll',query:{a:null}}})

will be collscan. (that you can work around with a compound index a:1,b:1, you may have such on other fields)


Best, 
Attila

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d46893c7-8796-461c-acad-9b94987d8836%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nonagon

unread,
Dec 29, 2016, 10:51:06 PM12/29/16
to mongodb-user
Hi Attila,

Thanks for the tip, I didn't know that a sparse index would be better on the $exists query. Out of curiosity I tried with a partial index (e.g. with $exists: true as the filter) but couldn't get any of the other queries to use COUNT_SCAN.

Do you know if these behaviors are documented anywhere? It seems like a bug to me that mongo needs to fetch all of the documents to count with the "$eq: null" and "$ne: null" queries.

Regards,
Jeremy
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

Kevin Adistambha

unread,
Jan 3, 2017, 1:13:32 AM1/3/17
to mongodb-user

Hi Jeremy

Do you know if these behaviors are documented anywhere? It seems like a bug to me that mongo needs to fetch all of the documents to count with the “$eq: null” and “$ne: null” queries.

Part of the reason is because they’re different queries, and null values are treated differently from exact values:

> db.test.find()
{ "_id": ObjectId("586b3557af05082123fcb2f8"), "a": 1 }
{ "_id": ObjectId("586b3559af05082123fcb2f9"), "b": 1 }
{ "_id": ObjectId("586b360daf05082123fcb2fa"), "a": null, "c": 1 }

> db.test.find({a:null})
{ "_id": ObjectId("586b3559af05082123fcb2f9"), "b": 1 }
{ "_id": ObjectId("586b360daf05082123fcb2fa"), "a": null, "c": 1 }

> db.test.find({a:{$ne:null}})
{ "_id": ObjectId("586b3557af05082123fcb2f8"), "a": 1 }

{a:null} will match documents where {a:null} and also documents where a doesn’t exist, while {a:{$ne:null}} will match documents where field a exists and its value is not null. For more information, please see: https://docs.mongodb.com/v3.2/tutorial/query-for-null-fields/

Best regards,
Kevin

nonagon

unread,
Jan 6, 2017, 4:54:43 PM1/6/17
to mongodb-user
Hi Kevin,

Yes, I do understand how null works in queries. I don't see why mongo would be need to fetch documents from disk in order to count documents matching those queries, however.

Do you know whether there is anything inherent in the way mongo stores indices that requires it to fetch documents in order to count with an $eq:null query? In that case I'd like to understand the limitation, since it was surprising to me that these queries were so slow (again in my production environment they take something like 2 minutes to complete while a covered query executes in just a few milliseconds).

The alternative (that I can see) is that mongo could optimize these queries but this hasn't been implemented yet. In that case I'd definitely keep my eye on the relevant feature request or bug since it will make a huge difference in many cases in my production environment.

Regards,
Jeremy

Attila Tozser

unread,
Jan 6, 2017, 6:31:36 PM1/6/17
to mongod...@googlegroups.com
Hi Jeremy, 

In the meantime i found this ticket https://jira.mongodb.org/browse/SERVER-18653

Which initiated the feature request: https://jira.mongodb.org/browse/SERVER-18861

In open unresolved state. 

Best, 
Attila

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Jeremy Todd

unread,
Jan 7, 2017, 11:21:23 PM1/7/17
to mongod...@googlegroups.com
Aha that's a great find, that is indeed exactly my issue (and exactly my frustration too!). Thanks for finding it, I'll keep an eye on the feature request.

Regards,
Jeremy


You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/O-fMpLuzulY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Astro

unread,
Jun 2, 2017, 11:28:29 AM6/2/17
to mongodb-user
Hi Jeremy,

Did you guys able to optimize such kind of calls which includes a match with null?

Any help will be appreciated. 

Jeremy Todd

unread,
Jun 2, 2017, 2:16:41 PM6/2/17
to mongod...@googlegroups.com
I did indeed find that queries like this:

db.test.find({a:{$gt:0}}).count()

will use COUNT_SCAN and they will match only documents where a both exists and is an integer greater than 0.

This was useful for some (but not all) of my use cases. In general if you have a lower bound (or just use the data type's min and $gte) and the field only has a single data type, this can work. I haven't tested it for all data types though.

I tried to be clever and use constructs such as {$not: {$lt: 0}} to match multiple data types (because I think mismatched data types always return FALSE no matter what the comparison operation is). Unfortunately those types of queries did not show COUNT_SCAN in the output of explain().

Jeremy


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/O-fMpLuzulY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Monika Shah

unread,
Apr 9, 2018, 1:00:03 PM4/9/18
to mongodb-user
I have read explanation given in ticket.
Somehow , I do not agree or understand explanation given about Rassi.

As per Rassi,
  1. The document {} generates the index key {"": null} for the index with key pattern {"a.b": 1}.
  2. The document {a: []} also generates the index key {"": null} for the index with key pattern {"a.b": 1}.
  3. The document {} matches the query {"a.b": null}.
  4. The document {a: []} does not match the query {"a.b": null}.

Therefore, documents need to be fetched.


I do have two counter arguments.

i).    If we agree with this argument, Why should IXSCAN required at all ? It can be only document scan.

ii)    considering same argument, for query {a:123} also should fetch documents.

To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Monika Shah

unread,
Apr 9, 2018, 1:11:44 PM4/9/18
to mongodb-user
To verify my some doubts similar to this, I would like to verify my query plans.
For example
Count_scan(key:a, bounds[MinKey,null),(null,MaxKey]) + count

OR
IXintersection({Key:a, bounds[a1,a2]},{key:b, bounds[b1,b2]}) + count


How can I ?


On Friday, June 2, 2017 at 11:46:41 PM UTC+5:30, nonagon wrote:
I did indeed find that queries like this:

db.test.find({a:{$gt:0}}).count()

will use COUNT_SCAN and they will match only documents where a both exists and is an integer greater than 0.

This was useful for some (but not all) of my use cases. In general if you have a lower bound (or just use the data type's min and $gte) and the field only has a single data type, this can work. I haven't tested it for all data types though.

I tried to be clever and use constructs such as {$not: {$lt: 0}} to match multiple data types (because I think mismatched data types always return FALSE no matter what the comparison operation is). Unfortunately those types of queries did not show COUNT_SCAN in the output of explain().

Jeremy

On Fri, Jun 2, 2017 at 11:28 AM, Astro <andhar...@gmail.com> wrote:
Hi Jeremy,

Did you guys able to optimize such kind of calls which includes a match with null?

Any help will be appreciated. 

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/O-fMpLuzulY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

Monika Shah

unread,
Apr 9, 2018, 1:18:12 PM4/9/18
to mongodb-user
Still, I could not received answer of question : difference between IXSCAN and COUNT_SCAN.
I have seen many query , where IXSCAN is used , but docexamined are 0.
For example,
   .find({a:{$ne:v}}.count()     use IXSCAN + count stage, but docexamined are 0.
But,
   .find({a:{$ne:null}}.count()     use IXSCAN + count stage, but docexamined are same as key examined.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.

Kevin Adistambha

unread,
May 6, 2018, 9:56:21 PM5/6/18
to mongodb-user

Hi Monika

.find({a:{$ne:v}}.count() use IXSCAN + count stage, but docexamined are 0.

.find({a:{$ne:null}}.count() use IXSCAN + count stage, but docexamined are same as key examined.

Generally, having zero docs examined means that the query can be answered by using the index. This is as opposed to having a non-zero docs examined value, which means that the query cannot answer the query without loading the document into memory, and verify that the document satisfies the conditions of the query.

Having a zero docs examined is the goal of having a covered query, means that the query can be answered using only the index. This can typically be achieved by using a well-defined boundaries on the query, along with the use of projections to ensure that the output fields are aligned with the index.

In terms of your two examples, the first example shows a defined boundaries: documents having field a not equal to some value.

In contrast, the second example is less defined: documents having field a containing null, or documents with no field a at all. Since in MongoDB both cases are recorded the same way in the index, MongoDB must reconfirm the exact situation by inspecting the document.

I believe the relevant ticket for you would be SERVER-12869 which is the ticket that suggests to index null values and missing values differently. Please watch/upvote/comment on the ticket if you find it relevant.

Best regards
Kevin

Monika Shah

unread,
May 12, 2018, 11:29:40 AM5/12/18
to mongodb-user
I think both cases are same.

1. Like find({a:{$ne:null}}.count(), In case of .find({a:{$ne:v}}.count() , it need to search document with no field a

Please correct me, wherever required

Monika Shah

unread,
May 12, 2018, 11:37:43 AM5/12/18
to mongodb-user
Hi Kevin,

I am thankful to you for answering all my arguments on processing.

I would be very grateful , if you can help me to solve my own query processing plan
  for e.g. Stage1: IXSCAN (f1:[minKey,maxKey])
parallel   Stage 1: IXSCAN (f2:[MinKey,v), (v,MaxKey])
             Stage 2: AND_SORT (index intersection)
             Stage 3: countty

I want to check some parameter at each stage for verifying my optimizing algorithm


It would be great help of one in community version  to check new approaches. And one resulting good can be approached to team

On Monday, May 7, 2018 at 7:26:21 AM UTC+5:30, Kevin Adistambha wrote:
Reply all
Reply to author
Forward
0 new messages