poor find-performance with { $type : 10 }

Anders Jonsson

unread,

Sep 1, 2011, 5:42:00 AM9/1/11

to mongodb-user

This is a continuation from http://groups.google.com/group/mongodb-user/browse_thread/thread/f34298f42220a979

I started a new thread to avoid confusion since the discussion in the
old one shifted along the way.

The short version:
I had problems with {$type : 10} in version 1.8.2, so I upgraded to
2.0 and rebuilt the indexes. After that, {$type : 10} queries work,
but are horribly slow (for some reason they scan the entire
collection).
If i create a new collection and insert data into it, { $type : 10 }
queries work as expected (meaning that they are really fast).
I've tried dropping the index completely and creating a new one. I've
also tried creating an index on just "Flags". Makes no difference. The
field "Flags" is normally saved as an array. This issue came up when
some of the documents had Flags set to null by mistake.

Can anyone think of a reason why an index would work on one collection
but not on another?

My next step is to copy the collection and see if this query will work
on the new one. I'd like to get to the bottom of this issue, since it
seems there is a bug somewhere in the database

Here are the results of a query with explain() that I ran. The result
(0 documents) is to be expected, so that's not an error.

> db.Visits.find({Flags : {$type : 10}}).limit(1).explain()

{
"cursor" : "BtreeCursor Flags_1_CompanyId_1_AccountId_1",
"nscanned" : 13320222,
"nscannedObjects" : 13320222,
"n" : 0,
"millis" : 16402010,
"nYields" : 88032,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"Flags" : [
[
null,
null
]
],
"CompanyId" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"AccountId" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}

Nat

unread,

Sep 1, 2011, 6:32:07 AM9/1/11

to mongod...@googlegroups.com

Can you also attach the plan from the new collection as well? I think that it is not fast because your index has quite high cardinality and it has to scan through them all.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Anders Jonsson

unread,

Sep 1, 2011, 8:03:56 AM9/1/11

to mongodb-user

Is that a reasonable explanation, when all other queries against this
index ("Flags" : null, "Flags" : 1 etc) are extremely fast? There are
about 5 different values of Flags in the entire collection, so the
cardinality of Flags is very low.

I tried making yet another new collection and managed to recreate this
issue with the following code.

> for (var i = 1; i <= 1000000; i++) db.test.save({x : i+1000000})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [1]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [1,3]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [1,2,3]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [2,3]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [2]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [3]})
> for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags : [1,2]})
> db.test.ensureIndex({ Flags : 1})
> db.test.find({ Flags : { $type : 10}}).limit(1).explain()
{
"cursor" : "BtreeCursor Flags_1",
"nscanned" : 1000000,
"nscannedObjects" : 1000000,
"n" : 0,
"millis" : 1820,
"nYields" : 0,

"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"Flags" : [
[
null,
null
]
]
}
}

> db.test.find({ Flags : null}).limit(1).explain()
{
"cursor" : "BtreeCursor Flags_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,

"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"Flags" : [
[
null,
null
]
]
}
}

> db.test.find({ Flags : 1}).limit(1).explain()
{
"cursor" : "BtreeCursor Flags_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 6,
"nYields" : 0,

"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"Flags" : [
[

1,
1
]
]
}
}

It seems like { $type : 10} scans all documents that doesn't have the
field and tries to check that for null. Is that the way it's supposed
to be?

I'm not sure what the difference is between this collection and the
one i made earlier. That example didn't have more than three values of
Flags (not set, null and [1]). That may be it..

Another example: (the only difference is that I insert 100 records
with Flags set to null aswell). Remember to drop "test" between these
examples.

for (var i = 1; i <= 1000000; i++) db.test.save({x : i+1000000})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[1]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[1,3]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[1,2,3]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[2,3]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[2]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[3]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
[1,2]})
for (var i = 1; i <= 100; i++) db.test.save({x : i+1000000, Flags :
null})
db.test.ensureIndex({ Flags : 1})
db.test.find({ Flags : { $type : 10}}).limit(1).explain()
{
"cursor" : "BtreeCursor Flags_1",
"nscanned" : 787654,
"nscannedObjects" : 787654,
"n" : 1,
"millis" : 1513,
"nYields" : 0,

"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"Flags" : [
[
null,
null
]
]
}
}

787654 seems like a very random number of documents to scan. And for
some reason the presence of those 100 documents with Flags : null,
changes nscanned from 1M to ~800k

I'm sure there's a reason for this behaviour but I'm struggling to
understand it.

/Anders

On 1 Sep, 12:32, "Nat" <nat.lu...@gmail.com> wrote:
> Can you also attach the plan from the new collection as well? I think that it is not fast because your index has quite high cardinality and it has to scan through them all.
>
>
>
>
>
>
>
> -----Original Message-----
> From: Anders Jonsson <anders.jons...@gmail.com>
> Sender: mongod...@googlegroups.com
> Date: Thu, 1 Sep 2011 02:42:00
> To: mongodb-user<mongod...@googlegroups.com>
> Reply-To: mongod...@googlegroups.com
> Subject: [mongodb-user] poor find-performance with { $type : 10 }
>

> This is a continuation fromhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f342...

Anders Jonsson

unread,

Sep 1, 2011, 8:24:51 AM9/1/11

to mongodb-user

When I though about it I guess that the "nscanned" in the second
example is caused by limit(1). The query probably happened upon a
document with Flags : null at position 787654. So ignore that example.

Seems like the indexes doesn't make any difference between null and
not set so that mongodb has to scan all documents where the field is
null or not set, even when querying for { $type : 10 }. Anyone able to
confirm this?

/Anders

Nat

unread,

Sep 1, 2011, 8:45:04 AM9/1/11

to mongod...@googlegroups.com

- There is actually a subtle different if you use a sparse index where missing field will not be stored.
- when you use $type query, even though it can leverage an index, it will only use an index as a covered index.
- if you search specifically for null, or other value, it can simply use binary search-like logic to find the exact match.
- when you insert data in order, it's likely that your btree index will be in order and quite compact. That's probably why your new collection is much faster.

Anders Jonsson

unread,

Sep 1, 2011, 9:18:36 AM9/1/11

to mongodb-user

My new collection is much faster, yes, but it's still too slow to be
useful. This is not really a problem for us since the find for
{ $type: 10} was, as I've mentioned multiple times, a one time thing.
I'm just trying to find out the reason for this behaviour in the
database, since I don't like surprises, and the poor performance here
really surprised me, since I'm very pleased with mongodb's performance
in general.

A sparse index would work for this test case, but the index in my real
collection has a compound key, so it can't be sparse.
Covered indexes can't be used for arrays, right?

I can't really gather if your answer means that this behaviour is the
way it's supposed to be or not. Is this working the way it's intended?
If it is I can live with that. I just saw a weird behaviour and wanted
to point it out.

/Anders

> ...
>
> läs mer »

Anders Jonsson

unread,

Sep 1, 2011, 9:37:28 AM9/1/11

to mongodb-user

One last thing. If this behaviour is expected: maybe it should be
documented on http://www.mongodb.org/display/DOCS/Querying+and+nulls

/Anders

> ...
>
> läs mer »

Kyle Banker

unread,

Sep 1, 2011, 12:28:38 PM9/1/11

to mongod...@googlegroups.com

If you query for {flags: {$type: 10}}, then you're asking for every document with a 'flags' key whose value is the BSON object of type 10 (null).

When you insert 1M documents without the flags key, but then index the flags key, then by default, the value of flags for every document

must be placed into the index. If the flags key doesn't exist for the document, then a null value is inserted into the index for that document. That's

why you're seeing a nscanned value of 1M, and that's why the query is slow.

No, you can't use a covered index to return an indexed array.

Does this answer your question?

Anders Jonsson

unread,

Sep 1, 2011, 3:23:33 PM9/1/11

to mongodb-user

Loud and clear :)

Thank you for the answer

/Anders

Nat

unread,

Sep 2, 2011, 4:56:23 AM9/2/11

to mongod...@googlegroups.com

Read more about the covered index here. It mentions explicitly that as soon as an index becomes a multikey index, it will disable covered index.

http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes

Reply all

Reply to author

Forward