Question on index performance (nscannedObjects)

Milan Gornik

unread,

Dec 6, 2011, 11:19:15 AM12/6/11

to mongodb...@googlegroups.com

Hi!

The typical query we run is like this:
db.Collection.find({VALUE: {$in: [“value1”, “value2”,
“value3”]}}).limit(25).sort({TIMESTAMP:-1});

We have index {TIMESTAMP:-1, VALUE:1} in place. We were satisfied with
its performance although it needs to scan through all records (as it
goes over TIMESTAMP) – TIMESTAMP is our sorting criteria so it worked
okay. We noticed high nscanned parameter when running explain() on our
queries – practically matching whole dataset size. On the other hand,
nscannedObjects was always exactly as limit for the count of results
(set with the call to limit()). My understanding was that Mongo server
is reading all it needs from the index itself and only loads records
from disk when it is sure document should be part of the resultset.
Since recently, we’re getting far worse performance of the same
queries and when we run explain() we can see that nscannedObjects got
much higher. Our total indexes size is still smaller than the total
memory (it's approx half of the RAM size) we have on the server, so I
was wondering what can lead to this behavior and can we make sure that
the whole index is indeed in memory?

Here is an example of explain() command output for some typical queries:
1) When searching for values which occur pretty frequent in our database:
{
"cursor" : "BtreeCursor TIMESTAMP_-1_VALUE_1 multi",
"nscanned" : 209,
"nscannedObjects" : 29,
"n" : 25,
"millis" : 1354,
"nYields" : 13,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
… }
2) When searching for values which occur less frequently:
{
"cursor" : "BtreeCursor TIMESTAMP_-1_VALUE_1 multi",
"nscanned" : 2613,
"nscannedObjects" : 135,
"n" : 25,
"millis" : 3840,
"nYields" : 70,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
… }
3) When searching for values which doesn't appear in our database:
{
"cursor" : "BtreeCursor TIMESTAMP_-1_VALUE_1 multi",
"nscanned" : 350382,
"nscannedObjects" : 16684,
"n" : 0,
"millis" : 369540,
"nYields" : 7254,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
… }

I have written results from when I initially tested this (with Mongo
1.8) and for the last case, nscannedObjects would always be 0 when we
searched for value which never occurs in the database. Could this be
caused by migrating to Mongo 2.0?

Thanks!
Milan Gornik

aaron

unread,

Dec 6, 2011, 5:29:14 PM12/6/11

to mongodb-csharp

Hi Milan,

There are a couple of factors which have combined to make the behavior
with respect to nscannedObjects different for you in 2.0. One of the
side effects of the fix for SERVER-3448 is that we are returning some
more results to the matcher than we did previously, in order to avoid
unbounded scans in the indexing system. In addition, we are stricter
about loading documents from disk when dealing with multikey indexes.
This was part of the work for SERVER-958. Unfortunately there are now
some cases where we do not need to load documents from disk when they
are present in a multikey index, but we are doing it anyway. This is
the case with the query you are running. Improving the performance of
this is SERVER-3103, which hasn't been implemented yet.

One thing to look into is whether your index really needs to be
multikey. Do you have any documents where VALUE is an array?

Another thing you might consider is adding an index on {VALUE:1}. The
index you have, {TIMESTAMP:-1,VALUE:1} will perform well if the values
you are searching for are very frequent. But the index {VALUE:1} will
perform well if the values are infrequent.

You might also want to keep an eye on SERVER-3310, which is for doing
queries exactly like the one you are interested in.

Thanks,
Aaron

Milan Gornik

unread,

Dec 7, 2011, 5:20:45 AM12/7/11

to mongodb-csharp

Hi Aaron,

Thanks for the reply and especially for the details you provided. I am
thinking about our options now - and it seems most reasonable for me
to revert back to Mongo 1.8. This will require downgrading instances,
but we had no problems combining 1.8 and 2.0 when we upgraded, so I
suppose it could work the other way around too.

Our VALUE field is array field, used to provide a text search. I tried
running the query with just one operand given to $in operator and it
runs fine, so it really is the count of operands in $in and the fact
that VALUE is array index that complicates things.

Kind regards,
Milan Gornik

Chris Nagele

unread,

Dec 7, 2011, 3:49:26 PM12/7/11

to mongodb-csharp

Hi Aaron,

Thanks for the detailed description.

Is there anything we can do besides downgrade to 1.8.3? We've already
upgraded all of our indexes and doing it all over again on a cluster
with 400GB of data will be a pain.

Chris

aaron

unread,

Dec 12, 2011, 6:13:26 PM12/12/11

to mongodb-csharp

Hi Chris,

Unfortunately there isn't a way to make this work without changing
your schema or indexes, or building new indexes. So if the need to
rebuild indexes is your only concern you might be just as well
switching to 1.8.3 and keeping the schema that works for you.

Aaron

Reply all

Reply to author

Forward