how to find index size (query benchmarks)

Valentin Kuznetsov

unread,

Sep 17, 2010, 12:17:46 PM9/17/10

to mongodb-user

Hi,
I'm doing benchmarking tests and found degradation of queries by
significant factor. I am not sure if I can claim this statement, since
I just see that queries takes longer time and I'm looking for advise.
The setup I test is the following. I inserted certain amount of
record, whose size is 280 bytes and measured query respond time in
average for random query.

1. 500K records, query respond was negligible that I didn't bother to
run explain on it :)

2. 5M records, query respond time becomes visible and here is explain
plan
db.merge.find({'block.name':pat}).explain()
{u'allPlans': [{u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name': [[<_sre.sre_pattern>,
<_sre.sre_pattern>],
[{}, u'']]}}],
u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name':
[[<_sre.sre_pattern>,<_sre.sre_pattern>],
[{}, u'']]},
u'millis': 5109,
u'n': 80506,
u'nscanned': 4488492,
u'nscannedObjects': 80506,
u'oldPlan': {u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name': [[<_sre.sre_pattern>,
<_sre.sre_pattern>],
[{}, u'']]}}}

3. 50M records, query look-up time jump by factor of 10 and here is
explain plan
{u'allPlans': [{u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name': [[<_sre.SRE_Pattern
object at 0x3a2f250>,
<_sre.SRE_Pattern
object at 0x3a2f250>],
[{}, u'']]}}],
u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name': [[<_sre.SRE_Pattern object at
0x3a2f250>,
<_sre.SRE_Pattern object at
0x3a2f250>],
[{}, u'']]},
u'millis': 49771,
u'n': 865522,
u'nscanned': 44988492,
u'nscannedObjects': 865522,
u'oldPlan': {u'cursor': u'BtreeCursor block.name_-1 multi',
u'indexBounds': {u'block.name': [[<_sre.SRE_Pattern
object at 0x3a2f250>,
<_sre.SRE_Pattern
object at 0x3a2f250>],
[{}, u'']]}}}

What I observed is query look-up time scale up with number of records.
What I don't know if the numbers I obtain is reasonable. And what I
want to know is my index fit into RAM that I don't degrade with my
tests. My machine is Linux with 8 cores and 16GB of ram.
Thanks,
Valentin.

Eliot Horowitz

unread,

Sep 17, 2010, 1:35:18 PM9/17/10

to mongod...@googlegroups.com

explain() counts the time to find all of the matching records, in this
case 865522.
also - do you have lots of the same values for block.name in the same
object? that's what it seems like, but hard to tell.

can you run the explain from the shell and paste that output?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Valentin Kuznetsov

unread,

Sep 17, 2010, 3:04:57 PM9/17/10

to mongodb-user

Yes, I generated identical records which differ only by block.name. Is
it bad? Here is explain from mongo shell, I hope I get it right. For
completeness here is python query

pat = re.compile("/W.*")
db.merge.find({"block.name": pat}).explain()

and here is how I queries in mongo shell

> db.merge.find({"block.name": /W.*/}).explain();
{
"cursor" : "BtreeCursor block.name_-1 multi",
"nscanned" : 44988492,
"nscannedObjects" : 4433964,
"n" : 4433964,
"millis" : 478437,
"indexBounds" : {
"block.name" : [
[
/W.*/,
/W.*/
],
[
{

},
""
]
]
}
}

> db.merge.find({block.name : /W.*/}).explain();
SyntaxError: missing : after property id

Eliot Horowitz

unread,

Sep 17, 2010, 3:06:46 PM9/17/10

to mongod...@googlegroups.com

Ok - the problem is using a regex without a ^ means it can't use the index.
So it has to scan every item in the collection, which will increase
linearly with the number of objects.

Reply all

Reply to author

Forward