Re: Slow query on collection

90 views
Skip to first unread message

Alexander

unread,
Sep 20, 2012, 5:56:05 AM9/20/12
to mongod...@googlegroups.com
Update. The indexes were pasted wrongly:
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "serial" : 1,
                        "productid" : 1
                },
                "unique" : true,
                "ns" : "pinvault.pins",
                "name" : "serial_1_productid_1"
        },
        {
                "v" : 1,
                "key" : {
                        "tan" : 1
                },
                "unique" : true,
                "ns" : "pinvault.pins",
                "name" : "tan_1",
                "sparse" : true
        },
        {
                "v" : 1,
                "key" : {
                        "validthru" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "validthru_1"
        },
        {
                "v" : 1,
                "key" : {
                        "validfrom" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "validfrom_1"
        }
]


Index usage:

       "keysPerIndex" : {
                "pinvault.pins.$_id_" : 200000,
                "pinvault.pins.$serial_1_productid_1" : 200000,
                "pinvault.pins.$tan_1" : 3294,
                "pinvault.pins.$validthru_1" : 200000,
                "pinvault.pins.$validfrom_1" : 200000

        }

Am Donnerstag, 20. September 2012 07:53:51 UTC schrieb Alexander Sahler:
Hi all.

I've encountered a very strange behavior on mongodb.
I'm using version 2.2.0 on debian with the java driver.


I have two queries on the same dataset, one of which is blazingly fast, the other one (same syntax) is very slow.
The intention is, to select a secret pin from a collection (pins) and mark it as reserved for a distinct transaction (tan) in an atomic operation.

When logging the queries withe the profiler, one can see that the only difference in the query is the field 'poolid' and the trace number of the request, both given from outside. Please have a look on the timeLockedMicros attribute. The indexes seem to be very selective, so nscanned is low (actually 1).

{ "ts" : ISODate("2012-09-20T07:20:49.334Z"), "op" : "command", "ns" : "pinvault.$cmd", "command" : { "findandmodify" : "pins", "query" : { "productid" : 3868, "poolid" : 888, "state" : "n", "valid-from" : { "$lte" : ISODate("2012-09-20T07:21:30.423Z") }, "valid-thru" : { "$gt" : ISODate("2012-09-20T07:21:30.423Z") } }, "update" : { "$set" : { "state" : "r", "tan" : "BADFACE91888", "reserved" : ISODate("2012-09-20T07:21:30.423Z") } }, "new" : true }, "updateobj" : { "$set" : { "state" : "r", "tan" : "BADFACE91888", "reserved" : ISODate("2012-09-20T07:21:30.423Z") } }, "ntoreturn" : 1, "nscanned" : 1, "moved" : true, "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 0, "numYield" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(383076) }, "timeAcquiringMicros" : { "r" : NumberLong(0), "w" : NumberLong(15) } }, "responseLength" : 332, "millis" : 383, "client" : "172.21.3.3", "user" : "" }


{ "ts" : ISODate("2012-09-20T07:26:15.816Z"), "op" : "command", "ns" : "pinvault.$cmd", "command" : { "findandmodify" : "pins", "query" : { "productid" : 3868, "poolid" : 999, "state" : "n", "valid-from" : { "$lte" : ISODate("2012-09-20T07:26:57.290Z") }, "valid-thru" : { "$gt" : ISODate("2012-09-20T07:26:57.290Z") } }, "update" : { "$set" : { "state" : "r", "tan" : "BADFACE92999", "reserved" : ISODate("2012-09-20T07:26:57.290Z") } }, "new" : true }, "updateobj" : { "$set" : { "state" : "r", "tan" : "BADFACE92999", "reserved" : ISODate("2012-09-20T07:26:57.290Z") } }, "ntoreturn" : 1, "nscanned" : 1, "moved" : true, "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 0, "numYield" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0), "w" : NumberLong(1102) }, "timeAcquiringMicros" : { "r" : NumberLong(0), "w" : NumberLong(12) } }, "responseLength" : 314, "millis" : 1, "client" : "172.21.3.3", "user" : "" }

Here is the configuration of the database indexes:
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "serial" : 1,
                        "productid" : 1
                },
                "unique" : true,
                "ns" : "pinvault.pins",
                "name" : "serial_1_productid_1"
        },
        {
                "v" : 1,
                "key" : {
                        "valid-from" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "valid-from_1"
        },
        {
                "v" : 1,
                "key" : {
                        "tan" : 1
                },
                "unique" : true,
                "ns" : "pinvault.pins",
                "name" : "tan_1",
                "sparse" : true
        },
        {
                "v" : 1,
                "key" : {
                        "valid_thru" : 1
                },
                "ns" : "pinvault.pins",
                "name" : "valid_thru_1"
        }
]

Index stats:
"keysPerIndex" : {
                "pinvault.pins.$_id_" : 209999,
                "pinvault.pins.$serial_1_productid_1" : 209999,
                "pinvault.pins.$valid-from_1" : 209999,
                "pinvault.pins.$tan_1" : 57400,
                "pinvault.pins.$valid_thru_1" : 209999
        },

Data distribution:
[
        {
                "productid" : 3868,
                "state" : "n",
                "poolid" : 999,
                "csum" : 53392
        },
        {
                "productid" : 3868,
                "state" : "r",
                "poolid" : 999,
                "csum" : 56607
        },
        {
                "productid" : 3868,
                "state" : "r",
                "poolid" : 888,
                "csum" : 793
        },
        {
                "productid" : 3868,
                "state" : "n",
                "poolid" : 888,
                "csum" : 99207
        }
]


What else do you need?

gregor

unread,
Sep 20, 2012, 6:34:39 AM9/20/12
to mongod...@googlegroups.com
Can you run an find/explain from the mongo shell with the two queries?

Alexander

unread,
Sep 20, 2012, 8:16:52 AM9/20/12
to mongod...@googlegroups.com
Hi Gregor.

Thanks for your quick answer. I took the query part from the findandmodify command and did a plan explanation. It yields the following (for both the poolid 888 and poolid 999). But find() returns the whole set, so I did a limit(1). The times don't differ no matter if I set limit to 1 or not.

db.pins.find({       
     query: { productid: 3868, poolid: 999, state: "n",             
        validfrom: { $lte: new Date(1348142051761) },             
        validthru: { $gt: new Date(1348142051761) } },         
    sort: { validthru: 1 }
}).limit(1).explain()
{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 200000,
        "nscanned" : 200000,
        "nscannedObjectsAllPlans" : 200000,
        "nscannedAllPlans" : 200000,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 148,
        "indexBounds" : {

        },
        "server" : "cassi1:27017"

gregor

unread,
Sep 24, 2012, 5:04:47 AM9/24/12
to mongod...@googlegroups.com
For some reason this query didn't use an index - can you list indexes with
db.system.indexes.find()

Alexander

unread,
Sep 24, 2012, 6:11:39 AM9/24/12
to mongod...@googlegroups.com
Hello Gregor.

Here you go:

rs1:PRIMARY> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "ns" : "pinvault.pins", "name" : "_id_" }
{ "v" : 1, "key" : { "serial" : 1, "productid" : 1 }, "unique" : true, "ns" : "pinvault.pins", "name" : "serial_1_productid_1" }
{ "v" : 1, "key" : { "tan" : 1 }, "unique" : true, "ns" : "pinvault.pins", "name" : "tan_1", "sparse" : true }
{ "v" : 1, "key" : { "validthru" : 1 }, "ns" : "pinvault.pins", "name" : "validthru_1" }
{ "v" : 1, "key" : { "validfrom" : 1 }, "ns" : "pinvault.pins", "name" : "validfrom_1" }
{ "v" : 1, "key" : { "productid" : 1, "poolid" : 1, "state" : 1 }, "ns" : "pinvault.pins", "name" : "productid_1_poolid_1_state_1" }

Best regards, Alexander.

gregor

unread,
Sep 24, 2012, 7:06:55 AM9/24/12
to mongod...@googlegroups.com
Is this product_id_1_poolid_1_state_1 index new? For some reason the query is not selecting any index and is using basic cursor. Are you using a sharded cluster or a replica set? Can you try without the sort clause and see if it uses an index? Or maybe try hint 
to hint this index and paste the results. 

Alexander

unread,
Sep 24, 2012, 7:14:09 AM9/24/12
to mongod...@googlegroups.com
What do you mean by 'is this index new? I'm using a two-server replica set (rs1) without sharding.

I tried a hint query already: As this seems to work, the nscannedObjects is still high:

rs1:PRIMARY> db.pins.find({      query: { productid: 3868, poolid: 999, state: "n",                       validfrom: { $lte: new Date(1348142051761) },                       validthru: { $gt: new Date(1348142051761) } },               sort: { validthru: 1 }  }).hint("productid_1_poolid_1_state_1").limit(1).explain()
{
"cursor" : "BtreeCursor productid_1_poolid_1_state_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 300000,
"nscanned" : 300000,
"nscannedObjectsAllPlans" : 300000,
"nscannedAllPlans" : 300000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 461,
"indexBounds" : {
"productid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"poolid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "cassi1:27017"
}

And without the 'sort':

rs1:PRIMARY> db.pins.find({      query: { productid: 3868, poolid: 999, state: "n",                       validfrom: { $lte: new Date(1348142051761) },                       validthru: { $gt: new Date(1348142051761) } } }).hint("productid_1_poolid_1_state_1").limit(1).explain()
{
"cursor" : "BtreeCursor productid_1_poolid_1_state_1",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 300000,
"nscanned" : 300000,
"nscannedObjectsAllPlans" : 300000,
"nscannedAllPlans" : 300000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 443,
"indexBounds" : {
"productid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"poolid" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"state" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
},
"server" : "cassi1:27017"
}


Alexander

unread,
Sep 24, 2012, 7:15:07 AM9/24/12
to mongod...@googlegroups.com
Without sort + hint:

db.pins.find({      query: { productid: 3868, poolid: 999, state: "n",                       validfrom: { $lte: new Date(1348142051761) },                       validthru: { $gt: new Date(1348142051761) } } }).limit(1).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 300000,
"nscanned" : 300000,
"nscannedObjectsAllPlans" : 300000,
"nscannedAllPlans" : 300000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 227,

Alexander

unread,
Sep 24, 2012, 8:23:11 AM9/24/12
to mongod...@googlegroups.com
The main problem seems to be the sort operation. If I remove it, I can do up to 160 updates per second!

But the challenge is, to get the data out of the database, that's within a certain validity date. If pins are taken in natural order it may happen, that pins with a later validity date are preferred about others, that will just end their validity in the near future.

gregor

unread,
Sep 24, 2012, 9:40:57 AM9/24/12
to mongod...@googlegroups.com
The number of documents returned by your find is always 0
"n": 0
Can you paste an example of a document in that collection?

Alexander

unread,
Sep 24, 2012, 11:09:53 AM9/24/12
to mongod...@googlegroups.com
Oops you're right. I imported the database again and the validfrom timestamp was bigger then. So I corrected the query:

db.pins.find({ query: { productid: 3868, poolid: 999, state: "n",
                        "validfrom" : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")} ,          
                        "validthru" : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")} },
               sort: { validthru: 1 }  }).limit(1)
Gives:
{ "_id" : ObjectId("505ae02a30c7d7555f2e3251"), "serial" : "5911058", "poolid" : 999, "productid" : 3868, "encpin" : "93FCFD8A66695F44CBF01D2D0F341FDB22B41F501C352F72", "validthru" : ISODate("2099-12-31T23:59:59Z"), "validfrom" : ISODate("2012-09-20T09:21:28.344Z"), "imported" : ISODate("2012-09-20T09:21:28.344Z"), "state" : "n" }

The same query with explain gives (!):
rs1:PRIMARY> db.pins.find({      query: { productid: 3868, poolid: 999, state: "n",                       "validfrom" : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")} ,          "validthru" : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")} },      sort: { validthru: 1 }  }).limit(1).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 0,
"nscannedObjects" : 300000,
"nscanned" : 300000,
"nscannedObjectsAllPlans" : 300000,
"nscannedAllPlans" : 300000,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 226,
Message has been deleted

Alexander

unread,
Sep 25, 2012, 10:44:31 AM9/25/12
to mongod...@googlegroups.com
The answer lies in the query syntax:

Querying with find( { query:{ ...} } ) has to be written as find(  { $query : {...} } ). But that does not work either.
Only with find( { <document> }).sort(...).limit(1) it works:

> db.pins.find({ poolid: 999, productid: 3868, state: "n",              
        validfrom : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")} , 
        validthru : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")}     
}).sort( { validthru: 1 } ).limit(1).explain()
{
"cursor" : "BtreeCursor poolid_1_productid_1_state_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 96191,
"nscanned" : 96191,
"nscannedObjectsAllPlans" : 577149,
"nscannedAllPlans" : 577149,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2787,
"indexBounds" : {
"poolid" : [
[
888,
888
]
],
"productid" : [
[
3868,
3868
]
],
"state" : [
[
"n",
"n"
]
]
},
"server" : "cassi1:27017"
}

But the query takes very long due to the sort. The problem is, I need the sort to get the document that expires next. When now executing the query, I don't get the pin with the smallest 'validthru' value anyways:

rs1:PRIMARY> db.pins.find({ poolid: 999, productid: 3868, state: "n",              
...         validfrom : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")} , 
...         validthru : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")}     
... }).sort( { validthru: 1 } ).limit(1)
{ "_id" : ObjectId("505ae02a30c7d7555f2e3251"), "serial" : "5911058", "poolid" : 999, "productid" : 3868, "encpin" : "93FCFD8A66695F44CBF01D2D0F341FDB22B41F501C352F72", "validthru" : ISODate("2099-12-31T23:59:59Z"), "validfrom" : ISODate("2012-09-20T09:21:28.344Z"), "imported" : ISODate("2012-09-20T09:21:28.344Z"), "state" : "n" }

According to the data, there are pins that expire earlier:
> db.pins.group(
...     { query: {poolid: 999, state: "n"},
...       key: { productid: true, validthru : true, poolid: true, state : true },
...       reduce: function(obj,prev) { prev.csum++ },
...       initial: { csum: 0 }
...     }
... )
[
{
"productid" : 3868,
"validthru" : ISODate("2099-12-31T23:59:59Z"),
"poolid" : 999,
"state" : "n",
"csum" : 170376
},
{
"productid" : 3868,
"validthru" : ISODate("2029-12-31T23:59:59Z"),
"poolid" : 999,
"state" : "n",
"csum" : 99900
}
]




Rob Moore

unread,
Sep 25, 2012, 9:47:17 PM9/25/12
to mongod...@googlegroups.com

Can you extend this index to include the validthru field?


{ "v" : 1, "key" : { "productid" : 1, "poolid" : 1, "state" : 1 }, "ns" : "pinvault.pins", "name" : "productid_1_poolid_1_state_1" }

e.g.,

db.pins.createIndex( { "productid" : 1, "poolid" : 1, "state" : 1, "validthru" : 1 }  )

That will allow the sort to use the index in addition to the query.

You don't need the old version of the index once you have the new one.

Rob.

Alexander

unread,
Sep 26, 2012, 5:20:42 AM9/26/12
to mongod...@googlegroups.com
Hi Rob.

In the meantime I had already created an index containing validfrom and validthru as well as productid, poolid and state with no improvement.
The interesting thing is, that if I change the dates in the query, it's blazing fast:

 db.pins.find({ poolid: 999, productid: 3868, state: "n",         validfrom : { "$lte" : ISODate("2012-09-25T10:47:13.961Z")} ,          validthru : { "$gt" : ISODate("2012-09-25T10:47:13.961Z")}      }).hint("poolid_1_productid_1_state_1_validthru_1").sort({validthru:1}).limit(1).explain()
{
"cursor" : "BtreeCursor poolid_1_productid_1_state_1_validthru_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"poolid" : [
[
999,
999
]
],
"productid" : [
[
3868,
3868
]
],
"state" : [
[
"n",
"n"
]
],
"validthru" : [
[
ISODate("2012-09-25T10:47:13.961Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "cassi1:27017"
}

If I use another date parameter (2012-09-22T10:47:13.961Z), It's slow and scans a lot of. data.

db.pins.find({ poolid: 999, productid: 3868, state: "n",         validfrom : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")} ,          validthru : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")}      }).hint("poolid_1_productid_1_state_1_validthru_1").sort({validthru:1}).limit(1).explain()
{
"cursor" : "BtreeCursor poolid_1_productid_1_state_1_validthru_1",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 99901,
"nscanned" : 99901,
"nscannedObjectsAllPlans" : 99901,
"nscannedAllPlans" : 99901,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 416,
"indexBounds" : {
"poolid" : [
[
999,
999
]
],
"productid" : [
[
3868,
3868
]
],
"state" : [
[
"n",
"n"
]
],
"validthru" : [
[
ISODate("2012-09-22T10:47:13.961Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
},
"server" : "cassi1:27017"
}

So I examined the distribution of dates/times in the data returned by the query:
First time for dates 2012-09-25:
db.pins.group(
    { query: { poolid: 999, productid: 3868, state: "n",
            validfrom : { "$lte" : ISODate("2012-09-25T10:47:13.961Z")},
            validthru : { "$gt" : ISODate("2012-09-25T10:47:13.961Z")} },
      key: { productid: true, validthru : true, validfrom: true, poolid: true, state : true },
      reduce: function(obj,prev) { prev.csum++ },
      initial: { csum: 0 }
    }
)
[
{
"productid" : 3868,
"validthru" : ISODate("2029-12-31T23:59:59Z"),
"validfrom" : ISODate("2012-09-25T07:09:59.642Z"),
"poolid" : 999,
"state" : "n",
"csum" : 99900
},
{
"productid" : 3868,
"validthru" : ISODate("2099-12-31T23:59:59Z"),
"validfrom" : ISODate("2012-09-20T09:21:28.344Z"),
"poolid" : 999,
"state" : "n",
"csum" : 72574
},
{
"productid" : 3868,
"validthru" : ISODate("2099-12-31T23:59:59Z"),
"validfrom" : ISODate("2012-09-24T10:24:51.553Z"),
"poolid" : 999,
"state" : "n",
"csum" : 97802
}
]

And then for 2012-09-22 (!):
rs1:PRIMARY> db.pins.group(     { query: { poolid: 999, productid: 3868, state: "n",             validfrom : { "$lte" : ISODate("2012-09-22T10:47:13.961Z")},             validthru : { "$gt" : ISODate("2012-09-22T10:47:13.961Z")} },       key: { productid: true, validthru : true, validfrom: true, poolid: true, state : true },       reduce: function(obj,prev) { prev.csum++ },       initial: { csum: 0 }     } )
[
{
"productid" : 3868,
"validthru" : ISODate("2099-12-31T23:59:59Z"),
"validfrom" : ISODate("2012-09-20T09:21:28.344Z"),
"poolid" : 999,
"state" : "n",
"csum" : 72574
}
]

Obviously, the pins imported on 25.09 show not up as they are 'not yet valid'. These are exactly 99900 which is nscannedObjects-1 from the slow query! So it seems that all of these are scanned by the query.

So I created the index like this: 
db.pins.ensureIndex(
    { poolid: 1, productid: 1,  state: 1, validthru: 1, validfrom: 1}
).

And now both queries are done in no time!




Reply all
Reply to author
Forward
0 new messages