Sort with hashed index leads to full table scan

107 views
Skip to first unread message

Tecbot

unread,
Feb 26, 2013, 12:02:08 PM2/26/13
to mongod...@googlegroups.com
Hi,

I have a issue with the new hashed index type.

The entries in my collection has a field "_id" and "user_id".
I created a hashed index on user_id.
I sharded the collection with a hashed shard key on user_id.
If I execute queries to find something with the shard key all works fine.
If I execute queries with a sort by _id, the mongos router finds the correct shard BUT the shard self scans all objects and ignores the shard key on sorting time but returns the correct result on the end.
Can anyone explain me this behaviour? The only solution I have found was to create also a default index on user_id.

Thanks & Regards
Thomas

Asya Kamsky

unread,
Feb 26, 2013, 3:14:34 PM2/26/13
to mongod...@googlegroups.com
Did you mean to say sorted by '_id' or sorted by 'user_id' ?   If you meant to say user_id then the following is my answer:

Because the indexed value is hashed, the shard key index cannot be used for sorting (basically what it would sort would be the hashed value which does not correspond to the real ordering of the shard key).

Your solution of having a second normal index on this value is okay but if your user_id is monotonically increasing (causing you to need to hash it to allow effective sharding) is it possible that sorting by _id would give you at least approximately the same ordering?

Asya
P.S. if you really meant that you wanted it sorted by _id then I don't understand why _id index wouldn't be used by the sort unless you also query by multiple other fields?  Providing the full query and its explain plan would probably clarify things.

Tecbot

unread,
Feb 26, 2013, 3:40:23 PM2/26/13
to mongod...@googlegroups.com
I mean sorted by '_id'.

Here are my queries which works correct: (user_id is in this case "u.$id")

db.entries.find({"u.$id":ObjectId("5124aa20dfee8fb462469370")})
db.entries.find({"u.$id":ObjectId("5124aa20dfee8fb462469370", "n": true)}) // works also correctly (no index on "n")

{
        "clusteredType" : "ParallelSort",
        "shards" : {
                "rs-2/10.2.0.201:10012,10.2.0.202:10012,10.2.0.203:10012" : [
                        {
                                "cursor" : "BtreeCursor u.$id_hashed",
                                "isMultiKey" : false,
                                "n" : 0,
                                "nscannedObjects" : 1,
                                "nscanned" : 1,
                                "nscannedObjectsAllPlans" : 1,
                                "nscannedAllPlans" : 1,
                                "scanAndOrder" : false,
                                "indexOnly" : false,
                                "nYields" : 0,
                                "nChunkSkips" : 0,
                                "millis" : 0,
                                "indexBounds" : {
                                        "u.$id" : [
                                                [
                                                        NumberLong("-3850829528314550651"),
                                                        NumberLong("-3850829528314550651")
                                                ]
                                        ]
                                },
                                "server" : "db2:10012"
                        }
                ]
        },
        "cursor" : "BtreeCursor u.$id_hashed",
        "n" : 0,
        "nChunkSkips" : 0,
        "nYields" : 0,
        "nscanned" : 1,
        "nscannedAllPlans" : 1,
        "nscannedObjects" : 1,
        "nscannedObjectsAllPlans" : 1,
        "millisShardTotal" : 0,
        "millisShardAvg" : 0,
        "numQueries" : 1,
        "numShards" : 1,
        "indexBounds" : {
                "u.$id" : [
                        [
                                NumberLong("-3850829528314550651"),
                                NumberLong("-3850829528314550651")
                        ]
                ]
        },
        "millis" : 1
}

But if I execute this query:

db.entries.find({"u.$id":ObjectId("5124aa20dfee8fb462469370")}).sort({_id:1})

Its scans the full table and don't use the hashed index:

{
        "clusteredType" : "ParallelSort",
        "shards" : {
                "rs-2/10.2.0.201:10012,10.2.0.202:10012,10.2.0.203:10012" : [
                        {
                                "cursor" : "BtreeCursor _id_",
                                "isMultiKey" : false,
                                "n" : 1,
                                "nscannedObjects" : 50050655,
                                "nscanned" : 50050655,
                                "nscannedObjectsAllPlans" : 100101310,
                                "nscannedAllPlans" : 100101310,
                                "scanAndOrder" : false,
                                "indexOnly" : false,
                                "nYields" : 5,
                                "nChunkSkips" : 0,
                                "millis" : 213172,
                                "indexBounds" : {
                                        "_id" : [
                                                [
                                                        {
                                                                "$minElement" : 1
                                                        },
                                                        {
                                                                "$maxElement" : 1
                                                        }
                                                ]
                                        ]
                                },
                                "server" : "db2:10012"
                        }
                ]
        },
        "cursor" : "BtreeCursor _id_",
        "n" : 1,
        "nChunkSkips" : 0,
        "nYields" : 5,
        "nscanned" : 50050655,
        "nscannedAllPlans" : 100101310,
        "nscannedObjects" : 50050655,
        "nscannedObjectsAllPlans" : 100101310,
        "millisShardTotal" : 213172,
        "millisShardAvg" : 213172,
        "numQueries" : 1,
        "numShards" : 1,
        "indexBounds" : {
                "_id" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        },
        "millis" : 213173
}

Thomas

Asya Kamsky

unread,
Feb 26, 2013, 6:03:08 PM2/26/13
to mongod...@googlegroups.com
Is user_id unique?

If so why sort on _id?

Could you try to use hint to see if that will get the behavior you expect?  

Asya

Tecbot

unread,
Feb 26, 2013, 6:18:52 PM2/26/13
to mongod...@googlegroups.com
No, user_id is not unique.
This collection has many entries for a user and we sort on _id DESC to get the latest entries first.

With hint, it works like expected. But IMHO it should be work without too, or not?

Thanks
Thomas

Asya Kamsky

unread,
Feb 27, 2013, 7:05:02 AM2/27/13
to mongod...@googlegroups.com
I have reproduced this behavior, it happens with any sort variable (does not have to be _id) but only with hashed indexes, not with regular indexes.  Sharding doesn't have to be involved.

It may be a manifestation of a known - I see you already filed a ticket?  I added my notes there.  > db.idx.find({x:5}).sort({y:1}).hint({x:'hashed'}).explain(true)

Asya
Reply all
Reply to author
Forward
0 new messages