How to resolve the $or sort not using index problem

164 views
Skip to first unread message

Hanson Lu

unread,
Sep 7, 2012, 4:43:01 AM9/7/12
to mongod...@googlegroups.com
My app has a collection like this 

  sender:'1000', 
  receiver:'9999',
  type:'text',
  content:' this is content' ,
  sentDate:ISODate("2011-10-12T14:54:02.069Z)
}

The collection records number > 50M. 

There is a requirement to query records between sender 'X' and receiver 'Y' in short time. 

First, i create index, {sender:1, receiver:1, sentDate:-1}, use query 

  db.msgs.find({$or:[{sender:'1000',receiver:'9999'}, { sender:'9999', receiver:'1000'}]}).sort(sentDate:-1}, 

but this query will not use the above index, it do scan full table.  The problem is known as 

Next, I change the structure of the collection like this 

  sender:'1000', 
  receiver:'9999',
  type:'text',
  both:['1000','9999']
  content:' this is content' ,
  sentDate:ISODate("2011-10-12T14:54:02.069Z)
}

add a new field named 'both' to store 'sender' and 'receiver' , and use query 

db.msgs.find({both:{$all:['1000','9999']}).sort(sendDate:-1},.

The query do hit index, but if  sender '1000'  has large number of records(>100000), 

the query become very slow, i saw lots page fault from mongostat.

It seems MongoDB will load all documents for indexed for {both:'1000'},  and compare whether the 'both' contents is exactly['1000','9999'].
is it right? 

I have no idea how to resolve the problem. Can you give me some suggestion.

Regards.

Hanson Lu

unread,
Sep 7, 2012, 4:54:56 AM9/7/12
to mongod...@googlegroups.com
I forgot to mention that i have created the index {both:1,sentDate:-1}  for the query db.msgs.find({both:{$all:['1000','9999']}).sort(sendDate:-1},.

Kay

unread,
Sep 7, 2012, 5:46:40 PM9/7/12
to mongod...@googlegroups.com
Hi Hanson -- 

We have a ticket to implement index intersection which I believe will provide a solution in the future. 

One possible suggestion -- instead of the both field being an array, if you make it an embedded document
so that you would have

{
sender:'1000', 
receiver:'9999',
type:'text',
both: { s: '1000', r: '9999'},
content:' this is content' ,
sentDate:ISODate("2011-10-12T14:54:02.069Z)
}

and
 
{
sender:'9999', 
receiver:'1000',
type:'text',
both: { s: '9999', r: '1000'},
content:' this is content' ,
sentDate:ISODate("2011-10-12T15:54:02.069Z)
}

then, you would create the index 

db.msgs.ensureIndex({both: 1, sendDate:-1})

and your query would use the $in operator rather than the $all

db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).sort({sendDate:-1})

--

And just a summary of what was happening with the your earlier queries (it already seems like you have a good grasp on the situation):

1)  Regarding your initial query with $or and the compound-key index on {sender:1, receiver:1, sentDate:-1} 

2)  Regarding your second query with $and and the array structure
  • creating an index on an array field creates an index on each element of the array (http://www.mongodb.org/display/DOCS/Multikeys)
  • an $all looks up just the first element in the index (which is the first element in the both array)

Hope this helps.

Kay

Hanson Lu

unread,
Sep 8, 2012, 2:18:12 AM9/8/12
to mongod...@googlegroups.com
Many Thanks, Kay. It is a good suggestion.

My app also has another requirement that query records of someone(sender is 'X' or receiver is 'X')

so when  both is  array, i can query like 

 db.msgs.find({both:'1000}').sort({sendDate:-1}

if both is embedded document,  it seems that it can not support the query.

Sam Millman

unread,
Sep 8, 2012, 7:46:44 AM9/8/12
to mongod...@googlegroups.com
It sounds like you would just modify the query above to just do an $or on the r and s field rather than a compound $or on groups of the r and s field within the subdocument.

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Sam Millman

unread,
Sep 8, 2012, 7:51:25 AM9/8/12
to mongod...@googlegroups.com
Sorry I mean you could replace that with two $ins, I think that will use index.

Sam Millman

unread,
Sep 8, 2012, 10:04:58 AM9/8/12
to mongod...@googlegroups.com
I just realised that won't work either. I did have a plan but after more thought I realised it wouldnt do the trick.

Probably your best bet, only if the query is slow, is to store two forms of it. One in that form that Kay talked about and another in the form you had originally.

Though I doubt your query will be slow so if you put a index on the query itself the sort would probably be instantaneous unless your hoping to pull lots of records.

Kay

unread,
Sep 8, 2012, 11:09:43 AM9/8/12
to mongod...@googlegroups.com
Oops -- 
just realized I typed "sendDate" instead of "sentDate" in both the index creation and find statements.

The correct statements would be:

db.msgs.ensureIndex({both: 1, sentDate:-1})
db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).sort({sentDate:-1})

Kay

unread,
Sep 8, 2012, 1:25:50 PM9/8/12
to mongod...@googlegroups.com
Hi Hanson -- 

It does seem like it's hard to get away from the $or and sort. 

I agree with Sam's solution in that within your document, you would also need your original solution of an array structure.  

I'll keep tinkering with it and if I run across something, I'll let you know.

Best regards,

Kay 

Hanson Lu

unread,
Sep 9, 2012, 5:19:47 AM9/9/12
to mongod...@googlegroups.com
It seems that is a way to resolve my issue, though index size will be very large.

Thanks,  Sammaye  and Kay.

Hanson Lu

unread,
Sep 10, 2012, 3:21:32 AM9/10/12
to mongod...@googlegroups.com
Hi Kay
  I tested the query with  the index you suggest. I found a problem that it will scan all documents of sender 'X' and receiver 'Y'.
  db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).limit(5).sort({sentDate:-1}).explain()
  {
        "cursor" : "BtreeCursor both_1_type_1_sentDate_-1 multi",
        "nscanned" : 28,
        "nscannedObjects" : 26,
        "n" : 5,
        "scanAndOrder" : true,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : false,
        "indexOnly" : false,
 }

I think if the result set is little, the solution is acceptable.

Kay

unread,
Sep 10, 2012, 11:30:35 PM9/10/12
to mongod...@googlegroups.com
Hi Hanson -- 
Thanks so much for the update.  I was actually quite curious to see how the query would work for you.

If I do come across something that might work better, I'll definitely keep you posted.

Regards,

Kay

Asya Kamsky

unread,
Sep 11, 2012, 12:08:32 AM9/11/12
to mongod...@googlegroups.com
You might check out this comment (if you are using 2.2, which I think you must be):


This explains why the nscanned may be higher than you would be expecting.

Hanson Lu

unread,
Sep 11, 2012, 1:05:20 AM9/11/12
to mongod...@googlegroups.com
I'am using version 2.0.2 

The problem is that  I think nscannedObjects  should be equal to 5 that i limited in the query.

Asya Kamsky

unread,
Sep 11, 2012, 2:23:42 AM9/11/12
to mongod...@googlegroups.com
The fix for SERVER-5063 would improve things by using the index to sort but even then I can't see how nscanned can be less than 10.
You have two values which are being read from the index - limiting each of them to 5 is doable but not 5 (you should see 5 if there was not an "$in" or "$or" expression.

Asya

Hanson Lu

unread,
Sep 11, 2012, 3:00:20 AM9/11/12
to mongod...@googlegroups.com
Hi Asya

 I verified with 2.2.0 just now. The problem is fixed , and nscanobject is equal to the limit count(5).  
 {
        "isMultiKey" : false,
        "n" : 5,
        "nscannedObjects" : 5,
        "nscanned" : 6,
        "nscannedObjectsAllPlans" : 10,
        "nscannedAllPlans" : 11,
        "scanAndOrder" : true,
        "indexOnly" : false,
 }

Thanks!
Regards

Hanson Lu

unread,
Sep 11, 2012, 3:31:00 AM9/11/12
to mongod...@googlegroups.com
Update..
The query int above i typing miss a field  'type' . The correct query is 
  db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } , type:'text'} ).limit(5).sort({sentDate:-1})

if use the query without 'type' in version 2.2, it will scan all matched document in  query with 'limit'.
Reply all
Reply to author
Forward
0 new messages