Received: by 10.66.76.130 with SMTP id k2mr3479009paw.16.1347344634715; Mon, 10 Sep 2012 23:23:54 -0700 (PDT) X-BeenThere: mongodb-user@googlegroups.com Received: by 10.68.195.39 with SMTP id ib7ls549004pbc.9.gmail; Mon, 10 Sep 2012 23:23:43 -0700 (PDT) Received: by 10.68.233.130 with SMTP id tw2mr963796pbc.12.1347344623263; Mon, 10 Sep 2012 23:23:43 -0700 (PDT) Date: Mon, 10 Sep 2012 23:23:42 -0700 (PDT) From: Asya Kamsky To: mongodb-user@googlegroups.com Message-Id: In-Reply-To: <9c5d805a-56d5-4bc4-acf0-d9b81abb836d@googlegroups.com> References: <7a94600f-8ca6-4fa6-8e13-6c88b3030aad@googlegroups.com> <59f260fd-1116-4894-a39d-dde98a0baea2@googlegroups.com> <6ea29df3-ba70-4292-825b-d615ea641968@googlegroups.com> <1ea6e074-a0cc-4426-a0ef-d907f8605fb2@googlegroups.com> <6e66baf4-8656-42ce-95f3-96f89e7e75d9@googlegroups.com> <9c5d805a-56d5-4bc4-acf0-d9b81abb836d@googlegroups.com> Subject: Re: How to resolve the $or sort not using index problem MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_911_4909429.1347344622810" ------=_Part_911_4909429.1347344622810 Content-Type: multipart/alternative; boundary="----=_Part_912_2430730.1347344622810" ------=_Part_912_2430730.1347344622810 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit The fix for SERVER-5063 would improve things by using the index to sort but even then I can't see how nscanned can be less than 10. You have two values which are being read from the index - limiting each of them to 5 is doable but not 5 (you should see 5 if there was not an "$in" or "$or" expression. Asya On Tuesday, September 11, 2012 2:05:20 AM UTC-3, Hanson Lu wrote: > > I'am using version 2.0.2 > > The problem is that I think *nscannedObjects should be equal to 5 that > i limited in the query.* > * > * > On Tuesday, September 11, 2012 12:08:32 PM UTC+8, Asya Kamsky wrote: > >> You might check out this comment (if you are using 2.2, which I think you >> must be): >> >> >> https://jira.mongodb.org/browse/SERVER-5063?focusedCommentId=138239&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-138239 >> >> This explains why the nscanned may be higher than you would be expecting. >> >> On Monday, September 10, 2012 4:21:32 AM UTC-3, Hanson Lu wrote: >>> >>> Hi Kay >>> I tested the query with the index you suggest. I found a problem that >>> it will scan all documents of sender 'X' and receiver 'Y'. >>> db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', >>> r:'1000'} ] } } ).limit(5).sort({sentDate:-1}).explain() >>> { >>> "cursor" : "BtreeCursor both_1_type_1_sentDate_-1 multi", >>> * "nscanned" : 28,* >>> * "nscannedObjects" : 26*, >>> * "n" : 5,* >>> "scanAndOrder" : true, >>> "millis" : 0, >>> "nYields" : 0, >>> "nChunkSkips" : 0, >>> "isMultiKey" : false, >>> "indexOnly" : false, >>> } >>> >>> I think if the result set is little, the solution is acceptable. >>> >>> >>> On Saturday, September 8, 2012 11:09:43 PM UTC+8, Kay wrote: >>> >>>> Oops -- >>>> just realized I typed "sen*d*Date" instead of "sen*t*Date" in both the >>>> index creation and find statements. >>>> >>>> The correct statements would be: >>>> >>>> *db.msgs.ensureIndex({both: 1, sentDate:-1})* >>>> *db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', >>>> r:'1000'} ] } } ).sort({sentDate:-1})* >>>> * >>>> * >>>> On Friday, September 7, 2012 5:46:40 PM UTC-4, Kay wrote: >>>>> >>>>> Hi Hanson -- >>>>> >>>>> We have a ticket to implement index intersection which I believe will >>>>> provide a solution in the future. >>>>> https://jira.mongodb.org/browse/SERVER-3071 >>>>> >>>>> One possible suggestion -- instead of the *both* field being an >>>>> array, if you make it an embedded document >>>>> so that you would have >>>>> >>>>> { >>>>> sender:'1000', >>>>> receiver:'9999', >>>>> type:'text', >>>>> *both: { s: '1000', r: '9999'},* >>>>> content:' this is content' , >>>>> sentDate:ISODate("2011-10-12T14:54:02.069Z) >>>>> } >>>>> >>>>> and >>>>> >>>>> { >>>>> sender:'9999', >>>>> receiver:'1000', >>>>> type:'text', >>>>> *both: { s: '9999', r: '1000'},* >>>>> content:' this is content' , >>>>> sentDate:ISODate("2011-10-12T15:54:02.069Z) >>>>> } >>>>> >>>>> then, you would create the index >>>>> >>>>> *db.msgs.ensureIndex({both: 1, sendDate:-1})* >>>>> >>>>> and your query would use the *$in *operator rather than the *$all* >>>>> >>>>> *db.msgs.find( { both: { $in: [ { s:'1000', r:'9999' }, {s:'9999', >>>>> r:'1000'} ] } } ).sort({sendDate:-1})* >>>>> >>>>> -- >>>>> >>>>> And just a summary of what was happening with the your earlier queries >>>>> (it already seems like you have a good grasp on the situation): >>>>> >>>>> 1) Regarding your initial query with *$or* and the compound-key >>>>> index on *{sender:1, receiver:1, sentDate:-1}* >>>>> >>>>> - *$or* does not use a compound-key index*. *Instead,* $or** r*uns >>>>> multiple index plans in parallel. >>>>> - >>>>> https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/ncFY7kqBJDs >>>>> * >>>>> *presents a good discussion of indexes with *$o**r * >>>>> >>>>> >>>>> 2) Regarding your second query with *$and* and the array structure >>>>> >>>>> - creating an index on an array field creates an index on each >>>>> element of the array ( >>>>> http://www.mongodb.org/display/DOCS/Multikeys) >>>>> - an *$all* looks up just the first element in the index (which is >>>>> the first element in the* both* array) >>>>> >>>>> >>>>> Hope this helps. >>>>> >>>>> Kay >>>>> >>>>> On Friday, September 7, 2012 4:54:56 AM UTC-4, Hanson Lu wrote: >>>>>> >>>>>> I forgot to mention that i have created the index >>>>>> {both:1,sentDate:-1} for the >>>>>> query db.msgs.find({both:{$all:['1000','9999']}).sort(sendDate:-1},. >>>>>> >>>>>> On Friday, September 7, 2012 4:43:01 PM UTC+8, Hanson Lu wrote: >>>>>>> >>>>>>> My app has a collection like this >>>>>>> { >>>>>>> sender:'1000', >>>>>>> receiver:'9999', >>>>>>> type:'text', >>>>>>> content:' this is content' , >>>>>>> sentDate:ISODate("2011-10-12T14:54:02.069Z) >>>>>>> } >>>>>>> >>>>>>> The collection records number > 50M. >>>>>>> >>>>>>> There is a requirement to query records *between *sender 'X' and >>>>>>> receiver 'Y' in short time. >>>>>>> >>>>>>> First, i create index, {sender:1, receiver:1, sentDate:-1}, use >>>>>>> query >>>>>>> >>>>>>> db.msgs.find({$or:[{sender:'1000',receiver:'9999'}, { >>>>>>> sender:'9999', receiver:'1000'}]}).sort(sentDate:-1}, >>>>>>> >>>>>>> but this query will not use the above index, it do scan full table. >>>>>>> The problem is known as >>>>>>> https://jira.mongodb.org/browse/SERVER-1205 >>>>>>> >>>>>>> Next, I change the structure of the collection like this >>>>>>> { >>>>>>> sender:'1000', >>>>>>> receiver:'9999', >>>>>>> type:'text', >>>>>>> both:['1000','9999'] >>>>>>> content:' this is content' , >>>>>>> sentDate:ISODate("2011-10-12T14:54:02.069Z) >>>>>>> } >>>>>>> >>>>>>> add a new field named 'both' to store 'sender' and 'receiver' , and >>>>>>> use query >>>>>>> >>>>>>> db.msgs.find({both:{$all:['1000','9999']}).sort(sendDate:-1},. >>>>>>> >>>>>>> The query do hit index, but if sender '1000' has large number of >>>>>>> records(>100000), >>>>>>> >>>>>>> the query become very slow, i saw lots page fault from mongostat. >>>>>>> >>>>>>> It seems MongoDB will load all documents for indexed for >>>>>>> {both:'1000'}, and compare whether the 'both' contents is >>>>>>> exactly['1000','9999']. >>>>>>> is it right? >>>>>>> >>>>>>> I have no idea how to resolve the problem. Can you give me some >>>>>>> suggestion. >>>>>>> >>>>>>> Regards. >>>>>>> >>>>>>> ------=_Part_912_2430730.1347344622810 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable The fix for SERVER-5063 would improve things by using the index to sort but= even then I can't see how nscanned can be less than 10.
You have two v= alues which are being read from the index - limiting each of them to 5 is d= oable but not 5 (you should see 5 if there was not an "$in" or "$or" expres= sion.

Asya

On Tuesday, September 11, 2012 2:05:20= AM UTC-3, Hanson Lu wrote:
I'a= m using version 2.0.2 

The problem is that &nb= sp;I think nscannedObjects  should be equal to 5 that i limited in = the query.

On Tuesday, September 11, 20= 12 12:08:32 PM UTC+8, Asya Kamsky wrote:
You might check out this comment (if you are using= 2.2, which I think you must be):


This explains why the nscanned may be higher than you would be exp= ecting.

On Monday, September 10, 2012 4:21:32 AM UTC-3, Hanson Lu wr= ote:
Hi Kay
  I tested= the query with  the index you suggest. I found a problem that it will= scan all documents of sender 'X' and receiver 'Y'.
  d= b.msgs.find( { both: { = $in: [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).limit(5).sort({= sentDate:-1}).explain()
  {
        "cursor" : "BtreeCursor both_1_typ= e_1_sentDate_-1 multi",
        "nsca= nned" : 28,
        "nscannedObje= cts" : 26,
        "n" : 5,<= /b>
        "scanAndOrder" : true,
        "millis" : 0,
 &nb= sp;      "nYields" : 0,
      =  "nChunkSkips" : 0,
        "isMul= tiKey" : false,
        "indexOnly" : fa= lse,
 }

I think if the re= sult set is little, the solution is acceptable.

On Saturday, September 8, 2012 11:09:43 PM UTC+8, Kay wrote:
Oops -- 
just real= ized I typed "sendDate" instead of "sentDate" in both the ind= ex creation and find statements.

The correct state= ments would be:

db.msgs.ensureIndex({both: 1, sentDate:-1})
db.msgs.find( { both: { $in:= [ { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).sort({sentDate:-1})=


= On Friday, September 7, 2012 5:46:40 PM UTC-4, Kay wrote:
Hi Hanson -- 

We have a ticket to implement index intersection which I believe will pro= vide a solution in the future. 

One possible suggest= ion -- instead of the both field being an array, if you make it= an embedded document
so that you would have

=
{
sender:'1000', 
receiver:'9999',
type:'text',
both: { s: '1000', r: '9999'},
content:' this is content' ,
sentDate:ISODate("2011-10-12T14:54:02.069Z)
}=

and
 
{
sender:'9999', 
receiver:'1000',
type:'text',
both: { s: '9999', r: '1000'},
content:' this is content' ,
sentDate:ISODate("2011-10-12T15:54:02.069Z= )
}

then, you would create the index 

=
db.msgs.ensureIndex({both: 1,= sendDate:-1})

and your query would use= the $in operator rather than the $all

db.msgs.find( { both: { $in: [ = { s:'1000', r:'9999' }, {s:'9999', r:'1000'} ] } } ).sort({sendDate:-1})

--

And just a su= mmary of what was happening with the your earlier queries (it already seems= like you have a good grasp on the situation):

1)  Regarding your initial query with $or and the comp= ound-key index on {sender:1, receiver:1, sentDate:-1} 

2)  Regarding your second query with $and and th= e array structure
  • creating an index on an array field cre= ates an index on each element of the array (http://www.mongodb.org/d= isplay/DOCS/Multikeys)
  • an $all looks up just the = first element in the index (which is the first element in the both<= /i> array)

Hope this helps.

Kay

On Friday, September 7, 2012 4:54:56 AM = UTC-4, Hanson Lu wrote:
I forgot to = mention that i have created the index {both:1,sentDate:-1}  for the qu= ery db.msgs.find({both:{$all:['1000','9999']}).sort(sendDate= :-1},.

On Friday, September 7, 2012 4:43:01 PM UTC+8, Hanson Lu wrot= e:
My app has a collection like this=  

  sender:'1000', <= br>
  receiver:'9999',
  type:'= text',
  content:' this is content' ,
  sentDate:ISODate("2011-10-12T14:54:02.069Z)
}

The collection records number = > 50M. 

There is a requirement t= o query records between sender 'X' and receiver 'Y' in short time.&n= bsp;

First, i create index, {sender:1, receiver:1,= sentDate:-1}, use query 

  db.msgs= .find({$or:[{sender:'1000',receiver:'9999'}, { sender:'9999', receiver= :'1000'}]}).sort(sentDate:-1}, 

but this= query will not use the above index, it do scan full table.  The probl= em is known as 

Next, I change the structure of th= e collection like this 

  sender:'1000', 
  receiver:'9999',
  type:'text',
  both:['1000','9999']
  content:' this is c= ontent' ,
 &= nbsp;sentDate:ISODate("2011-10-12= T14:54:02.069Z)
}

add a new field named 'both' to store 's= ender' and 'receiver' , and use query 

db.msg= s.find({both:{$all:['1000','9999']}).sort(sendDate:-1},.

The query do hit index, but if  sender '1000' &n= bsp;has large number of records(>100000), 

the query become very slow, i saw lots page fault from mongostat.

It seems MongoDB will load all documents for indexed = for {both:'1000'},  and compare whether the 'both' contents is exactly= ['1000','9999'].
is it right? 

= I have no idea how to resolve the problem. Can you give me some sugges= tion.

Regards.

------=_Part_912_2430730.1347344622810-- ------=_Part_911_4909429.1347344622810--