Indexing with $exists query

1,404 views
Skip to first unread message

confused

unread,
Oct 31, 2011, 8:13:35 PM10/31/11
to mongodb-user
Hello:

We are trying to improve the db performance for $exists query, like
db.table.find({"annot.a" : {"$exists": true}}), because each query
takes more than 1 second. We have created an index on annot.a.
However, when we do an explain() on the query, it was using
"BasicCursor". Is there any way that we can force MongoDB to use the
index that we created on annot.a instead of the BasicCursor?

Thanks a lot for your help in advance!

confused

unread,
Oct 31, 2011, 8:41:32 PM10/31/11
to mongodb-user
We are using MongoDB 2.0.

Daniel W

unread,
Nov 1, 2011, 11:29:51 AM11/1/11
to mongodb-user
According to this link, it was scheduled for inclusion in 1.9 :http://
www.quora.com/Can-MongoDB-index-for-the-exists-query
but in the 2.0 'what's new', it's not there:
http://www.mongodb.org/display/DOCS/2.0+Release+Notes#2.0ReleaseNotes-What%27sNew
...where by 'it' I mean $exists queries using indexes.

Bernie Hackett

unread,
Nov 1, 2011, 1:55:52 PM11/1/11
to mongodb-user
$exists: false will use an index but $exists: true will not.

On Oct 31, 5:13 pm, confused <tiffany...@gmail.com> wrote:

Bernie Hackett

unread,
Nov 1, 2011, 2:12:31 PM11/1/11
to mongodb-user
You may get better performance by setting unused fields to null and
doing {<field name>: {'$ne': null}}. That will use an index.

Tiffany

unread,
Nov 1, 2011, 7:07:36 PM11/1/11
to mongod...@googlegroups.com
I have found a way to force MongoDB to use the index:
      db.table_name.find({field_name: {$exists: true}}).hint("field_name_index_name")
I created index field_name_index_name as
      db.table_name.ensureIndex({field_name: 1},{sparse: true})
When I run explain(), indeed it had used the index.
 
However, there is another problem.  I have totally 269702 entries in the table, and 7 queries.  There is  index for each query.   
 
Here are some of the numbers:
 
Query1: (obj scanned)8              (returned) 8,          (old time, ms)1005,  (new time, ms)0
Query2: (obj scanned)508134     (returned)268188   (old time, ms)1034,  (new time, ms)1906
Query3: (obj scanned)1431084   (returned)269566   (old time, ms)978,    (new time, ms)4695
Query4: (obj scanned)2              (returned)2            (old time, ms)984,    (new time, ms)0
Query5: (obj scanned)189699     (returned)172957   (old time, ms)1032,  (new time, ms)793
Query6: (obj scanned)36111       (returned)34848     (old time, ms)989,    (new time, ms)198
Query7: (obj scanned)394           (returned)294        (old time, ms)984,    (new time, ms)3
 
Queries 1, 4, 6, and 7, the performances have improved greatly.  For Query 5, it was good, but for Queries 2 and 3, it was getting worse.  I could not understand, there are only 269702 entries in the table, how can query 2 have 508134 scanned objects, and query 3 have 1431084 scanned objects.  There are more scanned objects than the number of entries in the table.
 
Could someone help me with this problem, please!  Thank you very much!   
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


aaron

unread,
Nov 1, 2011, 7:46:18 PM11/1/11
to mongodb-user
Hi Tiffany,

I filed SERVER-4187 to select a sparse index automatically when you do
a $exists:true query. For now, the easiest thing is to hint your
sparse index and you can even leave out the $exists:true query
parameter if you want (as long as you have the hint) since sparse
indexes only contain entries if the field exists.

Not sure what's going on with your various slow queries without more
info. You can get a higher nscanned than the number of documents if
you have multikey arrays with multiple values indexed per document.
If you send the actual slow query and some example documents we can
take a closer look.

One thing to keep in mind is that if your query matches most of your
documents it can be more performant to do an unindexed scan than to do
an indexed scan, since with indexing there is a level of indirection
involved and it can be slower per document returned (particularly if
you have a lot of multikey entries).

Thanks,
Aaron

aaron

unread,
Nov 1, 2011, 8:35:15 PM11/1/11
to mongodb-user
Hi Tiffany,

I filed SERVER-4187 to select a sparse index automatically when you do
a $exists:true query. For now, the easiest thing is to hint your
sparse index and you can even leave out the $exists:true query
parameter if you want (as long as you have the hint) since sparse
indexes only contain entries if the field exists.

Not sure what's going on with your various slow queries without more
info. You can get a higher nscanned than the number of documents if
you have multikey arrays with multiple values indexed per document.
If you send the actual slow query and some example documents we can
take a closer look.

One thing to keep in mind is that if your query matches most of your
documents it can be more performant to do an unindexed scan than to do
an indexed scan, since with indexing there is a level of indirection
involved and it can be slower per document returned (particularly if
you have a lot of multikey entries).

Thanks,
Aaron

On Nov 1, 4:07 pm, Tiffany <tiffany...@gmail.com> wrote:

Tiffany

unread,
Nov 1, 2011, 8:48:41 PM11/1/11
to mongod...@googlegroups.com
Hello arron:
 
Thanks a lot for your response! 
 
I have modified my query and removed the condition inside find.  This is what I have now:
    db.table_name.find().hint("field_name_index_name")
 
Here are the performances:
 
Query1: (obj scanned) 8             
            (returned) 8,         
            (old time, ms)1005,                                      
            (new time1, ms) 0      
            (new time2, ms) 0
 
Query2: (obj scanned)508134    
             (returned) 268188  
             (old time, ms) 1034, 
             (new time1, ms) 1906
             (new time2, ms) 916

Query3: (obj scanned) 1431084  
            (returned) 269566  
            (old time, ms) 978,   
            (new time1, ms) 4695
            (new time2, ms) 2292

Query4: (obj scanned) 2             
            (returned) 2           
            (old time, ms) 984,   
            (new time1, ms) 0
            (new time2, ms) 0

Query5: (obj scanned) 189699    
            (returned) 172957  
            (old time, ms) 1032, 
            (new time1, ms) 793
            (new time2, ms) 371

Query6: (obj scanned) 36111      
             (returned) 34848    
             (old time, ms) 989,   
             (new time1, ms) 198
             (new time2, ms) 95

Query7: (obj scanned) 394          
             (returned) 294       
             (old time, ms) 984,   
             (new time1, ms) 3
             (new time2, ms) 1

Comparing new time1 with new time2 for each query, the performance was improved by 1/2.  This is fantastic!
 
I would look more into why there are more scanned objects than the number of entries in the table.  I might append more info later.
 
Thank you so much!

Tiffany

unread,
Nov 2, 2011, 1:43:33 PM11/2/11
to mongod...@googlegroups.com
Hello:
 
I have looked into why some query executions would scan more objects than the number of entries in the table.  I have used the key annot.fieldX in our indexes and queries:
     db.table_name.ensureIndex({annot.field1: 1},{sparse: true})
 
This is what the data looks like in the table
{......"annot": {"field1": [{.....},{......}...]}......}
 
So each indexed field is a list.  If MongoDB creates an index for each member in the list, indeed it would scan the same object multiple times. 

I just want to count how many entries in the table that has annot.field1, is there anyway I do not need to go through each member of list annot.field1?

Thank you so much for your help in advance!

Tiffany

unread,
Nov 2, 2011, 1:51:03 PM11/2/11
to mongod...@googlegroups.com
Hello:

hint() does not work for Java driver 2.6.3, is that so?

Thanks for your help in advance!

Bernie Hackett

unread,
Nov 2, 2011, 5:13:33 PM11/2/11
to mongodb-user
Hint should work fine in Java 2.6.3:

http://api.mongodb.org/java/2.6.3/com/mongodb/DBCursor.html#hint(com.mongodb.DBObject)

On Nov 2, 10:51 am, Tiffany <tiffany...@gmail.com> wrote:
> Hello:
>
> hint() does not work for Java driver 2.6.3, is that so?
>
> Thanks for your help in advance!
>
>
>
>
>
>
>
> On Wed, Nov 2, 2011 at 10:43 AM, Tiffany <tiffany...@gmail.com> wrote:
> > Hello:
>
> > I have looked into why some query executions would scan more objects than
> > the number of entries in the table.  I have used the key annot.fieldX in
> > our indexes and queries:
> >      db.table_name.ensureIndex({annot.field1: 1},{sparse: true})
>
> > This is what the data looks like in the table
> > {......"annot": {"field1": [{.....},{......}...]}......}
>
> > So each indexed field is a list.  If MongoDB creates an index for each
> > member in the list, indeed it would scan the same object multiple times.
>
> > I just want to count how many entries in the table that has annot.field1,
> > is there anyway I do not need to go through each member of list
> > annot.field1?
>
> > Thank you so much for your help in advance!
>
Reply all
Reply to author
Forward
0 new messages