Do $or statements in the query parameter of MapReduce use the index?

22 views
Skip to first unread message

Nathan Hoad

unread,
Sep 8, 2011, 9:09:07 PM9/8/11
to mongodb-user
I have map and reduce functions defined as follows;

map = function() {
emit(this.username, { total_data: this.sent + this.received });
}

reduce = function (key, values) {
var result = {total_data:0};
values.forEach(function (value) {result.total_data +=
value.total_data;});
return result;
}

And I'm running the following query;

db.collection.mapReduce(m, r, {out:'myoutput', query: { time: {'$gte':
start, '$lte': end}, '$or': [ { username: '1743'}, {username:
'23'}] }, sort: {username: -1} });

If I run the query without the $or, and do two separate queries for
the username, then the queries are reasonably fast (finish between 1
second and 2-3 minutes. But if I run it with the or statement, it
takes 45-50 minutes. I'm not very experienced with MongoDB so I'm
unsure if there's anything I can do to debug this, or figure out if/
why it's not using the index.

Kyle Banker

unread,
Sep 9, 2011, 11:03:15 AM9/9/11
to mongodb-user
I don't believe that a difference of 45 minutes could possibly be
explained by whether $or is using the index or not.

For the record, the $or does use an index, but you should actually
express this differently:

query: { time: {'$gte': start, '$lte': end}, username: { $in: ['1743',
'23'] } }

I also suggest that you run the both the $or query and the individual
queries separate to ensure that they both return the same total number
of documents.

Nathan Hoad

unread,
Sep 12, 2011, 7:19:46 PM9/12/11
to mongodb-user
To be clear, that was for a collection with 200 million records. I
checked the disk utilization with iostat during the query, and it was
a constant 98%, but performing the individual queries had very little
utilization. This is what gave me the impression that the index wasn't
being used.

I've tried using $in and it's much more manageable, taking exactly the
amount of time it should. Thanks!

However, I've compared the individual results to $in as you suggested,
and the output is different.

Individual:
{ "_id" : "1743", "value" : { "total_data" : 58016271 } }
{ "_id" : "23", "value" : { "total_data" : 103653535 } }

Using $in:
{ "_id" : "1743", "value" : { "total_data" : 58016271 } }
{ "_id" : "23", "value" : { "total_data" : 103653535 } }

The output of the map reduce call indicates that the exact same amount
of records were read and inputted for both types of queries, yet the
amounts are incorrect. What could cause this? Obviously my reduce has
a bug, but it's so simple I have no idea what could be causing this.

On Sep 10, 1:03 am, Kyle Banker <kyleban...@gmail.com> wrote:
> I don't believe that a difference of 45 minutes could possibly be
> explained by whether $or is using the index or not.
>
> For the record, the $or does use an index, but you should actually
> express this differently:
>
> query: { time: {'$gte': start, '$lte': end}, username: { $in: ['1743',
> '23'] } }
>
> I also suggest that you run the both the $orqueryand the individual
> queries separate to ensure that they both return the same total number
> of documents.
>
> On Sep 8, 9:09 pm, Nathan Hoad <nat...@getoffmalawn.com> wrote:
>
>
>
>
>
>
>
> > I have map and reduce functions defined as follows;
>
> > map = function() {
> >     emit(this.username, { total_data: this.sent + this.received });
>
> > }
>
> > reduce = function (key, values) {
> >     var result = {total_data:0};
> >     values.forEach(function (value) {result.total_data +=
> > value.total_data;});
> >     return result;
>
> > }
>
> > And I'm running the followingquery;
>
> > db.collection.mapReduce(m, r, {out:'myoutput',query: { time: {'$gte':
> > start, '$lte': end}, '$or': [ { username: '1743'}, {username:
> > '23'}] }, sort: {username: -1} });
>
> > If I run thequerywithout the $or, and do two separate queries for

Nathan Hoad

unread,
Sep 12, 2011, 7:29:27 PM9/12/11
to mongodb-user
Wait, I realised those two runs are giving me the same data just now.

However, an earlier query didn't:

> db.entire_database.mapReduce(m, r, {out: 'myoutput', query: { username: { $in: ['1743', '23']}, time : { $lte: ISODate('2011-08-12 12:40:00'), $gte: ISODate('2011-08-12 08:40:00') }}})
{
"result" : "myoutput",
"timeMillis" : 37373,
"counts" : {
"input" : 12808,
"emit" : 12808,
"output" : 2
},
"ok" : 1,
}
> db.myoutput.find()
{ "_id" : "1743", "value" : { "total_data" : 80362302 } }
{ "_id" : "23", "value" : { "total_data" : 103627486 } }

Which is where the confusion started. This has been performed in the
same session, with the same map and reduce functions, and I can
guarantee that no data modification took place between the queries.

Kyle Banker

unread,
Sep 21, 2011, 4:05:03 PM9/21/11
to mongodb-user
Are you still seeing a problem with this? Can you provide a
reproducible script to help us replicate, if so?

Nathan Hoad

unread,
Oct 3, 2011, 8:07:55 PM10/3/11
to mongodb-user
Hmm no it appears I'm not having the problem anymore. Strange, but I'm
fine with that I suppose.

Thanks anyway!
Reply all
Reply to author
Forward
0 new messages