Missing MapReduce docs

20 views
Skip to first unread message

Ben McCann

unread,
Jun 16, 2012, 4:19:30 PM6/16/12
to mongod...@googlegroups.com
Hi,

The MapReduce docs have docs for the counts object in the result:
  counts : {
       input :  <number of objects scanned>,
       emit  : <number of times emit was called>,
       output : <number of items in output collection>
  } ,

However, my counts object looks like:
"counts" : {
"input" : 561244,
"emit" : 23126721,
"reduce" : 1622679,
"output" : 21210800
},

What's the reduce number that's displayed here?  Is it the number of times that reduce is called?  Why is reduce different than output?

Thanks,
Ben

Ben McCann

unread,
Jun 16, 2012, 5:20:46 PM6/16/12
to mongod...@googlegroups.com
To clarify, here is my reduce function:
  reduce : function(key, values) {
    var result = {};
    result[key] = values.length;
    return result;
  },

This should be 1:1 from emit to output, shouldn't it?

Ben McCann

unread,
Jun 17, 2012, 3:07:59 PM6/17/12
to mongod...@googlegroups.com
Ah, thank you!  I did not realize that reduce would be called more than once for the same key.  I guess I could have figured that out by reading the docs more closely.  I'm pretty sure when I was at Google that they would call reduce only once per key and so I just assumed it worked the same way for MongoDB.  The values in my reduce function were ints, so I should have been outputting ints as well since the data type needs to be the same due to the potential of reduce being called on it's own output.

Thanks!

-Ben


On Sun, Jun 17, 2012 at 8:34 AM, Joshua Marsh <jos...@themarshians.com> wrote:
I believe the reduce number is the number of times the reduce function was called, as you suggest. It is different because the reduce is called with groups of emits. The map/reduce algorithm may not get all of the emits for a particular key before it calls the reduce function. If may call reduce on a batch of 20 it finds early in processing. Near the end of the processing, it may find another two. In this case, it's going to call reduce on the previous results of the 20, and the two new values. 

Your reduce algorithm determines what the output will be. Normally, this is smaller. I'm guessing that what you think your algorithm is doing isn't actually right. It looks like you are trying to count something and store it in an object. The problem is that you are overwriting the previous values. In the scenario above, you might expect a count of 22, but your result object would be { key: 3 }. If you aren't getting the results you are expecting, can you provide a sample of relevant data,  you map function, and what you are trying to do?
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Reply all
Reply to author
Forward
0 new messages