Google Groups

Re: Reduce don't work on documents with unique key in MapReduce?


Jenna deBoisblanc May 11, 2012 12:08 PM
Posted in group: mongodb-user
Hi Eason,
I apologize for misunderstanding your question- the reduce function
will not be called if a single document was emitted for a particular
key, which explains the discrepancy in the two documents resulting
from reduce function 2.




Function 2 works in this instance since reduce is run once for the key
"eason." When you're dealing with more data, however, the reduce
function is not guaranteed to process every value for a particular key
at one time.  For that reason, it's a good idea to get in the habit of
structuring the result of your reduce function in a way that can be
called by reduce more than once.




If you're only interested in the platform field, consider including a
finalize function in map/reduce. For more information on this subject,
please see http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-FinalizeFunction.

On May 10, 11:07 pm, eason wang <buffon...@gmail.com> wrote:
> Hi Jenna,
>
> Thanks for your reply and the modification on my Reduce function where
> indeed lies some bugs when i wrote this post.
> The demand is briefly that get the user name and the platform with the
> latest information.
> Here i give a tested example:
> The data:
> {
>   "_id" : ObjectId("4fac78d8a1681d11dc93b498"),
>   "name" : "eason",
>   "date" : 20120511,
>   "platform" : "ubuntu"}
>
> {
>   "_id" : ObjectId("4fac78f8a1681d11dc93b49a"),
>   "name" : "wang",
>   "date" : 20120511,
>   "platform" : "xp"}
>
> {
>   "_id" : ObjectId("4fac78eba1681d11dc93b499"),
>   "name" : "eason",
>   "date" : 20120512,
>   "platform" : "redhat"
>
> }
>
> Map function:
> function Map() {
> emit(this.name, {"platform":this.platform,"date":this.date});
>
> }
>
> Reduce function1:
> function Reduce(key, number) {
>      var date=0;
>          var platform;
>      for(var i in number){
>         if(date<number[i].date){
>            date=number[i].date; platform=number[i].platform;
>         }
>      }
>      return {"date":date,"platform":platform};
>
> }
>
> After this MapReduce, the result is as expected:
> {
>   "_id" : "eason",
>   "value" : {
>     "date" : 20120512.0,
>     "platform" : "redhat"
>   }}
>
> {
>   "_id" : "wang",
>   "value" : {
>     "platform" : "xp",
>     "date" : 20120511.0
>   }}
>
> /********************************************/
> Reduce function2:
> function Reduce(key, number) {
>      var date=0;
>          var platform;
>      for(var i in number){
>         if(date<number[i].date){
>            date=number[i].date; platform=number[i].platform;
>         }
>      }
>      return platform;
>
> }
>
> In this case, i want to modify the "value" structure of MapReduce
> result since the "date" is not what i concern. Strangely the result is
> that:
> {
>   "_id" : "eason",
>   "value" : "redhat"}
>
> {
>   "_id" : "wang",
>   "value" : {
>     "platform" : "xp",
>     "date" : 20120511.0
>   }}
>
> It seems that the "eason" documents can reach the demand as i desired,
> while the "wang" document (unique key) cannot. So must the quoation
> "the value that the reduce function returns must match the structure
> of the map function's emitted value" be followed? How to explain the
> unbalanced output with Reduce function2 (For unique key doc, it
> doesn't work, otherwise, it seems work). My current assumption is that
> in the MapReduce mechanism, if the unique-key document is detected
> after Map, it will bypass the Reduce process. Is that reasonable?
> Thanks!
>
> Regards,
> Eason Wang
>
> On 5月11日, 上午1时29分, Jenna <jenna.deboisbl...@10gen.com> wrote:
>
>
>
>
>
>
>
> > The reduce function will be called for documents with a unique key, in
> > this case "name." The important thing to remember about the reduce
> > function is that it may be invoked more than once for the same key.
> > For that reason, the value that the reduce function returns must match
> > the structure of the map function's emitted value.
>
> > Your reduce function currently does not return the same result as your
> > map function. It should return a document in the form, {platform: x,
> > date: y}.
>
> > In addition, "platform=number[i].key" does not work because "key" is
> > not emitted in your map function as a value, and so it will not be
> > stored in the "number" array.  So you could edit your reduce function
> > as follows:
>
> > function Reduce(key, number) {
> >      var result= {platform: 0, date:0};
> >      for(var i in number){
> >         if(date<number[i].date){
> >            date=number[i].date; platform=number[i].platform;
> >         }
> >      }
> >      return result;
>
> > }
>
> > To help address your other question about modifying the "value," could
> > you provide an example of the way in which you would like to modify
> > your data? This may be possible in map-reduce, but it's hard to
> > provide a specific solution without knowing your desired output.
>
> > On May 10, 7:07 am, eason wang <buffon...@gmail.com> wrote:
>
> > > Hi,
>
> > > For example, i wrote map and reduce function as follows,
>
> > > function Map() {
> > > emit(this.name, {"platform":this.platform,"date":this.time});
>
> > > }
>
> > > function Reduce(key, number) {
> > >         var platform; var date=0 ;
> > >         for(var i in number)
> > >         {if(date<number[i].date)
> > > {date=number[i].date;platform=number[i].key;}   }
> > >         return platform;
>
> > > }
>
> > > In the reduce function, i want to modify the structure of the "value"
> > > after map, but i guess that the reduce function is not called for
> > > those documents with only unique "name".
> > > Is that right? And how can i make such documents reformed?
> > > Thanks!