Google Groups

Re: Reduce don't work on documents with unique key in MapReduce?


eason wang May 15, 2012 11:39 PM
Posted in group: mongodb-user
The data model is the users's logs, for example, the platform, user
settings are collected into our mongoDB, and such kinds of information
may count a lot. Sometimes we want to make statistics to analyze the
current trend based on DB, so map/reduce may be used in many cases.
Now it seems that the single-thread JS engine in map/reduce become our
bottleneck while computing.
Thank you very much for the information.

Regards,
Eason Wang

On 5月15日, 上午5时39分, Jenna <jenna.deboisbl...@10gen.com> wrote:
> The best-suited/fastest aggregation process really depends upon your
> data and what you're trying to do.  Can you give us more information
> about your data input/output?
>
> Mongo-hadoop may prove to be faster than map/reduce.  The new
> aggregation operators, which are currently available in mongodb 2.1.0
> (unstable release), may also improve aggregation speed (more info can
> be found here:http://docs.mongodb.org/manual/reference/aggregation/?highlight=map%2...).
>  Sharding the input is another possible way to make map/reduce faster,
> but again, it all depends upon your data model.
>
> On May 12, 6:23 am,easonwang <buffon...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Aha! Thank you very much, and now I have much deeper understanding of
> > the way MapReduce works in MongoDB with your explanation and links
> > given. The misunderstanding is caused by my poor English, Haha...By
> > the way, if the process efficiency of MapReduce under JS engine is not
> > high enough, do you have any suggestions on some ways to level up the
> > speed, such as Sharding, mongo-hadoop or more?
>
> > Regards,
>
> >EasonWang
>
> > On 5月12日, 上午3时08分, Jenna <jenna.deboisbl...@10gen.com> wrote:
>
> > > HiEason,
> > > I apologize for misunderstanding your question- the reduce function
> > > will not be called if a single document was emitted for a particular
> > > key, which explains the discrepancy in the two documents resulting
> > > from reduce function 2.
>
> > > Function 2 works in this instance since reduce is run once for the key
> > > "eason." When you're dealing with more data, however, the reduce
> > > function is not guaranteed to process every value for a particular key
> > > at one time.  For that reason, it's a good idea to get in the habit of
> > > structuring the result of your reduce function in a way that can be
> > > called by reduce more than once.
>
> > > If you're only interested in the platform field, consider including a
> > > finalize function in map/reduce. For more information on this subject,
> > > please seehttp://www.mongodb.org/display/DOCS/MapReduce#MapReduce-FinalizeFunction.
>
> > > On May 10, 11:07 pm,easonwang <buffon...@gmail.com> wrote:
>
> > > > Hi Jenna,
>
> > > > Thanks for your reply and the modification on my Reduce function where
> > > > indeed lies some bugs when i wrote this post.
> > > > The demand is briefly that get the user name and the platform with the
> > > > latest information.
> > > > Here i give a tested example:
> > > > The data:
> > > > {
> > > >   "_id" : ObjectId("4fac78d8a1681d11dc93b498"),
> > > >   "name" : "eason",
> > > >   "date" : 20120511,
> > > >   "platform" : "ubuntu"}
>
> > > > {
> > > >   "_id" : ObjectId("4fac78f8a1681d11dc93b49a"),
> > > >   "name" : "wang",
> > > >   "date" : 20120511,
> > > >   "platform" : "xp"}
>
> > > > {
> > > >   "_id" : ObjectId("4fac78eba1681d11dc93b499"),
> > > >   "name" : "eason",
> > > >   "date" : 20120512,
> > > >   "platform" : "redhat"
>
> > > > }
>
> > > > Map function:
> > > > function Map() {
> > > > emit(this.name, {"platform":this.platform,"date":this.date});
>
> > > > }
>
> > > > Reduce function1:
> > > > function Reduce(key, number) {
> > > >      var date=0;
> > > >          var platform;
> > > >      for(var i in number){
> > > >         if(date<number[i].date){
> > > >            date=number[i].date; platform=number[i].platform;
> > > >         }
> > > >      }
> > > >      return {"date":date,"platform":platform};
>
> > > > }
>
> > > > After this MapReduce, the result is as expected:
> > > > {
> > > >   "_id" : "eason",
> > > >   "value" : {
> > > >     "date" : 20120512.0,
> > > >     "platform" : "redhat"
> > > >   }}
>
> > > > {
> > > >   "_id" : "wang",
> > > >   "value" : {
> > > >     "platform" : "xp",
> > > >     "date" : 20120511.0
> > > >   }}
>
> > > > /********************************************/
> > > > Reduce function2:
> > > > function Reduce(key, number) {
> > > >      var date=0;
> > > >          var platform;
> > > >      for(var i in number){
> > > >         if(date<number[i].date){
> > > >            date=number[i].date; platform=number[i].platform;
> > > >         }
> > > >      }
> > > >      return platform;
>
> > > > }
>
> > > > In this case, i want to modify the "value" structure of MapReduce
> > > > result since the "date" is not what i concern. Strangely the result is
> > > > that:
> > > > {
> > > >   "_id" : "eason",
> > > >   "value" : "redhat"}
>
> > > > {
> > > >   "_id" : "wang",
> > > >   "value" : {
> > > >     "platform" : "xp",
> > > >     "date" : 20120511.0
> > > >   }}
>
> > > > It seems that the "eason" documents can reach the demand as i desired,
> > > > while the "wang" document (unique key) cannot. So must the quoation
> > > > "the value that the reduce function returns must match the structure
> > > > of the map function's emitted value" be followed? How to explain the
> > > > unbalanced output with Reduce function2 (For unique key doc, it
> > > > doesn't work, otherwise, it seems work). My current assumption is that
> > > > in the MapReduce mechanism, if the unique-key document is detected
> > > > after Map, it will bypass the Reduce process. Is that reasonable?
> > > > Thanks!
>
> > > > Regards,
> > > >EasonWang
>
> > > > On 5月11日, 上午1时29分, Jenna <jenna.deboisbl...@10gen.com> wrote:
>
> > > > > The reduce function will be called for documents with a unique key, in
> > > > > this case "name." The important thing to remember about the reduce
> > > > > function is that it may be invoked more than once for the same key.
> > > > > For that reason, the value that the reduce function returns must match
> > > > > the structure of the map function's emitted value.
>
> > > > > Your reduce function currently does not return the same result as your
> > > > > map function. It should return a document in the form, {platform: x,
> > > > > date: y}.
>
> > > > > In addition, "platform=number[i].key" does not work because "key" is
> > > > > not emitted in your map function as a value, and so it will not be
> > > > > stored in the "number" array.  So you could edit your reduce function
> > > > > as follows:
>
> > > > > function Reduce(key, number) {
> > > > >      var result= {platform: 0, date:0};
> > > > >      for(var i in number){
> > > > >         if(date<number[i].date){
> > > > >            date=number[i].date; platform=number[i].platform;
> > > > >         }
> > > > >      }
> > > > >      return result;
>
> > > > > }
>
> > > > > To help address your other question about modifying the "value," could
> > > > > you provide an example of the way in which you would like to modify
> > > > > your data? This may be possible in map-reduce, but it's hard to
> > > > > provide a specific solution without knowing your desired output.
>
> > > > > On May 10, 7:07 am,easonwang <buffon...@gmail.com> wrote:
>
> > > > > > Hi,
>
> > > > > > For example, i wrote map and reduce function as follows,
>
> > > > > > function Map() {
> > > > > > emit(this.name, {"platform":this.platform,"date":this.time});
>
> > > > > > }
>
> > > > > > function Reduce(key, number) {
> > > > > >         var platform; var date=0 ;
> > > > > >         for(var i in number)
> > > > > >         {if(date<number[i].date)
> > > > > > {date=number[i].date;platform=number[i].key;}   }
> > > > > >         return platform;
>
> > > > > > }
>
> > > > > > In the reduce function, i want to modify the structure of the "value"
> > > > > > after map, but i guess that the reduce function is not called for
> > > > > > those documents with only unique "name".
> > > > > > Is that right? And how can i make such documents reformed?
> > > > > > Thanks!