How to running Mapreduce on Secondary database to output other db.

843 views
Skip to first unread message

Tom Vo

unread,
May 13, 2012, 10:59:45 PM5/13/12
to mongodb-user
Dear all
We setuped Replication Set for our system. And we faced out the
problem about performance when used map reduce to out put data. Could
we run map/reduce on Secondary database to output into other data?
Because warning message "too much data for in memory map/reduce"
appeared when we use option "inline:1".

Thanks and Best Regards.
======================================================
Diagram Mongodb
Version 2.0.4 - 64 bit
Memory (RAM) : 32.0GB
OS: Windows Server 2008 R2 Standard

Scott Hernandez

unread,
May 13, 2012, 11:08:34 PM5/13/12
to mongod...@googlegroups.com
No, secondaries cannot save data -- that would let them have data the
primary doesn't, and would be lost in terms of replica set
consistency; that is one reason that you can do an inline mapReduce
(where no perm. data is saved) on secondaries.

You should run your map/reduce on the primary.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Tom Vo

unread,
May 14, 2012, 12:52:32 AM5/14/12
to mongodb-user
Thanks for your reply
Please help me fix error message "too much data for in memory map/
reduce" when we ran map reduce.
Could you tuning the script below help us.
db.runCommand({
mapreduce: test,
map: function Map() {
emit(this._id.Name,{"Number": this.value.Number});
},
reduce: function Reduce(key, values) {
var Number = 0
values.forEach(function (val) {
Number += val.Number;
})
return {
"Number": Number
}
},
query: {
"_id.Times": {
$lte: d_i
}
},
out: {
inline: 1
}
,verbose: true
,jsMode: false
});

Best Regards

Scott Hernandez

unread,
May 14, 2012, 1:25:32 AM5/14/12
to mongod...@googlegroups.com
It means you are *returning* too much data. Can you break up the
map/reduce into multiple batches based on a query to only select part
of your collection by _id.Name?

Tom Vo

unread,
May 14, 2012, 1:50:55 AM5/14/12
to mongod...@googlegroups.com
Dear sir
How to break up the multiple batches. It's the same query:

set d_i = 10
Select  _id.Name , count( value.Number ) as Number 
From  test 
Group by  _id.Name  
Having  _id.Times <= d_i

Best regards.
--

Thanks and Best Regards
-------------------------------------------------------------------------------------------------------------
Vo Tan Hau (TOM)
Senior Database Administrator
NEXCEL SOLUTIONS LTD
SMS Tower,Lot 40, Quang Trung Software City, District 12, HCMC, Vietnam.
Tel:+84-8-37154278 - Fax:+84-8-37154279 www.nexcel.vn
Messenger
(Skype+Yahoo+Live): Vohau2002


Scott Hernandez

unread,
May 14, 2012, 8:40:09 AM5/14/12
to mongod...@googlegroups.com
Using the primary is the easiest approach, and best supported.

To break up the query for map/reduce into many batches you would have
to know the range of values for _id.Name so you could filter on them,
one range at a time, to produce results small enough for a series of
inline map/reduce commands.

Tom Vo

unread,
May 14, 2012, 11:15:03 AM5/14/12
to mongod...@googlegroups.com
Dear all
This trouble have been fixed.
Thanks for your support.

Mark Hansen

unread,
Sep 29, 2012, 8:06:09 PM9/29/12
to mongod...@googlegroups.com
I have a similar issue (see https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/29Ee_p33pRA).  However, using the primary is not an option fo us.  The primary is dedicated to handling large data loading tasks.  We cannot do the map-reduce inline because the results sets are large and the distribution of data values is unknown, so we cannot break up the queries very easily.
Reply all
Reply to author
Forward
0 new messages