How to running Mapreduce on Secondary database to output other db.

Tom Vo

unread,

May 13, 2012, 10:59:45 PM5/13/12

to mongodb-user

Dear all
We setuped Replication Set for our system. And we faced out the
problem about performance when used map reduce to out put data. Could
we run map/reduce on Secondary database to output into other data?
Because warning message "too much data for in memory map/reduce"
appeared when we use option "inline:1".

Thanks and Best Regards.
======================================================
Diagram Mongodb
Version 2.0.4 - 64 bit
Memory (RAM) : 32.0GB
OS: Windows Server 2008 R2 Standard

Scott Hernandez

unread,

May 13, 2012, 11:08:34 PM5/13/12

to mongod...@googlegroups.com

No, secondaries cannot save data -- that would let them have data the
primary doesn't, and would be lost in terms of replica set
consistency; that is one reason that you can do an inline mapReduce
(where no perm. data is saved) on secondaries.

You should run your map/reduce on the primary.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>

Tom Vo

unread,

May 14, 2012, 12:52:32 AM5/14/12

to mongodb-user

Thanks for your reply
Please help me fix error message "too much data for in memory map/
reduce" when we ran map reduce.
Could you tuning the script below help us.
db.runCommand({
mapreduce: test,
map: function Map() {
emit(this._id.Name,{"Number": this.value.Number});
},
reduce: function Reduce(key, values) {
var Number = 0
values.forEach(function (val) {
Number += val.Number;
})
return {
"Number": Number
}
},
query: {
"_id.Times": {
$lte: d_i
}
},
out: {
inline: 1
}
,verbose: true
,jsMode: false
});

Best Regards

Scott Hernandez

unread,

May 14, 2012, 1:25:32 AM5/14/12

to mongod...@googlegroups.com

It means you are *returning* too much data. Can you break up the
map/reduce into multiple batches based on a query to only select part
of your collection by _id.Name?

Tom Vo

unread,

May 14, 2012, 1:50:55 AM5/14/12

to mongod...@googlegroups.com

Dear sir

How to break up the multiple batches. It's the same query:

set d_i = 10

Select _id.Name , count( value.Number ) as Number

From test

Group by _id.Name

Having _id.Times <= d_i

Best regards.

--

Thanks and Best Regards
-------------------------------------------------------------------------------------------------------------
Vo Tan Hau (TOM)
Senior Database Administrator
NEXCEL SOLUTIONS LTD
SMS Tower,Lot 40, Quang Trung Software City, District 12, HCMC, Vietnam.
Tel:+84-8-37154278 - Fax:+84-8-37154279 www.nexcel.vn
Messenger (Skype+Yahoo+Live): Vohau2002

Scott Hernandez

unread,

May 14, 2012, 8:40:09 AM5/14/12

to mongod...@googlegroups.com

Using the primary is the easiest approach, and best supported.

To break up the query for map/reduce into many batches you would have
to know the range of values for _id.Name so you could filter on them,
one range at a time, to produce results small enough for a series of
inline map/reduce commands.

Tom Vo

unread,

May 14, 2012, 11:15:03 AM5/14/12

to mongod...@googlegroups.com

Dear all
This trouble have been fixed.
Thanks for your support.

Mark Hansen

unread,

Sep 29, 2012, 8:06:09 PM9/29/12

to mongod...@googlegroups.com

I have a similar issue (see https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/29Ee_p33pRA). However, using the primary is not an option fo us. The primary is dedicated to handling large data loading tasks. We cannot do the map-reduce inline because the results sets are large and the distribution of data values is unknown, so we cannot break up the queries very easily.

Reply all

Reply to author

Forward