MapReduce

51 views
Skip to first unread message

tom

unread,
Jan 2, 2012, 1:00:37 AM1/2/12
to luam...@googlegroups.com
mapreduce doesn't work?

res,err = db:mapreduce(ns jsmapfunc, jsreducefunc[, query[, output]])

tested with a simple count.
delivers always a empty table.

    local n = "test.country"
    local m = "function() {emit(this.p_type, {cname: this.name});}"
    local r = "function(key, values) {var sum = 0;values.forEach(function(doc) {sum += doc.namey;});return {summe: sum};}"
    local q = "{ query : {}, out: {inline: 1}}"

    local  res,err = db:mapreduce(n, m, r, q)

remark: out: {inline: 1} should deliver the results direct instead to a var.


Evan

unread,
Jan 2, 2012, 12:52:57 PM1/2/12
to luamongo
Thanks for the report, Tom. Can you paste a small test dataset?
JSON data is fine.

-Evan

tombo...@gmail.com

unread,
Jan 3, 2012, 5:37:13 PM1/3/12
to luam...@googlegroups.com

COUNTING EXAMPLE VIA SHELL:

> db.things.insert( { _id : 1, tags : ['dog', 'cat'] } );
> db.things.insert( { _id : 2, tags : ['cat'] } );
> db.things.insert( { _id : 3, tags : ['mouse', 'cat', 'dog'] } );
> db.things.insert( { _id : 4, tags : []  } );

> m = function(){this.tags.forEach(function(z){emit( z , { count : 1 } );});};
> r = function( key , values ){ var total = 0; for ( var i=0; i<values.length; i++ ) total += values[i].count; return { count : total };};
> res = db.things.mapReduce(m, r, { out : {inline : 1} } );

delivers:

{
"results" : [
{
"_id" : "cat",
"value" : {
"count" : 3
}
},
{
"_id" : "dog",
"value" : {
"count" : 2
}
},
{
"_id" : "mouse",
"value" : {
"count" : 1
}
}
],
"timeMillis" : 0,
"counts" : {
"input" : 4,
"emit" : 6,
"reduce" : 2,
"output" : 3
},
"ok" : 1,
}


LUAMONGO

local n = "test.things"
local m = "function(){this.tags.forEach(function(z){emit( z , { count : 1 } ) ; }) ; } ;"
local r = "function( key , values ){ var total = 0 ; for ( var i=0 ; i<values.length; i++ ) total += values[i].count ; return { count : total } ; } ;"
local q = {query = {}} --, out = {inline = 1}} 
local o =  "example"

local  res,err = mdb:mapreduce(n, m, r, q, o)

this code creates a valid return table but doesn't process the records.

ok 1
counts table: 0x7f97e0
timeMillis 1
result example


* res,err = db:mapreduce(ns, jsmapfunc, jsreducefunc, query, output)

the output file (o) is another function parameter ( -> const char *output = luaL_optstring(L, 6, ""); )
so the out param within (q) is ignored. mongo provides some nice output options (replace/merge/reduce) and
{inline :1} which creates the result collection to ram and returning directly as results array.
don't know if the current binding is capable to use this options.

since mongo doesn't allow multhithreaded MR the bigggest advantage of MR is gone but
at this stage this MR can create very easy some usefull persistent aggregate collection serverside named by (o)
so it's a nice to have. a full featured MR could be emulate with coroutines via lua or via mongo-hadoop ->
https://github.com/mongodb/mongo-hadoop

remark: out : {inline = 1} should return RESULTS -> table  and out : 'cname' should return RESULT -> string (name of coll saved to current db)

see -> http://www.mongodb.org/display/DOCS/MapReduce   -> Output options

tom














tombo...@gmail.com

unread,
Jan 5, 2012, 5:10:48 PM1/5/12
to luam...@googlegroups.com

change:
----------------------------------------------------------------------


const char *output = luaL_optstring(L, 6, "");

res = dbclient->mapreduce(ns, jsmapfunc, jsreducefunc, query,
----------------------------------------------------------------------
to:
----------------------------------------------------------------------
BSONObj res;
if (!lua_isnoneornil(L, 6)) {


const char *output = luaL_optstring(L, 6, "");

res = dbclient->mapreduce(ns, jsmapfunc, jsreducefunc, query,
output);
} else {
res = dbclient->mapreduce(ns, jsmapfunc, jsreducefunc, query);
}
----------------------------------------------------------------------
will enable inline results.

o = nil -> inline


res,err = mdb:mapreduce(n, m, r, q, o)

print (res.results[1]._id) -> cat
print (res.results[1].value.count) --> 3


Reply all
Reply to author
Forward
0 new messages