mapreduce vs find

35 views
Skip to first unread message

Jamal

unread,
Jan 30, 2011, 2:21:51 PM1/30/11
to mongodb-user
Hello,

I don't see any difference in polling the data out and doing whatever
or use mapreduce.

user system total real
ruby::::: 12.910000 0.280000 13.190000 ( 14.179250)
mapreduce: 0.000000 0.000000 0.000000 ( 14.005307)

---------------------------------------------------------------------------------------------------------------------------------------------------
def execute_by_ruby
result = Log.where({:path => "http://www.anything.com/
velkommen.aspx"})

array = Array.new
result.each do |row|
array << row
end

return array.length
end
-----------------------------------------------------------------------------------------------------------------------------------------------------
db.logs.mapReduce(map, reduce, {query: {path: "http://www.anything.com/
velkommen.aspx"}, out: "logs"})
{
"result" : "logs",
"timeMillis" : 13410,
"counts" : {
"input" : 112814,
"emit" : 112814,
"output" : 60672
},
"ok" : 1,
}

-----------------------------------------------------------------------------------------------------------------------------------------------------

Is that true they are equal? even though I have index on path?

I'm trying to optimize my app, but it's hard to it since their is no
many options.

Should I go with shading, would it make any big difference?

Thanks for any answer.

Adrien Mogenet

unread,
Jan 30, 2011, 3:09:04 PM1/30/11
to mongodb-user
MapReduce becomes useful in sharded environment.
Actually, "mapreduce" itself is not fast. I even say that it's almost
the slowest thing you can do, at least with the current implementation
in MongoDB.
However, it scales extremely well.
ie: your map/reduce function will not be longer on 1000 shards ;
your find() will (if we suppose it has to scan each shard)

Jamal

unread,
Jan 30, 2011, 5:21:36 PM1/30/11
to mongodb-user
I just tried to dump the same query from the console using mongodump.
It took less then 1 sec to be done?

How can they fetch the data out soooo fast.

Is the ruby driver slow?
112814 documents on 13410millis good versus other (redis or mysql).

I'm using it for logging apache stuff.

Alvin Richards

unread,
Jan 30, 2011, 5:33:28 PM1/30/11
to mongodb-user
You should check the query plan. You do this by adding explain() to
the query, e.g.

db.logs.find().explain()

This will confirm whether you are using the intended indexes or not.

What is the logic differences between your MapReduce and your find/
mongodump?

-Alvin

Jamal

unread,
Jan 31, 2011, 3:44:51 AM1/31/11
to mongodb-user
Well, the find().explain() takes less then few millis about 80millis.

What takes ages is to get the data out of the mongo server, around
13sec, and it's only one field I extract from the server.

(mongodump takes less then 1 sec to extract the data out on a file,
but it blocks all other queries to mongo server)

When I use mapReduce to do the calculation on the server it does take
the same amount sec as If I extract the data on my local machine and
did the calculation without mongodb.

I'm only extract/fetching one field out.
111814 documents
{_id etc., cookie: "aj#as2"}
etc.

Where should I go from here?

{
"cursor" : "BtreeCursor path_1_date_1_cookie_1",
"nscanned" : 112814,
"nscannedObjects" : 112814,
"n" : 112814,
"millis" : 132,
"indexBounds" : {
"path" : [
[
"http://www.anything.com/index.aspx",
"http://www.anything.com/index.aspx"
]
],
"date" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
],
"cookie" : [
[
{
"$minElement" : 1
},
{
"$maxElement" : 1
}
]
]
}
}

Thanks again for your answer.

Nat

unread,
Jan 31, 2011, 3:49:48 AM1/31/11
to mongod...@googlegroups.com
- What does your schema/index look like?
- can you also get db.printCollectionStats()
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Jamal

unread,
Jan 31, 2011, 4:03:10 AM1/31/11
to mongodb-user
Schema:

> db.logs.find()
{ "_id" : ObjectId("4d2f584de844dc012e07c229"), "cookie" : "DNG8Y",
"path" : "http://www.anything.com/default.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c234"), "cookie" : "DUQksQ",
"path" : "http://www.anything.com/da/vores-rejsemaal/europa/grkenland/
pages/default.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c235"), "cookie" : "DIgGy",
"path" : "http://www.anything.com/DA/tilbud-og-rabatter/ugens-
kampagner/Pages/sol-i-januar.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c237"), "cookie" : "DNGlD",
"path" : "http://www.anything.com/DA/vores-rejsemaal/Europa/Spanien/
Fuerteventura/LasPlayitas/Accomodations/Pages/default.aspx", "date" :
20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c238"), "cookie" : "DNHBz",
"path" : "http://www.anything.com/da/hvordan-vil-du-rejse/pages/
hvordan-vil-du-rejse.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c24c"), "cookie" : "D4lgUA",
"path" : "http://www.anything.com/da/vores-rejsemaal/europa/grkenland/
pages/default.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c24f"), "cookie" : "CtJB5",
"path" : "http://www.anything.com/da/tilbud-og-rabatter/last-minute/
pages/last-minute.aspx#", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c251"), "cookie" : "CqJd",
"path" : "http://www.anything.com/DA/tilbud-og-rabatter/ugens-
kampagner/Pages/sol-i-januar.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c252"), "cookie" : "CtJB5",
"path" : "http://www.anything.com/da/tilbud-og-rabatter/ugens-
kampagner/pages/ugens-kampagner.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c255"), "cookie" : "DNG-_",
"path" : "http://www.anything.com/da/tilbud-og-rabatter/last-minute/
pages/last-minute.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c257"), "cookie" : "DNHDC",
"path" : "http://www.anything.com/da/velkommen/pages/velkommen.aspx",
"date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c258"), "cookie" : "DNHDD",
"path" : "http://www.anything.com/da/velkommen/pages/velkommen.aspx",
"date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c259"), "cookie" : "DUuyoA",
"path" : "http://www.anything.com/DA/vores-rejsemaal/Europa/Grkenland/
Kreta/Pages/default.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c25a"), "cookie" : "DMQsy",
"path" : "http://www.anything.com/da/velkommen/pages/velkommen.aspx",
"date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c25b"), "cookie" : "DNHDH",
"path" : "http://www.anything.com/da/om-apollo/kontakt-os/pages/
kontakt-os.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c25d"), "cookie" : "DNHDM",
"path" : "http://www.anything.com/da/vores-rejsemaal/europa/pages/
default.aspx", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c25e"), "cookie" : "DNHDT",
"path" : "http://www.anything.com/da/velkommen/pages/velkommen.aspx",
"date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c261"), "cookie" : "DNHDW",
"path" : "http://www.anything.com/da/velkommen/pages/velkommen.aspx",
"date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c263"), "cookie" : "DNG_e",
"path" : "http://www.playitas.info/da/sport/ugeprogram-
sportsaktiviteter", "date" : 20101211 }
{ "_id" : ObjectId("4d2f584de844dc012e07c266"), "cookie" : "DINX6",
"path" : "http://www.anything.com/DA/om-apollo/kontakt-os/Pages/
kontakt-os.aspx", "date" : 20101211 }



Indexes:

db.logs.getIndexes()
[
{
"name" : "_id_",
"ns" : "apollo.logs",
"key" : {
"_id" : 1
}
},
{
"_id" : ObjectId("4d4124681f6799352c96399b"),
"ns" : "apollo.logs",
"key" : {
"path" : 1,
"date" : 1,
"cookie" : 1
},
"name" : "path_1_date_1_cookie_1"
}
]

db.printCollectionStats()

logs
{
"ns" : "apollo.logs",
"count" : 1587354,
"size" : 234930628,
"avgObjSize" : 148.00140863348693,
"storageSize" : 289510656,
"numExtents" : 16,
"nindexes" : 2,
"lastExtentSize" : 55183872,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 275800064,
"indexSizes" : {
"_id_" : 59392000,
"path_1_date_1_cookie_1" : 216408064
},
"ok" : 1
}
---
system.indexes
{
"ns" : "apollo.system.indexes",
"count" : 2,
"size" : 192,
"avgObjSize" : 96,
"storageSize" : 75264,
"numExtents" : 3,
"nindexes" : 0,
"lastExtentSize" : 57344,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"ok" : 1
}



I can only think of.

* ruby driver is slow
* mongo is slow to extract data
* I need to use sharding

Nat

unread,
Jan 31, 2011, 4:10:51 AM1/31/11
to mongodb-user
problem could be that there is no exact match for an index to use with
MR. Can you try to create an index on "path" only and see whether MR
performs faster?
> ...
>
> read more »

Jamal

unread,
Jan 31, 2011, 4:29:49 AM1/31/11
to mongodb-user
user system total real
ruby: 7.820000 0.130000 7.950000 ( 8.361832)
mapreduce 0.000000 0.000000 0.000000 ( 13.975395)

Mapreduce:
Log.collection.map_reduce("function() { emit(this.cookie, {count: 1,
valid:0}); }", "function(key, values) { var sum = 0;
values.forEach(function(doc) { sum += doc.count; }); return {count:
sum, valid:false}; };", {:query => {:path => "http://www.anything.com/
da/velkommen/pages/velkommen.aspx"}, :out => "test"})

Ruby
result = Log.only(:cookie).where(:path => "http://www.anything.comda/
velkommen/pages/velkommen.aspx")
p "Result: " + result.count.to_s
result.map {|r| }


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

What I'm trying to accomplish here is count how many users that exists
that have path="value".

But I cannot use group because their is limitation on it.

So I cannot find any other solution then extracting the data and
counting the cookies(unique) based on the condition.
> ...
>
> read more »

Nat

unread,
Jan 31, 2011, 8:47:57 AM1/31/11
to mongodb-user
If so, it means the mapreduce already use the right query plan. It
took 14 seconds because of javascript overhead.
> > > >                                 {...
>
> read more »

Jamal

unread,
Jan 31, 2011, 9:00:03 AM1/31/11
to mongodb-user
Can I do anything now or should I move on to Redis for performance?

* group is limited to 10.000 keys, can only be used for small amount
of data
* mapreduce is very slow. (maybe in the future it will be faster)
* extract data from mongo server is also slow. (112.000documents in 8
sec).

The funny part is that I'm trying to accomplish a simple easy task and
mongo cannot accomplish that.

group({keys : {cookie:1}, query: {path: "http://anything.com/index"}})

Count unique cookies based on path value.

It should not be a challenge for 1,5m documents.
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 9:22:46 AM1/31/11
to mongod...@googlegroups.com
You should use distinct.
Will be much faster and harder to reach the limits.
Limits are also much higher in 1.7.5

Jamal

unread,
Jan 31, 2011, 9:35:45 AM1/31/11
to mongodb-user
I already tried distinct,,,it took ages.

I'm also running.

** NOTE: This is a development version (1.7.5) of MongoDB.
** Not recommended for production.
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 9:37:05 AM1/31/11
to mongod...@googlegroups.com
Did you run distinct with the query?
It certainly should be a lot faster than map/reduce.
If its not, then you're either missing something or you found a bug.

Jamal

unread,
Jan 31, 2011, 9:46:08 AM1/31/11
to mongodb-user
It took ages, I don't know why. (more then 30sec)

It's finished now, Im using the mongo console.

db.logs.distinct("cookie", {path:"http://www.anything.com/da/velkommen/
pages/velkommen.aspx"})
[
"#26",
"#300",
"-28z",
"-8WM",
"-EKj",
"-HRh",
"-IyG",
"-KX9",
"-KiH",
"-NKa",
"-NfU",
"-Ofq",
"-RyN",
"-TGl",
"-U46",
"-UPt",
"-Vw6",
"-WIy",
"-Wjj",
"-XL5",
"-XvE",
"-Z8Q",
"-aCG",
"-aGK",
"-ahj",
"-c60",
"-cEJ",
"-cjn",
"-h-O",

I cannot see any bug?
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 9:49:23 AM1/31/11
to mongod...@googlegroups.com
Can you do:

db.runCommand( { distinct : "logs" , key : "cookie" , query : {
path:"http://www.anything.com/da/velkommen/pages/velkommen.aspx"})

There is more debugging there.

Jamal

unread,
Jan 31, 2011, 9:54:23 AM1/31/11
to mongodb-user
I cannot see any debugging information?

I waited and waiting and in the end...

db.runCommand( { distinct : "logs" , key : "cookie" , query : { path:
"http://www.anything.com/da/velkommen/pages/velkommen.aspx" }});
{
"values" : [
"#26",
"#300",
"-28z",
"-8WM",
"-EKj",
"-HRh",

.....

"yq-Y"
],
"ok" : 1
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 9:56:06 AM1/31/11
to mongod...@googlegroups.com
Are you sure the server is 1.7.5?
It was added in the 1.7 series.

db.version() will tell you.

Jamal

unread,
Jan 31, 2011, 10:03:54 AM1/31/11
to mongodb-user
I'm sorry, I was starting the old console and it connected to the old
mongo server

> db.version()
1.6.5

Now I using the new console and connecting to the new port...

> db.version()
1.7.5

but it still take ages..

db.runCommand( { distinct : "logs" , key : "cookie" , query : { path:
"http://www.anything.com/da/velkommen/pages/velkommen.aspx" }});
{
"values" : [
"#26",
"#300",
"-28z",
.....
],
"stats" : {
"n" : 112814,
"nscanned" : 112814,
"nscannedObjects" : 112814
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:06:40 AM1/31/11
to mongod...@googlegroups.com
How many distinct values are there?

Jamal

unread,
Jan 31, 2011, 10:12:20 AM1/31/11
to mongodb-user
60672 out of 112814 collection.
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:16:38 AM1/31/11
to mongod...@googlegroups.com
Ok, one last thing.
Can you copy and paste this whole thing into the shell:
----
print( Date() );
res = db.logs.runCommand( "distinct" , { key : "cookie" , query : {
path:"http://www.anything.com/da/velkommen/pages/velkommen.aspx" } } )
printjson( res.stats )
print( Date() )
-----

Jamal

unread,
Jan 31, 2011, 10:17:06 AM1/31/11
to mongodb-user
Mon Jan 31 16:09:09 [conn1] query apollo.$cmd ntoreturn:1 command:
{ distinct: "logs", key: "cookie", query: { path: "http://
www.anything.com/da/velkommen/pages/velkommen.aspx" } } reslen:1025342
425ms
> ...
>
> read more »

Jamal

unread,
Jan 31, 2011, 10:22:53 AM1/31/11
to mongodb-user
print( Date() );
Mon Jan 31 2011 16:20:41 GMT+0100 (CET)
> res = db.logs.runCommand( "distinct" , { key : "cookie" , query : { path:"http://www.anything.com/da/velkommen/pages/velkommen.aspx" } } )
...
"DU3AGg",
"DWnzr"
],
"stats" : {
"n" : 112814,
"nscanned" : 112814,
"nscannedObjects" : 112814
},
"ok" : 1
}
> printjson( res.stats )
{ "n" : 112814, "nscanned" : 112814, "nscannedObjects" : 112814 }
> print( Date() )
Mon Jan 31 2011 16:22:17 GMT+0100 (CET)
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:23:45 AM1/31/11
to mongod...@googlegroups.com
So it looks like you've run into a mongo shell performance issue,not a
server issue.
It took 425ms to run the distinct.
and 30 seconds for the javascript shell to display it.
What language are you going to be running this in?

Jamal

unread,
Jan 31, 2011, 10:29:28 AM1/31/11
to mongodb-user
I will use one of the ruby drivers.

I have no idea how I can fetch the data out of mongo server fast?

If the mongo console is slow, and ruby drivers is slow?

What should I do?
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:32:05 AM1/31/11
to mongod...@googlegroups.com
Why do you think the ruby driver is slow?
Can you run distinct from the ruby driver?
You're not going to be outputting the results to the console, right?

try putting this on one line

print( Date() ); res = db.logs.runCommand( "distinct" , { key :
"cookie" , query : {
path:"http://www.anything.com/da/velkommen/pages/velkommen.aspx" } }

); printjson( res.stats) ; print( Date() );

Jamal

unread,
Jan 31, 2011, 10:37:50 AM1/31/11
to mongodb-user
I will use ruby driver?
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:39:09 AM1/31/11
to mongod...@googlegroups.com
Can you run the same distinct from ruby?

Jamal

unread,
Jan 31, 2011, 10:43:09 AM1/31/11
to mongodb-user
I just want to know how many of my users visited that url.

I cannot make this happened fast without extracting the data out :S

group is limited
map_reduce is slow
distinct may help.

but when I go further and want to know how many times every user have
visited that url, what can I do :D

then I'm back with a simple issue that mongo cannot solve.
> ...
>
> read more »

Eliot Horowitz

unread,
Jan 31, 2011, 10:45:00 AM1/31/11
to mongod...@googlegroups.com
you may have had the same issue with group(), i.e. the actual slow
part was the shell displaying data.
can you do the same trick with group?
you can look in the log for the actual group computation time

Jamal

unread,
Jan 31, 2011, 10:52:28 AM1/31/11
to mongodb-user
I cannot even use group because of the 10.000 limitation you put.

Ruby:
distinct 0.500000 0.060000 0.560000 ( 0.993592)
Log.where(:path => "http://www.anything.com/da/velkommen/pages/
velkommen.aspx").distinct('cookie').count()

This somehow fixed the problem for counting on unique cookie.

But If I were to count based on how many cookie their is for every
user.

Then this take me in the mapreduce direction which is slower then
extracting the data out and making the calculation for my self.

"documents: 112814"
ruby 9.920000 0.210000 10.130000 ( 10.849003)
mapreduce 0.000000 0.000000 0.000000 ( 10.403757)
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages