mongomapper overhead

kuniklo

unread,

Feb 3, 2010, 6:18:44 PM2/3/10

to MongoMapper

I've just started experimenting with MongoMapper and, while the
features seem very appealing, it seems to add a *lot* of overhead to
the queries I want to run. To demonstrate I wrote a little benchmark
script to test the same query run directly via the ruby lib and then
via MM:

#!/usr/bin/env
ruby

require 'mongo'
require 'mongo_mapper'

con = Mongo::Connection.new("mongo").db("cs-development")
hosts = con.collection "hosts"

MongoMapper.connection = Mongo::Connection.new('mongo')
MongoMapper.database = "cs-development"

start = Time.now
count = hosts.find({"os_name" => "Linux"}).collect { |i| i
['os_name'] }.size
finish = Time.now
puts "it took #{finish - start} seconds to find #{count} records"

class Host
include MongoMapper::Document
end

start = Time.now
count = Host.all({"os_name" => "Linux"}).collect { |i| i
['os_name'] }.size
finish = Time.now
puts "it took #{finish - start} seconds to find #{count} records"

The output I get is this:
it took 0.812456168 seconds to find 2872 records
it took 64.732907105 seconds to find 2872 records

So MM here seems to be slower by almost two orders of magnitude. The
records themselves are fairly simple hashes with no date fields.

Am I missing something obvious?

Matt Powell

unread,

Feb 3, 2010, 7:55:56 PM2/3/10

to mongo...@googlegroups.com

Well, for a start, this is young software. John has been very open about the fact that very little has been done in the way of speed optimisation (John, punch me if I'm speaking out of turn here).

Having said that, the great news is that it's open source, so if you have performance concerns or feature requests, I daresay you have at least one sleeve you could be rolling up ;)

Something to check: is mongo_ext installed? Is your MM setup taking advantage of it? There was a message about this in this group the other day.

M

> --
> You received this message because you are subscribed to the Google
> Groups "MongoMapper" group.
> For more options, visit this group at
> http://groups.google.com/group/mongomapper?hl=en?hl=en

Miles Egan

unread,

Feb 3, 2010, 10:46:49 PM2/3/10

to mongo...@googlegroups.com

I'm happy to pitch in. Just wondering if I'm doing something wrong or
this is what I should expect at this stage of optimization.

mongo_ext is definitely installed but I'll double-check that MM is using it.

On Wed, Feb 3, 2010 at 4:55 PM, Matt Powell <faux...@gmail.com> wrote:
> Well, for a start, this is young software. John has been very open about the fact that very little has been done in the way of speed optimisation (John, punch me if I'm speaking out of turn here).
>
> Having said that, the great news is that it's open source, so if you have performance concerns or feature requests, I daresay you have at least one sleeve you could be rolling up ;)
>
> Something to check: is mongo_ext installed? Is your MM setup taking advantage of it? There was a message about this in this group the other day.

--
miles
http://kuniklo.bandcamp.com

Joe Van Overberghe

unread,

Feb 3, 2010, 8:16:00 PM2/3/10

to mongo...@googlegroups.com

This may help:

http://www.mongodb.org/display/DOCS/Ruby+Tutorial

John Nunemaker

unread,

Feb 4, 2010, 12:29:38 AM2/4/10

to mongo...@googlegroups.com

I have no clue what kind of data you have in there. Also, it seems like a pretty naive benchmark. Definitely there is overhead using MM instead of the pure ruby driver. There is overhead using AR instead of the mysql gem. You can't really compare them as it is all about what you get.

I have not focused on performance. That said, the master branch is probably 2-3 times faster than 0.6.10 (the last release) as I applied a great patch from a helping hand. Also, we are currently using MM in production on an app and it is plenty fast and getting better.

Miles Egan

unread,

Feb 4, 2010, 1:37:56 AM2/4/10

to mongo...@googlegroups.com

It certainly is a simple benchmark, although I don't know that I'd
call it naive. I have an in-house administration tool I'm currently
considering migrating from Postgres to Mongo and it performs exactly
these kinds of queries and the difference in response time of <1
second vs 60 seconds is definitely a dealbreaker.

The data are pretty simple. A typical record looks like this:
{ "_id" : { "$oid" : "4b69f7505f202e6a12001529" },
"last_ping_date" : "20100203", "last_auto_update" : "20081031143048",
"os_version" : "Fedora Core 9", "model" : "HP xw8600 Workstation",
"cpu_type" : "Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (8)",
"ram" : "8007", "mac_address" : "00:22:64:AD:A8:CD", "kickstart_args"
: [ "root:sda", "data:sdb", "snap:sdc", "scratch:yes", "arch:x86_64",
"console:X", "swap:12GB", "rootsize:160GB", "keepdata:yes" ],
"room_number" : "S157", "os_bitness" : "64", "owner" : "miles",
"disposition" : "repair", "os_name" : "Linux", "last_known_ip" :
"138.72.129.115", "description" : "miles' linux workstation",
"department" : "Systems", "likely_serial_number" : "2UA8120MVH",
"graphics_adapter" : "Quadro FX 3700", "vendor" : "Hewlett-Packard",
"dhcp_group" : "winwds", "graphics_driver_version" : "2.177.13-1",
"backups_disabled" : false, "machine_subclass" : "workstation",
"os_kernel_version" : "2.6.26.3-29.fc9.x86_64", "last_backup_date" :
"20080923", "swap_size" : "11444", "cpu_speed" : "2833",
"likely_owner" : "miles", "serial_number" : "2UA8120MVH",
"last_console_login" : "miles :0.0 Fri Oct 31 14:26", "machine_class"
: "UNIX", "ip_type" : "dynamic", "names" : [ "groovy", "m" ], "tags" :
[] }

I'll double-check tomorrow that MM is in fact using the mongo_ext lib
and if so see if I can get a profile trace to see if anything in
particular sticks out.

On Wed, Feb 3, 2010 at 9:29 PM, John Nunemaker <nune...@gmail.com> wrote:
> I have no clue what kind of data you have in there. Also, it seems like a
> pretty naive benchmark. Definitely there is overhead using MM instead of the
> pure ruby driver. There is overhead using AR instead of the mysql gem. You
> can't really compare them as it is all about what you get.
> I have not focused on performance. That said, the master branch is probably
> 2-3 times faster than 0.6.10 (the last release) as I applied a great patch
> from a helping hand. Also, we are currently using MM in production on an app
> and it is plenty fast and getting better.

--
miles
http://kuniklo.bandcamp.com

John Nunemaker

unread,

Feb 4, 2010, 9:42:32 AM2/4/10

to mongo...@googlegroups.com

MM just requires mongo which auto requires mongo_ext so if it is working for the mongo gem it is working for MM.

What I mean by naive is that it doesn't seem like a valid use case. When will you query everything in the database all at once? If you really are going to query through all of them, master has find_each which would be far more efficient. It grabs stuff in batches and goes through them rather than pulling down the entire dataset at once.

Also, you are testing both of them in the same script one right after another. Maybe that affects things. Why not use the ruby benchmark library? Did something else on your machine start working right when MM performed its test? There are a ton of factors when benchmarking (I don't claim to know them all). All I'm saying is a single run, through some dataset, testing only one query seems naive.

--
miles
http://kuniklo.bandcamp.com

--

Miles Egan

unread,

Feb 4, 2010, 11:02:07 AM2/4/10

to mongo...@googlegroups.com

This is actually a real-life use case extracted from an application
we've had in production for years with an ldap backend. The only
difference is that the real production code outputs the results to an
html table or json array.

I ran the test again with profiling enabled. Here's the result for the
queries using the ruby bindings directly:
Thread ID: 12055792
Total: 1.220000

%self total self wait child calls name
14.75 1.16 0.18 0.00 0.98 6
Mongo::Connection#read_documents
11.48 0.14 0.14 0.00 0.00 11577 String#unpack
11.48 0.23 0.14 0.00 0.09 2876
<Module::CBson>#deserialize
9.02 0.11 0.11 0.00 0.00 2882 Array#fast_pack
7.38 0.12 0.09 0.00 0.03 5814 ByteBuffer#put_array
6.56 0.08 0.08 0.00 0.00 103318 Hash#[]=
4.92 0.48 0.06 0.00 0.42 2876
<Class::BSON_C>#deserialize
4.92 0.11 0.06 0.00 0.05 5764
Mongo::Connection#receive_message_on_socket
2.46 0.03 0.03 0.00 0.00 2884 ByteBuffer#to_a
2.46 0.03 0.03 0.00 0.00 5814 Array#[]=
2.46 0.03 0.03 0.00 0.00 17484
ActiveSupport::CoreExtensions::String::OutputSafety#html_safe?
1.64 0.13 0.02 0.00 0.11 2882 ByteBuffer#to_s
1.64 0.02 0.02 0.00 0.00 5764 BasicSocket#recv
1.64 1.19 0.02 0.00 1.17 2876 Mongo::Cursor#more?
1.64 0.02 0.02 0.00 0.00 5778 ByteBuffer#initialize

and here's the result for the MM queries:
Thread ID: 10691828
Total: 110.100000

%self total self wait child calls name
51.48 57.32 56.68 0.00 0.64 100440 Module#module_eval
3.71 21.42 4.09 0.00 17.33 103315
MongoMapper::Dirty#write_attribute
3.04 81.12 3.35 0.00 77.77 100440
MongoMapper::EmbeddedDocument::ClassMethods#key
2.43 4.48 2.68 0.00 1.80 200880
MongoMapper::EmbeddedDocument::ClassMethods#accessors_module
2.06 6.38 2.27 0.00 4.11 109065
MongoMapper::EmbeddedDocument::InstanceMethods#read_attribute
1.94 4.24 2.14 0.00 2.10 315695
MongoMapper::EmbeddedDocument::InstanceMethods#_keys
1.50 4.96 1.65 0.00 3.31 100440
MongoMapper::Key#initialize
1.49 2.38 1.64 0.00 0.74 100440
MongoMapper::EmbeddedDocument::ClassMethods#create_validations_for
1.34 84.42 1.47 0.00 82.95 100440
MongoMapper::EmbeddedDocument::InstanceMethods#ensure_key_exists
1.24 1.37 1.37 0.00 0.00 416135 Object#metaclass
1.23 4.76 1.35 0.00 3.41 103315
MongoMapper::Dirty#value_changed?
1.15 64.56 1.27 0.00 63.29 100440
MongoMapper::EmbeddedDocument::ClassMethods#create_accessors_for
1.04 3.11 1.14 0.00 1.97 103315
MongoMapper::EmbeddedDocument::InstanceMethods#write_attribute
1.00 1.10 1.10 0.00 0.00 200880 Kernel#method
0.94 1.04 1.04 0.00 0.00 103315 NilClass#duplicable?

The only call I see to module_eval anywhere in the code is in the
EmbeddedDocument module, in the code that's creating accessor
functions for each of the keys in the loaded records. If I run that
with some debugging enabled it appears that this function is creating
an accessor function for each attribute of each loaded record, which
is, unsurprisingly, slow.

I'm not sure what an easy optimization might be here though. Maybe you
could hook method_missing instead of explicitly constructing each
function given that most of these accessor functions in typical usage
won't ever be called?

On Thu, Feb 4, 2010 at 6:42 AM, John Nunemaker <nune...@gmail.com> wrote:
> What I mean by naive is that it doesn't seem like a valid use case. When
> will you query everything in the database all at once? If you really are
> going to query through all of them, master has find_each which would be far
> more efficient. It grabs stuff in batches and goes through them rather than
> pulling down the entire dataset at once.

--
miles
http://kuniklo.bandcamp.com

Kurt Mackey

unread,

Feb 4, 2010, 11:25:25 AM2/4/10

to mongo...@googlegroups.com

Returning that many documents that are composed completely of dynamic keys seems like a really fringe thing to be doing with MongoMapper.

Try creating an actual document with all (or most) of your keys defined ahead of time and see how that helps.

miles
http://kuniklo.bandcamp.com

Miles Egan

unread,

Feb 4, 2010, 11:40:43 AM2/4/10

to mongo...@googlegroups.com

Good suggestion. If I pre-define my keys I get this new profile:
Thread ID: 12897500
Total: 21.750000

%self total self wait child calls name

4.97 2.50 1.08 0.00 1.42 103315
MongoMapper::Key#initialize
4.87 8.80 1.06 0.00 7.74 103315
MongoMapper::Dirty#write_attribute
4.14 0.90 0.90 0.00 0.00 419010 Object#metaclass
4.09 8.99 0.89 0.00 8.10 103315
MongoMapper::EmbeddedDocument::ClassMethods#key
4.05 1.28 0.88 0.00 0.40 103315
MongoMapper::EmbeddedDocument::ClassMethods#create_validations_for
3.95 2.09 0.86 0.00 1.23 315695
MongoMapper::EmbeddedDocument::InstanceMethods#_keys
3.82 1.19 0.83 0.00 0.36 209690
HashWithIndifferentAccess#convert_value
3.59 2.24 0.78 0.00 1.46 419012
MongoMapper::EmbeddedDocument::ClassMethods#keys
3.17 2.34 0.69 0.00 1.65 103315
MongoMapper::Dirty#value_changed?
2.90 2.59 0.63 0.00 1.96 109065
MongoMapper::EmbeddedDocument::InstanceMethods#read_attribute
2.76 1.92 0.60 0.00 1.32 103315
MongoMapper::EmbeddedDocument::InstanceMethods#write_attribute
2.57 0.81 0.56 0.00 0.25 109065 MongoMapper::Key#get
2.48 0.72 0.54 0.00 0.18 209691
HashWithIndifferentAccess#convert_key
2.39 0.52 0.52 0.00 0.00 625680 Hash#default

So the overhead of dynamically constructing the keys seems to be gone.
This is still a factor of about 20 slower than the plain ruby driver,
but now there's not a single bottleneck that really sticks out.

On Thu, Feb 4, 2010 at 8:25 AM, Kurt Mackey <ku...@mubble.net> wrote:
> Returning that many documents that are composed completely of dynamic keys
> seems like a really fringe thing to be doing with MongoMapper.
> Try creating an actual document with all (or most) of your keys defined
> ahead of time and see how that helps.

--
miles
http://kuniklo.bandcamp.com

Josh Szmajda

unread,

Feb 4, 2010, 11:50:51 AM2/4/10

to mongo...@googlegroups.com

I think examining performance is a great thing, but you have to remember we're not using ruby because it's the fastest way to do something under the hood, we're using it because it's fast for us to create useful systems. The same concept applies here to MM vs. straight Mongo---MM will make your job easier and enable you to write useful systems more quickly than using straight Mongo all the time.

But again, I'm all for making MM better under the hood. Saving on machine costs is always a good thing ;)

-Josh

John Nunemaker

unread,

Feb 4, 2010, 11:54:13 AM2/4/10

to mongo...@googlegroups.com

Try the tests against master. Master has some tweaks that make it better than 0.6.10.

Miles Egan

unread,

Feb 4, 2010, 11:56:21 AM2/4/10

to mongo...@googlegroups.com

Of course. In order to make these kinds of decisions we have to
understand the costs and benefits, right?

In this particular application, an overhead of 20x is more than I can
afford to pay for all the nice features MM offers. However, if I can
shave that down to 2-3x by either using MM in a more efficient way or
by optimizing something in MM than I'd love to be able to use it.

On Thu, Feb 4, 2010 at 8:50 AM, Josh Szmajda <jos...@gmail.com> wrote:
> I think examining performance is a great thing, but you have to remember
> we're not using ruby because it's the fastest way to do something under the
> hood, we're using it because it's fast for us to create useful systems. The
> same concept applies here to MM vs. straight Mongo---MM will make your job
> easier and enable you to write useful systems more quickly than using
> straight Mongo all the time.
>
> But again, I'm all for making MM better under the hood. Saving on machine
> costs is always a good thing ;)

--
miles
http://kuniklo.bandcamp.com

Miles Egan

unread,

Feb 4, 2010, 12:10:24 PM2/4/10

to mongo...@googlegroups.com

Here's the profile with a fresh git:

Thread ID: 15054060
Total: 9.750000

%self total self wait child calls name

9.33 7.26 0.91 0.00 6.35 103315
MongoMapper::Plugins::Dirty::InstanceMethods#write_key
8.41 1.28 0.82 0.00 0.46 312820
MongoMapper::Plugins::Keys::InstanceMethods#keys
6.36 1.72 0.62 0.00 1.10 103315
MongoMapper::Plugins::Keys::InstanceMethods#write_key
5.74 1.76 0.56 0.00 1.20 103315
MongoMapper::Plugins::Dirty::InstanceMethods#value_changed?
4.51 8.11 0.44 0.00 7.67 2877 Hash#each_pair
4.41 1.91 0.43 0.00 1.48 106190
MongoMapper::Plugins::Keys::InstanceMethods#read_key
4.00 0.76 0.39 0.00 0.37 106190
MongoMapper::Plugins::Keys::Key#get
3.69 0.36 0.36 0.00 0.00 312854
MongoMapper::Plugins::Keys::ClassMethods#keys
3.18 0.31 0.31 0.00 0.00 284149 NilClass#nil?
3.08 0.60 0.30 0.00 0.30 103315
MongoMapper::Plugins::Keys::Key#number?
3.08 1.20 0.30 0.00 0.90 6
Mongo::Connection#read_documents

So HEAD appears to be about ~2x faster than 0.6.10, which is a pretty
nice improvement. We're at about an overhead of 10x vs the plain ruby
driver here.

It looks to me as if a lot of this is incurred by the machinery around
attribute assignment. Maybe there's a way to bypass some of this on
the initial load of a doc? Does the whole attribute assignment stack
need to be invoked even when loading a fresh object?

On Thu, Feb 4, 2010 at 8:54 AM, John Nunemaker <nune...@gmail.com> wrote:
> Try the tests against master. Master has some tweaks that make it better
> than 0.6.10.

--
miles
http://kuniklo.bandcamp.com

Kurt Mackey

unread,

Feb 4, 2010, 12:16:30 PM2/4/10

to mongo...@googlegroups.com

If I were doing the work your app is doing, I'd use MM for everything, then switch to the driver directly when you need to do stuff that doesn't perform as well as it needs to.

At some point I think it'll be worth digging in and making MM as fast as is feasible, but it seems way early to be doing that (to me).

miles
http://kuniklo.bandcamp.com

Miles Egan

unread,

Feb 4, 2010, 5:23:58 PM2/4/10

to mongo...@googlegroups.com

Maybe so. I agree that it's too early to start doing any serious
optimization work on MM but I don't think that means that speed should
be completely ignored.

That said, I compared ActiveRecord vs raw postgres queries on the same
dataset and in my limited testing it seems that AR is slower by a
factor of ~8x so at least in this case AR and the MM (@HEAD) are
pretty close provided the MM schema is defined.

On Thu, Feb 4, 2010 at 9:16 AM, Kurt Mackey <ku...@mubble.net> wrote:
> If I were doing the work your app is doing, I'd use MM for everything, then
> switch to the driver directly when you need to do stuff that doesn't perform
> as well as it needs to.
> At some point I think it'll be worth digging in and making MM as fast as is
> feasible, but it seems way early to be doing that (to me).

--
miles
http://kuniklo.bandcamp.com

Matthew Gibbons

unread,

Feb 5, 2010, 7:11:32 AM2/5/10

to mongo...@googlegroups.com

Keep in mind that both MM and AR are doing a lot of work for you that
the driver doesn't, so whilst the driver is a lot faster at raw
access, the results are not directly usable or even desirable in most
cases.

One thing that I have found using Mongodb (with either MongoMapper or
Mongoid) is that the data we have in an SQL database does not
necessarily map to the data stored in Mongodb when migrated, i.e. we
have needed to re-think our data layout, and in doing so, have
overcome some of the speed issues that we found.

Miles Egan

unread,

Feb 5, 2010, 11:38:53 AM2/5/10

to mongo...@googlegroups.com

On Fri, Feb 5, 2010 at 4:11 AM, Matthew Gibbons <mhgi...@gmail.com> wrote:
> Keep in mind that both MM and AR are doing a lot of work for you that
> the driver doesn't, so whilst the driver is a lot faster at raw
> access, the results are not directly usable or even desirable in most
> cases.

Of course.

> One thing that I have found using Mongodb (with either MongoMapper or
> Mongoid) is that the data we have in an SQL database does not
> necessarily map to the data stored in Mongodb when migrated, i.e. we
> have needed to re-think our data layout, and in doing so, have
> overcome some of the speed issues that we found.

That's a big part of the appeal of Mongo for me. A lot of the schema
I'm working with require a lot of joins and one-to-many relationships
that I think I could eliminate in a more document-centric datastore.

--
miles
http://kuniklo.bandcamp.com

Reply all

Reply to author

Forward