MongoMapper and document size

64 views
Skip to first unread message

Justin Dossey

unread,
May 3, 2012, 6:27:26 PM5/3/12
to mongo...@googlegroups.com
Hi all,

After more poking around in the MM source code, I wonder whether there
is some room to improve efficiency of document storage in MongoDB.

The short answer is that if a field has not been changed from null,
zero, or the empty string, it should not be written to the database.
This saves memory on the DB server by avoiding storage for the key
name as well as preventing waste on the null or zero value.

Saving an explicit "null" to MongoDB is also bad because you can't
apply atomic modifiers like $inc or $push to keys that are set to
null.

Let's make a hypothetical model: you're storing election results by
precinct for a whole country in MongoDB. There are several dozen
candidates running in this election. Your boneheaded programmer
decided to set up the model as follows:

class PrecinctVotes
include MongoMapper::Document
key :precinct_id, Integer, :required => true
key :candidate_1_name, String
key :candidate_1_votes, Integer, :default => 0
key :candidate_100_name, String,
key :candidate_100_votes, Integer, :default = >0
end

It is a terrible model, but it's Election Day and the boss won't let
you rewrite the vote counting system from scratch before polls open.

Not every election will have 100 candidates, of course-- this one only
has 48 of them. Name/Vote pairs for inactive candidate slots are
simply never set in the codebase.

Good (mongo shell output) for a hypothetical precinct with only one
vote for one candidate:
{_id: ObjectId('4d41eeb33678b716c62ff8b1'), precinct_id: 1234,
candidate_34_name: "Joe Blow", candidate_34_votes: 1}

The above document is compact in the sense that data which has not
been changed from default is not included, and null values are not
written to Mongo.

What MM actually produces for me in the above case:
{_id: ObjectId('4d41eeb33678b716c62ff8b1'), precinct_id: 1234,
candidate_1_name: null, candidate_1_votes: 0, ..., candidate_34_name:
"Joe Blow", candidate_34_votes: 1, ..., candidate_100_name: "Joe
Blow", candidate_100_votes: 0}

The above document is much larger, and most of the data is useless.

I wrote my own to_mongo to fix this kind of problem:
def to_mongo
my_attrs=attributes
my_attrs.each {|k,v| my_attrs.delete(k) if v.blank? or v == 0 }
my_attrs
end

Data size for the objects produced same line of code for me (with my
object, not the dumb one above) is 43 bytes with my to_mongo, vs. 153
bytes with the default. Multiply by thirty million and you have 3.3
GB of memory savings for a single collection.

I'd love to have a "compact" option for MongoMapper that doesn't save
useless data to MongoDB. What do y'all think?

Justin Dossey

John Nunemaker

unread,
May 4, 2012, 5:37:38 PM5/4/12
to mongo...@googlegroups.com
Doesn't the latest MM ignore keys with null values? I could have sworn that made it in.

Also, for something like you mentioned, I would have a candidates hash. key :candidates, Hash which would make the problem go away.


Justin Dossey

--
You received this message because you are subscribed to the Google
Groups "MongoMapper" group.
For more options, visit this group at
http://groups.google.com/group/mongomapper?hl=en?hl=en

Justin Dossey

unread,
May 4, 2012, 5:44:55 PM5/4/12
to mongo...@googlegroups.com
Darn it-- I am on MM 0.8.6 because I'm stuck with Rails 2 for the time being.
You're right that the hash is the correct design, of course.

Justin Dossey

John Nunemaker

unread,
May 4, 2012, 5:48:33 PM5/4/12
to mongo...@googlegroups.com
I think there is a branch for latest 0.8, maybe a pull request to that and a bug fix release? Just a thought.

Darren Schnare

unread,
May 5, 2012, 10:36:27 AM5/5/12
to mongo...@googlegroups.com
Speaking of key optimization, I recently read an article by Tilo Slaboda regarding key length and how to optimize it to reduce storage overhead. Could his work be incorporated somehow into MM? I think this would be a great feature to have. 
Reply all
Reply to author
Forward
0 new messages