Justin Dossey
unread,May 3, 2012, 6:27:26 PM5/3/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongo...@googlegroups.com
Hi all,
After more poking around in the MM source code, I wonder whether there
is some room to improve efficiency of document storage in MongoDB.
The short answer is that if a field has not been changed from null,
zero, or the empty string, it should not be written to the database.
This saves memory on the DB server by avoiding storage for the key
name as well as preventing waste on the null or zero value.
Saving an explicit "null" to MongoDB is also bad because you can't
apply atomic modifiers like $inc or $push to keys that are set to
null.
Let's make a hypothetical model: you're storing election results by
precinct for a whole country in MongoDB. There are several dozen
candidates running in this election. Your boneheaded programmer
decided to set up the model as follows:
class PrecinctVotes
include MongoMapper::Document
key :precinct_id, Integer, :required => true
key :candidate_1_name, String
key :candidate_1_votes, Integer, :default => 0
key :candidate_100_name, String,
key :candidate_100_votes, Integer, :default = >0
end
It is a terrible model, but it's Election Day and the boss won't let
you rewrite the vote counting system from scratch before polls open.
Not every election will have 100 candidates, of course-- this one only
has 48 of them. Name/Vote pairs for inactive candidate slots are
simply never set in the codebase.
Good (mongo shell output) for a hypothetical precinct with only one
vote for one candidate:
{_id: ObjectId('4d41eeb33678b716c62ff8b1'), precinct_id: 1234,
candidate_34_name: "Joe Blow", candidate_34_votes: 1}
The above document is compact in the sense that data which has not
been changed from default is not included, and null values are not
written to Mongo.
What MM actually produces for me in the above case:
{_id: ObjectId('4d41eeb33678b716c62ff8b1'), precinct_id: 1234,
candidate_1_name: null, candidate_1_votes: 0, ..., candidate_34_name:
"Joe Blow", candidate_34_votes: 1, ..., candidate_100_name: "Joe
Blow", candidate_100_votes: 0}
The above document is much larger, and most of the data is useless.
I wrote my own to_mongo to fix this kind of problem:
def to_mongo
my_attrs=attributes
my_attrs.each {|k,v| my_attrs.delete(k) if v.blank? or v == 0 }
my_attrs
end
Data size for the objects produced same line of code for me (with my
object, not the dumb one above) is 43 bytes with my to_mongo, vs. 153
bytes with the default. Multiply by thirty million and you have 3.3
GB of memory savings for a single collection.
I'd love to have a "compact" option for MongoMapper that doesn't save
useless data to MongoDB. What do y'all think?
Justin Dossey