Scalable and efficient document versioning?

2,899 views
Skip to first unread message

Voltron

unread,
Mar 11, 2011, 12:24:13 PM3/11/11
to mongodb-user
I would have to implement a scalable and efficient schema for document
versioning. At the moment, I intent to simply store older document
versions with timestamps in a document container. Are there other
schemas out there that would work well?

Thanks

Gates

unread,
Mar 11, 2011, 6:45:16 PM3/11/11
to mongodb-user
There are several common strategies for document versioning. The
strategy you select will depend on the trade-offs you want to make.

Please note that MongoDB does not have any support for triggers. So a
few of these methods will require that you be able to do multiple
writes.

===
Strategy 1: embed history
===
In theory, you can embed the history of a document inside of the
document itself. This can even be done atomically.

> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs.update( {_id: doc._id}, { $set : { text : 'New Text' }, $push : { hist : doc.text } } )
> db.docs.find()
{ "_id" : 1, "hist" : [ "Original Text" ], "text" : "New Text" }

For even more atomicity, you can use findAndModify:
> db.docs.findAndModify( {query:{_id:1}, update: { $set : { text : 'New Text' }, $push : { hist : "Original Text" } }, new : true} )

===
Strategy 2: write history to separate collection
===

> db.docs.save( { _id : 1, text : "Original Text" } )
> var doc = db.docs.findOne()
> db.docs_hist.insert ( { orig_id : doc._id, ts : Math.round((new Date()).getTime() / 1000), data : doc } )
> db.docs.update( {_id:doc._id}, { $set : { text : 'New Text' } } )

Here you'll see that I do two writes. One to the master collection and
one to the history collection.

To get fast history lookup, just grab the original ID:
> db.docs_hist.ensureIndex( { orig_id : 1, ts : 1 })
> db.docs_hist.find( { orig_id : 1 } ).sort( { ts : -1 } )

-----
Both strategies can be enhanced by storing only the diffs.

There are several small variants here, but they tend to fall under one
of these two themes: embed or separate collection.

Embedding:
+ atomic change (especially with findAndModify)
- can result in large documents, may break the 16MB limit
- probably have to enhance code to avoid returning full hist when not
necessary

Separate collection:
+ easier to write queries
- not atomic, needs two operations and Mongo doesn't have
transactions
- more storage space (extra indexes on original docs)

Hopefully that gets you working in the right direction.

- Gates

Voltron

unread,
Mar 14, 2011, 4:43:09 AM3/14/11
to mongodb-user
Great! Thanks Gates

Bhaskar Pathak

unread,
Feb 15, 2016, 2:35:11 PM2/15/16
to mongodb-user, nhy...@googlemail.com
Thanks for the solution 
these two are good approaches for maintaining history,
but being from Relational DB background instinct also think to apply trigger approach since trigger is not supported in MONGODB.

Is there any way we can achieve trigger's  functionality of MongoDB .

Stephen Steneker

unread,
Feb 15, 2016, 4:17:33 PM2/15/16
to mongodb-user, nhy...@googlemail.com
Hi Bhaskar,

I'm not sure if your question is related to triggers, document versioning, or both .. but please start a new discussion thread with details relevant to your actual environment and use case.

Your version of MongoDB and the type of deployment (standalone, replica set, or sharded cluster) would be helpful.

Thanks,
Stephen
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages