Avoiding data corruption with MongoDB's inplace updates design

348 views
Skip to first unread message

Deepak Kumar

unread,
Jul 8, 2014, 8:30:32 AM7/8/14
to mongod...@googlegroups.com
Hi,

I am planning to use MongoDB for one of my projects. I've been reading about MongoDB as well as some of the other document store NoSQL databases available especially CouchDB. I see a ton of posts on the net talking how CouchDB is more safer for data storage with its MVCC based design than mongodb and I am trying to understand how MongoDB users can achieve this data safety/durability without any corruption, partially written data or data loss?

From  http://www.quora.com/How-does-MongoDB-compare-to-CouchDB-What-are-the-advantages-and-disadvantages-of-each/answer/Riyad-Kalla "Mongo does in-place updates to its data on-disk, CouchDB employs a Copy-on-Write/Append-only design almost to a fault. It makes it insanely resilient at the expense of fsync'ing everything and constantly recopying portions of the data file on document mutations (B+ path updates). Because of this core design Couch will always (by design) in a single-server capacity, be safer than Mongo. Because of Mongo's in-place edits there is always the chance you could corrupt old data."

From http://www.chrisallnutt.com/2011/12/08/why-i-choose-couchdb-over-mongodb/ "In short, if your server crashes, someone trips over the powercord, etc… MongoDB is more than likely going to lose data, while CouchDB will wake back up, see what it was last doing and will lose only the write that was happening at the time of failure  The difference is that Couch will still have the older copy whereas Mongo *might* have partly overwritten it. "

Could somebody please comment on how MongoDB user can avoid such data corruption and cases of partially written data either in journal or in the data files? Does mongodb have any checksum or any other validation internally to catch for corruption errors? Can these issues happen even with using journalled acknowledged write concern?

I am sure MongoDB developer team would have considered these issues -- could you please elaborate on if/how these have been addressed or do they continue to be present?

Thanks,

Deepak


Deepak Kumar

unread,
Jul 8, 2014, 9:15:58 AM7/8/14
to mongod...@googlegroups.com
From one more couchdb post http://wiki.apache.org/couchdb/Technical%20OverviewDocument updates (add, edit, delete) are all or nothing, either succeeding entirely or failing completely. The database never contains partially saved or edited documents.

Could you please explain what kind of guarantees does MongoDB provides to avoid these issues?

Thanks,
Deepak

John Esmet

unread,
Jul 8, 2014, 10:54:32 AM7/8/14
to mongod...@googlegroups.com
Those comments are almost certainly wrong.

I didn't write the logging code, but I believe it works as follows:
- Log a bytestring that represents the in-place update, including where on disk and how many bytes.
- Apply the in-place update through the memory mapped file (at any time between bytes 0-N getting written / possibly flushed out, you could lose power and crash).
- Eventually, the data file sync does an fsync of the data file. If there is a confirmed success, truncate the log.
- On recovery, re-apply all updates still in the log. These operations are idempotent, so no data is lost.

Asya Kamsky

unread,
Jul 8, 2014, 4:36:18 PM7/8/14
to mongodb-user
You should probably ignore all answered based on versions from 2011 and 2012.  The client defaults were different then, earlier journaling may not have been on by default.   Even back then what was said in one of those posts was not quite correct (you can check the comments for some corrections).

Since 2.0, MongoDB employs a journal by default to guarantee that all writes are all or nothing.  You can read more about it here and you can even check the code.

To answer your question about how MongoDB user can avoid such data corruption, the answer is simple: 

               do NOT disable journaling.

Asya
P.S. and use replication - the most important thing about data IMHO is whether it's available to the application, not whether it exists on disk somewhere...



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/e8e5e0d3-4343-4f1a-9370-cb4747320929%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Abhi

unread,
Jul 9, 2014, 1:59:32 PM7/9/14
to mongod...@googlegroups.com
This looks an interesting comparison between CouchDB and MongoDB keeping aside performance comparison. This post http://www.chrisallnutt.com/2011/12/08/why-i-choose-couchdb-over-mongodb/ also compares them on similar lines. 

I am also curious to know how mongodb handles/avoids partially written data files or journal files. What happens when journaling is enable and mongodb crashes while writing to journal file(only partial update is written) and then mongodb comes back online , how does it handle this partially written update in journal file?

Thanks
Abhi

MARK CALLAGHAN

unread,
Jul 9, 2014, 2:48:33 PM7/9/14
to mongod...@googlegroups.com
AFAIK MongoDB doesn't protect against partial page writes. InnoDB has the doublewrite buffer for that. Protection isn't free so some workloads might not want to pay the cost -- it doubles the bytes written rate for database files, but does not double seeks from random page writes.

After reading comments in URL below I think a few people at 10gen/MongoDB are more optimistic about storage failures then I am. The question is whether the full post-image of the page is needed to recover from some partial page write failures. I think it is and the MongoDB journal doesn't have it.



For more options, visit https://groups.google.com/d/optout.



--
Mark Callaghan
mdca...@gmail.com

Deepak Kumar

unread,
Jul 10, 2014, 6:20:45 AM7/10/14
to mongod...@googlegroups.com
This is interesting but concerning as well. Could the MongoDB developers/experts comment on Mark's previous comments here? Specifically, is it possible that the journal may have partially written data in case of abrupt mongod crash -- this could potentially corrupt the data and cause inconsistent state. How can a user detect and recover from such a situation? Does mongodb have some kind of validation internally like checksum validation etc to ensure that the data is not corrupted/incomplete/partial?

Thanks,
Deepak

Asya Kamsky

unread,
Jul 14, 2014, 4:43:21 AM7/14/14
to mongodb-user
Abhi,

You are quoting an article from 2011 again (and one that you already mentioned) and it's (a) somewhat wrong (b) way out of date.  See the comments to the article - even back then it wasn't quite as accurate as it might have been.

As far as what happens with the journal, see my next response as well as the link that Mark provided which explains how the journal works:


Asya



On Wed, Jul 9, 2014 at 10:59 AM, Abhi <xab...@gmail.com> wrote:

Asya Kamsky

unread,
Jul 14, 2014, 4:47:14 AM7/14/14
to mongodb-user
The journal will only apply what are called commit groups - group of writes that guarantee that if all of them are applied, then the data files are consistent (i.e. all writes are complete - data files, indexes, oplog, etc).

When MongoDB crashes and then comes back up, it replays the journal entries in these groups.  If it comes to a group that's not complete (i.e. it was in the process of writing a commit group when it crashed) then it won't apply any of this group.  Since the journal writes are always flushed before the data file writes, I don't see how partial journal writes or data file writes could be left inconsistent.

Deepak asks:
Specifically, is it possible that the journal may have partially written data in case of abrupt mongod crash -- 
> this could potentially corrupt the data and cause inconsistent state.

The journal can only have incomplete/partial data written in the last commit group - if the group does not have a full header/footer/checksum then it won't be applied to the data files on restart/recovery.

Now, Mark's point that storage systems "fail" in ways that corrupts data is absolutely accurate - but the journal is not meant to protect against that - in fact, only having multiple copies of your data and ability to switch to another node and resync one with disk corruption can help here.  But it's true that currently in MongoDB one of the shortcomings is that the file corruption will generally only be found "by chance" - i.e. until you try to read some malformed BSON or follow a pointer that leads to "no where" you wouldn't know (at least from mongod) that something in your files got corrupted.

Asya



s.molinari

unread,
Jul 14, 2014, 7:36:16 AM7/14/14
to mongod...@googlegroups.com
Ehem.....

If, in the rare case MongoDB should crash and then comes back up, it replays the journal entries in these groups.......

..........I think sounds much better.;-) I am betting on it being true too!:-D

Scott

Asya Kamsky

unread,
Jul 14, 2014, 3:15:28 PM7/14/14
to mongodb-user
Scott,

Whether crashes are infrequent isn't really relevant, since DBAs are basically paid to worry about the least likely and most catastrophic events.  

If one is ready for the worst case scenario then one will be in best position to deal with it if it happens - and more importantly, to even notice/detect that it happened.

I've seen too many people just assume that everything is magically fine (regardless of what they do, or the application does or the hardware does) and then get really upset when they learn that everything *isn't* magically fine.  That's why I'd rather make sure that the people who are worried about worst-thing-that-can-happen understand the implications of various choices that they are making when deploying their application and DB.

I did a consult with a customer once who was running with journaling turned off - no particular reason, their application wasn't write-heavy, they were quite over-provisioned, but someone for some reason decided to run it that way.  When I pointed out the danger of it, they informed me that this server had been running continuously (had not even been restarted) for over two years...   As if that guarantees something for the future   :(

Asya, seen too many bad things happen to good data centers.



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

s.molinari

unread,
Jul 15, 2014, 12:22:44 AM7/15/14
to mongod...@googlegroups.com
Absolutely Asya. Though, your post sounded to me like it is a normal event for Mongo to crash. That just can't be the case. Can MongoDB crash? Certainly. Should it crash? Certainly not. Should one be prepared for a crash? Certainly. Should one constantly worry it will happen? Certainly not. :-) 



Scott

Asya Kamsky

unread,
Jul 15, 2014, 12:54:27 AM7/15/14
to mongodb-user
Well, what else are you going to worry about???

Asya



On Mon, Jul 14, 2014 at 9:22 PM, s.molinari <scottam...@googlemail.com> wrote:
Absolutely Asya. Though, your post sounded to me like it is a normal event for Mongo to crash. That just can't be the case. Can MongoDB crash? Certainly. Should it crash? Certainly not. Should one be prepared for a crash? Certainly. Should one constantly worry it will happen? Certainly not. :-) 



Scott

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

s.molinari

unread,
Jul 15, 2014, 1:32:10 AM7/15/14
to mongod...@googlegroups.com
Too many other things in life. LOL! Honestly though, my worries would be in knowing for sure our disaster recovery is going to work as planned. Shit is going to happen no matter what, although, with any professional system, it should be very, very rare. If I would worry about shit happening constantly, then I'd probably end up a paranoid schizophrenic.;) So my motto is, don't worry about shit happening, because it most likely will at one point. Worry more about being prepared for it.:D

Scott

Deepak Kumar

unread,
Jul 16, 2014, 6:08:47 AM7/16/14
to mongod...@googlegroups.com
Thanks for this explanation, Asya. Very helpful.

You wrote - "But it's true that currently in MongoDB one of the shortcomings is that the file corruption will generally only be found "by chance" - i.e. until you try to read some malformed BSON or follow a pointer that leads to "no where" you wouldn't know (at least from mongod) that something in your files got corrupted."

Is there any plan to improve this perhaps by adding some checksum validation etc? If there is any jira request for this already, could you please point me to it?

Thanks,
Deepak

MARK CALLAGHAN

unread,
Jul 16, 2014, 10:57:05 AM7/16/14
to mongod...@googlegroups.com
See https://jira.mongodb.org/browse/SERVER-1558

I think that with the storage engine API the solution might be to use a storage engine that already provides the feature.





For more options, visit https://groups.google.com/d/optout.



--
Mark Callaghan
mdca...@gmail.com

Asya Kamsky

unread,
Jul 16, 2014, 7:28:18 PM7/16/14
to mongodb-user

As Mark points out, with new storage API there will be different storage engine options, IMHO at that point the priority of various storage tickets will be reevaluated in the context of available options and ... relative popularities of different storage engine options.

Asya

Reply all
Reply to author
Forward
0 new messages