Log file in blocks

86 views
Skip to first unread message

Lucas Lersch

unread,
Apr 8, 2016, 10:51:35 AM4/8/16
to leveldb
Hi,

this is probably a basic question, but the documentation says: "The log file contents are a sequence of 32KB blocks.  The only exception is that the tail of the file may contain a partial block". Why exactly is it organized as 32KB blocks? In other words, why is the block organization useful? Can't I just append log entries in the following format?

entry :=
checksum: uint32 // crc32c of type and data[] ; little-endian
        sequence: fixed64
        count: fixed32
        data: record[count]

 record :=  kTypeValue varstring varstring      |     kTypeDeletion varstring

 varstring :=
    len: varint32
    data: uint8[len]

Best regards.

Robert Escriva

unread,
Apr 8, 2016, 10:53:23 AM4/8/16
to lev...@googlegroups.com
The block format means that corruption early in a file does not damage
the entire file. You can simply seek forward 32KB at a time until you
find a valid place to resume parsing.

-Robert
> --
> You received this message because you are subscribed to the Google Groups
> "leveldb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to leveldb+u...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Lucas Lersch

unread,
Apr 8, 2016, 11:09:31 AM4/8/16
to lev...@googlegroups.com
Thanks for the answer. I get it. But in case you have a system failure and need to rebuild based on the log file, if there is a corruption early in the file and you just seek forward to the next block, you lose all the updated in the first block. Putting in other words, why is a corruption in the log file not treated as something critical? Why can you just ignore it and keep going?

You received this message because you are subscribed to a topic in the Google Groups "leveldb" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/leveldb/-5iAL3Fr8i0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to leveldb+u...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Lucas Lersch

Dhruba Borthakur

unread,
Apr 9, 2016, 4:05:36 AM4/9/16
to lev...@googlegroups.com
This exact problem has caused us some pain too earlier. We enhanced this default behaviour of leveldb to be more flexible here:


There were some use-cases that were ok with the default leveldb recovery mode (which skips over corruptions in the transaction log), but there were other use-cases that needed the database open to fail even if there is a single corruption in the transaction log.

enum class WALRecoveryMode : char {
// Original levelDB recovery
// We tolerate incomplete record in trailing data on all logs
// Use case : This is legacy behavior (default)
kTolerateCorruptedTailRecords = 0x00,
// Recover from clean shutdown
// We don't expect to find any corruption in the WAL
// Use case : This is ideal for unit tests and rare applications that
// can require high consistency guarantee
kAbsoluteConsistency = 0x01,
// Recover to point-in-time consistency
// We stop the WAL playback on discovering WAL inconsistency
// Use case : Ideal for systems that have disk controller cache like
// hard disk, SSD without super capacitor that store related data
kPointInTimeRecovery = 0x02,
// Recovery after a disaster
// We ignore any corruption in the WAL and try to salvage as much data as
// possible
// Use case : Ideal for last ditch effort to recover data or systems that
// operate with low grade unrelated data
kSkipAnyCorruptedRecords = 0x03,
};

Subscribe to my posts at http://www.facebook.com/dhruba

Lucas Lersch

unread,
Apr 11, 2016, 9:00:16 AM4/11/16
to lev...@googlegroups.com
Thanks, that was very elucidative. I am taking a look at both leveldb and rocksdb source code, unfortunately I do not have a facebook account to participate in rocksdb discussion group. Anyway, it is cool to see that you guys are still active and improving the code :)

MARK CALLAGHAN

unread,
Apr 11, 2016, 9:23:05 AM4/11/16
to lev...@googlegroups.com
We are happy to discuss RocksDB via email at https://groups.google.com/forum/#!forum/rocksdb
Mark Callaghan
mdca...@gmail.com
Reply all
Reply to author
Forward
0 new messages