I just checked with the team; they want to keep the source private until they’ve implemented the remaining features (probably June) but it’s OK to talk about it.
ForestDB is a persistent key-value storage library, roughly similar in features to Tokyo Cabinet or Berkeley DB. It’s a key-to-value map where the keys and values are binary blobs. You can get and set values and enumerate the keys in (lexicographic) order. It also has a parallel by-sequence index that records the order in which values were updated.
The data format is, at a high level, somewhat like CouchDB’s (or Couchbase Server’s.) It’s strictly append-only: new values are written at the end of the file, and any modified b-tree nodes are also written to the end, and then finally a header [sic] is added to mark a commit. This type of file is less space-efficient than a traditional database but extremely robust and supports very fast write speeds. It also has the nice property that writing to the file doesn’t interrupt readers; in fact a reader can keep using its own snapshot of the database as long as it wants to.
(If you want to look at this sort of thing in more detail today, the closest equivalent is the
CouchStore library that Couchbase Server uses. It’s a re-implementation of CouchDB’s storage manager, ported from Erlang to C for speed and then modified further from there.)
What’s new in ForestDB is a clever data structure that’s a combination of a b-tree and a trie. This makes the tree nodes a lot more compact and speeds up lookups. It’s also careful about how it aligns its data with the filesystem’s page size, which supposedly makes it more efficient on SSDs, and it’s got an efficient in-memory page cache.
I’ve been working part-time for a month or so on moving Couchbase Lite/iOS on top of ForestDB. I’ve got it running and passing the unit tests. Some basic benchmarks show it from 50% to 300% faster on various tasks (the biggest speedup is in view queries.) I’m working on adding more concurrency using GCD which should speed it up even more. While I’m doing this I’m also changing the way documents are stored; instead of every revision being a separate database row, the entire document is stored together, with the revisions encoded inside it. That gives better locality-of-reference and simplifies the code too. (Also, the new document/revision storage is written in C so it can be reused on Android.)
This is still an investigation; so no promises and no timelines. But it’s looking very good so far.
—Jens