Cap'n Proto and Key/Value Stores

562 views
Skip to first unread message

emile.co...@gmail.com

unread,
Aug 1, 2014, 9:56:37 PM8/1/14
to capn...@googlegroups.com
Is there any reason why Cap'n Proto couldn't be used encode/decode blobs that are stored in key/value stores, such as LevelDB or Berkeley DB?

I was thinking of implementing my app's Data Access Layer as its own process using Cap'n Proto RPCs for the interface. Since all DB access is only done within this DAL process, I avoid DB concurrency problems. The encoding used for passing objects via RPC would be the very same object encoding I'd use for the key/value store. There'd be no need to serialize the objects a second time!

Would FlatArrayMessageReader and messageToFlatArray be the best tools for this kind of job?

Cheers,
Emile Cormier

Kenton Varda

unread,
Aug 1, 2014, 10:13:29 PM8/1/14
to emile.co...@gmail.com, capnproto
On Fri, Aug 1, 2014 at 6:56 PM, <emile.co...@gmail.com> wrote:
Is there any reason why Cap'n Proto couldn't be used encode/decode blobs that are stored in key/value stores, such as LevelDB or Berkeley DB?

That should work great!
 
Would FlatArrayMessageReader and messageToFlatArray be the best tools for this kind of job?

messageToFlatArray() will require an extra copy, since message memory is allocated in multiple segments which would then have to be concatenated. If your DB API requires a flat array then there may be no way around this, but if it can take an array of arrays, you may want to use MessageBuilder::getSegmentsForOutput() directly (and SegmentArrayMessageReader when reading back).

I have been tempted to develop a simple Cap'n-Proto-based key-value store which just maps directly to the filesystem. It feels silly to have these databases reinventing everything that the kernel VFS already does just fine.

-Kenton

David Yu

unread,
Aug 2, 2014, 12:25:30 AM8/2/14
to Kenton Varda, emile.co...@gmail.com, capnproto
On Sat, Aug 2, 2014 at 10:13 AM, Kenton Varda <ken...@sandstorm.io> wrote:
On Fri, Aug 1, 2014 at 6:56 PM, <emile.co...@gmail.com> wrote:
Is there any reason why Cap'n Proto couldn't be used encode/decode blobs that are stored in key/value stores, such as LevelDB or Berkeley DB?

That should work great!
 
Would FlatArrayMessageReader and messageToFlatArray be the best tools for this kind of job?

messageToFlatArray() will require an extra copy, since message memory is allocated in multiple segments which would then have to be concatenated. If your DB API requires a flat array then there may be no way around this,
Both the dbs mentioned require flat arrays (probably all datastores do).  I've been doing that with protobuf (random-access for non-bytestring fields using fixed* types and grouping them together, then generating accessor code for those)
but if it can take an array of arrays, you may want to use MessageBuilder::getSegmentsForOutput() directly (and SegmentArrayMessageReader when reading back).

I have been tempted to develop a simple Cap'n-Proto-based key-value store which just maps directly to the filesystem.
Or you can simply use symas lmdb for zero-copy reads (mmap-ed storage)  
It feels silly to have these databases reinventing everything that the kernel VFS already does just fine.

-Kenton

--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.



--
When the cat is away, the mouse is alone.
- David Yu

emile.co...@gmail.com

unread,
Aug 2, 2014, 11:38:57 AM8/2/14
to capn...@googlegroups.com, emile.co...@gmail.com
On Friday, August 1, 2014 11:13:29 PM UTC-3, Kenton Varda wrote:
messageToFlatArray() will require an extra copy, since message memory is allocated in multiple segments which would then have to be concatenated. If your DB API requires a flat array then there may be no way around this, but if it can take an array of arrays, you may want to use MessageBuilder::getSegmentsForOutput() directly (and SegmentArrayMessageReader when reading back).

I could perhaps store the segments under different "sub keys", but that seems rather messy. I'm not concerned about performance, as the key/value store would only be used to persist app settings that are occasionally changed.

I have been tempted to develop a simple Cap'n-Proto-based key-value store which just maps directly to the filesystem. It feels silly to have these databases reinventing everything that the kernel VFS already does just fine.

This remark made me consider using the filesystem as a simple key/value store. But the filesystem approach has no transactions or batch writes, so the DB could become inconsistent in the event of a crash or power failure.

You might find this post amusing. This Unix & Linux StackExchange answer also suggested using the filesystem as a key/value store. I have not been found more information regarding the use of the filesystem as a key/value store (I've only searched casually, though).

Kenton Varda

unread,
Aug 2, 2014, 7:13:59 PM8/2/14
to emile.co...@gmail.com, capnproto
On Sat, Aug 2, 2014 at 8:38 AM, <emile.co...@gmail.com> wrote:
I could perhaps store the segments under different "sub keys", but that seems rather messy. I'm not concerned about performance, as the key/value store would only be used to persist app settings that are occasionally changed.

Then messageToFlatArray and FlatArrayMessageReader are what you want.
 
This remark made me consider using the filesystem as a simple key/value store. But the filesystem approach has no transactions or batch writes, so the DB could become inconsistent in the event of a crash or power failure.

To solve this you just need some sort of a journal or file naming scheme that allows you to tell, after reboot, what was going on at the time of the failure, and either complete or roll back the transaction. (Hmm, I wonder if it is possible to hook into the filesystem's own journal...)

I would think that the biggest problem with using the filesystem is that most filesystems allocate space in 4k chunks, so small values may waste a lot of space. You could come up with schemes to deal with that, of course. Hmm, I wonder if it would make sense to encode small values into the targets of symlinks -- you should be able to get at least PATH_MAX * 7 bits in there, and PATH_MAX on Linux is 4096, so that's 3584 bytes, which is close enough that you can just switch to files for larger values.

All that said, I'm no DB expert, so take what I say with a grain of salt. :)

-Kenton

emile.co...@gmail.com

unread,
Aug 3, 2014, 10:18:49 AM8/3/14
to capn...@googlegroups.com, emile.co...@gmail.com
On Saturday, August 2, 2014 8:13:59 PM UTC-3, Kenton Varda wrote:

To solve this you just need some sort of a journal or file naming scheme that allows you to tell, after reboot, what was going on at the time of the failure, and either complete or roll back the transaction. (Hmm, I wonder if it is possible to hook into the filesystem's own journal...)

I would think that the biggest problem with using the filesystem is that most filesystems allocate space in 4k chunks, so small values may waste a lot of space. You could come up with schemes to deal with that, of course. Hmm, I wonder if it would make sense to encode small values into the targets of symlinks -- you should be able to get at least PATH_MAX * 7 bits in there, and PATH_MAX on Linux is 4096, so that's 3584 bytes, which is close enough that you can just switch to files for larger values.

All that said, I'm no DB expert, so take what I say with a grain of salt. :)


I think I'd rather just use a ready-made embeddedable key/value store, where they've already wrestled with all those details. Same as how I'd rather use a ready-made IPC/RPC communications library (such as Cap'n Proto) than roll up my own! :-)

Kenton Varda

unread,
Aug 4, 2014, 5:40:04 PM8/4/14
to emile.co...@gmail.com, capnproto
Totally understandable.

Maybe eventually I'll end up coding up my idea and then you'll be able to use it. :)

Though I've discovered the symlink trick for small files is only good for up to 60 bytes. After that the symlink allocates a whole 4k block on disk. Oh well.

-Kenton

--
Sandstorm.io is crowdfunding! http://igg.me/at/sandstorm


--
Reply all
Reply to author
Forward
0 new messages