[RavenDB] Document encryption

97 views
Skip to first unread message

Michael Davis

unread,
May 24, 2010, 6:08:55 PM5/24/10
to rav...@googlegroups.com
I've been looking to see if it might be possible to add a way to encrypt documents before they are stored on disk in the database (primarily in the context of an application that would use Raven's embedded mode).

So far, I think I've found the places in DocumentStorageActions where extension points would need to be added to allow the data to be encrypted before writing or decrypted after reading.  At this point, I'd like to get a discussion going to see if anyone else has any interest in such a feature or suggestions on how to implement it.

From what I've seen in the code so far, it looks like it should be possible to create a new plugin/trigger type that might have an interface something like this:

void OnEncode(string key, JObject document, JObject metadata, TransactionInformation transactionInformation, ref byte[] encodedDocument);
void OnDecode(string key, JObject metadata, TransactionInformation transactionInformation, ref byte[] decodedDocument);

The encodedDocument parameter of OnEncode() would initially contain the UTF-8-encoded JSON for the document, but each plugin could replace the data it receives with its own encoded/encrypted version.  Likewise, OnDecode() would reverse the process, outputting the UTF-8 encoded JSON in the decodedDocument parameter.

Presumably, there would need to be something stored in the metadata during encoding to list which encoders actually ran and in what order so that the decoders can be run in reverse to get the original data back.  Alternatively, we might only allow one encoder plugin to run on a document, in which case it could actually handle the UTF-8 encoding/decoding itself along with the encryption.  That might require some sort of configuration and/or metadata to tell Raven which encoder to use, though.

Here are some other things I've been thinking about with regards to a potential encryption feature/plugin:

- How would Raven get access to the encryption key(s)?  I'm open to suggestions as to what the best way to handle this might be.

- Encryption should probably be an opt-in feature.  Presumably this would be handled by defining a metadata field to indicate that a document should be encrypted.  If so, what should this metadata field contain?  Would it just be a simple flag, or could the metadata specify additional details (e.g., which encryption algorithm to use, information about the key used to encrypt it, etc.)?

- While thinking about the metadata, I was wondering how to specify such metadata for a document using the high-level client interface.  One way would be to put an attribute on a class that you wanted to encrypt.  Looking through the code, I don't see any built-in support for this, but it does look like the IDocumentSession.OnEntityConverted event would allow you to write your own code to generate metadata from attributes.  Still, it seems like this might be a common enough thing that it would be useful to provide a built-in feature for it, possibly by including an abstract base class for an attribute that the DocumentSession will check for when storing an entity and use it to generate metadata.  If anyone thinks this is a good idea (independent of the encryption feature), I'll gladly look into doing it and submitting a patch.

- I could also see a use for field-level encryption.  This could probably be done without any major changes to Raven by customizing the entity-to-JSON conversion with the support already provided by Json.NET, though doing it this way would make it impossible to have a useful index on those fields, I'd assume.

- I'm also curious about the possibility of encrypting the Lucene indexes, since encrypting the documents in the database doesn't help much if you can still see sensitive information in an index.  I noticed that there's a branch that appears to be for providing support for storing the indexes in ESENT along with the data, so unless that's been abandoned, that might offer a good starting point.  Otherwise, the only thing I've seen so far that might help with this is a patch for the Java version of Lucene to allow AES encryption: https://issues.apache.org/jira/browse/LUCENE-2228

If anyone has any suggestions or comments on any of this, please speak up.

Thanks,
Michael Davis

John Davidson

unread,
May 24, 2010, 6:53:49 PM5/24/10
to rav...@googlegroups.com
How would indexing work? Indexing doesn't happen until after the document is stored, but if the document is encrypted the index will not work, so it will not be possible to retrieve the document except by id.

John Davidson

Andrew Siemer

unread,
May 24, 2010, 6:16:59 PM5/24/10
to rav...@googlegroups.com, rav...@googlegroups.com
I am certainly for having native encryption support or at least a hook for encryption of the the data on the way in and out. I like the idea of encrypting the entire document but encrypting individual fields of the document would be useful too. 

Sent from my iPhone

Andrew Siemer

unread,
May 24, 2010, 8:46:01 PM5/24/10
to rav...@googlegroups.com, rav...@googlegroups.com
Encrypting a credit card number would be important. Perhaps a "select all from documents where ccnum equals encrypt(ccnum)" would work though that would only do for exact matches. 

Sent from my iPhone

Michael Davis

unread,
May 24, 2010, 8:57:24 PM5/24/10
to rav...@googlegroups.com
If it can be done the way I described, then the encryption/decryption code would plug in to the system at a very low level (just above the ESENT storage).  The indexer operates on the documents at a higher level (on the JSON representation), so the documents would be decrypted as part of the retrieval process when the indexer asks for them.

This does mean that the database probably has to have access to the encryption keys at any time, of course, and I'm still not sure what the best way to provide that access is (or even if it'd be the same for both the server and embedded modes).

Michael Davis

Nathan Palmer

unread,
May 24, 2010, 9:05:28 PM5/24/10
to rav...@googlegroups.com
There are some situations that you wouldn't even want the server to
have access to the encrypted data and only allow the user access to
the encrypted data through some type of public/private key
authentication system. In the more stricter PCI compliance scenario's
you have to have key management and rotation in order to store the
credit card numbers on your server.

Nathan Palmer

Ayende Rahien

unread,
May 25, 2010, 7:52:15 AM5/25/10
to rav...@googlegroups.com
I am not sure how useful that is.
I follow your logic, and that will certainly store the data securely on disk, but it would result in the server having to know the ecnryption/decryption keys.
This is only useful when you consider people trying to hack the data without having the server do this for them.
In most systems, it isn't that the data should be stored on disk encrypted, it is that even the DB shouldn't be able to decrypt it without help.
Nevertheless, I can certainly see this as being a useful extension point. Look for AbstractDocumentCodec class on the next push

Ayende Rahien

unread,
May 25, 2010, 7:52:28 AM5/25/10
to rav...@googlegroups.com
Agreed.

Michael Davis

unread,
May 25, 2010, 8:27:29 AM5/25/10
to rav...@googlegroups.com
I'm aware of the limitations of this kind of encryption, but I think it can still be useful in an embedded scenario like the one I'm most interested in right now, where there isn't a server running and theft of a laptop is a bigger concern than someone hacking into one.

Thanks for adding the extension point to enable this.

Ayende Rahien

unread,
May 25, 2010, 8:35:32 AM5/25/10
to rav...@googlegroups.com
It is on github now.
The problem is that the thief could just start the server and get the info out.
That is assuming that he has some skills, of course :-)
Reply all
Reply to author
Forward
0 new messages