I've been looking to see if it might be possible to add a way to encrypt documents before they are stored on disk in the database (primarily in the context of an application that would use Raven's embedded mode).
So far, I think I've found the places in DocumentStorageActions where extension points would need to be added to allow the data to be encrypted before writing or decrypted after reading. At this point, I'd like to get a discussion going to see if anyone else has any interest in such a feature or suggestions on how to implement it.
From what I've seen in the code so far, it looks like it should be possible to create a new plugin/trigger type that might have an interface something like this:
void OnEncode(string key, JObject document, JObject metadata, TransactionInformation transactionInformation, ref byte[] encodedDocument);
void OnDecode(string key, JObject metadata, TransactionInformation transactionInformation, ref byte[] decodedDocument);
The encodedDocument parameter of OnEncode() would initially contain the UTF-8-encoded JSON for the document, but each plugin could replace the data it receives with its own encoded/encrypted version. Likewise, OnDecode() would reverse the process, outputting the UTF-8 encoded JSON in the decodedDocument parameter.
Presumably, there would need to be something stored in the metadata during encoding to list which encoders actually ran and in what order so that the decoders can be run in reverse to get the original data back. Alternatively, we might only allow one encoder plugin to run on a document, in which case it could actually handle the UTF-8 encoding/decoding itself along with the encryption. That might require some sort of configuration and/or metadata to tell Raven which encoder to use, though.
Here are some other things I've been thinking about with regards to a potential encryption feature/plugin:
- How would Raven get access to the encryption key(s)? I'm open to suggestions as to what the best way to handle this might be.
- Encryption should probably be an opt-in feature. Presumably this would be handled by defining a metadata field to indicate that a document should be encrypted. If so, what should this metadata field contain? Would it just be a simple flag, or could the metadata specify additional details (e.g., which encryption algorithm to use, information about the key used to encrypt it, etc.)?
- While thinking about the metadata, I was wondering how to specify such metadata for a document using the high-level client interface. One way would be to put an attribute on a class that you wanted to encrypt. Looking through the code, I don't see any built-in support for this, but it does look like the IDocumentSession.OnEntityConverted event would allow you to write your own code to generate metadata from attributes. Still, it seems like this might be a common enough thing that it would be useful to provide a built-in feature for it, possibly by including an abstract base class for an attribute that the DocumentSession will check for when storing an entity and use it to generate metadata. If anyone thinks this is a good idea (independent of the encryption feature), I'll gladly look into doing it and submitting a patch.
- I could also see a use for field-level encryption. This could probably be done without any major changes to Raven by customizing the entity-to-JSON conversion with the support already provided by Json.NET, though doing it this way would make it impossible to have a useful index on those fields, I'd assume.
- I'm also curious about the possibility of encrypting the Lucene indexes, since encrypting the documents in the database doesn't help much if you can still see sensitive information in an index. I noticed that there's a branch that appears to be for providing support for storing the indexes in ESENT along with the data, so unless that's been abandoned, that might offer a good starting point. Otherwise, the only thing I've seen so far that might help with this is a patch for the Java version of Lucene to allow AES encryption:
https://issues.apache.org/jira/browse/LUCENE-2228
If anyone has any suggestions or comments on any of this, please speak up.
Thanks,
Michael Davis