Best mode to handle encrypted files with random access

86 views
Skip to first unread message

lasl...@gmail.com

unread,
Oct 18, 2015, 12:20:59 PM10/18/15
to Crypto++ Users
Hi all:

   Never before have I worked in cryptography. Excuse me if the question is very long but i´m very lost.

   We have a new project in my company related to cryptography and would appreciate you give me advice.

   We have many customers who are going to save many files encrypted on our servers. The files are now unencrypted. We'll encrypt files and, from now on, read and change the content without completely decrypt.
   Each file can have its own password, but we believe that most customers will always use the same password.

   I'm thinking about putting a header to the file, as is done in https://www.aescrypt.com/aes_file_format.html. In this way we can have a new Initialization Vector ( if necessary ) for each compressed file and the hash of the password. 
   In the header we also will keep pointers to various sections of the document in which we have to read for some encrypted information and write some encrypted information on the fly. For this reason I need a semi-ramdom access.

   We have quite clear that we must use AES to encrypt files. We're not sure whether to use 128bit or 256bit. There  is really too much difference in safety between them (we need to consider speed) ?

   And the most important question is what may be the most secure mode, even assuming that an attacker could download all the files from the server?


Thank you in advance!

Jean-Pierre Münch

unread,
Oct 18, 2015, 3:06:59 PM10/18/15
to cryptop...@googlegroups.com
Am 18.10.2015 um 18:20 schrieb lasl...@gmail.com:
Hi all:

   Never before have I worked in cryptography. Excuse me if the question is very long but i´m very lost.

   We have a new project in my company related to cryptography and would appreciate you give me advice.

   We have many customers who are going to save many files encrypted on our servers. The files are now unencrypted. We'll encrypt files and, from now on, read and change the content without completely decrypt.
   Each file can have its own password, but we believe that most customers will always use the same password.

   I'm thinking about putting a header to the file, as is done in https://www.aescrypt.com/aes_file_format.html. In this way we can have a new Initialization Vector ( if necessary ) for each compressed file and the hash of the password.
The header format is actually looking surprisingly good. However I'd suggest only using 12 byte IVs (AES-GCM only needs a 12 byte IV) and replacing the header's "HMAC" with an AES-GCM tag. And I'd suggest authenticating the whole unencrypted header (extensions, version, ...) with the header's tag (GCM can handle associated data).
   In the header we also will keep pointers to various sections of the document in which we have to read for some encrypted information and write some encrypted information on the fly. For this reason I need a semi-ramdom access.
This actually is really tricky. Just to make sure: You need to be able securely read and write at any place in the stored file?
This will force you to re-authenticate the whole file (which can be done quite fast via this trick[1]) or to pull off some other trick. In your scenario I'd suggest dividing the file into several subsections. Each subsection will have it's own IV which is just the IV of the previous subsection incremented by one (this is safe with AES-GCM). And at the end of each subsection there will be an authentication tag for said section. This will give you a linear increase in size, but a linear decrease in computation time as you only need to consider a specific subsection when re-authenticating or verifying.


   We have quite clear that we must use AES to encrypt files. We're not sure whether to use 128bit or 256bit. There  is really too much difference in safety between them (we need to consider speed) ?
You need to consider two points here. a) Marketing. AES-256 with a 2^256 key space will sound much more impressive than AES-128 with a tiny 2^128 key space. b) Security period. If you want to keep the data confidential for a very long time (decades?) than you want to use AES-256. Otherwise AES-128 is just fine (and NSA approved for secret information) and will be a bit faster.


   And the most important question is what may be the most secure mode, even assuming that an attacker could download all the files from the server?
AES-GCM is the mode of choice in modern Cryptography. It is considered good style to use AES-GCM and will be significantly faster than most other modes (like AES-CTR+HMAC or AES-CBC+HMAC). Furthermore you may be able to exploit the underlying construction to be faster by multi threading via [1].

I hope this answered your questions, if there are any left you may ask here or over at Crypto.SE[2] although you may not make it too broad there.

BR

JPM

[1]: https://crypto.stackexchange.com/a/27468/23623
[2]: https://crypto.stackexchange.com/


Thank you in advance!
--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to cryptopp-user...@googlegroups.com.
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cryptopp-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

lasl...@gmail.com

unread,
Oct 19, 2015, 1:05:56 PM10/19/15
to Crypto++ Users
Hi Jean Pierre:

   Thank you very much for your answer!

   First of all: yes, I need read and write at any place in the stored file. I know this is a problem.

   I was reading some doc about AES-GCM, and the way you propose to work, and have some doubts/observations:
  • I understand that using 256bit AES, the GCM mode, and size of TAG = 96 bytes (the shortest recommended) will get a resulting file, 37.5% larger  than the original (regardless of the header).
Some of the files we will encrypt are about 1 GB or even 2GB, so we are speaking about 375MB for tags. This is correct?
  •  You say In your scenario I'd suggest dividing the file into several subsections: Where can I find the info about how to handle the subsection concept  for AES-GCM?
By the way, what do you think about using XTS as shown in the link? I see this option is not implemented in Crypto++...

  • If I want not to have TAG overhead I have two options AES-CTR+HMAC or AES-CBC+HMAC:
    • Pros:
      • File size will change only with header info.
      • Need to change HMAC every time file is changed.  A test to perform: how long it takes to compute HMAC of a 1GB file?
    • Cons: 
      • More slow encryption/decryption

Thank you very much for your help!

Jeffrey Walton

unread,
Oct 19, 2015, 1:15:18 PM10/19/15
to Crypto++ Users

  •  You say In your scenario I'd suggest dividing the file into several subsections: Where can I find the info about how to handle the subsection concept  for AES-GCM?
This is sometimes called "blocking" or "chunking"
By the way, what do you think about using XTS as shown in the link? I see this option is not implemented in Crypto++...

XTS mode lacks an authentication tag.

XTS is on the roadmap; see https://www.cryptopp.com/wiki/Roadmap .

Jeff

Jean-Pierre Münch

unread,
Oct 19, 2015, 2:27:37 PM10/19/15
to cryptop...@googlegroups.com
Am 19.10.2015 um 19:05 schrieb lasl...@gmail.com:
Hi Jean Pierre:

   Thank you very much for your answer!

   First of all: yes, I need read and write at any place in the stored file. I know this is a problem.

   I was reading some doc about AES-GCM, and the way you propose to work, and have some doubts/observations:
  • I understand that using 256bit AES, the GCM mode, and size of TAG = 96 bytes (the shortest recommended) will get a resulting file, 37.5% larger  than the original (regardless of the header).
I think you're getting something wrong here. The longest standard tag for GCM is 16 bytes (128 bits) and the shortest standard tag is 12 bytes (96 bits).

Some of the files we will encrypt are about 1 GB or even 2GB, so we are speaking about 375MB for tags. This is correct?
I don't know how you calculated those 375MB. Depending on how far you want to chunk you can have 16 byte tag overhead (for the whole files, only one chunk) or more if you want smaller chunks. Reasonable would be a sector-size alignment, meaning Data||Tag would be as large as one sector of your filesystem.  This would result in an increase of 3.2% (512-byte sectors) to 0.09% (16,384-byte sectors). Or you could just use some arbitrary unit like one Megabyte per chunk, resulting in a 0.0015% increase in storage needed.

  •  You say In your scenario I'd suggest dividing the file into several subsections: Where can I find the info about how to handle the subsection concept  for AES-GCM?
Chunking and Blocking (see Jeffrey's answer) are standard concepts. What I basically suggest is to divide the file into Header||Chunk1||Chunk2||Chunk3... where each Chunk has it's own tag and is fully independent from the others (besides the somewhat dependent IV).

Yes, you can't seek with AES-GCM and Crypto++, because you'd skip parts of the enciphered data, which you can't because the authentication would fail in this case.

By the way, what do you think about using XTS as shown in the link? I see this option is not implemented in Crypto++...
See Jeffrey's answer.


  • If I want not to have TAG overhead I have two options AES-CTR+HMAC or AES-CBC+HMAC:
CTR would be the better option here as it can be parallelized (supports seek() in Crypto++) and has less severe IV requirements.

    • Pros:
      • File size will change only with header info.
You can also get constant tag size "tag overhead" with AES-GCM by only using one chunk.

      • Need to change HMAC every time file is changed.  A test to perform: how long it takes to compute HMAC of a 1GB file?
With SHA-256, you'd roughly need 20 cycles / byte on such long messages with a C implementation on an Intel Core 2 Duo [1]. Meaning you may get something like 10 cycles / byte with ASM code on a modern CPU. HMAC is not significantly slower than plain hashing. This would mean you'd need 10^10 cycles for this - or equivalently - 5 seconds on a 2GHz CPU. Reading the whole file is likely to take longer.

BR

JPM

[1]: https://www.schneier.com/skein1.3.pdf

lasl...@gmail.com

unread,
Oct 20, 2015, 1:11:33 PM10/20/15
to Crypto++ Users
Hi:

        I have obviously misunderstand how the TAG works. I've been playing around with code and now I have it clear. Thank you for your help.



       About the concept of  "blocking" or "chunking":  my question was more focused to some sample code in which, starting an IV for the first block, shows me how to create the following IV incremented by one the previous IV.

       I'm reading the documentation and testing and thank you very much for your help.

Thank you very much!

Jean-Pierre Münch

unread,
Oct 20, 2015, 5:11:10 PM10/20/15
to cryptop...@googlegroups.com
Am 20.10.2015 um 19:11 schrieb lasl...@gmail.com:
Hi:

        I have obviously misunderstand how the TAG works. I've been playing around with code and now I have it clear. Thank you for your help.



       About the concept of  "blocking" or "chunking":  my question was more focused to some sample code in which, starting an IV for the first block, shows me how to create the following IV incremented by one the previous IV.
If your problem is only the change of the IV for the next chunk there's always "IncrementCounterByOne(byte*,unsigned int)" in Crypto++. You input your old IV IncrementCounterByOne(IV,12) and can feed IV directly into the next call to GCM.

The first IV may be all-0 or some random value (AutoSeededRandomPool, AutoSeededX917RNG).

BR

JPM

lasl...@gmail.com

unread,
Oct 21, 2015, 2:05:27 AM10/21/15
to Crypto++ Users

OK, I see.
But what if I want to get the IV of 123th block? Do I have to do it 123 times or there is an IncrementCounterByX(byte*,unsigned int, unsignet int ) function?

Jeffrey Walton

unread,
Oct 21, 2015, 3:45:20 AM10/21/15
to Crypto++ Users


On Wednesday, October 21, 2015 at 2:05:27 AM UTC-4, lasl...@gmail.com wrote:

OK, I see.
But what if I want to get the IV of 123th block? Do I have to do it 123 times or there is an IncrementCounterByX(byte*,unsigned int, unsignet int ) function?

You need to find the i-th block because your block cipher operates on blocks. So you need a floor function; maybe something like:

    size_t block = SaturatingSubtract(X, T::BLOCKSIZE - 1) / T::BLOCKSIZE;

Saturating arithmetic clamps at lower and upper values, so you don't have to worry about underflow or overflow. Plus, good compilers will produce fast code based on bit operations.

With the block in hand, you can add seek to it, and then add it to your counter for a decryption operation.

But for GCM mode, I don't believe its seekable. See https://groups.google.com/d/msg/cryptopp-users/UHlnZ8r-0Gc/yctuSeSn57gJ.

Jeff

Jean-Pierre Münch

unread,
Oct 21, 2015, 8:15:21 AM10/21/15
to cryptop...@googlegroups.com
Am 21.10.2015 um 08:05 schrieb lasl...@gmail.com:

OK, I see.
But what if I want to get the IV of 123th block? Do I have to do it 123 times or there is an IncrementCounterByX(byte*,unsigned int, unsignet int ) function?

The simplest solution is to just loop IncrementCounterByOne().
The more advanced solution would be to use the function CTR uses for this purpose (see below). You'd replace m_register[] with your input IV ad get your input into m_counterArray[] (in practice you'd use two pointers here).

BR

JPM

void CTR_ModePolicy::SeekToIteration(lword iterationCount)
{
    int carry=0;
    for (int i=BlockSize()-1; i>=0; i--)
    {
        unsigned int sum = m_register[i] + byte(iterationCount) + carry;
        m_counterArray[i] = (byte) sum;
        carry = sum >> 8;
        iterationCount >>= 8;

lasl...@gmail.com

unread,
Oct 21, 2015, 4:50:47 PM10/21/15
to Crypto++ Users
This is great!
 
  I have finally create my function based on  CTR_ModePolicy::SeekToIteration function. Here it is if anyone want to use it:

void SeekToIteration(unsigned long long iterationCount, char * str_original_IV, int i_size_IV, char * str_new_IV )
{
    int carry=0;
    for (int i=i_size_IV-1; i>=0; i--)
    {
        unsigned int sum = str_original_IV[i] + byte(iterationCount) + carry;
        str_new_IV[i] = (byte) sum;
        carry = sum >> 8;
        iterationCount >>= 8;
    }
}

   I have a question regarding the chunk or block size I have to divide the whole file. Is there any recomentación about it? Is 1 KB valid? or 4KB is better? Or 1 MB? Or there is not any limit or better value than other?

  Thank you very much to everyone who helped!
...

Jean-Pierre Münch

unread,
Oct 21, 2015, 5:10:34 PM10/21/15
to cryptop...@googlegroups.com
Am 21.10.2015 um 22:50 schrieb lasl...@gmail.com:
This is great!
 
  I have finally create my function based on  CTR_ModePolicy::SeekToIteration function. Here it is if anyone want to use it:

void SeekToIteration(unsigned long long iterationCount, char * str_original_IV, int i_size_IV, char * str_new_IV )
{
    int carry=0;
    for (int i=i_size_IV-1; i>=0; i--)
    {
        unsigned int sum = str_original_IV[i] + byte(iterationCount) + carry;
        str_new_IV[i] = (byte) sum;
        carry = sum >> 8;
        iterationCount >>= 8;
    }
}

I'd suggest using byte* instead of char* but the rest is fine and maybe use size_t for the size of the IV.


   I have a question regarding the chunk or block size I have to divide the whole file. Is there any recomentación about it? Is 1 KB valid? or 4KB is better? Or 1 MB? Or there is not any limit or better value than other?
I'd say there's a lower limit. You really don't want to go below the sector size of your file system (and you want to align the chunks with the sectors of your file system).

The actual choice is really up to you and the needs of your application. If you usually only read chunks of a few Megabytes, 1 tag per MB is fine. If basically could run any tiny bit of the file, choose the sector size. And of course I can't judge how much size increase you can afford.  The size increase will be smaller with larger blocks, but if you only ever read large blocks this may be just fine.

From a security standpoint there's an upper bound. You don't want to use the same IV/key pair for more than 2^39-256 bits (or 2^36-32 bytes which is roughly 64GB), but I guess you won't hit that limit...

BR

JPM

lasl...@gmail.com

unread,
Oct 22, 2015, 12:30:47 PM10/22/15
to Crypto++ Users
Thank you very much!
I think I have all I need.
...
Reply all
Reply to author
Forward
0 new messages