Building Authenticated Encryption from CommonCrypto

Jeffrey Goldberg

unread,

May 31, 2012, 4:40:53 PM5/31/12

to

Apple's open source CommonCrypto library does not at present offer any
authenticated encryption modes. It does, however, provide AES in CBC
mode, but not (yet) Counter Mode. HMAC is available with SHA1, MD5,
SHA{256,384,512,224}.

What passes for documentation of CommonCrypto are the header files in

http://www.opensource.apple.com/source/CommonCrypto/CommonCrypto-55010/CommonCrypto/

For reasons too tedious to go into, I would prefer to just use
CommonCrypto instead of OpenSSL. So I am hoping that someone has already
build an Encrypt-then-MAC implementation on top of CommonCrypto.

Failing that, I'd like some pointers for what I would need to watch out
for if I wanted to develop my own. What I do know is that a
decryption/authentication failure should not leak where the failure
occurred, and so that even if the MAC verification fails, I'd need to do
the decryption anyway.

If it turns out that implementing this correctly is beyond my colleagues
and I, we will probably just have to use OpenSSL. But I'd like to know
what is involved before making that decision.

Cheers,

-j

--
Jeffrey Goldberg http://goldmark.org/jeff/
I rarely read HTML or poorly quoting posts
Reply-To address is valid

kg

unread,

May 31, 2012, 5:25:15 PM5/31/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>Apple's open source CommonCrypto library does not at present offer any
>authenticated encryption modes. It does, however, provide AES in CBC
>mode, but not (yet) Counter Mode. HMAC is available with SHA1, MD5,
>SHA{256,384,512,224}.
>
>What passes for documentation of CommonCrypto are the header files in
>
>http://www.opensource.apple.com/source/CommonCrypto/CommonCrypto-55010/CommonCrypto/

Seems straight-forward, but then I am clueless about implementation.

>Failing that, I'd like some pointers for what I would need to watch out
>for if I wanted to develop my own.

Straight-forward encrypt-then-mac is really straight-forward.

You should have two keys, one for AES-CBC and one for HMAC-SHAwhatever.

If you can't have two keys, you need to expand one key into two
keys. Given what's available, I'd probably just use AES-ECB with
fixed input blocks.

Once you have your keys, you encrypt first with AES-CBC.

Here you need to supply an unpredictable iv (random is good),
and probably you want to use PKCS7 padding. I couldn't see a
random source in the library, but (with the proviso that I am
clueless) I think arc4random(3) should be suitably failsafe.

Then you HMAC that ciphertext (make sure the iv is included!) and attach
the MAC tag to the ciphertext.

When you decrypt, you detach the putative MAC tag from the ciphertext,
recompute the MAC tag and compare. If different, stop. Otherwise,
decrypt using AES-CBC.

If decryption fails, send a defect report by e-mail directly to
the developers, because they have f***ed up.

> What I do know is that a
>decryption/authentication failure should not leak where the failure
>occurred, and so that even if the MAC verification fails, I'd need to do
>the decryption anyway.

In straight encrypt-then-mac, I don't see how this can be a problem.
If the MAC is decent, it should be hard for an adversary to produce a
ciphertext that will pass the MAC check and still fail decryption.

--
kg

Jeffrey Goldberg

unread,

May 31, 2012, 6:46:27 PM5/31/12

to

On 12-05-31 4:25 PM, kg wrote:

> Straight-forward encrypt-then-mac is really straight-forward.

Great. That is what I was hoping to hear.

> You should have two keys, one for AES-CBC and one for HMAC-SHAwhatever.

We know that. We've got that covered. We will just have a 64 byte key,
the first 32 will be for AES CBC (256bits) and the remainder for the
HMAC. (Yes, I know that 256-bit keys are overkill, but we'd rather do it
this way than having to answer a zillion questions about why we are only
using 128-bit keys).

> Once you have your keys, you encrypt first with AES-CBC.
>
> Here you need to supply an unpredictable iv (random is good),
> and probably you want to use PKCS7 padding.

And if I understand correctly, my using authenticated encryption we are
defending against attacks that use the padding. Although integrity can
be a concern for our application, the push to moving to AE is so that we
don't have to worry about things like those padding or other CCA attacks.

> I couldn't see a random source in the library,

It's there somewhere. We have cryptographically appropriate random numbers.

> Then you HMAC that ciphertext (make sure the iv is included!) and attach
> the MAC tag to the ciphertext.

Yep.

> When you decrypt, you detach the putative MAC tag from the ciphertext,
> recompute the MAC tag and compare. If different, stop. Otherwise,
> decrypt using AES-CBC.

> If decryption fails, send a defect report by e-mail directly to
> the developers, because they have f***ed up.

Well, I was thinking in terms of a padding error as a decryption failure.

>> What I do know is that a
>> decryption/authentication failure should not leak where the failure
>> occurred, and so that even if the MAC verification fails, I'd need to do
>> the decryption anyway.
>
> In straight encrypt-then-mac, I don't see how this can be a problem.

You are right. I was thinking of the various tricks needed if I were to
try some MAC-then-Encrypt scheme. Those aren't needed for Encrypt-then-MAC.

Again, thanks!

kg

unread,

Jun 1, 2012, 3:58:57 AM6/1/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>Well, I was thinking in terms of a padding error as a decryption failure.

There have been real attacks exploiting padding schemes. But if you use
encrypt-then-mac, a padding error in a ciphertext that passed the mac
verification really points to implementation mistakes, not attacks.

--
kg

rossum

unread,

Jun 1, 2012, 8:13:59 AM6/1/12

to

On Thu, 31 May 2012 15:40:53 -0500, Jeffrey Goldberg
<nob...@goldmark.org> wrote:

>It does, however, provide AES in CBC mode, but not (yet) Counter Mode.

It is simple enough to build Counter Mode if you have access to ECB
mode. All the other modes are built round ECB at some point.

rossum

Jeffrey Goldberg

unread,

Jun 8, 2012, 11:14:31 AM6/8/12

to

Thanks. I understand that.

I think that you may have read too much into what I meant when I talked
about what to do "when decryption fails". I am talking about
implementation errors, but they can be my own (for example, expecting a
different padding scheme then the one used to encrypt the data). I'm
just talking about cautious programming and not something that is more
directly part of the security of the scheme.

Cheers,

kg

unread,

Jun 8, 2012, 11:21:33 AM6/8/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>I am talking about
>implementation errors, but they can be my own (for example, expecting a
>different padding scheme then the one used to encrypt the data). I'm
>just talking about cautious programming and not something that is more
>directly part of the security of the scheme.

Such mistakes could give the adversary unexpected oracles.

Sometimes, you can design defensively by including information about the
cryptosystem in use when doing key derivation (this is fairly cheap). If
you mess up and encrypt with one system (say CBC + HMAC) and decrypt
with something else (CTR + HMAC), the keys you use won't match and the
ciphertext will be rejected.

I'm not saying this makes sense, but ...

--
kg

Jeffrey Goldberg

unread,

Jun 8, 2012, 12:46:50 PM6/8/12

to

On 12-06-08 10:21 AM, kg wrote:
> Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>> I am talking about
>> implementation errors, but they can be my own (for example, expecting a
>> different padding scheme then the one used to encrypt the data). I'm
>> just talking about cautious programming and not something that is more
>> directly part of the security of the scheme.
>
> Such mistakes could give the adversary unexpected oracles.

Yep. That's why I said "directly". I do want to avoid such mistakes.

>

> Sometimes, you can design defensively by including information about the
> cryptosystem in use when doing key derivation (this is fairly cheap). If
> you mess up and encrypt with one system (say CBC + HMAC) and decrypt
> with something else (CTR + HMAC), the keys you use won't match and the
> ciphertext will be rejected.

That's interesting. But I'd like to keep things nice and modular. I'm
not sure that I want to build something like that into key derivation.

It the moment I'm trying to work out a format that is roughly

header || ciphertext || authenticated-plaintext || tag

The header will contain

fixed-label || version || length-of-cdata || length-of-adata ||
length-of-tag

The ciphertext includes its IV.

Each distinct version will have specifications of algorithms, modes, key
sizes, etc for the cdata and MAC. The version will just be a number, but
it will allow us to roll out modifications and alternatives as needed.

the tag will be the (truncated) MAC(K_m, cdata || adata).

Given the libraries we are more or less stuck with (along with a public
commitment we made to move to 256-bit AES keys), the details for the
version look like

Encryption/Decryption

alg AES
block: 16 bytes
mode CBC
padding PKCS7
keysize: 256bits

MAC

HMAC-SHA256
K_e will be some 256-bit hash of a 128-bit random key (I need to
research this more)
Truncated to 16 bytes

These will all be stored in a MySQL database, indexed with a UUID (an
arbitrary unique ID). The adata may contain a copy of that UUID.

The reason that I'm regretting the 256 bit AES keys is that we have a
unique random key for each item which is encrypted with a master key.

Peter Fairbrother

unread,

Jun 8, 2012, 10:03:23 PM6/8/12

to

Jeffrey Goldberg wrote:

>
> It the moment I'm trying to work out a format that is roughly
>
> header || ciphertext || authenticated-plaintext || tag
>
> The header will contain
>
> fixed-label || version || length-of-cdata || length-of-adata ||
> length-of-tag

Does every message contain both ciphertext and authenticated-plaintext?
Is there any relationship between them?

> The ciphertext includes its IV.
>
> Each distinct version will have specifications of algorithms, modes, key
> sizes, etc for the cdata and MAC. The version will just be a number, but
> it will allow us to roll out modifications and alternatives as needed.

You may find it's better to include the actual cipher suite in use here,
rather than just a version number - how will an older version know what
suite a newer version uses?

> the tag will be the (truncated) MAC(K_m, cdata || adata).

Should the header be in there too?

> Given the libraries we are more or less stuck with (along with a public
> commitment we made to move to 256-bit AES keys),

Personally, I don't like 256-bit-key 128-bit-block AES. The key schedule
sucks so bad it's probably weaker than 128-bit-key AES.

Rijndael 256/256 on the other hand ..

-- Peter Fairbrother

Jeffrey Goldberg

unread,

Jun 8, 2012, 11:44:59 PM6/8/12

to

On 12-06-08 9:03 PM, Peter Fairbrother wrote:

>> It the moment I'm trying to work out a format that is roughly
>>
>> header || ciphertext || authenticated-plaintext || tag
>>
>> The header will contain
>>
>> fixed-label || version || length-of-cdata || length-of-adata ||
>> length-of-tag
>
> Does every message contain both ciphertext and authenticated-plaintext?
> Is there any relationship between them?

I don't actually see an immediate need for the adata, but I thought it
might be good to design this in the most general form. But it recently
occurred to me that I could use it for a copy of the unique item id
(UUID) that we will be using as the index for these in the database (and
elsewhere).

>> Each distinct version will have specifications of algorithms, modes, key
>> sizes, etc for the cdata and MAC. The version will just be a number, but
>> it will allow us to roll out modifications and alternatives as needed.

> You may find it's better to include the actual cipher suite in use here,
> rather than just a version number - how will an older version know what
> suite a newer version uses?

Internally we have some back and forth about exactly this. One school of
thought is that older versions will simply have to refuse to process
those records when that happens. Another school of thought is to make
the "version number" a bit field that actually encodes these sorts of
choices.

>> the tag will be the (truncated) MAC(K_m, cdata || adata).
>
> Should the header be in there too?

I was wondering that. But there are two reasons that left me inclined to
not do it that way.

(1) Because the header includes information about the length of the tag
and, via the version number, the MAC parameters, I think that it would
require a lot of extra care to do that correctly.
(2) I cribbed much of the design from CCM.

However, now that you mention it, because the header contains
information about how the data are to be decrypted and verified, the
"version" information does need to be authenticated. I think that we
will just have to repeat that in the adata.

So thank you for asking this question. I realize now that CCM can get
away with not authenticating its header because nothing in the header
could change choice of decryption or verification mechanism.

>> Given the libraries we are more or less stuck with (along with a public
>> commitment we made to move to 256-bit AES keys),
>
> Personally, I don't like 256-bit-key 128-bit-block AES. The key schedule
> sucks so bad it's probably weaker than 128-bit-key AES.

Given the libraries we are more or less stuck with ...

Though this is why I want to build in flexibility with the "version".

Anyway, thanks! Particularly for asking me to be explicit about why I
wasn't authenticating the header. Only when I started to answer that did
I see my error.

Jeffrey Goldberg

unread,

Jun 9, 2012, 2:46:08 AM6/9/12

to

On 12-06-08 10:44 PM, Jeffrey Goldberg wrote:
> On 12-06-08 9:03 PM, Peter Fairbrother wrote:

>> Should the header be in there too?

> I was wondering that. [...]

> However, now that you mention it, because the header contains
> information about how the data are to be decrypted and verified, the
> "version" information does need to be authenticated. I think that we
> will just have to repeat that in the adata.

That won't work either. I do need an integrity check on the header.

Even if I were to just put the version information in the adata, an
attacker could still manipulate the header specification of the lengths
of the parts to force some bits of the adata to be treated during
decryption as being part of the cdata.

Although I don't see how that could be exploited in practice, it still
breaks the point of authenticated encryption.

kg

unread,

Jun 9, 2012, 10:13:02 AM6/9/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>Even if I were to just put the version information in the adata, an
>attacker could still manipulate the header specification of the lengths
>of the parts to force some bits of the adata to be treated during
>decryption as being part of the cdata.
>

>Although I don't see how that could be exploited in practice, [...]

If the adversary can control parts of the adata, you could end up giving
him a decryption oracle, especially if you are using CBC mode. (With
your layout of data, it is slightly more difficult with CTR mode. A
different layout, though, and you could also attack CTR mode.)

--
kg

Jeffrey Goldberg

unread,

Jun 9, 2012, 12:48:55 PM6/9/12

to

On 12-06-09 9:13 AM, kg wrote:
> Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>> Even if I were to just put the version information in the adata, an
>> attacker could still manipulate the header specification of the lengths
>> of the parts to force some bits of the adata to be treated during
>> decryption as being part of the cdata.
>>
>> Although I don't see how that could be exploited in practice, [...]
>
> If the adversary can control parts of the adata,

The adata is included in the integrity check...But yes. I get it. If the
adversary can get me to include things of their choosing in the adata,
that would be a way in.

This is why the expression "I don't see how this could be exploited in
practice" is behind so many security blunders. Just because *I* can't
see it hardly means that others don't have better vision.

This is why I'm trying to rely on the security proofs from Bellare and
Namprempre, 2000. Full 2007 paper here:
http://cseweb.ucsd.edu/~mihir/papers/oem.html

If the adversary can get me to decrypt stuff beyond the intended cdata,
then I no longer have INT-CTXT.

For someone like me with no real training in this stuff, it took time
for me to work through the theorems (and there are still some bits that
I don't quite follow), but it's really cool.

kg

unread,

Jun 9, 2012, 3:39:50 PM6/9/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>On 12-06-08 10:21 AM, kg wrote:
>> Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>>> I am talking about
>>> implementation errors, but they can be my own (for example, expecting a
>>> different padding scheme then the one used to encrypt the data). I'm
>>> just talking about cautious programming and not something that is more
>>> directly part of the security of the scheme.
>>
>> Such mistakes could give the adversary unexpected oracles.
>
>Yep. That's why I said "directly". I do want to avoid such mistakes.
>>
>
>> Sometimes, you can design defensively by including information about the
>> cryptosystem in use when doing key derivation (this is fairly cheap). If
>> you mess up and encrypt with one system (say CBC + HMAC) and decrypt
>> with something else (CTR + HMAC), the keys you use won't match and the
>> ciphertext will be rejected.
>
>That's interesting. But I'd like to keep things nice and modular. I'm
>not sure that I want to build something like that into key derivation.
>
>It the moment I'm trying to work out a format that is roughly
>
> header || ciphertext || authenticated-plaintext || tag

Ah. Encrypt-then-MAC is straight-forward. Adding authenticated-only data
is often tricky, as you have found out.

Basically, to encrypt authenticated-only data adata and confidential
data cdata with encryption key k1 and MAC key k2, it should be safe to
do something like:

c0 = Enc(k1, cdata)
tag = MAC(k2, length(adata) || adata || cdata)
c = length(adata) || adata || c0 || tag

The MAC tag ensures that the decryptor knows the string

length(adata) || adata | c0

and it is exactly as constructed. A sane encoding of length(adata) allows
the decryptor to correctly split the string into adata and c0 pieces,
after which c0 can safely be decrypted.

>The header will contain
>
> fixed-label || version || length-of-cdata || length-of-adata ||
>length-of-tag

I'd put version information etc. into adata. If you cannot guarantee that
your keys are only used properly, I'd adapt key generation. I've tried a
few tricks such as putting version information into the tag generation,
but I can always find ways to screw up.

E.g.: One could replace MAC by

MAC'(k2, txt) = MAC(k2, version || txt) ,

but if one version is a prefix of another version, it is possible to
mess up.

Using

MAC'(k2, txt) = MAC(KDF(k2, version), txt)

seems more difficult to break for most reasonable KDF functions.

>The reason that I'm regretting the 256 bit AES keys is that we have a
>unique random key for each item which is encrypted with a master key.

This is an engineering question (which I am not good at, presumably you
are): You have a database of ciphertexts ci = Enc(master-key, ki). Would
it work if you had a database of salts si and defined ki = KDF(master-key,
salt || other-data)? (If your ciphertexts are integrity-protected,
so should the salts be.)

From a security point of view, these two solutions seem more or less
equivalent.

--
kg

kg

unread,

Jun 9, 2012, 3:46:04 PM6/9/12

to

Jeffrey Goldberg <jeffre...@goldmark.org> wrote:
>This is why I'm trying to rely on the security proofs from Bellare and
>Namprempre, 2000. Full 2007 paper here:
>http://cseweb.ucsd.edu/~mihir/papers/oem.html

Note that this paper only analyzes authenticated encryption, not
authenticated encryption with associated data, which is what you to want.

It is a nice paper, though.

--
kg

Jeffrey Goldberg

unread,

Jun 12, 2012, 5:54:46 PM6/12/12

to

On 12-06-09 2:39 PM, kg wrote:

> Ah. Encrypt-then-MAC is straight-forward. Adding authenticated-only data
> is often tricky, as you have found out.

Yep.

> Basically, to encrypt authenticated-only data adata and confidential
> data cdata with encryption key k1 and MAC key k2, it should be safe to
> do something like:
>
> c0 = Enc(k1, cdata)
> tag = MAC(k2, length(adata) || adata || cdata)

I think you meant

tag = MAC(k2, length(adata) || adata || c0)

> c = length(adata) || adata || c0 || tag
>
> The MAC tag ensures that the decryptor knows the string
>
> length(adata) || adata | c0

> I'd put version information etc. into adata. If you cannot guarantee that
> your keys are only used properly, I'd adapt key generation.

This is the conclusion that I've come to. In particular if the version
information is malleable and can be used to determine how the data is to
be parsed and verified, then I lose all of the nice security results.

Basically the version must be included in the adata and the integrity
check must not depend on the version. (Or if the integrity check does
depend on the version, then there must never be a pair of versions that
could ever "accept" the same data.)

> I've tried a
> few tricks such as putting version information into the tag generation,
> but I can always find ways to screw up.
>
> E.g.: One could replace MAC by
>
> MAC'(k2, txt) = MAC(k2, version || txt) ,
>
> but if one version is a prefix of another version, it is possible to
> mess up.
>
> Using
>
> MAC'(k2, txt) = MAC(KDF(k2, version), txt)

I think it is simpler to just include the header in the adata, and just
make sure that the interpretation of the header never varies with
different versions, and also make sure that the specification of the MAC
verification is independent of the version (or anything else in the header).

My attempts to build in flexibility were the real problem. If something
says "this is how you verify me" then we are asking for trouble.

Thank you and everyone else for your terrific help in getting me to
think through this process.