Keyczar version 2 design

87 views
Skip to first unread message

Shawn Willden

unread,
Jul 19, 2013, 3:39:58 PM7/19/13
to keyczar...@googlegroups.com
There are a number of problems with the current Keyczar message formats, including:
  • Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
    • Use of text encoding
  • Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
  • Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.

I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.

Comments?

Thanks.

--
Shawn Willden | Software Engineer | swil...@google.com | 720-924-6645

Jay Tuley

unread,
Jul 20, 2013, 1:39:41 PM7/20/13
to keyczar...@googlegroups.com
On Fri, Jul 19, 2013 at 2:39 PM, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
  • Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
    • Use of text encoding
In regards to message size vs conservative security, it should be easier for the end user to do the conservative security.
  • Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
If an attacker signed it himself the recipients pre-exchanged (outside of keyczar) public key keyset wouldn't validate that signature. What am I missing? 
  • Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I think there is still a happy medium between designing a new set of flexible KeyTypes and when the need arises make new KeyTypes for clear 100% backwards compatibility. And on that note the other design change that is needed strongly needs is that Keysets need to be task oriented vs keytype oriented to be able to rotate out old or just different algorithm KeyTypes.
 
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.

I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I think protobufs would make implementing the payload format correctly easier which is a very strong plus. If you count python2 and 3 separately there are 6 implementations of KeyCzar that work together now (pending patch reviews), maybe 7 if Evan release's his JS one, easy to implement right and interoperable is important.

Ultimately when we are talking about self describing, we mean meta formats, the keyczar message formats are still going to be adhoc just it's encoding the payload with some light structure that's being suggested.  Ubiquity for encoding is really more important that standardization, as it's not like we would be using standard message formats that would interoperate with things that know nothing about keyczar,  and Protobuf has been ported to enough languages  (several times in many cases too) that I don't think ubiquity is an issue with it.
 

Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.

I like having human readable/editable representation for the key files, if it's json or text proto, or even XER (XML gross) so be it, the differences are small details, the more it can be easily modified by hand or with standard tools the better.  Being easy to massage random keydata gives unsupported flexibility! 
 

Comments?

Thanks.

--
Shawn Willden | Software Engineer | swil...@google.com | 720-924-6645

--
You received this message because you are subscribed to the Google Groups "Keyczar Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keyczar-discu...@googlegroups.com.
To post to this group, send email to keyczar...@googlegroups.com.
Visit this group at http://groups.google.com/group/keyczar-discuss.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Shawn Willden

unread,
Jul 22, 2013, 8:38:49 AM7/22/13
to keyczar...@googlegroups.com
On Sat, Jul 20, 2013 at 11:39 AM, Jay Tuley <j...@tuley.name> wrote:
On Fri, Jul 19, 2013 at 2:39 PM, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
  • Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
    • Use of text encoding
In regards to message size vs conservative security, it should be easier for the end user to do the conservative security.

I'm not sure what you mean here, can you elaborate? 
  • Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
If an attacker signed it himself the recipients pre-exchanged (outside of keyczar) public key keyset wouldn't validate that signature. What am I missing?

If the recipient regularly received messages from both the attacker and the actual origin of the message, he'd have keys for both. It's not an attack that's generally useful, but there are some circumstances in which it could be damaging. It's the reason most systems sign before encrypting rather than after. With sign-then-encrypt, there's a similar but even less useful attack where the legitimate recipient can decrypt and re-encrypt to a different target then pass the message along making the new recipient think that he received it from the origin.

A v2 protocol should address both of these by signing before encrypting, and by including the encryption key ID in the signed payload.
  • Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I think there is still a happy medium between designing a new set of flexible KeyTypes and when the need arises make new KeyTypes for clear 100% backwards compatibility. And on that note the other design change that is needed strongly needs is that Keysets need to be task oriented vs keytype oriented to be able to rotate out old or just different algorithm KeyTypes.

Agreed. Purpose-oriented keysets, with more flexible keys are the way to go... and I also agree that we need to maintain backward compatibility.
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.

I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I think protobufs would make implementing the payload format correctly easier which is a very strong plus. If you count python2 and 3 separately there are 6 implementations of KeyCzar that work together now (pending patch reviews), maybe 7 if Evan release's his JS one, easy to implement right and interoperable is important.

Ultimately when we are talking about self describing, we mean meta formats, the keyczar message formats are still going to be adhoc just it's encoding the payload with some light structure that's being suggested.  Ubiquity for encoding is really more important that standardization, as it's not like we would be using standard message formats that would interoperate with things that know nothing about keyczar,  and Protobuf has been ported to enough languages  (several times in many cases too) that I don't think ubiquity is an issue with it.

This is my thinking as well.
 
Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.

I like having human readable/editable representation for the key files, if it's json or text proto, or even XER (XML gross) so be it, the differences are small details, the more it can be easily modified by hand or with standard tools the better.  Being easy to massage random keydata gives unsupported flexibility! 

Agreed. When you find yourself manually hacking key files you're doing something ugly... but sometimes it's necessary, and very convenient to be able to do.

Jay Tuley

unread,
Jul 22, 2013, 12:19:06 PM7/22/13
to keyczar...@googlegroups.com
On Mon, Jul 22, 2013 at 7:38 AM, Shawn Willden <swil...@google.com> wrote:
On Sat, Jul 20, 2013 at 11:39 AM, Jay Tuley <j...@tuley.name> wrote:
On Fri, Jul 19, 2013 at 2:39 PM, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
  • Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
    • Use of text encoding
In regards to message size vs conservative security, it should be easier for the end user to do the conservative security.

I'm not sure what you mean here, can you elaborate? 

Shorter Macs and IV's might have enough entropy for most usages when you are rotating your key properly, but full length probably isn't too excessive message size wise for most usages either and being conservative by default will help in the generalist programmer usage. I am just expressing the opinion of  not shortening for all aes usages, and that iv and mac length should be tied to serialized properties on the the new AES KeyType what ever form that takes.

And I didn't mean to include text encoding, I just gloss over it in my head since as only signedsessionencryption uses a text encoded payload and just for the session material. 
  • Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
If an attacker signed it himself the recipients pre-exchanged (outside of keyczar) public key keyset wouldn't validate that signature. What am I missing?

If the recipient regularly received messages from both the attacker and the actual origin of the message, he'd have keys for both. It's not an attack that's generally useful, but there are some circumstances in which it could be damaging. It's the reason most systems sign before encrypting rather than after.

Ah, I see the weakness is that in the case that both A & B can send data to C, if A is compromised, then A can replay B's transmission as coming from A to C.
 
With sign-then-encrypt, there's a similar but even less useful attack where the legitimate recipient can decrypt and re-encrypt to a different target then pass the message along making the new recipient think that he received it from the origin.

A v2 protocol should address both of these by signing before encrypting, and by including the encryption key ID in the signed payload.

If A can send to B & C, and B was compromised, B can replay A's transmissions to B as if A to C. So if we include the encrypting public keyhash in the signed payload, the actual destination is authenticated. 

Thanks, make sense.

The 0.7 construction encrypt->mac->sign did intuitively looked wasteful for including the mac. But sign->encrypt->mac,  does look so much better.



Devin Lundberg

unread,
Jul 25, 2013, 6:02:24 AM7/25/13
to keyczar...@googlegroups.com
Once we define a core set of features that we want across all versions. One thing I think would be useful to see in keyczar is the support for extensions or plugins to make it easy to add additional features to current versions. This can clearly be done already, but I think it would be useful to distinguish between core features and other code that may be useful, but is not currently in the core feature set.

For example, in terms of of operations encrypting and signing might be core features, but things like timeout sign might not. 

I think separating the features in this way would make it easier for creation and maintenance of future implementations and also for the creation of new features. 

Ben Laurie

unread,
Jul 25, 2013, 6:09:30 AM7/25/13
to Keyczar Discuss
On 19 July 2013 20:39, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
  • Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
What's the alternative? 
    • Use of text encoding
  • Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
  • Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.

I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.
 

Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.

Comments?

Thanks.

--
Shawn Willden | Software Engineer | swil...@google.com | 720-924-6645

--

Shawn Willden

unread,
Jul 25, 2013, 6:21:07 AM7/25/13
to keyczar...@googlegroups.com
On Thu, Jul 25, 2013 at 12:09 PM, Ben Laurie <be...@google.com> wrote:
On 19 July 2013 20:39, Shawn Willden <swil...@google.com> wrote:
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
What's the alternative? 

For HMAC values, truncation. HMAC length should be an attribute of the HMAC key, I think. I'd set a reasonable lower bound on it (say, 64 bits).

For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV. 

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.

Er, yes, I meant "receiver" :-)

I'd like the receiver to be able to look at the message and be able to distinguish between encrypted data, a signature, signed session data, signed session fields, etc. by examining headers and structure. It's likely that the receiver couldn't do anything more useful than issue a good diagnostic if, say, a Crypter was handed a signed session blob, but at least it could issue a good diagnostic. Also, it would be nice just to have a framework for describing message structures in a more formal and convenient way than diagrams and English text which lay out the meanings of sequences of bytes.

Ben Laurie

unread,
Jul 25, 2013, 6:45:28 AM7/25/13
to Keyczar Discuss
Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.

Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.


--
Shawn Willden | Software Engineer | swil...@google.com | 720-924-6645

--

Daniel Bleichenbacher

unread,
Jul 25, 2013, 7:23:36 AM7/25/13
to keyczar...@googlegroups.com
On Thu, Jul 25, 2013 at 12:45 PM, Ben Laurie <be...@google.com> wrote:



On 25 July 2013 11:21, Shawn Willden <swil...@google.com> wrote:
On Thu, Jul 25, 2013 at 12:09 PM, Ben Laurie <be...@google.com> wrote:
On 19 July 2013 20:39, Shawn Willden <swil...@google.com> wrote:
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
What's the alternative? 

For HMAC values, truncation. HMAC length should be an attribute of the HMAC key, I think. I'd set a reasonable lower bound on it (say, 64 bits).


Because of hash collisions, HMACs longer than half of the digest length do not add much security. Hence truncating the MACs certainly makes a lot of sense.
The library itself isn't really the best place to set reasonable lower bounds. There is always some project that wants to use low security margins (e.g. the shortest "approved"
HMAC length which is 32-bit or other key sizes such as 1024-bit RSA keys). There are a lot of crypto libraries that still support 512-bit RSA keys, probably because 
backwards compatibility is important.
The solution I've added to keymaster is a method for each key class that describes key sizes and other properties. This method allows for example to verify that
a key set contains no keys with a low security level, which means that each project can decide itself, when to require that all keys achieve say a 112-bit security level
(according to say NIST).
 
For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV. 

I don't think this works. CBC really needs an IV that is not predictable.
However, there is some market for deterministic encryption and there the  SIV encryption mode (RFC 5297) is certainly an option.


On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.

Er, yes, I meant "receiver" :-)

I'd like the receiver to be able to look at the message and be able to distinguish between encrypted data, a signature, signed session data, signed session fields, etc. by examining headers and structure. It's likely that the receiver couldn't do anything more useful than issue a good diagnostic if, say, a Crypter was handed a signed session blob, but at least it could issue a good diagnostic. Also, it would be nice just to have a framework for describing message structures in a more formal and convenient way than diagrams and English text which lay out the meanings of sequences of bytes.

Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.

Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.


I think there is an important topic that hasn't been mentioned yet: Readability.
Quite frequently when I look at code, I can't decide whether the code is secure, because
the security properties depend on the key type. E.g. If the code uses a crypter
object it is unclear if the keys are public keys (with no authentication), an authenticated
encryption mode or even some special purpose key type with no security. 

Jay Tuley

unread,
Jul 25, 2013, 9:27:54 AM7/25/13
to keyczar...@googlegroups.com
On Thu, Jul 25, 2013 at 5:45 AM, Ben Laurie <be...@google.com> wrote:

On 25 July 2013 11:21, Shawn Willden <swil...@google.com> wrote:
On Thu, Jul 25, 2013 at 12:09 PM, Ben Laurie <be...@google.com> wrote:
On 19 July 2013 20:39, Shawn Willden <swil...@google.com> wrote:
  • Excessive message size, caused by:
    • Use of full-length hash values (20 byte) in HMACs
    • Use of block-length IVs for CBC mode encryption
What's the alternative? 

For HMAC values, truncation. HMAC length should be an attribute of the HMAC key, I think. I'd set a reasonable lower bound on it (say, 64 bits).

For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV. 

On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:
  1. ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
  2. Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.

I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.

Er, yes, I meant "receiver" :-)

I'd like the receiver to be able to look at the message and be able to distinguish between encrypted data, a signature, signed session data, signed session fields, etc. by examining headers and structure. It's likely that the receiver couldn't do anything more useful than issue a good diagnostic if, say, a Crypter was handed a signed session blob, but at least it could issue a good diagnostic. Also, it would be nice just to have a framework for describing message structures in a more formal and convenient way than diagrams and English text which lay out the meanings of sequences of bytes.

Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.

Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.

Anything parsed before the message authentication is validated would increase attack surface area certainly. But I think you could use protobuf for internal payload and still have a mac or signature checked first.

Jay Tuley

unread,
Jul 25, 2013, 10:05:52 AM7/25/13
to keyczar...@googlegroups.com
On Thu, Jul 25, 2013 at 6:23 AM, Daniel Bleichenbacher <blei...@google.com> wrote:



On Thu, Jul 25, 2013 at 12:45 PM, Ben Laurie <be...@google.com> wrote:




Because of hash collisions, HMACs longer than half of the digest length do not add much security. Hence truncating the MACs certainly makes a lot of sense.
The library itself isn't really the best place to set reasonable lower bounds. There is always some project that wants to use low security margins (e.g. the shortest "approved"
HMAC length which is 32-bit or other key sizes such as 1024-bit RSA keys). There are a lot of crypto libraries that still support 512-bit RSA keys, probably because 
backwards compatibility is important.
The solution I've added to keymaster is a method for each key class that describes key sizes and other properties. This method allows for example to verify that
a key set contains no keys with a low security level, which means that each project can decide itself, when to require that all keys achieve say a 112-bit security level
(according to say NIST).
 
For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV. 

I don't think this works. CBC really needs an IV that is not predictable.
However, there is some market for deterministic encryption and there the  SIV encryption mode (RFC 5297) is certainly an option. 

Yeah padding sounds bad, PRF to expand it maybe, but rather than doing something weird, a different AES mode that is designed for variable length IVs would be better. Deterministic encryption sounds tedious for offering an api to use it correctly. But I  really think worrying about less than 16 bytes is silly unless you really need to worry about less than 16 bytes. As a side note ciphertext stealing for CBC could save less than 16 bytes too.
 
Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.

Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.


I think there is an important topic that hasn't been mentioned yet: Readability.
Quite frequently when I look at code, I can't decide whether the code is secure, because
the security properties depend on the key type. E.g. If the code uses a crypter
object it is unclear if the keys are public keys (with no authentication), an authenticated
encryption mode or even some special purpose key type with no security. 

So if you could set up your security requirements in code, and they could be checked at runtime similar to what you described above for keymaster? Or  a more radical design change?

Maybe init Crypters with an optional Security enum - High, Standard, Deprecated, default to Standard. However your code security would be relative to how up to date keyczar is.  Not sure about how to distinguishing between asymmetric and authenticated without making a separate PublicEncrypter/PrivateCrypter type, that might be a reasonable design choice as symmetric and asymmetric encryption aren't really interchangble.



Daniel Bleichenbacher

unread,
Jul 25, 2013, 12:02:15 PM7/25/13
to keyczar...@googlegroups.com
On Thu, Jul 25, 2013 at 4:05 PM, Jay Tuley <j...@tuley.name> wrote:



On Thu, Jul 25, 2013 at 6:23 AM, Daniel Bleichenbacher <blei...@google.com> wrote:



On Thu, Jul 25, 2013 at 12:45 PM, Ben Laurie <be...@google.com> wrote:




Because of hash collisions, HMACs longer than half of the digest length do not add much security. Hence truncating the MACs certainly makes a lot of sense.
The library itself isn't really the best place to set reasonable lower bounds. There is always some project that wants to use low security margins (e.g. the shortest "approved"
HMAC length which is 32-bit or other key sizes such as 1024-bit RSA keys). There are a lot of crypto libraries that still support 512-bit RSA keys, probably because 
backwards compatibility is important.
The solution I've added to keymaster is a method for each key class that describes key sizes and other properties. This method allows for example to verify that
a key set contains no keys with a low security level, which means that each project can decide itself, when to require that all keys achieve say a 112-bit security level
(according to say NIST).
 
For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV. 

I don't think this works. CBC really needs an IV that is not predictable.
However, there is some market for deterministic encryption and there the  SIV encryption mode (RFC 5297) is certainly an option. 

Yeah padding sounds bad, PRF to expand it maybe, but rather than doing something weird, a different AES mode that is designed for variable length IVs would be better. Deterministic encryption sounds tedious for offering an api to use it correctly. But I  really think worrying about less than 16 bytes is silly unless you really need to worry about less than 16 bytes. As a side note ciphertext stealing for CBC could save less than 16 bytes too.

I don't think deterministic encryption leads necessarily to a more tedious API.
For the non-deterministic encryption one option is to bascially use the AEAD (authenticated encryption with additional authenticated data) interface.
E.g. the encryption takes a plaintext and some additional data D that is also authenticated, but not included in the ciphertext but also authenticated.
Decryption takes the ciphertext and D.
Equivalently, the interface for the deterministic encryption modes would also take a plaintext and also an additional data block D and return a ciphertext that
is a deterministic result of the key, plaintext and D. Decryption again would require the ciphertext and D.
The nice property about SIV is that reusing the same D multiple times does not significantly reduce the security.
To ensure that the API does not become tedious, it might be possible to restrict this to encryption modes with equal properties, i.e.
where reusing D multiple times only leaks if the plaintexts are equal, but nothing else.
I think this is typically called tweakable encryption (though the distinction between tweak, nonce, salt, IV are not standardized and differs a lot between authors.)

 
Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.

Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.


I think there is an important topic that hasn't been mentioned yet: Readability.
Quite frequently when I look at code, I can't decide whether the code is secure, because
the security properties depend on the key type. E.g. If the code uses a crypter
object it is unclear if the keys are public keys (with no authentication), an authenticated
encryption mode or even some special purpose key type with no security. 

So if you could set up your security requirements in code, and they could be checked at runtime similar to what you described above for keymaster? Or  a more radical design change?


I think, using this in the key management tools would be right. Warning users when they want to add new keys to a key set is probably more acceptable than
suddenly refusing to use old keys.
 
Maybe init Crypters with an optional Security enum - High, Standard, Deprecated, default to Standard. However your code security would be relative to how up to date keyczar is.  Not sure about how to distinguishing between asymmetric and authenticated without making a separate PublicEncrypter/PrivateCrypter type, that might be a reasonable design choice as symmetric and asymmetric encryption aren't really interchangble.


Having different data type for symmetric and asymmetric crypters has the nice side effect that the code using them becomes easier to understand.
I'm using bit levels (i.e. 80, 112, 128 following NIST recommendations). Then the library doesn't have to decide how long 112-bits are acceptable.
Rather the app developer might check if the keys are big enough and switch.

It is indeed a problem, that the library itself can't aggressively push higher security levels. But having at least a tool that allows a user to get alerts for
outdated key sizes would be helpful.

Devin Lundberg

unread,
Jul 25, 2013, 12:28:38 PM7/25/13
to keyczar...@googlegroups.com
Exposing something like bit levels of security could be a good strategy. While users may not understand the bit levels in isolation, NIST also has estimated dates when the crypto will be usable until for each bit level. This would help users who may not know the difference between algorithms and key sizes actually understand what they are getting out of their encryption, which is one of the goals of keyczar. We could even make an algorithm chooser based on when the crypto is going to be used until or how often the users wish to rotate keys.

Jay Tuley

unread,
Jul 25, 2013, 1:20:02 PM7/25/13
to keyczar...@googlegroups.com
Internally to keyczar nist bit levels make 100% sense design wise, but exposing that externally to the generalist programmer, I'm hesitant, 2048 bit RSA < 128 bit not obvious to the layman, and it's not like NIST bits of security estimates don't change over time so even with the bit levels you aren't guaranteed to know what you'll get. But I can see both sides to this "standardize view vs nonstandard abstraction" or  "confusing expert interface vs easy to use presets". 

 
It is indeed a problem, that the library itself can't aggressively push higher security levels. But having at least a tool that allows a user to get alerts for
outdated key sizes would be helpful.

There's a thought, the keyczartool could check online for more aggressive definitions for security constraints.
 
 
Exposing something like bit levels of security could be a good strategy. While users may not understand the bit levels in isolation, NIST also has estimated dates when the crypto will be usable until for each bit level. This would help users who may not know the difference between algorithms and key sizes actually understand what they are getting out of their encryption, which is one of the goals of keyczar. We could even make an algorithm chooser based on when the crypto is going to be used until or how often the users wish to rotate keys.

I think trying semantically guide developers more with the tool makes a lot of sense.

Indeed, no help is given to the developer related to when to rotate keys. The only meta data we keep, that could even give a hint, is a version number, otherwise they need to track it on their own outside of keyczar.  This is an area that could use a lot of improvement.

Daniel Bleichenbacher

unread,
Jul 26, 2013, 6:57:12 AM7/26/13
to keyczar...@googlegroups.com
Well, I'm sorry, but if I see a requirement for standard security or high security then I don't know what is required and hence I assume that a layman doesn't know either since it
is not defined and hence the key size depends on opinions. If I see a 128-bit NIST level as requirement, then I can  go to the corresponding document and translate that.

Also, it doesn't have to be based on NIST recommendations. E.g. my implementation separates the key classes from the key size checker class. I currently only have a 
class that implements what NIST recommends, but it would be easy to make another instance that uses the Korean Crypto standards or whatever Germany's BSI publishes.
All the class that implements a key type has to do is to pass to the checker the algorithms, key sizes and usage (e.g. for hashes one has to distinguish between using them for signatures, HMACs or pseudorandom functions.)

Translating everything into bit levels does not solve every single security problem, but it should allow to find the most serious security problems (512-bit RSA keys, SHA-1 for signatures etc). 
 
It is indeed a problem, that the library itself can't aggressively push higher security levels. But having at least a tool that allows a user to get alerts for
outdated key sizes would be helpful.

There's a thought, the keyczartool could check online for more aggressive definitions for security constraints.
 
 
Exposing something like bit levels of security could be a good strategy. While users may not understand the bit levels in isolation, NIST also has estimated dates when the crypto will be usable until for each bit level. This would help users who may not know the difference between algorithms and key sizes actually understand what they are getting out of their encryption, which is one of the goals of keyczar. We could even make an algorithm chooser based on when the crypto is going to be used until or how often the users wish to rotate keys.

I think trying semantically guide developers more with the tool makes a lot of sense.

Indeed, no help is given to the developer related to when to rotate keys. The only meta data we keep, that could even give a hint, is a version number, otherwise they need to track it on their own outside of keyczar.  This is an area that could use a lot of improvement.

--

Jay Tuley

unread,
Jul 26, 2013, 9:13:54 AM7/26/13
to keyczar...@googlegroups.com
Sure you can, but you are a layman by no means, requiring the layman to read nist documents, to understand the security choices they are making, is probably too much, they just won't do it. This is a classic expert vs novice design issue. But I think this is doable from both sides, keyczartool checkkeys [--minstrength=<nistbitlevel>]  could always return a report explaining the existing keys and their nist levels along with a plain description about keylength and ciphertext security over time and return an error code on anything below an optional minimum strength. The report could include a link to the exact version of the nist document that the security levels came from too. Experts happy, novices get some education rather than being presumed to know or required to research.


Also, it doesn't have to be based on NIST recommendations. E.g. my implementation separates the key classes from the key size checker class. I currently only have a 
class that implements what NIST recommends, but it would be easy to make another instance that uses the Korean Crypto standards or whatever Germany's BSI publishes.

Maybe a simple constraint checker based on a json or protobuf-text definition file so that we can keep all the implementations up to date easily.

Daniel Bleichenbacher

unread,
Jul 26, 2013, 12:30:02 PM7/26/13
to keyczar...@googlegroups.com
Well,
despite calling me a clueless layman you just basically redesigned the tool I've already implemented. You only made one mistake. I'm not mixing the command line tool
for manipulating keys with the command line tool for checking them. The motivation for this is that in the former case the tool is performing sensitive operations and in the
later case it is not. In fact no output of the tool must contain key material, so that no keys get leaked when someone is using the tool for debugging and posts the results.

Shawn Willden

unread,
Jul 26, 2013, 12:32:41 PM7/26/13
to keyczar...@googlegroups.com

On Fri, Jul 26, 2013 at 6:30 PM, Daniel Bleichenbacher <blei...@google.com> wrote:
despite calling me a clueless layman

Actually Jay said you are not a layman, just with a somewhat idiomatic expression :-)

Jay Tuley

unread,
Jul 26, 2013, 1:27:33 PM7/26/13
to keyczar...@googlegroups.com
On Fri, Jul 26, 2013 at 11:30 AM, Daniel Bleichenbacher <blei...@google.com> wrote:
Well,
despite calling me a clueless layman

As Shawn clarified, my intended meaning was that you are 100% expert and 0% layman :D.
 
you just basically redesigned the tool I've already implemented. You only made one mistake. I'm not mixing the command line tool
for manipulating keys with the command line tool for checking them. The motivation for this is that in the former case the tool is performing sensitive operations and in the
later case it is not. In fact no output of the tool must contain key material, so that no keys get leaked when someone is using the tool for debugging and posts the results.

 With the current design of keyczar, to query key attributes you have to pull the entire key material into memory. Same goes for encrypted keysets, they additionally have to be decrypted as well. Should this be a design consideration for v2?

Shawn Willden

unread,
Jul 26, 2013, 2:13:21 PM7/26/13
to keyczar...@googlegroups.com
That's a good point. IMO, it is a design consideration. The encrypted element of key should contain only the key material, plus perhaps a hash of the attributes if we want to ensure that such "secured" keys can't be misused by tweaking attributes. That would allow a key checking utility to examine the attributes without access to the key material.

Daniel Bleichenbacher

unread,
Jul 30, 2013, 8:35:31 AM7/30/13
to keyczar...@googlegroups.com
On Fri, Jul 26, 2013 at 8:13 PM, Shawn Willden <swil...@google.com> wrote:
On Fri, Jul 26, 2013 at 7:27 PM, Jay Tuley <j...@tuley.name> wrote:
On Fri, Jul 26, 2013 at 11:30 AM, Daniel Bleichenbacher <blei...@google.com> wrote:
Well,
despite calling me a clueless layman

As Shawn clarified, my intended meaning was that you are 100% expert and 0% layman :D.

I'm sorry about that. I completely misread the mail.
 
 
you just basically redesigned the tool I've already implemented. You only made one mistake. I'm not mixing the command line tool
for manipulating keys with the command line tool for checking them. The motivation for this is that in the former case the tool is performing sensitive operations and in the
later case it is not. In fact no output of the tool must contain key material, so that no keys get leaked when someone is using the tool for debugging and posts the results.

 With the current design of keyczar, to query key attributes you have to pull the entire key material into memory. Same goes for encrypted keysets, they additionally have to be decrypted as well. Should this be a design consideration for v2?

That's a good point. IMO, it is a design consideration. The encrypted element of key should contain only the key material, plus perhaps a hash of the attributes if we want to ensure that such "secured" keys can't be misused by tweaking attributes. That would allow a key checking utility to examine the attributes without access to the key material.


I agree too. This is an issue that needs attention. I'm not really aware of how keyczar encrypts key files, thus I might misrepresent some stuff.
One observation is that developers want to use alternative interfaces to access files. If they use abstract interfaces to their resources then it is difficult
to tell if the obtained file is authentic or possibly went through an insecure connection. Even though key files should have an integrity check, there
is currently no way to tell whether this is the right key (or maybe some old deprecated key). An of course as already pointed out, metadata must be
included in the integrity checks too. 
  
--
Shawn Willden | Software Engineer | swil...@google.com | 720-924-6645

--
Reply all
Reply to author
Forward
0 new messages