There are a number of problems with the current Keyczar message formats, including:
- Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
- Use of text encoding
- Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
- Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.
I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.
--
You received this message because you are subscribed to the Google Groups "Keyczar Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keyczar-discu...@googlegroups.com.
To post to this group, send email to keyczar...@googlegroups.com.
Visit this group at http://groups.google.com/group/keyczar-discuss.
For more options, visit https://groups.google.com/groups/opt_out.
On Fri, Jul 19, 2013 at 2:39 PM, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
- Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
- Use of text encoding
In regards to message size vs conservative security, it should be easier for the end user to do the conservative security.
- Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
If an attacker signed it himself the recipients pre-exchanged (outside of keyczar) public key keyset wouldn't validate that signature. What am I missing?
- Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I think there is still a happy medium between designing a new set of flexible KeyTypes and when the need arises make new KeyTypes for clear 100% backwards compatibility. And on that note the other design change that is needed strongly needs is that Keysets need to be task oriented vs keytype oriented to be able to rotate out old or just different algorithm KeyTypes.
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.
I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
I think protobufs would make implementing the payload format correctly easier which is a very strong plus. If you count python2 and 3 separately there are 6 implementations of KeyCzar that work together now (pending patch reviews), maybe 7 if Evan release's his JS one, easy to implement right and interoperable is important.Ultimately when we are talking about self describing, we mean meta formats, the keyczar message formats are still going to be adhoc just it's encoding the payload with some light structure that's being suggested. Ubiquity for encoding is really more important that standardization, as it's not like we would be using standard message formats that would interoperate with things that know nothing about keyczar, and Protobuf has been ported to enough languages (several times in many cases too) that I don't think ubiquity is an issue with it.
Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.I like having human readable/editable representation for the key files, if it's json or text proto, or even XER (XML gross) so be it, the differences are small details, the more it can be easily modified by hand or with standard tools the better. Being easy to massage random keydata gives unsupported flexibility!
On Sat, Jul 20, 2013 at 11:39 AM, Jay Tuley <j...@tuley.name> wrote:
On Fri, Jul 19, 2013 at 2:39 PM, Shawn Willden <swil...@google.com> wrote:
There are a number of problems with the current Keyczar message formats, including:
- Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
- Use of text encoding
In regards to message size vs conservative security, it should be easier for the end user to do the conservative security.I'm not sure what you mean here, can you elaborate?
- Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
If an attacker signed it himself the recipients pre-exchanged (outside of keyczar) public key keyset wouldn't validate that signature. What am I missing?If the recipient regularly received messages from both the attacker and the actual origin of the message, he'd have keys for both. It's not an attack that's generally useful, but there are some circumstances in which it could be damaging. It's the reason most systems sign before encrypting rather than after.
With sign-then-encrypt, there's a similar but even less useful attack where the legitimate recipient can decrypt and re-encrypt to a different target then pass the message along making the new recipient think that he received it from the origin.A v2 protocol should address both of these by signing before encrypting, and by including the encryption key ID in the signed payload.
There are a number of problems with the current Keyczar message formats, including:
- Reliance on a single, fixed hash algorithm... exacerbated by the fact that the hash algorithm in question (SHA1) is deprecated.
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
I'm sure there are others that I'm missing, some because I haven't noticed them, and others because I've forgotten them.
- Use of text encoding
- Security weakness in signed session encryption, since elements are signed after encryption, which enables an attacker strip the signatures and sign the elements himself (origin spoofing).
- Inflexibility in padding and mode selection. These are tied to the key type, which has led Keymaster, which has the same problem, (the Google-internal library on which Keyczar was based) to proliferate key types in a combinatorial explosion.
I'd like to open the floor for suggestions and comments on what we should do to address all of these. Some of them will require new message formats, which points out another weakness, which is that Keyczar's message formats are rather ad-hoc, tied to the key type, and not at all self-describing.On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
Another point of discussion is around key formats. I like JSON for key data, but some of my colleagues see value in using the same basic infrastructure for both message and key formats, and I have some sympathy with that point of view. If we were to use ASN.1 BER or protobuf for keys, we could still use the associated text representation as the on-disk format.Comments?Thanks.--
--
On 19 July 2013 20:39, Shawn Willden <swil...@google.com> wrote:
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
What's the alternative?
On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.
--
On 25 July 2013 11:21, Shawn Willden <swil...@google.com> wrote:
On Thu, Jul 25, 2013 at 12:09 PM, Ben Laurie <be...@google.com> wrote:
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
What's the alternative?For HMAC values, truncation. HMAC length should be an attribute of the HMAC key, I think. I'd set a reasonable lower bound on it (say, 64 bits).
For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV.
On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.Er, yes, I meant "receiver" :-)I'd like the receiver to be able to look at the message and be able to distinguish between encrypted data, a signature, signed session data, signed session fields, etc. by examining headers and structure. It's likely that the receiver couldn't do anything more useful than issue a good diagnostic if, say, a Crypter was handed a signed session blob, but at least it could issue a good diagnostic. Also, it would be nice just to have a framework for describing message structures in a more formal and convenient way than diagrams and English text which lay out the meanings of sequences of bytes.Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.
On 25 July 2013 11:21, Shawn Willden <swil...@google.com> wrote:
On Thu, Jul 25, 2013 at 12:09 PM, Ben Laurie <be...@google.com> wrote:
- Excessive message size, caused by:
- Use of full-length hash values (20 byte) in HMACs
- Use of block-length IVs for CBC mode encryption
What's the alternative?For HMAC values, truncation. HMAC length should be an attribute of the HMAC key, I think. I'd set a reasonable lower bound on it (say, 64 bits).
For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV.On that latter point, in some discussions I've had with other Googlers, we're in agreement that we'd really like to have a single, structured message format that is adaptable to whatever we need to put into it, but doesn't presume that the sender must already know exactly what it's receiving. Two main candidates have emerged:The pros and cons above are my perceptions, of course. Others may disagree and I'd love to hear alternative views. On balance, I'm leaning towards protobuf.
- ASN.1 BER (or perhaps PER). The advantages are that ASN.1 is standardized, very widely-used, has good libraries in basically all languages and has both binary and textual representations (the text representation is XER, XML Encoding Rules). The disadvantage is that ASN.1 is complex and hard to work with.
- Protobuf. The advantages are that protobuf has good libraries in all common languages, both binary and textual representations (the text representation is basically JSON) and it's pretty simple to work with; easy to extend and modify over time, etc. The disadvantage is that it's not a standard.
I'm a little puzzled by this - if the receiver (I assume you mean that, rather than sender) doesn't know what its receiving, how does it process it? Sure, protobuf/ASN.1 are self-describing (to some extent) but that doesn't particularly help once you've unpacked the data.Er, yes, I meant "receiver" :-)I'd like the receiver to be able to look at the message and be able to distinguish between encrypted data, a signature, signed session data, signed session fields, etc. by examining headers and structure. It's likely that the receiver couldn't do anything more useful than issue a good diagnostic if, say, a Crypter was handed a signed session blob, but at least it could issue a good diagnostic. Also, it would be nice just to have a framework for describing message structures in a more formal and convenient way than diagrams and English text which lay out the meanings of sequences of bytes.Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.
Because of hash collisions, HMACs longer than half of the digest length do not add much security. Hence truncating the MACs certainly makes a lot of sense.The library itself isn't really the best place to set reasonable lower bounds. There is always some project that wants to use low security margins (e.g. the shortest "approved"HMAC length which is 32-bit or other key sizes such as 1024-bit RSA keys). There are a lot of crypto libraries that still support 512-bit RSA keys, probably becausebackwards compatibility is important.The solution I've added to keymaster is a method for each key class that describes key sizes and other properties. This method allows for example to verify thata key set contains no keys with a low security level, which means that each project can decide itself, when to require that all keys achieve say a 112-bit security level(according to say NIST).For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV.I don't think this works. CBC really needs an IV that is not predictable.However, there is some market for deterministic encryption and there the SIV encryption mode (RFC 5297) is certainly an option.
Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.I think there is an important topic that hasn't been mentioned yet: Readability.Quite frequently when I look at code, I can't decide whether the code is secure, becausethe security properties depend on the key type. E.g. If the code uses a crypterobject it is unclear if the keys are public keys (with no authentication), an authenticatedencryption mode or even some special purpose key type with no security.
On Thu, Jul 25, 2013 at 6:23 AM, Daniel Bleichenbacher <blei...@google.com> wrote:
Because of hash collisions, HMACs longer than half of the digest length do not add much security. Hence truncating the MACs certainly makes a lot of sense.The library itself isn't really the best place to set reasonable lower bounds. There is always some project that wants to use low security margins (e.g. the shortest "approved"HMAC length which is 32-bit or other key sizes such as 1024-bit RSA keys). There are a lot of crypto libraries that still support 512-bit RSA keys, probably becausebackwards compatibility is important.The solution I've added to keymaster is a method for each key class that describes key sizes and other properties. This method allows for example to verify thata key set contains no keys with a low security level, which means that each project can decide itself, when to require that all keys achieve say a 112-bit security level(according to say NIST).For IVs, a smaller bit string can be used and then padded to get to the full block size. The length of IV needed depends on the amount of message randomization desired. Again, I'd probably set a reasonable lower bound, although if you happen to know that for your application every initial block is unique you really could use an all-zeros IV.I don't think this works. CBC really needs an IV that is not predictable.However, there is some market for deterministic encryption and there the SIV encryption mode (RFC 5297) is certainly an option.Yeah padding sounds bad, PRF to expand it maybe, but rather than doing something weird, a different AES mode that is designed for variable length IVs would be better. Deterministic encryption sounds tedious for offering an api to use it correctly. But I really think worrying about less than 16 bytes is silly unless you really need to worry about less than 16 bytes. As a side note ciphertext stealing for CBC could save less than 16 bytes too.
Right - I've been musing about this in conjunction with my toying with inherently safe (but flexible) APIs - the thing that worries me about self-description of this type is what attacks it might open you up to, by meddling with the description.Perhaps the description could be the one thing with fixed protections? But that seems unfortunate.I think there is an important topic that hasn't been mentioned yet: Readability.Quite frequently when I look at code, I can't decide whether the code is secure, becausethe security properties depend on the key type. E.g. If the code uses a crypterobject it is unclear if the keys are public keys (with no authentication), an authenticatedencryption mode or even some special purpose key type with no security.So if you could set up your security requirements in code, and they could be checked at runtime similar to what you described above for keymaster? Or a more radical design change?
Maybe init Crypters with an optional Security enum - High, Standard, Deprecated, default to Standard. However your code security would be relative to how up to date keyczar is. Not sure about how to distinguishing between asymmetric and authenticated without making a separate PublicEncrypter/PrivateCrypter type, that might be a reasonable design choice as symmetric and asymmetric encryption aren't really interchangble.
It is indeed a problem, that the library itself can't aggressively push higher security levels. But having at least a tool that allows a user to get alerts foroutdated key sizes would be helpful.
Exposing something like bit levels of security could be a good strategy. While users may not understand the bit levels in isolation, NIST also has estimated dates when the crypto will be usable until for each bit level. This would help users who may not know the difference between algorithms and key sizes actually understand what they are getting out of their encryption, which is one of the goals of keyczar. We could even make an algorithm chooser based on when the crypto is going to be used until or how often the users wish to rotate keys.
It is indeed a problem, that the library itself can't aggressively push higher security levels. But having at least a tool that allows a user to get alerts foroutdated key sizes would be helpful.There's a thought, the keyczartool could check online for more aggressive definitions for security constraints.Exposing something like bit levels of security could be a good strategy. While users may not understand the bit levels in isolation, NIST also has estimated dates when the crypto will be usable until for each bit level. This would help users who may not know the difference between algorithms and key sizes actually understand what they are getting out of their encryption, which is one of the goals of keyczar. We could even make an algorithm chooser based on when the crypto is going to be used until or how often the users wish to rotate keys.I think trying semantically guide developers more with the tool makes a lot of sense.Indeed, no help is given to the developer related to when to rotate keys. The only meta data we keep, that could even give a hint, is a version number, otherwise they need to track it on their own outside of keyczar. This is an area that could use a lot of improvement.
--
Also, it doesn't have to be based on NIST recommendations. E.g. my implementation separates the key classes from the key size checker class. I currently only have aclass that implements what NIST recommends, but it would be easy to make another instance that uses the Korean Crypto standards or whatever Germany's BSI publishes.
despite calling me a clueless layman
Well,despite calling me a clueless layman
you just basically redesigned the tool I've already implemented. You only made one mistake. I'm not mixing the command line toolfor manipulating keys with the command line tool for checking them. The motivation for this is that in the former case the tool is performing sensitive operations and in thelater case it is not. In fact no output of the tool must contain key material, so that no keys get leaked when someone is using the tool for debugging and posts the results.
On Fri, Jul 26, 2013 at 7:27 PM, Jay Tuley <j...@tuley.name> wrote:
On Fri, Jul 26, 2013 at 11:30 AM, Daniel Bleichenbacher <blei...@google.com> wrote:
Well,despite calling me a clueless laymanAs Shawn clarified, my intended meaning was that you are 100% expert and 0% layman :D.
you just basically redesigned the tool I've already implemented. You only made one mistake. I'm not mixing the command line toolfor manipulating keys with the command line tool for checking them. The motivation for this is that in the former case the tool is performing sensitive operations and in thelater case it is not. In fact no output of the tool must contain key material, so that no keys get leaked when someone is using the tool for debugging and posts the results.With the current design of keyczar, to query key attributes you have to pull the entire key material into memory. Same goes for encrypted keysets, they additionally have to be decrypted as well. Should this be a design consideration for v2?That's a good point. IMO, it is a design consideration. The encrypted element of key should contain only the key material, plus perhaps a hash of the attributes if we want to ensure that such "secured" keys can't be misused by tweaking attributes. That would allow a key checking utility to examine the attributes without access to the key material.
--
--