Questions about the spec

257 views
Skip to first unread message

Matthew Green

unread,
Dec 30, 2019, 2:17:51 PM12/30/19
to age-dev
Hi everyone,
Over the summer Filippo asked me if I would be willing to write up a more formal specification for age, with an eye to submitting it as an Internet Draft. I proceeded to do nothing much for a few months and just got started over the holiday break.

As a result I have a long list of (mostly very boring) questions and comments, which Filippo asked me to send to the list. Most are very simple and dumb. The big ones are down at the bottom.

Here are the simple questions:

Q: What is the upper bound on the size of age plaintexts?

This should be dependent on the AE scheme. I've picked 2^64 bytes as an arbitrary number because it's big and feels standard. This is mostly a bound derived from AES-based schemes in the FIPS literature so technically it doesn't apply to ChaCha, but it seems convenient anyway.

NB This limit can be raised in future versions, it's only for v1.

Q: Is each recipient line (I’m calling this a "recipient subsection" in the draft) terminated with CR or LF or both?

I've been dictating that the line ends with \n (newline character, ASCII 10). If you want this to be more flexible, please let me know.

Q: What is the upper bound on the number of recipient lines?

Currently I'm placing an arbitrary bound of 1024 "recipient" lines per file. These can mix all types of ALGORITHM_ID, including scrypt and public key schemes. Please feel free to adjust this, but I think there *should* be a limit.

NB This limit can be raised in future versions, it's only for v1.

Q: can scrypt/ssh/X25519 recipient lines be mixed together in the same header?

Nothing in the spec forbids this, so I'm assuming the answer is "yes". Specifically, I can have an age file that is encrypted using both X25519 and scrypt. I realize an actual implementation might not support encrypting files with both passwords and public keys, but I don't (yet) see any rules against it in the standard.

Q: What algorithms are REQUIRED and what are OPTIONAL in the v1 standard?

In most RFCs like this, clients are REQUIRED to support some set of algorithms. Does that set include the full set (ssh-rsa, ssh-ed25519, scrypt, X25519) as well as the symmetric encryption scheme? Or only some subset of them?

Q: Can the symmetric encryption support an empty final chunk?

I'm certain that the answer is yes, but I want to call it out explicitly -- since some streaming encryption implementations will not know that they've reached EOF until after a full chunk has been encrypted. This will result in an empty final chunk of encrypted data.

Q: What is the max scrypt cost parameter?

What is the upper/lower bound on scrypt cost parameter? Feel free to specify ridiculous outer limits, but there should be some.

===
More general and possibly spec-breaking questions here:
===

ASCII armor

I think we agree that ASCII armor sucks. What's worse is that it seems like such an unnecessary hack with age, since the header is already "ASCII formatted". It would be nice if the payload (symmetric portion) of an age file could be encoded using some standard (fast) encoding scheme that isn't ASCII armor.

To do this, I would propose adding an additional (human readable) field to the header that gives options on how the payload is to be formatted. This could specify an encoding format, or just give "no encoding, raw octets" as the default option.

NB This could wait until version 2.

scrypt recipient line/subsection

I wanted to come up with a general way of parsing recipient lines, so that different algorithms
don't require different parsing. This is very much a "spec writer's problem", and to illustrate it, here is what I've written about the format of recipient lines in the header. Note that it is true for all algorithms except for scrypt:

Each subsection begins with the identifier string "->", followed by an algorithm identifier
string (as defined in {supported-algorithms}), and followed by Base-64 encoded {{RFC4648}}
cipher material. The structure of a section is as follows:

-> ALGORITHM_IDENTIFIER <Base64-encoded cipher material>\n

All four of the PKE algorithms fit within this description, but scrypt recipient lines also have a numeric cost parameter and thus a slightly different textual format. I realize this is a very stupid thing to be driving me nuts, but it's driving me nuts that this one ALGORITHM_ID has to be parsed slightly differently.

My proposed solutions are both terrible. One is to put the iteration count into the Base64 blob (ugh, and spec breaking) and the other is to add an optional numeric parameter to every single recipient algorithm, e.g.,:

ALGORITHM_IDENTIFIER <Base64-encoded cipher material> <optional numeric param or params>\n

I think I will propose the second approach. Do you have any thoughts?

Versioning and algorithm agility

Ok, so this is the big one. age has a "version number" in the first line of the header. I am trying to get a handle on what this version means. There are two ways I can think about this:

Way 1: An age version (e.g., v1, v2 in the header line) refers only to the symmetric encryption algorithm that is used to encrypt the payloads. So v1 would use ChaCha20-Poly1305, and v2 might use AES-GCM. All remaining aspects of age (e.g., header formats, supported public key algorithms etc. remain fixed by the specification).

Comment: Obviously I don't think Way 1 is a very good idea at all. Why have an upgrade mechanism that can't support real protocol upgrades?

Way 2: An age version refers to every aspect of the age standard. This means version 2 can hypothetically change the way every aspect of age works, ranging from supported PKE algorithms to constants to header formats to scrypt bounds.

Comment: This makes way more sense, since it gives generic protocol upgrade capability. However it raises a problem.

Ok. So — imagine you want to add support for FIPS algorithms in v2. The point now is that v2 could support either (a) both ChaCha and AES, or (b) only AES. It sounds like the direction you’re pushing for is (b), since there’s no agility mechanism to support (a).

My counterproposal would be that version (2) should just add an agility field for the cipher and support two different symmetric ciphers. This could also complement the Armor encoding field mentioned above.

Thoughts?

Matt Green

Filippo Valsorda

unread,
Dec 31, 2019, 7:27:15 AM12/31/19
to Matthew Green, age-dev
2019-12-30 20:17 GMT+01:00 Matthew Green <matthe...@gmail.com>:
> Hi everyone,
> Over the summer Filippo asked me if I would be willing to write up a
> more formal specification for age, with an eye to submitting it as an
> Internet Draft. I proceeded to do nothing much for a few months and
> just got started over the holiday break.
>
> As a result I have a long list of (mostly very boring) questions and
> comments, which Filippo asked me to send to the list. Most are very
> simple and dumb. The big ones are down at the bottom.

Hi Matt,

Thank you so much for offering to work on this, and for the questions
below! I think these are great discussions to have and they help the
format mature. I'm particularly happy to discuss on the mailing list
to let Ben, Jack, and the community chime in and help me see where I'm
wrong.

> Here are the simple questions:
>
> Q: What is the upper bound on the size of age plaintexts?
>
> This should be dependent on the AE scheme. I've picked 2^64 bytes as an
> arbitrary number because it's big and feels standard. This is mostly a
> bound derived from AES-based schemes in the FIPS literature so
> technically it doesn't apply to ChaCha, but it seems convenient anyway.
>
> NB This limit can be raised in future versions, it's only for v1.

Do we need one? RFC 8446, Section 5.5 suggests we are safe for at least
2^64 chunks, and a chunk is 2^16 bytes. 2^80 bytes is a yobibyte (TIL),
and high enough to be meaningless.

> Q: Is each recipient line (I’m calling this a "recipient subsection" in
> the draft) terminated with CR or LF or both?
>
> I've been dictating that the line ends with \n (newline character,
> ASCII 10). If you want this to be more flexible, please let me know.

LF exclusively. The format is NOT malleable (and if it is, it's a bug).

> Q: What is the upper bound on the number of recipient lines?
>
> Currently I'm placing an arbitrary bound of 1024 "recipient" lines per
> file. These can mix all types of ALGORITHM_ID, including scrypt and
> public key schemes. Please feel free to adjust this, but I think there
> *should* be a limit.
>
> NB This limit can be raised in future versions, it's only for v1.

Since there is no public key tag for privacy reasons, recipients have to
do trial decryption, which can get pretty expensive pretty quickly, in
particular with multiple identities configures.

My implementation has a cap of 20 recipients, which is probably too low.
I think we should RECOMMEND that implementations and/or applications
pick a reasonable limit, but we can't really pick a universal one.

> Q: can scrypt/ssh/X25519 recipient lines be mixed together in the same header?
>
> Nothing in the spec forbids this, so I'm assuming the answer is "yes".
> Specifically, I can have an age file that is encrypted using both
> X25519 and scrypt. I realize an actual implementation might not support
> encrypting files with both passwords and public keys, but I don't (yet)
> see any rules against it in the standard.

No, there's an exception: scrypt SHOULD be the only recipient, as
the passphrase provides implicit authentication.

> Note that if an scrypt recipient is present it SHOULD be the only
recipient: every recipient can tamper with the message, but with
passwords there might be a stronger expectation of authentication.

> Q: What algorithms are REQUIRED and what are OPTIONAL in the v1 standard?
>
> In most RFCs like this, clients are REQUIRED to support some set of
> algorithms. Does that set include the full set (ssh-rsa, ssh-ed25519,
> scrypt, X25519) as well as the symmetric encryption scheme? Or only
> some subset of them?

X25519 and scrypt are REQUIRED. The rest is OPTIONAL.

> Q: Can the symmetric encryption support an empty final chunk?
>
> I'm certain that the answer is yes, but I want to call it out
> explicitly -- since some streaming encryption implementations will not
> know that they've reached EOF until after a full chunk has been
> encrypted. This will result in an empty final chunk of encrypted data.

My implementation will actually never generate an empty final chunk,
preferring to hold on to a full buffer until it knows more is coming,
but it clearly has to be allowed for empty files, and I decided against
special-casing that. So yes, it's allowed.

This makes the payload malleable, but if you have the file key you can
also regenerate the nonce, so it seems inconsequential.

> Q: What is the max scrypt cost parameter?
>
> What is the upper/lower bound on scrypt cost parameter? Feel free to
> specify ridiculous outer limits, but there should be some.

On the decryption side, I have 18 as max by default, and no minimum.
(Are there scenarios where the attacker can gain something by sending a
file with an artificially low work factor?)

We should RECOMMEND that implementations make the decryption max
configurable, and set a reasonable default. I don't see a need for a
min, nor for any limits on the encryption side.

> ===
> More general and possibly spec-breaking questions here:
> ===
>
> ASCII armor
>
> I think we agree that ASCII armor sucks. What's worse is that it seems
> like such an unnecessary hack with age, since the header is already
> "ASCII formatted". It would be nice if the payload (symmetric portion)
> of an age file could be encoded using some standard (fast) encoding
> scheme that isn't ASCII armor.

Heh, that's exactly how I designed it the first time around, including
spending a whole Sunday implementing it. You can still find it in the
git history.

It turned out to be a mistake. Users, myself included, would sometimes
cut parts of the headers because there was no clear BEGIN line, and the
URL in the first line breaks on contact with some Markdown parsers.
It also caused us to worry about armor-related concerns in the header
design, and introduce armor logic in the header parser, which doesn't
make sense.

PEM sucks, but it's boring, fast (you can do base64 at linespeed),
ubiquitous, familiar, and does the job. I resisted it for 2 weeks,
complained about it here on the mailing list, and then surrendered to
it. age files are binary, not malleable, and have only one encoding.
Then you can wrap them in PEM if you need them to be 7-bit safe.

> To do this, I would propose adding an additional (human readable) field
> to the header that gives options on how the payload is to be formatted.
> This could specify an encoding format, or just give "no encoding, raw
> octets" as the default option.

This is the kind of flexibility I don't want to let proliferate.

Choosing an armor format is hard, but different armor formats don't
actually do anything meaningfully different for the user. That's the
indicator of a choice that doesn't matter, and that we should make for
the user.

Supporting multiple would only fragment the ecosystem.
Note that ssh-ed25519 already has 2 arguments. Future or custom ones
might have good reasons to have different structures too.

How about "ALGORITHM_IDENTIFIER (which I call recipient type by the way)
is followed by 0 or more space separated textual arguments, a newline,
and a base64 body"?
It's definitely Way 2.

Here's how I feel about versioning and agility: users should never be
faced with an algorithm choice. Ever.

Users should be faced with use case choices. Do you want a public key?
Do you want a passphrase? Do you neeeeed FIPS? Do you want to reuse an
SSH key? Do you want to use AWS KMS? Do you need 7-bit ASCII? Then *we*
pick the best algorithms for their use case.

Then it's a matter of figuring out if we can do with a single joint,
or if we need some orthogonality. You might want to encrypt to both an
X25519 and an SSH key, so they don't make sense as separate versions.
Enter recipients. Armor makes sense as a binary choice for all files, so
it's a wrapper.

Note that recipients are not sterile algorithm choices, like the AEAD
would be, they are the core interface of age: they wrap a file key and
can be backed by a number of user-perceivably different systems. In the
Go implementation they are already an interface, and I'm thinking of
making them pluggable in the CLI too. The spec should make it clear that
applications are allowed to define their own recipient types (maybe with
a "custom-" prefix?). Of course applications shouldn't be allowed to
define their own AEADs.

If we make a FIPS version we can call it v1F (or v2, whatever), and it
will ONLY support P-256 keys and AES. Now the user has a _semantic_
choice: FIPS or not FIPS? We make all the other choices for them based
on that. It has the benefit of not letting anyone do P-256+ChaCha or
X25519+AES, which make no sense.

As I type this I am extremely tempted to make the scrypt recipient
into a separate major version, since it can't be combined with
the other recipients, and it's not FIPS compliant anyway (?).
[i_have_altered_the_deal.gif]
Reply all
Reply to author
Forward
0 new messages