This version reflects comments received during IETF Last Call.
The changes from the -01 version are a discussion of OpenPGP's
stripping of trailing whitespace before calculating the signature,
mention of Unicode Annex 14, and some text clean-ups/clarifications .
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
If I like it I say it's mine. If I don't I say it's a fake.
--Pablo Picasso
> An updated draft which is intended to replace RFC 2646 has been sent
> in; because of the crush of last-minute submissions, there may be a
> delay before the announcement appears. During this time it is
> available at
> <ftp://ftp.pensive.org/Public/Randy/draft-gellens-format-bis-02.txt>.
>
> This version reflects comments received during IETF Last Call.
>
> The changes from the -01 version are a discussion of OpenPGP's
> stripping of trailing whitespace before calculating the signature,
> mention of Unicode Annex 14, and some text clean-ups/clarifications .
Thanks for adding the OpenPGP discussion. Given the subtleness of the
issue, I believe the document should not only mention it, but also
give normative advice on how the combination of OpenPGP and
format=flowed is to be implemented. Otherwise implementors will
ignore the problem, as they do today.
When I look at how to properly implement both OpenPGP and
format=flowed, I can't come to any other conclusion than that security
is more important than maintaining soft paragraph breaks. That means
a client should not flow OpenPGP signed data, when it present the
outcome as something that OpenPGP guarantee is what the sender sent.
If the client would flow a message, someone in transit may modify the
rendering of a message without being detected by OpenPGP.
Repeating the text from RFC 2440, saying that PGP/MIME aka RFC 3156
SHOULD be used in messaging applications, may be sufficient. Perhaps
promote it to MUST within the scope of flowed messages.
Thanks,
Simon
In particular, what is a "line"? is it:
- zero or more characters from the canonical form of a body part,
beginning either at the start of the body part
or immediately following a CRLF, and ending with a CRLF?
- zero or more characters from the encoded form of a body part,
beginning either at the start of the body part
or immediately following a CRLF, and ending with a CRLF
whether or not it is preceded by a SP?
- zero or more characters from the canonical form of a body part,
beginning either at the start of the body part
or immediately following a CRLF, and ending with a CRLF
that isn't preceded by a SP?
In a charset that isn't compatible with ASCII, are the characters
">", SP, CR, LF treated specially using the values of those characters
from that charset, or are the octet values 0x3E, 0x20, 0x0D, 0x0A,
treated specially? does the answer depend on the format in which
the message is stored? (e.g. if the message is stored in a file
on a system whose native charset is ASCII compatible, line endings
in the storage format might still be a combination of CF and/or LF,
but they will have no significance for the canonical form of the
text at all, since that will be UTF-16, EBCDIC, whatever.)
Here's a stab at defining this more succinctly and precisely.
(or perhaps, it's an indication of how much I misunderstood the
draft...)
If the format= parameter is set to "fixed" or the parameter is unspecified,
text/plain is to be interpreted per RFC 2046.
If the format= parameter is set to "flowed", text/plain is to be interpreted
per RFC 2046, with the following exceptions:
1. The sequence SP CR LF from the canonical form of the body part is to
be treated as follows:
a. if the delsp= parameter is set to "yes", the sequence SP CR LF is to
be ignored when displaying, printing, or otherwise presenting the body part.
b. if the delsp= parameter is set to "no", or the delsp= parameter is
unspecified, the sequence SP CR LF is to be treated as SP when displaying,
printing, or otherwise presenting the body part.
c. regardless of the value of the delsp= parameter, if the format=
parameter has a value of "flowed" the sequence SP CR LF is not treated
as a "line break". (this changes the rule in section 4.1.1 of RFC 2046
which states that CR and LF are forbidden outside of line breaks)
2. The sequence CR LF from the canonical form of the body part, when
immediately preceded by SP, is interpreted as a line break.
3. A "line" consists of zero or more characters which start immediately at
the beginning of the canonical form of the body part, or immediately following
a line break.
4. "Lines" in body parts for which format=flowed MAY be "wrapped" as necessary
to fit the width of the display or output medium, by ceasing the output of
characters along one horizontal row of the output device or medium, and
continuing the output of subsequent characters along the next horizontal row
of the output device or medium. Such wrapping SHOULD, when possible, be done
when a character sequence that is to be interpreted as SP is detected (either
a SP character, or if delsp=no or is unspecified, the sequence SP CR LF)
5. One or more ">" characters at the start of a line are taken as an indicator
that the text on that line are a quotation. The greater number of ">"
characters, the greater the "depth" of the quotation.
6. User agents MAY display or present quotations using leading ">" characters
or in any other manner which is suitable for the output device or medium. If
">" characters are used to indicate quotations for display or presentation,
the number of ">" characters displayed SHOULD equal the number of ">"
characters at the beginning of the line in the canonical form, if the display
reasonably permits this. If some other means is used to indicate quotations
in the display or output medium, different levels of quotations SHOULD be
displayed or presented differently, so they can be distinguished by the
recipient.
7. Since the ">" notation applies to the entire "line" (as defined in #3 above),
when a quotation line is "wrapped", the entire line SHOULD be presented as if
it were a single quotation (and all at the same level of depth), even if the
line is "wrapped" for display or presentation purposes.
8. The vertical spacing between output display rows SHOULD be the same between
rows of characters within a "wrapped" line as between separate lines.
9. In all of the rules in this section, the characters CR LF SP and ">"
have code values as defined by the charset parameter, even if those values
do not correspond to those in ASCII.
> c. regardless of the value of the delsp= parameter, if the format=
> parameter has a value of "flowed" the sequence SP CR LF is not treated
> as a "line break". (this changes the rule in section 4.1.1 of RFC 2046
> which states that CR and LF are forbidden outside of line breaks)
>
> 2. The sequence CR LF from the canonical form of the body part, when
> NOT immediately preceded by SP, is interpreted as a line break.
^^^ add this here
(I hate it when I do that...)
> 5. One or more ">" characters at the start of a line are taken as an indicator
> that the text on that line are a quotation.
^^ should be "is"
> 7. Since the ">" notation applies to the entire "line" (as defined in #3 above),
> when a quotation line is "wrapped", the entire line SHOULD be presented as if
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ delete this part (it's redundant)
> it were a single quotation (and all at the same level of depth), even if the
> line is "wrapped" for display or presentation purposes.
Keith
> if the format= parameter has a value of "flowed" the sequence SP CR LF
> is not treated as a "line break".
>
> 3. A "line" consists of zero or more characters which start
> immediately at the beginning of the canonical form of the body part,
> or immediately following a line break.
>
> 4. "Lines" in body parts for which format=flowed MAY be "wrapped" as
> necessary to fit the width of the display or output medium,
Your description misses one subtlety: while one or more flowed lines and
a following fixed line constitute a unit that invites re-wrapping, a
fixed line not preceeded by a flowed line does not invite re-wrapping.
Also, I think "paragraph" is a more intuitive term for the concept
you're calling "line". It's useful to let "line" continue to mean what
it always has meant: a sequence of characters terminated by CR LF.
You raise a very good point about the charset. The grammar in section 7
allows only ASCII characters, even though the delsp parameter is being
introduced for the express purpose of better supporting non-ASCII
scripts. I think that can easily be fixed by redoing the non-sp
production:
non-sp = <any character except NUL, CR, LF, SP>
There are some other issues with the grammar. First, I don't see the
reason for distinguishing between unquoted lines and quoted lines.
Every line has a quote depth, which might happen to be zero, but lines
with zero quote depth are not treated specially. The grammar currently
says:
flowed-line = flow-qt / flow-unqt
flow-qt = quote [stuffing] *text-char 1*SP CRLF
flow-unqt = [stuffing] *text-char 1*SP CRLF
quote = 1*">"
Why not simply:
flowed-line = quote [stuffing] *text-char 1*SP CRLF
quote = *">"
In any case, the grammar for flowed-line is ambiguous: the spaces before
the last space could match either 1*SP or *text-char. It's only the
last space that's special; the others are arbitrary text that ought to
match *text-char. So the production should be:
flowed-line = quote [stuffing] *text-char SP CRLF
(Similarly, section 5.2 makes the rule sound more complex than it really
is when it says "If the line ends in one or more spaces, the line is
flowed." The rule is really just "If the line ends in a space, the line
is flowed.")
Even after that last adjustment, the production is still ambiguous. A
space at the beginning of a line could match [stuffing] or *text-char.
A line that does not begining with the space could still match stuffing,
because stuffing is defined as [SP]. The decoder needs to be able to
distinguish between stuffed lines and unstuffed lines, because it's
supposed to display them differently. Also, an initial greater-than
sign could match quote or *text-char. The decoder needs to determine
the quote depth unambiguously. This will do the trick:
flowed-line = quote (stuffing stuffed / unstuffed) SP CRLF
stuffed = *text-char
unstuffed = non-sp-quote *text-char
quote = *">"
stuffing = SP
non-sp-quote = <any character except NUL, CR, LF, SP, ">">
I think that is unambiguous. The productions for fixed-line have
similar issues.
The grammar gives a name to the space at the beginning of a line because
it is in some sense not part of the regular text. Now that the delsp
parameter is introduced, there should be a name for the space at the end
of a line, because it's not part of the regular text when delsp=yes.
And there should be a name for the regular text itself.
Here's a stab at an unambiguous grammar with a name for every noteworthy
syntactic unit:
flowed-body = * ( paragraph / fixed-line )
paragraph = 1*flowed-line fixed-line
flowed-line = quote (stuffing stuffed-flowed / unstuffed-flowed) soft CRLF
fixed-line = quote (stuffing stuffed-fixed / unstuffed-fixed) CRLF
stuffed-flowed = *text-char
unstuffed-flowed = non-sp-quote *text-char
stuffed-fixed = [*text-char non-sp]
unstuffed-fixed = non-sp-quote [*text-char non-sp]
quote = *">"
stuffing = SP
soft = SP
non-sp-quote = <any character except NUL, CR, LF, SP, ">">
non-sp = non-sp-quote / ">"
text-char = non-sp / SP
There is an additional rule that is impossible to express in the
grammar: a flowed line must have the same quote depth as the next line.
A flowed line that breaks this rule (has a quote depth different from
the next line) is to be intepreted as if it were a fixed line.
AMC
I would rather treat a "flowed line" as a single entity than as
multiple entities. But you're right in that a "fixed line" and "flowed line"
should be presented differently, to the extent possible.
I also left out space stuffing.
> Also, I think "paragraph" is a more intuitive term for the concept
> you're calling "line".
I avoided using the term "paragraph" because that has semantics other
than for presentation.
Personally, I don't think the grammar helps much. Most readers will
pay less attention to the grammar than the text. It's more important
to clean up the text than the grammar. But if the text were cleaned
up, cleaning up the grammar would also be useful.
e.g.
text-body = *( fixed-line / flowed-line )
( fixed-line / flowed-line / fragment )
fixed-line = [quote] *text-char CRLF
flowed-line = [stuffing] [quote] 1*( *text-char SP CRLF )
*text-char text-char-no-sp CRLF
fragment = [quote] *text-char
...
I don't like using the term "soft" for the SP CR LF sequence in a flowed
line because it's too easily confused with Q-P soft line breaks,
even though they have nothing to do with one another.
Also, I'm concerned that people will implement format=flowed differently
for objects depending on the content-transfer-encoding, when the
content-transfer-encoding should be irrelevant.
> There is an additional rule that is impossible to express in the
> grammar: a flowed line must have the same quote depth as the next line.
> A flowed line that breaks this rule (has a quote depth different from
> the next line) is to be intepreted as if it were a fixed line.
ah yes, another thing that I missed.
Keith
> I have to say that I find both RFC 2646 and this draft fairly opaque and
> somewhat ambiguous.
> In particular, what is a "line"? is it:
> - zero or more characters from the canonical form of a body part,
> beginning either at the start of the body part
> or immediately following a CRLF, and ending with a CRLF?
> - zero or more characters from the encoded form of a body part,
> beginning either at the start of the body part
> or immediately following a CRLF, and ending with a CRLF
> whether or not it is preceded by a SP?
> - zero or more characters from the canonical form of a body part,
> beginning either at the start of the body part
> or immediately following a CRLF, and ending with a CRLF
> that isn't preceded by a SP?
I'll leave this for Randy to answer.
> In a charset that isn't compatible with ASCII, are the characters
> ">", SP, CR, LF treated specially using the values of those characters
> from that charset, or are the octet values 0x3E, 0x20, 0x0D, 0x0A,
> treated specially?
If a charset isn't compatible with ASCII insofar as CR and LF are concerned it
cannot be used with the text top-level type. (RFC 2046 section 4.1.1.) So this
is a vacuous concern.
As for ">" and SP, the usual way this is handled is for the specification
to work with those characters regardless of how they are represented.
(You won't find many charsets that muck with space, but there are some
that are modal and depending on mode may use ">" for some other purpose.)
I supposed you could define a charset without a ">" character, or even
one without space, but as a practical concern this seems a bit farfetched.
> does the answer depend on the format in which
> the message is stored? (e.g. if the message is stored in a file
> on a system whose native charset is ASCII compatible, line endings
> in the storage format might still be a combination of CF and/or LF,
> but they will have no significance for the canonical form of the
> text at all, since that will be UTF-16, EBCDIC, whatever.)
This was decided eons ago. We define things in terms of the canonical form of
the data, and in the canonical form of MIME text line breaks are CRLF and CRLF
is always represented the same way. If a different representation is used on
some system (EBCDIC, counted records, whatever) it is for that system ot figure
out how to adapt the specification accordingly. Trying to account for all the
vagarities of local storage is a sure path to madness.
I guess I have no objection to this reformulation, but OTOH it really isn't
necessary to restate that we're dealing with the canonical form of the data.
Ned
> I avoided using the term "paragraph" because that has semantics other
> than for presentation.
Okay, but I think we need to keep the term "line" referring to its
intuitive and customary sense, so that we can say things like "each
line may begin with one or more quote characters" and "a space at the
beginning or end of a line (after quoting has been removed) acts as a
flag".
I think it would be confusing, therefore, to use a term like "flowed
line" to refer to something that spans multiple lines. I think the
current meaning of "flowed line" (a line that ends in a space) is more
intuitive.
If "paragraph" needs to be avoided, perhaps a new term could be
invented, like "snake" or "flow-group".
> I don't like using the term "soft" for the SP CR LF sequence in a
> flowed line because it's too easily confused with Q-P soft line
> breaks, even though they have nothing to do with one another.
I was using "soft" to refer just to the SP, not the CR LF. I was
calling the single spaces at the beginning and end of the line
"stuffing" and "soft", respectively. Perhaps better names would be
"stuff" and "flow".
> I'm concerned that people will implement format=flowed differently
> for objects depending on the content-transfer-encoding, when the
> content-transfer-encoding should be irrelevant.
That ought to follow from the MIME architecture (format=flowed is a
parameter of the content-type, therefore the content-transfer-encoding
is indeed irrelevant), but it wouldn't hurt to mention it explicitly.
> It's more important to clean up the text than the grammar.
Try this: Forget everything you've read in the draft and just read this
excerpt (45 lines):
If the first character of a line is a quote mark (">"), the line is
considered to be quoted (see section 5.5). Logically, all quote
marks are counted and deleted, resulting in a line with a non-zero
quote depth, and content. (The agent is of course free to display
the content with quote marks or excerpt bars or anything else.)
Logically, this test for quoted lines is done before any other tests
(that is, before checking for space-stuffed and flowed).
If the first character of a line is a space, the line has been
space-stuffed (see section 5.4). Logically, this leading space is
deleted before examining the line further (that is, before checking
for flowed).
If the line ends in one or more spaces, the line is flowed.
Otherwise it is fixed.
If the line is flowed and DelSp is "yes", the trailing space
immediately prior to the line's CRLF is logically deleted. If the
DelSp parameter is "no" (or not specified, or set to an unrecognized
value), the trailing space is not deleted.
Any remaining trailing spaces are part of the line's content, but
the CRLF of a soft line break is not.
A series of one or more flowed lines followed by one fixed line is
considered a paragraph, and MAY be flowed (wrapped and unwrapped) as
appropriate on display and in the construction of new messages (see
section 5.5).
A line consisting of one or more spaces (after deleting a stuffed
space) is considered a flowed line.
An empty line (just a CRLF) is a fixed line.
There is a convention in Usenet news of using "-- " as the separator
line between the body and the signature of a message. When
generating a Format=Flowed message containing a Usenet-style
separator before the signature, the separator line is sent as-is.
This is a special case; an (optionally quoted) line consisting of
DASH DASH SP is not considered flowed.
whenever two adjacent lines have different quote depths, senders
should ensure that the earlier line is fixed (does not end in
a space), and receivers should treat the earlier line as fixed
regardless of whether it ends with a space.
That's just sections 5.2 and 5.3 and one paragraph from 5.5. I find
that if I focus only on this 45-line excerpt and ignore the rest, I
completely understand format=flowed and delsp=yes. Do you agree (except
for the content-transfer-encoding concern)?
Perhaps it might help to move this material earlier. Imagine swapping
sections 5.1 (Generating format=flowed) and 5.2 (Interpreting
format=flowed). I find it much easier to understand a format first from
a decoder's point of view ("this is what can appear, and this is what
it means"). Then, after I understand the format, I can more easily
understand rules/recommendations aimed at encoders.
I think it might also help to include all the essentials of the decoding
algorithm in the interpreting section. Right now, two details are
omitted and sprung on the reader in later sections: the Usenet sig
exception and the changing-quote-depth exception. The interpreting
section could include them concisely and provide forward references,
just like it already does for quoting and space-stuffing.
It's easier to follow an explanation if you have a preview of where
it's heading. Consider an interpreting section that began with such a
preview and then gave all the steps:
An interpreter of format=flowed text processes the text line by
line. ("Line" refers to lines of the text/plain data, because
format=flowed is a parameter of text/plain. "Line" does not refer
to lines of any quoted-printable, base64, or other encoding of the
text/plain data.) Each line can have characters removed from the
beginning and/or end, and each line is tagged with a quote-depth (a
non-negative integer) and a flow-type ("fixed" or "flowed"). The
tags are used to group lines into paragraphs/snakes/flow-groups that
can be re-wrapped for display or construction of other messages.
For each line (in order), the following steps are applied (in
order):
1. All quote marks (">") are removed from the beginning of the line
and counted; the count becomes the quote-depth of the line. [other
remarks] [forward reference to quoting section]
2. If the line is not the first line, and if its quote-depth differs
from the quote-depth of the previous line, then the previous line
is expected to have a flow-type of "fixed". In properly generated
text, that will be true; if the previous line's flow-type is
"flowed" then the text was generated improperly. In that case,
reset the flow-type of the previous line to "fixed", and re-do
step 7 for that line. [forward reference to quoting section]
[[ Suggestion: Perhaps, instead of overriding the
flow-type, the line should be left as flowed, but the
paragraph/snake/flow-group is nevertheless terminated, resulting
in an improper paragraph/snake/flow-group that does not end
with a fixed line. This would yield different treatment for a
single flowed line followed by a change in quote-depth. The
existing rules change it to a single fixed line, which would not
be re-wrapped. But clearly it was intended to be re-wrappable.
The suggested new rule would allow it to be re-wrapped. The
existing draft contains the sentence "the change in quote depth
ends the paragraph", which is inconsistent with the rest of the
existing changing-quote-depth rule, because when a single flowed
line is changed to fixed, there is no paragraph. But that
sentence would be consistent with the suggested new rule. ]]
3. If the line is "-- " (dash dash space) then set the flow-type
to "fixed" and go on to the next line (do not proceed to step 4).
[forward reference to Usenet sig section]
[[ Suggestion: Maybe a sig line should also terminate a
paragraph/snake/flow-group, same as a change in quote-depth.
Otherwise a sig line could get re-wrapped so that it no longer
appears on a line by itself. ]]
4. If the line begins with a space, the space is removed. [other
remarks] [forward reference to space-stuffing section]
5. If the line ends with a space then set the flow-type to "flowed",
otherwise set it to "fixed".
6. If the line is flowed (that is, it's flow-type is "flowed") and
delsp is "yes" then remove the space at the end of the line.
7. If the line is flowed then it is part of a growing
paragraph/snake/flow-group. If the line is fixed and is preceeded
by a flowed line, then the fixed line is the last line of the
paragraph/snake/flow-group, which MAY be wrapped and unwrapped as
appropriate for display and construction of new messages. If the
line is fixed and is not preceeded by a flowed line, then it is not
part of a paragraph/snake/flow-group. [forward reference] [Unicode
reference].
[[ Suggestion: The current draft says nothing about what
a decoder should do if the very last line is flowed. I
suggest that end-of-input be yet another thing (along with
quote-depth changes and sig lines) that can improperly terminate
a paragraph/snake/flow-group. ]]
Note that multiple consecutive spaces have no special significance.
Only the single spaces at the beginning and end of a line have a
special meaning; any others are simply part of the line's content.
Steps 4 and 6 delete at most one space each.
Would an early section along those lines be at all helpful?
By the way, I wonder what purpose section 5.7 serves. It looks like
just a very verbose way of saying "A line is quoted iff it begins with a
quote indicator. A line is flowed iff it ends with a space." I don't
see the point of the first statement, because every quote depth needs
to be displayed differently--I see nothing special about the boundary
between zero and non-zero. The second statement is imprecise because it
neglects the Usenet sig exception. The author might want to consider
simply removing section 5.7.
AMC
> Thanks for adding the OpenPGP discussion. Given the subtleness of the
> issue, I believe the document should not only mention it, but also
> give normative advice on how the combination of OpenPGP and
> format=flowed is to be implemented. Otherwise implementors will
> ignore the problem, as they do today.
>
> When I look at how to properly implement both OpenPGP and
> format=flowed, I can't come to any other conclusion than that security
> is more important than maintaining soft paragraph breaks. That means
> a client should not flow OpenPGP signed data, when it present the
> outcome as something that OpenPGP guarantee is what the sender sent.
> If the client would flow a message, someone in transit may modify the
> rendering of a message without being detected by OpenPGP.
>
> Repeating the text from RFC 2440, saying that PGP/MIME aka RFC 3156
> SHOULD be used in messaging applications, may be sufficient. Perhaps
> promote it to MUST within the scope of flowed messages.
The current text says to use quoted-printable to protect the trailing
spaces so that the signature is calculated on the on-the-wire format:
5.6. Digital Signatures and Encryption
If a message is digitally signed or encrypted it is important that
cryptographic processing use the on-the-wire Format=Flowed format.
That is, during generation the message SHOULD be prepared for
transmission, including addition of soft line breaks,
space-stuffing, and [Quoted-Printable] encoding (to protect soft
line breaks) before being digitally signed or encrypted; similarly,
on receipt the message SHOULD have the signature verified or be
decrypted before [Quoted-Printable] decoding and removal of stuffed
spaces, soft line breaks and quote marks, and reflowing.
Note that [OpenPGP] specifies (in section 7.1) that "any trailing
whitespace (spaces, and tabs, 0x09) at the end of any line is
ignored when the cleartext signature is calculated."
Thus it would be possible to add, in transit, a format=flowed header
to a regular, format=fixed vanilla PGP (not PGP/MIME) signed message
and add arbitrary trailing space characters without this addition
being detected. This would change the rendering of the article by a
client which supported format=flowed.
In thinking about this some more, I'm not sure that the extra text on
OpenPGP is really needed, since if the text above is followed it
shouldn't be an issue.
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
Computers ... are not designed, as we are, for ambiguity. --Thomas
The problem is that the above procedure is flawed, so it is not always
possible to use it in the real world. There are several problems with
that text, I believe the two major issues are:
1) It leads to invalid MIME messages on the wire. After following the
above procedure, what is sent is a message marked with CTE qp but
only the contents of the inline PGP message actually follow the QP
rules. The PGP armor itself do not follow the QP rules. More
precisely, the base64 '=' character that is always part of the
final CRC value would be a unescaped '=' character. So transit or
recipient systems might reject it due to invalid QP, and IMAP
servers might refuse to search within the message, or refuse it
altogether because it isn't valid MIME.
This problem could be solved by performing QP encoding on the PGP
armor itself as well, after signing the QP encoded message body.
The text above does not suggest that (there is even a SHOULD
arguing otherwise), and existing clients do not appear to perform
QP on the PGP armor (at least not those I'm familiar with).
Furthermore, because existing clients do not QP encode to the PGP
armor, I doubt existing clients would understand they have to first
QP decode the PGP armor, and then pass the resulting QP decoded PGP
armor plus QP encoded body to the PGP implementation. Still,
adding such a clarification would be a way forward.
2) Non-ASCII text within the armor (e.g., 'Comments:') do exist, some
translated products even add such strings by default. If the
charset used for the armor isn't the same as the message body, it
is not even possible to QP for the armor. I don't see a solution
to this problem. It has been argued in the past that the PGP armor
cannot contain non-ASCII, but I believe it remains to be clarified
since it is frequent in the wild.
Saying that cryptographic processing is something that should be done,
on the generating side, after all 822 and MIME processing, and on the
receiving side, before all 822 and MIME processing, does not work. It
leads to the above problems.
Before we loose sight of the big picture, I'd like to repeat that none
of these issues apply if PGP/MIME is used, which is what RFC 2440 says
SHOULD be used. (Well, at least, the problems do not _necessarily_
apply; PGP/MIME can be implemented incorrectly that leads to similar
problems.) Unless it can be demonstrated that implementing PGP and
format=flowed is possible without breaking the MIME standard, I still
maintain that qualifying the SHOULD to a MUST, within the scope of
format=flowed messages, is better.
Thanks,
Simon
> In <iluy8uj...@latte.josefsson.org> Simon Josefsson <simon+i...@josefsson.org> writes:
>
>>The problem is that the above procedure is flawed, so it is not always
>>possible to use it in the real world. There are several problems with
>>that text, I believe the two major issues are:
>
>>1) It leads to invalid MIME messages on the wire. After following the
>> above procedure, what is sent is a message marked with CTE qp but
>> only the contents of the inline PGP message actually follow the QP
>> rules. The PGP armor itself do not follow the QP rules.
>
> No, I don't think that is right. There are basically two ways of signing
> PGP messages:
>
> 1. The "usual" way. You construct a text A, pass it through a PGP signing
> engine to get text B (it will do nasty things like changing every initial
> "---" on your lines to "- --"). Text B will contain the usual PGP
> wrappers, your text, and the PGP sig.
>
> You then email Text B, and if you can persuade you mailer to encode it as
> 7bit, then any trailing spaces (which were not included in the PGP hash,
> but are still present in your text) will be protected against munging en
> route. Note that it is your original text that is signed, not the QP
> version.
I agree with you that it isn't right, but it is what the draft says.
That's my point. The text was quoted in my replace, but here is the
text again:
5.6. Digital Signatures and Encryption
If a message is digitally signed or encrypted it is important that
cryptographic processing use the on-the-wire Format=Flowed format.
That is, during generation the message SHOULD be prepared for
transmission, including addition of soft line breaks,
space-stuffing, and [Quoted-Printable] encoding (to protect soft
line breaks) before being digitally signed or encrypted; similarly,
on receipt the message SHOULD have the signature verified or be
decrypted before [Quoted-Printable] decoding and removal of stuffed
spaces, soft line breaks and quote marks, and reflowing.
The text say that MUAs SHOLD do things differently from what you
describe.
FWIW, your approach has the security problem that started this thread;
it makes it possible for someone to in transit add trailing SPC to the
PGP message, and add a format=flowed tag to the headers, without being
detected.
I haven't seen a way to support both inline PGP and format=flowed
without creating one problem or another.
Btw, it is not clear if the paragraph apply to PGP/MIME too. Because
it uses generic terminology, I assume it apply to all cryptographic
formats. RFC 3156 already discuss these issues, repeating them again
with normative text might just be confusing. And my claim is that,
for inline PGP, the normative text doesn't work.
I'd love to be proved wrong on this, since currently I'm stuck in my
implementation of inline PGP and format=flowed, because of my
perceived problems.
I thing Gellens is right in that the there are more modes than what
you describe. There are at least 4 incompatible ways to do inline
PGP; QP before, QP after sign (but don't touch PGP armor), QP after
sign (including PGP armor), don't QP at all. I have seen all of them
in the wild. Incidentally, my experiences is that the last one works
best, although it breaks the most specifications. The format=flowed
draft says only one of the SHOULD be used even though it violate MIME.
> 2. Use RFC 3156 (PGP/MIME).
Right. This is what I believe should be used instead.
> Just for the hell of it, I shall sign this message both ways (so the outer
> signature will actually sign the inner one).
It wasn't a good example, since you didn't use QP. For pure 7-bit
messages, there are no ambiguities at all, as far as I know. Well,
except that you might prefer to use QP to protect leading '-' from
hash escaping ('- -'), otherwise RFC 1991 implementation cannot
understand the message. This is also not written down anywhere, I
believe.
Regards,
Simon
--On Friday, November 14, 2003 22:46 +0100 Simon Josefsson
<simon+i...@josefsson.org> wrote:
| I haven't seen a way to support both inline PGP and format=flowed
| without creating one problem or another.
That's been my experience and in fact I deliberately turn off format=flowed
when generating inline PGP signed messages. It remains on for PGP/MIME
messages (which is the default for new users).
| I'd love to be proved wrong on this, since currently I'm stuck in my
| implementation of inline PGP and format=flowed, because of my
| perceived problems.
|
| I thing Gellens is right in that the there are more modes than what
| you describe. There are at least 4 incompatible ways to do inline
| PGP; QP before, QP after sign (but don't touch PGP armor), QP after
| sign (including PGP armor), don't QP at all. I have seen all of them
| in the wild. Incidentally, my experiences is that the last one works
| best, although it breaks the most specifications. The format=flowed
| draft says only one of the SHOULD be used even though it violate MIME.
I think you need to look at this problem from the recipients end to
actually decide what it is the sender needs to do. In particular its
important that the signature verify for both types of recipients: those
that are format=flowed aware and those that aren't. The later type
effectively determines the order of processing for flowing, PGP signing and
CTE:
A non-format=flowed aware client, when processing a received format=flowed
message, will first do CTE decoding and then display the message.
Verification of the signature will then be done on the unflowed text (i.e.
with trailing spaces and CRLFs etc).
Thus a format=flowed aware client that generates a message would first wrap
the text doing flow, then do the PGP signature, and then do CTE encoding.
That is the only way to work with non-aware clients.
That means a format=flowed aware client, when processing a received
format=flowed message, will first do CTE decoding. At that point it will
need to check for inline PGP and verify, then do flowing. Alternatively, it
will do CTE decode, flowing and display to the user. But then if the user
wants to verify, it will have to go back to the CTE decoded but not flowed
version.
All-in-all this is a mess. Frankly I think its better to state that
format=flowed SHOULD NOT be done on inline PGP signatures, instead PGP/MIME
SHOULD be used instead.
--
Cyrus Daboo
> Hi Simon,
>
> --On Friday, November 14, 2003 22:46 +0100 Simon Josefsson
> <simon+i...@josefsson.org> wrote:
>
> | I haven't seen a way to support both inline PGP and format=flowed
> | without creating one problem or another.
>
> That's been my experience and in fact I deliberately turn off
> format=flowed when generating inline PGP signed messages. It remains
> on for PGP/MIME messages (which is the default for new users).
I'm coming to the same conclusion myself, but it would be nice if
people implementing this aren't forced to handle the same problems.
One way to achieve that goal is to include text in the document.
> | I'd love to be proved wrong on this, since currently I'm stuck in my
> | implementation of inline PGP and format=flowed, because of my
> | perceived problems.
> |
> | I thing Gellens is right in that the there are more modes than what
> | you describe. There are at least 4 incompatible ways to do inline
> | PGP; QP before, QP after sign (but don't touch PGP armor), QP after
> | sign (including PGP armor), don't QP at all. I have seen all of them
> | in the wild. Incidentally, my experiences is that the last one works
> | best, although it breaks the most specifications. The format=flowed
> | draft says only one of the SHOULD be used even though it violate MIME.
>
> I think you need to look at this problem from the recipients end to
> actually decide what it is the sender needs to do. In particular its
> important that the signature verify for both types of recipients:
> those that are format=flowed aware and those that aren't. The later
> type effectively determines the order of processing for flowing, PGP
> signing and CTE:
>
> A non-format=flowed aware client, when processing a received
> format=flowed message, will first do CTE decoding and then display
> the message. Verification of the signature will then be done on the
> unflowed text (i.e. with trailing spaces and CRLFs etc).
Following the current format=flowed specification, the client would
encounter errors during QP decode, because the '=' within the base64
encoded PGP armor is not QP encoded.
> Thus a format=flowed aware client that generates a message would first
> wrap the text doing flow, then do the PGP signature, and then do CTE
> encoding. That is the only way to work with non-aware clients.
It is not the only way that would work, here's another that I believe
is better: ('PGP aware' means aware of inline PGP, and 'format=flowed
aware' imply 'MIME-aware')
Generate: Do CTE encoding of message, sign, and CTE encode PGP armor.
Receive (non-MIME, non-PGP aware): display.
Receive (MIME-aware, non-PGP aware): CTE decode, display.
Receive (format=flowed aware, non-PGP aware): CTE decode,
format=flowed process, display.
Receive (non-MIME aware, PGP aware): cannot detect valid PGP message,
display.
Receive (MIME-aware, PGP aware): CTE decode PGP armor, verify, CTE
body, display.
Receive (format=flowed aware, PGP aware): CTE decode PGP armor,
verify, CTE decode body, format=flowed process, display.
Comparing your and my approaches: your is vulnerable to the
man-in-the-middle attack inserting trailing SPC and adding
format=flowed header, thus altering the rendering of the message. The
PGP signature would not detect the added trailing SPC for you. Both
approaches break for non-MIME aware, but PGP aware, clients. Your
approach cannot interoperate with RFC 1991 implementions, but my can
(by QP encoding '^-' in the body). Neither of our approaches can
handle non-ASCII characters within the PGP armor encoded in a charset
that isn't compatible with the body charset.
Using UTF-8 and 8-bit (no QP) is another option. The advantage is
that it is possible pipe the incoming RFC 2822 article directly into
the OpenPGP implementation, so it would work for non-MIME aware, but
PGP aware, clients. It also support non-ASCII in PGP armor. The
(only?) disadvantage is that trailing whitespace may be lost in
transit, invalidating the signature. But it still works much better
in practice than any of the QP variants.
Regards,
Simon
--On Saturday, November 15, 2003 2:42 +0100 Simon Josefsson
<j...@extundo.com> wrote:
| Receive (MIME-aware, PGP aware): CTE decode PGP armor, verify, CTE
| body, display.
Sorry - but this is a non-starter - you are expecting existing deployed
clients to magically change their behaviour to cope with format=flowed
inline signed messages. The only way to ensure format=flowed inline signed
messages work with existing clients is to use the procedure I outlined (or
just not use format=flowed). Yes it is vulnerable to a man-in-the-middle
attack but that is true for anything that does not also sign the message
headers. A man-in-the-middle that changes text/plain to text/html will
result in pretty much the same display 'corruption' without the need to
even change the body content, so its not a problem specific to
format=flowed.
--
Cyrus Daboo
> Hi Simon,
>
> --On Saturday, November 15, 2003 2:42 +0100 Simon Josefsson
> <j...@extundo.com> wrote:
>
> | Receive (MIME-aware, PGP aware): CTE decode PGP armor, verify, CTE
> | body, display.
>
> Sorry - but this is a non-starter - you are expecting existing
> deployed clients to magically change their behaviour to cope with
> format=flowed inline signed messages.
No, I expect clients to follow specifications, which I think is about
all we can hope for. If deployed clients follow section 4.6 of RFC
2646 they already treat inline format=flowed differently. They would
also be incompatible with both our proposals. I don't believe many
clients follow the recommendation in the RFC though. When fixing the
RFC text, we might as well evaluate all options available. If all
deployed clients implemented format=flowed inline PGP in the same way,
and that way worked, then of course we should change to that method.
But my understanding is that this isn't the case.
> The only way to ensure format=flowed inline signed messages work
> with existing clients is to use the procedure I outlined
It would not work with clients that follow RFC 2646. It would not
work with any client that compute the OpenPGP signature over the QP
encoded ("wire") data. Are there no such clients? I believe there
are gateways and some plugins that fall in the latter category. But
if I'm mistaken, it may be worthwhile to add a description of your
proposal to 2646bis, to document earlier practices. Of course, the
recommended solution should still be PGP/MIME.
Regards,
Simon
>I agree with you that it isn't right, but it is what the draft says.
>That's my point. The text was quoted in my replace, but here is the
>text again:
>5.6. Digital Signatures and Encryption
> If a message is digitally signed or encrypted it is important that
> cryptographic processing use the on-the-wire Format=Flowed format.
> That is, during generation the message SHOULD be prepared for
> transmission, including addition of soft line breaks,
> space-stuffing, and [Quoted-Printable] encoding (to protect soft
> line breaks) before being digitally signed or encrypted; similarly,
> on receipt the message SHOULD have the signature verified or be
> decrypted before [Quoted-Printable] decoding and removal of stuffed
> spaces, soft line breaks and quote marks, and reflowing.
>The text say that MUAs SHOLD do things differently from what you
>describe.
Yes. It is clear that the text in the present draft is plain wrong; but I
had been reading it as indicating the use of PGP/MIME, and if you add a
few words about PGP/MIME to it, then it makes sense again.
>FWIW, your approach has the security problem that started this thread;
>it makes it possible for someone to in transit add trailing SPC to the
>PGP message, and add a format=flowed tag to the headers, without being
>detected.
I don't think that is all that important. There is not much you can do by
way of malicious alteration to a message if all you can do is to change
the where the line breaks are perceived to be. Yes, it would be better to
ensure all the trailing spaces get included in the signature, and hence
PGP/MIME is strongly to be recommended. But people are undoubtedly going
to try to use inline PGP whatever we say, in which case a warning of its
limitations is all that is needed in the document.
>I haven't seen a way to support both inline PGP and format=flowed
>without creating one problem or another.
Well one possibility is to turn off textmode before signing (but people on
Unix systems will then have to edit in explicit CRs first). The bad news
is that all PGP systems turn off clearsigning when textmode is off.
Here is a signed text with some trailing whitespace in it (and it will be
proof against systems that munge WS too).
-----BEGIN PGP MESSAGE-----
Version: PGPfreeware 5.0i for non-commercial use
MessageID: sztuKj80+CHXhMNFdIs+RvEXUiwG4BTc
owHrZKhmZmWw37Frydq4V746oh9XMjJ6izMbtL5bvEb++h6OIyULuNRL1MrOOrwS
dNftFQiODP9xWi0ojHsdu0lqlMsywx/HPc7e6H0R+H1u4Nrpr6/tz1q/q4i3+VYS
u7Ao3ydftZCAgKDFMR4cv87+cfdXuCI/7ZKDstbnF6EiPDXaG5essU1iLS5Jycxj
AIKQjFSFnMy8VIXk/LySxMy8YoXUisTkkpxKBVOFkqLETKBcukJxQWJyarECCPBy
AQA=
=t64i
-----END PGP MESSAGE-----
But it is not much use to those who are not pgp-aware :-( .
>> 2. Use RFC 3156 (PGP/MIME).
>Right. This is what I believe should be used instead.
Indeed so.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
>A non-format=flowed aware client, when processing a received format=flowed
>message, will first do CTE decoding and then display the message.
>Verification of the signature will then be done on the unflowed text (i.e.
>with trailing spaces and CRLFs etc).
No, that will not do, because mail clients, generally speaking, do not do
PGP verification (whether inline or PGP/MIME). What usually happens is
that the user takes the signed text displayed on his screen, and verifies
it by a separate process. This usually starts by copying it all to the
Clipboard, and then passing the Clipboard to some verifier.
Which raises another issue. Mail clients which support format=flowed
SHOULD put the original (unflowed) version of the text onto the Clipboard.
IOW, you should be able to take the displayed text, put it into the
clipboard (or dran'n'drop it), and plonk it into another (format=flowed
aware) window, and it should automagically appear reflowed to suit the new
window.
>Thus a format=flowed aware client that generates a message would first wrap
>the text doing flow, then do the PGP signature, and then do CTE encoding.
>That is the only way to work with non-aware clients.
Yes, except that it should still work, modulo malicious changes to the
line endings, even if the CTE stage is omitted.
--On Monday, November 17, 2003 12:24 +0000 Charles Lindsey
<c...@clerew.man.ac.uk> wrote:
|> A non-format=flowed aware client, when processing a received
|> format=flowed message, will first do CTE decoding and then display the
|> message. Verification of the signature will then be done on the
|> unflowed text (i.e. with trailing spaces and CRLFs etc).
|
| No, that will not do, because mail clients, generally speaking, do not do
| PGP verification (whether inline or PGP/MIME). What usually happens is
| that the user takes the signed text displayed on his screen, and verifies
| it by a separate process. This usually starts by copying it all to the
| Clipboard, and then passing the Clipboard to some verifier.
OK - so that just goes to prove that there is no way to construct an
inline-signed format=flowed messages such that it will be verifiable by
both format=flowed aware and non-aware clients using the clipboard method
unless the aware client copies the non-flowed text to the clipboard - which
I think is wrong. All this means is that we have to say format=flowed MUST
NOT be used with inline-signed messages, otherwise there is no guarantee
that an arbitrary recipient will be able to verify it.
| Which raises another issue. Mail clients which support format=flowed
| SHOULD put the original (unflowed) version of the text onto the Clipboard.
| IOW, you should be able to take the displayed text, put it into the
| clipboard (or dran'n'drop it), and plonk it into another (format=flowed
| aware) window, and it should automagically appear reflowed to suit the new
| window.
I think this is wrong. First off where are you going to find a
format=flowed aware 'window' that is not in an email client? I want to be
able to copy the flowed text from a message and paste it into a text editor
or some other application and have it appear flowed.
--
Cyrus Daboo
>OK - so that just goes to prove that there is no way to construct an
>inline-signed format=flowed messages such that it will be verifiable by
>both format=flowed aware and non-aware clients using the clipboard method
>unless the aware client copies the non-flowed text to the clipboard - which
>I think is wrong.
I think we are agreed that inline PGP signing will be understood by the
non-aware clients. What is now clear is that an aware client MUST have
some mechanism to make the un-re-flowed (i.e. on the wire) text available
to the user if he asks to see it. That might be dome by making the
Clipboard take the un-re-flowed version, or it might be done some other
way.
> All this means is that we have to say format=flowed MUST
>NOT be used with inline-signed messages, otherwise there is no guarantee
>that an arbitrary recipient will be able to verify it.
But I still think that MUST is too strong. It is going to happen whether
you like it or not, so you have to warn of the dangers and give the aware
clients a mechanism to get around it.
>| Which raises another issue. Mail clients which support format=flowed
>| SHOULD put the original (unflowed) version of the text onto the Clipboard.
>| IOW, you should be able to take the displayed text, put it into the
>| clipboard (or dran'n'drop it), and plonk it into another (format=flowed
>| aware) window, and it should automagically appear reflowed to suit the new
>| window.
>I think this is wrong. First off where are you going to find a
>format=flowed aware 'window' that is not in an email client? I want to be
>able to copy the flowed text from a message and paste it into a text editor
>or some other application and have it appear flowed.
If you plonk it into a window in an email client (say as part of
generating a new outgoing message) then it can be shown reflowed (but when
sent on the wire it will be as its 'true' self). I think if you plonk it
into a window that is not aware (e.g. into some editor) then it will show
up un-re-flowed (which is still a valid representation of the text). So
with my suggestion, ordinary unaware windows need do nothing with the
Clipboard than they did not do before.
But if copying the text of a flowed message to the Clipboard is going to
take it as it had been reflowed in the donor window, then you have to ask
exactly how it is supposed to appear. Is it just a text with lines of a
length as determined by the donor window and no trailing WS, or is it
still in a format=flowed form with trailing WS to show where it might be
reflowed yet again? Or is it just one long line of text for each
paragraph, in which case what about any quote marks in it?
I think these questions need to be addressed, though whether the standard
goes to the length of codifying the behaviours is another matter.
I just tried it using Opera (which is the only client I have which offers
format-flowed). When doing Copy/Paste from the compose window, it showed up
as one long line, but after it had been posted and read (it was a local news
article rather than an email), it behaved like a new format=flowed object.
I.e. at the point where it wrapped in the viewing window, the pasted
version showed a line end with a trailing SP (but it was longer than 76
characters, though). Is that the behaviour we typically expect?
BTW, it didn't do the right thing when pasted into a new compose window :-(.
--
>> All this means is that we have to say format=flowed MUST
>>NOT be used with inline-signed messages, otherwise there is no guarantee
>>that an arbitrary recipient will be able to verify it.
>
> But I still think that MUST is too strong. It is going to happen whether
> you like it or not, so you have to warn of the dangers and give the aware
> clients a mechanism to get around it.
What mechanism? All mechanisms that has been proposed so far are
flawed in one way or another, as far as I can tell. Suggesting one of
them doesn't do much good. Although I lean towards MUST NOT, a SHOULD
NOT would work. The main point is that the document don't say
implementations SHOULD use a known broken mechanism.
> I just tried it using Opera (which is the only client I have which offers
> format-flowed). When doing Copy/Paste from the compose window, it showed up
> as one long line, but after it had been posted and read (it was a local news
> article rather than an email), it behaved like a new format=flowed object.
> I.e. at the point where it wrapped in the viewing window, the pasted
> version showed a line end with a trailing SP (but it was longer than 76
> characters, though). Is that the behaviour we typically expect?
No, Opera's format=flowed handling is broken. Look at the raw
messages and you will probably find that each format=flowed
"paragraph" is just one long (>76 octets) line with a terminating SPC.
Perhaps it has been fixed in later releases.
>"Charles Lindsey" <c...@clerew.man.ac.uk> writes:
>> But I still think that MUST is too strong. It is going to happen whether
>> you like it or not, so you have to warn of the dangers and give the aware
>> clients a mechanism to get around it.
>What mechanism? All mechanisms that has been proposed so far are
>flawed in one way or another, as far as I can tell.
The essential mechanism you need is an ability for the user to ask to see
the text as sent (maybe after CTE decoding, but before flowing). If he
can at least see that, then he can arrange to check inline PGP signatures.
Without that, he is stuck.
> Suggesting one of
>them doesn't do much good. Although I lean towards MUST NOT, a SHOULD
>NOT would work.
I could live with SHOULD.
> The main point is that the document don't say
>implementations SHOULD use a known broken mechanism.
Indeed.
>No, Opera's format=flowed handling is broken. Look at the raw
>messages and you will probably find that each format=flowed
>"paragraph" is just one long (>76 octets) line with a terminating SPC.
>Perhaps it has been fixed in later releases.
The latest Opera versions (anything later than version 7) use their m2
mail client, which is a complete rewrite. It seems to make a reasonable
attempt at format=flowed, but I have not used it sufficiently to have
spotted all the niggles. As I pointed out yesterday, its compose window
seems not to do the right things always, but its display windows seem OK.