Changes from -02:
* Added mention of quoting to Abstract and Introduction.
* Deleted line analysis table.
* Added MUST NOT for OpenPGP and SHOULD for OpenPGP-MIME.
* Replaced ABNF rules to remove ambiguity
* Added note that c-t-e is irrelevant to flowed text processing
* Added text indicating that end of data terminates a paragraph
* Moved sig-sep out of fixed-line ABNF
* Changed some SHOULDs to MUSTs (space-stuffing, quoted paragraphs)
* Added note to ABNF that space and ">" are encoded according to charset
* Mentioned exceptions in section on interpreting
* Moved section on interpreting before section on generating.
* Reworded non-normative "should"s.
One thing I'm not totally sure about is the encoding of space and
">". I added a note to the ABNF section that says that these
characters are encoded according to the charset, but as I recall,
during the discussions on how f=f can work with non-Western
languages/charsets, especially Chinese, Japanese and Korean, it was
mentioned that ASCII space is sometimes used in some of these
languages. So perhaps the statement needs to include the possibility
that space might be encoded in ASCII as well?
The updated text is available at
<ftp://ftp.pensive.org/Public/Randy/draft-gellens-format-bis-03.txt>
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
You have the right to remain silent. Anything you say will be
misquoted, then used against you.
> One thing I'm not totally sure about is the encoding of space and
> ">". I added a note to the ABNF section that says that these
> characters are encoded according to the charset, but as I recall,
> during the discussions on how f=f can work with non-Western
> languages/charsets, especially Chinese, Japanese and Korean, it was
> mentioned that ASCII space is sometimes used in some of these
> languages. So perhaps the statement needs to include the possibility
> that space might be encoded in ASCII as well?
I guess the question is what to do when there is more than one way to
represent the space character and/or ">", as can certainly happen
when using iso-2022 code-switching.
This is one of the reasons why charsets are defined as a mapping from octets to
characters, not the other way around: It lets us talk about the processing of
various characters without having to worry about whether there is one way to
represent them or fifty. If follows that simply scanning the content for 0x20
and 0x3E is unacceptable; the transformation from octets to characters must be
performed first, then the resulting sequence of *characters* can be checked for
space, greater than, etc.
We got this wrong in text/richtext but we fixed it in text/enriched, although
the language in RFC 1896 isn't as clean as I would like. Let's please not
repeat this whole argument yet again. As long as the document makes it clear
we're dealing with the characters that result from the application of the
charset to the sequence of octets we should be good to go.
Ned
P.S. I note in passing that while iso-2022 does allow for very general things,
including the ability to have the same character bound to multiple different
octet values at the same time and for common characters like space and greater
than to be bound to less than obvious octet values, in practice real charsets
defined in terms of iso-2022 tend to restrict themselves to a very small subset
of iso-2022's capabilities. The result is that while you cannot assume that,
say, 0x3E is always a greater than, when it appears greater than will be
represented as 0x3E. I haven't found this characteristic to be particularly
helpful when coding support for this stuff, but it sure helps a lot when
inspecting input iso-2022-based text manually.
> As long as the document makes it clear
> we're dealing with the characters that result from the application of the
> charset to the sequence of octets we should be good to go.
Here is the sentence I added in -03 (in the ABNF section):
Note that the SP (space) and ">" characters are encoded according
to the charset parameter.
Do you think this is sufficient?
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
The idea that Bill Gates has appeared like a knight in shining armor
to lead all customers out of a mire of technological chaos neatly
ignores the fact that it was he who, by peddling second-rate technology,
led them into it in the first place. --Douglas Adams
> I've created a -03 based on comments received in the past few days.
Doh, I just noticed that a message that I sent to the list on Nov-09 had
the wrong From: address and therefore was not permitted to go to the
list (presumably it is still awaiting moderator approval). I'll include
that message here. Please remember that when it says "current draft",
it means draft -02.
---begin-old-message---
Date: Mon, 10 Nov 2003 07:46:05 +0000
To: IETF RFC-822 list <ietf...@imc.org>
Subject: Re: Format=Flowed/RFC 2646 Bis (-02)
Message-ID: <20031110074...@nicemice.net>
References: <200311091913....@crowley.qualcomm.com>
In-Reply-To: <200311091913....@crowley.qualcomm.com>
ra...@qualcomm.com wrote:
> it would have been even better had this activity occurred sometime
> earlier in the past few years, since most of the text in question
> hasn't changed in some time.
Indeed. Somehow I had never heard of format=flowed until quite
recently.
> I also don't see the problem with calling a group of lines intended to
> be re-flowed a "paragraph".
"Paragraph" seems like a nice intuitive term to me. In TeX, for
example, a "paragraph" is the one and only construct that flows (has
line breaks inserted automatically). Keith hinted that "paragraph" too
loaded a term; maybe he'd like to explain that in more detail.
> Adam also suggests mentioning in the section on interpreting f=f the
> exceptions for Usenet signatures and changing-quote-depth, which
> seem like good ideas to me.
Two of the steps I listed need to be swapped. Step 3 checks for the
sig line, and step 4 unstuffs. I wrote them in that order because
section 5.3 says "an (optionally quoted) line consisting of DASH DASH
SP is not considered flowed." But now I notice that the grammar
says sig-sep = [quote [stuffing]] "--" SP CRLF. To be consistent,
section 5.3 should say "(optionally quoted, optionally stuffed)", and
the interpreting section should not check for a sig line until after
unstuffing.
(Or do you want to resolve the inconsistency the other way, by changing
the grammar?)
> Adam also had a number of concerns over ambiguity in the grammar, with
> suggestions for improvement. I generally like the replacement text,
> except for the removal of the distinction between quoted and unquoted
> lines. I thought it was helpful to identify a non-quoted line on
> its own, and not just as a line with a quote-depth of zero. This
> is based on a perceived need to treat the two somewhat differently,
> in particular, quoted lines need extra handling, state, and display
> semantics.
I would think it would be easier for an implementor to write a single
handler & data structure for all paragraphs of all quote depths, rather
than make quote-depth-zero a special case. Of course it's possible to
expand the grammar to make the distinction, but I think it makes the
grammar appear more complex than it really is. (And I just noticed that
while the grammar in the current draft distinguishes between quoted
flowed and unquoted flowed, it does not distinguish between quoted fixed
and unquoted fixed.)
By the way, my proposed grammar forgot to handle the Usenet sig
exception. Here's a fixed version that also incorporates my three
suggestions regarding improperly terminated paragraphs:
flowed-body = * ( paragraph / fixed-line / sig-line )
paragraph = 1*flowed-line fixed-line
; That is the grammar for proper paragraphs, which
; always end with a fixed line. Improper paragraphs
; are instead terminated by a change in quote-depth,
; end of input, or a sig-line (which is not included
; in the paragraph).
sig-line = quote [stuff] "--" SP CRLF
fixed-line = quote (stuff stuffed-fixed / unstuffed-fixed) CRLF
flowed-line = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
stuffed-fixed = [*text-char non-sp]
; Does not end with SP.
unstuffed-fixed = non-sp-quote [*text-char non-sp]
; Does not begin with SP or ">", does not end with SP.
stuffed-flowed = [non-dash *text-char] /
"-" [non-dash *text-char / "-" 1*text-char]
; Is not "--".
unstuffed-flowed = non-sp-quote-dash *text-char /
"-" [non-dash *text-char / "-" 1*text-char]
; Not empty, not "--", does not begin with SP or ">".
quote = *">"
stuff = SP
flow = SP
non-sp-quote-dash = <any character except NUL, CR, LF, SP, ">", "-">
non-sp-quote = <any character except NUL, CR, LF, SP, ">">
non-sp = <any character except NUL, CR, LF, SP>
text-char = <any character except NUL, CR, LF>
non-dash = <any character except NUL, CR, LF, "-">
One of my suggested grammar tweaks was that a sig line should not get
sucked into a paragraph, even if it is preceeded by a flowed line,
because then it could get re-wrapped and no longer appear at the start
of a line (and therefore cease to be a sig line). I notice now that
this suggestion amounts to having a third type of line. A sig line is
neither fixed nor flowed, because fixed and flowed lines can be inside
paragraphs, while sig lines can never be inside paragraphs.
Someone should double-check that grammar, especially the rules for
[un]stuffed-{fixed,flowed}.
AMC
---end-old-message---
Sorry you didn't get a chance to see that before revising the draft.
Henceforth, when I say "current draft", I mean draft -03.
> * Added mention of quoting to Abstract and Introduction.
> * Deleted line analysis table.
> * Added note that c-t-e is irrelevant to flowed text processing
> * Added text indicating that end of data terminates a paragraph
> * Moved sig-sep out of fixed-line ABNF
> * Mentioned exceptions in section on interpreting
> * Moved section on interpreting before section on generating.
> * Reworded non-normative "should"s.
All good.
> * Changed some SHOULDs to MUSTs (space-stuffing, quoted paragraphs)
I haven't given the distinction much thought for this protocol.
> * Added MUST NOT for OpenPGP and SHOULD for OpenPGP-MIME.
I'll defer to Simon and Cyrus, who seem to have that issue covered.
> * Added note to ABNF that space and ">" are encoded according to charset
But what about the decoding side? I think Ned has the right idea--the
clarification could say that the grammar is in terms of characters,
and therefore an encoder using the grammar to generate sequences
of characters would then need to transform the characters to bytes
according to the charset, and a decoder using the grammar to parse
sequences of characters would first need to transform the bytes to
characters according to the charset.
> * Replaced ABNF rules to remove ambiguity
One ambiguity still remains. Consider these two lines:
--
foo
The first ends with a space, and the second does not. We would like
this to parse as a sig-sep and a fixed-line, but according to the
current grammar it can also parse as a paragraph, because "-- " matches
the flowed-line production.
It is possible to eliminate that ambiguity; see the grammar in the old
message above.
Another comment regarding the grammar: It is nice for a grammar to give
names to the meaningful syntactic constructs. For example, we'd like
a name for the quote-marks (and we have one), we'd like names for the
special spaces that act as flags (and we have them), and we'd like a
name for the actual content of the line without the quotes and flags,
but the grammar in the draft doesn't give us that. Consider for example
stuffed-flowed. In the draft, this means a line that *was* flowed and
*is* stuffed (it includes the stuff space but not the flow space). In
the grammar in the old message above, stuffed-flowed means a line that
*was* flowed and *was* stuffed (it includes neither the stuff space nor
the flow space, only the actual content).
Section 5.1 says:
If the line ends in a space, the line is flowed. Otherwise it is
fixed. The exception to this rule is a signature separator line,
described in Section 5.3. Such lines end in a space but are not
flowed.
That leaves the following question unanswered: Are separator lines
fixed, or are they a third type of line?
According to the grammar in section 7, signature separator lines do not
match fixed-line. That seems to suggest that they are a third type of
line, which is the view that seems most intuitive to me. That could be
clarified by changing the last sentence of the quoted paragraph to "Such
lines end in a space but are neither flowed nor fixed."
Another way to view the situation is that sig-lines are fixed, and
paragraphs end with non-sig-sep fixed lines. The grammar would then be:
flowed-body = *( paragraph / fixed-line )
fixed-line = sig-sep / non-sig-sep-fixed-line
paragraph = 1*flowed-line non-sep-fixed-line
But that looks unnaturally convoluted to me. I prefer the existing
grammar:
flowed-body = *( paragraph / fixed-line / sig-sep )
paragraph = 1*flowed-line fixed-line
Section 5.3 says:
This is a special case; an (optionally quoted) line consisting of
DASH DASH SP is not considered flowed.
Sections 5.1 (interpreting) and 7 (grammar) both indicate that a
signature line can be quoted and/or stuffed. It is confusing for 5.3
to mention "optionally quoted" without also mentioning "optionally
stuffed". Also, if in section 5.1 "not flowed" is changed to "neither
flowed nor fixed", the same change ought to be made here.
Section 5.3 goes on to say:
Generating agents MUST NOT end a paragraph with such a signature
line, since doing so would indicate that the separator line is part
of the paragraph.
It would not indicate that the separator line is part of the paragraph,
it would indicate that the body is malformed (according to the grammar
and according to section 5.1); the receiver would not believe that the
separator is part of the paragraph (according to 5.1). Perhaps the
intention is something like this:
When placing soft line breaks in a paragraph, generating agents MUST
NOT place them in a way that causes any line of the paragraph to
be a signature separator line, because paragraphs cannot contain
signature separator lines (see sections 5.1 and 7).
Section 5.4 says:
Space-stuffing adds a single space to the start of any line which
needs protection when the message is generated. On reception, if
the first character of a line is a space, it is logically deleted.
This occurs after the test for a quoted line, and before the test
for a flowed line.
It's not only after testing for a quoted line, but more importantly
after stripping the quoting. And it's not only before the test for
a flowed line, but also before the test for a separator line. Maybe
change the last sentence to:
This occurs after deleting quote marks, and before testing for
fixed, flowed, and separator lines.
Section 5.5 says:
When generating quoted flowed lines, an agent needs to pay attention
to changes in quote depth. A sequence of quoted lines of the same
quote depth immediately followed by lines of a different quote
depth MUST be encoded so that lines of the same quote depth are a
paragraph, with the last line generated as fixed and prior lines
generated as flowed.
That seems to be a much stronger requirement than you intend. Within a
single quote depth, there might be multiple paragraphs, non-paragraph
fixed-lines, and separator lines. But the sentence quoted above seems
to say that because all of this text is a bunch of "lines at the same
quote depth", it must be encoded as "a paragraph", with the last line
fixed and all other lines flowed. Perhaps the intention is something
like this:
When generating quoted flowed lines, an agent needs to pay attention
to changes in quote depth. All lines of a paragraph MUST be
unquoted, or else they MUST all be quoted and have the same quote
depth. Therefore, whenever there is a change in quote depth, or a
change from quoted to unquoted, or change from unquoted to quoted,
the line immediately preceeding the change MUST NOT be a flowed
line.
The wording could be simplified if an unquoted line were simply a line
with a quote depth of zero:
When generating quoted flowed lines, an agent needs to pay attention
to changes in quote depth. All lines of a paragraph MUST have the
same quote depth. Therefore, whenever there is a change in quote
depth, the line immediately preceeding the change MUST NOT be a
flowed line.
Section 5.5 goes on to say:
If a receiving agent wishes to reformat flowed quoted lines (joining
and/or wrapping them) on display or when generating new messages,
the lines SHOULD be de-quoted, reformatted, and then re-quoted. To
de-quote, the number of close angle brackets in the quote indicator
at the start of each line is counted. Consecutive lines with the
same quote depth are considered one paragraph and are reformatted
together. To re-quote after reformatting, a quote indicator
containing the same number of close angle brackets originally
present are prefixed to each line.
I think one sentence there is inaccurate: "Consecutive lines with the
same quote depth are considered one paragraph and are reformatted
together." Consecutive lines with the same quote depth could be one
paragraph or several paragraphs or non-paragraph fixed lines (in which
case no reformatting is requested) or separator lines. I think that
sentence can simply be removed. Reformatting is covered elsewhere; this
section is about quoting.
The next two paragraphs are inconsistent with section 5.1:
On reception, if a change in quote depth occurs on a flowed line,
this is an improperly formatted message. The receiver SHOULD handle
this error by using the 'quote-depth-wins' rule, which is to ignore
the flowed indicator and treat the line as fixed. That is, the
change in quote depth ends the paragraph.
In other words, whenever two adjacent lines have different quote
depths, senders MUST ensure that the earlier line is fixed (does
not end in a space), and receivers SHOULD treat the earlier line as
fixed regardless of whether it ends with a space.
According to section 5.1, the paragraph ends with the flowed line; it is
possible therefore to have an improperly terminated paragraph consisting
of a single flowed line, and such a paragraph would be reformatted. If
the flowed indicator is ignored and the line is treated as fixed, then
we have a single fixed line, which is not a paragraph at all and would
not be reformatted. Also, it is possible for the line before the change
in quote depth to be a separator line, which is arguably not fixed (see
the discussion above). The inconsistency could be resolved like so:
...the 'quote-depth-wins rule', which is to consider the paragraph
to end with the flowed line immediately preceeding the change in
quote depth.
In other words, whenever two adjacent lines have different quote
depths, senders MUST ensure that the earlier line is not flowed
(does not end in a space), and receivers finding a flowed line there
SHOULD treat it as the last line of a paragraph.
Here we have more instances of the phrase "change in quote depth". If
we keep the current view that unquoted lines have no quote depth and
quoted lines have non-zero quote depth, then we really ought to be
saying "change in quote depth, or change from quoted to unquoted, or
change from unquoted to quoted". If we adopt the view that all lines
have a quote depth, which can be zero, then the simple phrase "change in
quote depth" will mean what we want it to mean.
AMC
> > As long as the document makes it clear
> > we're dealing with the characters that result from the application of the
> > charset to the sequence of octets we should be good to go.
> Here is the sentence I added in -03 (in the ABNF section):
> Note that the SP (space) and ">" characters are encoded according
> to the charset parameter.
> Do you think this is sufficient?
Yes I do.
Ned
>I've created a -03 based on comments received in the past few days.
>Changes from -02:
>* Added MUST NOT for OpenPGP and SHOULD for OpenPGP-MIME.
I think that MUST NOT is too severe. People are going to do it whether you
like it or not. The best you shjould do is to warn of the dangers (which
are not actually all that bad - just that your trailing spaces my get
munged by a malicious man in the middle) and deprecate it in favour of
PGP/MIME.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
The word "paragraph" describes the semantics of the block of text; it
implies, for instance, that the block of text contains one or more
sentences (some say two or more) on a single topic that are spoken by
a single speaker. For instance, it would not be appropriate for a
text-to-whatever translator to map a f=f "paragraph" into a "paragraph"
in some semantic markup language.
But if "paragraph" is the closest term you can find, use it, and just
point out that it's not quite the same as a "real" paragraph.
> Two of the steps I listed need to be swapped. Step 3 checks for the
> sig line, and step 4 unstuffs. I wrote them in that order because
> section 5.3 says "an (optionally quoted) line consisting of DASH DASH
> SP is not considered flowed." But now I notice that the grammar
> says sig-sep = [quote [stuffing]] "--" SP CRLF. To be consistent,
> section 5.3 should say "(optionally quoted, optionally stuffed)", and
> the interpreting section should not check for a sig line until after
> unstuffing.
>
> (Or do you want to resolve the inconsistency the other way, by changing
> the grammar?)
The -03 version fixed the grammar to treat signature separator lines
as a third type of line. I just created an -04 which changes "an
(optionally quoted) line consisting of DASH DASH SP" to "an
(optionally quoted or quoted and stuffed) line consisting of DASH
DASH SP".
Note that the idea was to allow signature separator lines to be
quoted, but not stuffed unless also quoted. That way, stuffing can
be used to guard against a line being confused with a signature
separator.
> > Adam also had a number of concerns over ambiguity in the grammar, with
>> suggestions for improvement. I generally like the replacement text,
>> except for the removal of the distinction between quoted and unquoted
>> lines. I thought it was helpful to identify a non-quoted line on
>> its own, and not just as a line with a quote-depth of zero. This
>> is based on a perceived need to treat the two somewhat differently,
>> in particular, quoted lines need extra handling, state, and display
>> semantics.
>
> I would think it would be easier for an implementor to write a single
> handler & data structure for all paragraphs of all quote depths, rather
> than make quote-depth-zero a special case.
Making the distinction in the grammar still allows a client to
implement either way.
> Of course it's possible to
> expand the grammar to make the distinction, but I think it makes the
> grammar appear more complex than it really is.
This is a good point. The grammar in the -03 and -04 versions is
based on the suggested replacement you provided, plus fixes to treat
signature lines as a third type, and to distinguish between quoted
and unquoted. I agree that this latter change does make the grammar
larger and more complex.
> (And I just noticed that
> while the grammar in the current draft distinguishes between quoted
> flowed and unquoted flowed, it does not distinguish between quoted fixed
> and unquoted fixed.)
This was fixed in -03.
> By the way, my proposed grammar forgot to handle the Usenet sig
> exception.
I fixed this in -03.
That definition for sig-line allows it to be stuffed but not quoted,
which we have been prohibiting.
The suggested definitions for flowed lines attempt to eliminate the
ambiguity between a flowed line and a signature separator but I don't
think they allow for all cases of flowed lines. I think we'd need
something like:
non-sp-quote-dash *text-char /
"-" non-dash *text-char /
"--" non-space *text-char /
"--" 2*text-char
But this wouldn't work since the definition of flowed-line includes
the flow (space) at the end. I don't see any way to resolve the
ambiguity without really convoluting the ABNF. I think it may be OK
to just note the potential ambiguity, especially since a check for
signature line can be made before checking for a flowed line. I've
added a note in the ABNF section of -04 about it.
> One of my suggested grammar tweaks was that a sig line should not get
> sucked into a paragraph, even if it is preceeded by a flowed line,
> because then it could get re-wrapped and no longer appear at the start
> of a line (and therefore cease to be a sig line). I notice now that
> this suggestion amounts to having a third type of line. A sig line is
> neither fixed nor flowed, because fixed and flowed lines can be inside
> paragraphs, while sig lines can never be inside paragraphs.
I agree, and I believe this was fixed in -03.
> Another comment regarding the grammar: It is nice for a grammar to give
> names to the meaningful syntactic constructs. For example, we'd like
> a name for the quote-marks (and we have one), we'd like names for the
> special spaces that act as flags (and we have them), and we'd like a
> name for the actual content of the line without the quotes and flags,
> but the grammar in the draft doesn't give us that. Consider for example
> stuffed-flowed. In the draft, this means a line that *was* flowed and
> *is* stuffed (it includes the stuff space but not the flow space). In
> the grammar in the old message above, stuffed-flowed means a line that
> *was* flowed and *was* stuffed (it includes neither the stuff space nor
> the flow space, only the actual content).
This is a nice feature, and I believe I've achieved it in -04 (at the
expense of some extra parentheses).
> Section 5.1 says:
>
> If the line ends in a space, the line is flowed. Otherwise it is
> fixed. The exception to this rule is a signature separator line,
> described in Section 5.3. Such lines end in a space but are not
> flowed.
>
> That leaves the following question unanswered: Are separator lines
> fixed, or are they a third type of line?
>
> According to the grammar in section 7, signature separator lines do not
> match fixed-line. That seems to suggest that they are a third type of
> line, which is the view that seems most intuitive to me. That could be
> clarified by changing the last sentence of the quoted paragraph to "Such
> lines end in a space but are neither flowed nor fixed."
I agree and made this change in -04.
> Section 5.3 says:
>
> This is a special case; an (optionally quoted) line consisting of
> DASH DASH SP is not considered flowed.
>
> Sections 5.1 (interpreting) and 7 (grammar) both indicate that a
> signature line can be quoted and/or stuffed. It is confusing for 5.3
> to mention "optionally quoted" without also mentioning "optionally
> stuffed". Also, if in section 5.1 "not flowed" is changed to "neither
> flowed nor fixed", the same change ought to be made here.
Thanks; I believe this has been cleaned up in -04 so that it is now
clear from all references in the text as well as the grammar that
signature lines can be quoted or quoted and stuffed but they can't be
stuffed without being quoted.
> Section 5.3 goes on to say:
>
> Generating agents MUST NOT end a paragraph with such a signature
> line, since doing so would indicate that the separator line is part
> of the paragraph.
>
> It would not indicate that the separator line is part of the paragraph,
> it would indicate that the body is malformed (according to the grammar
> and according to section 5.1); the receiver would not believe that the
> separator is part of the paragraph (according to 5.1).
That jumped out at me as I was making another change. I deleted the
second clause, so now it just says "Generating agents MUST NOT end a
paragraph with such a signature line".
> Perhaps the intention is something like this:
>
> When placing soft line breaks in a paragraph, generating agents MUST
> NOT place them in a way that causes any line of the paragraph to
> be a signature separator line, because paragraphs cannot contain
> signature separator lines (see sections 5.1 and 7).
I'm not sure if that was the original intent or not, but I liked the
text you suggest and so added it to the section on generating f=f
(with references to the section on signature lines and on the abnf).
> Section 5.4 says:
>
> Space-stuffing adds a single space to the start of any line which
> needs protection when the message is generated. On reception, if
> the first character of a line is a space, it is logically deleted.
> This occurs after the test for a quoted line, and before the test
> for a flowed line.
>
> It's not only after testing for a quoted line, but more importantly
> after stripping the quoting.
But the test for quoted line is what deletes the quote marks. I
changed the text in -04 to say "This occurs
after the test for a quoted line (which logically counts and deletes
any quote marks)".
> And it's not only before the test for
> a flowed line, but also before the test for a separator line.
I think the test for a signature line has to happen both before the
test for a quoted line and also after deleting quote marks and
stuffing.
> Section 5.5 says:
>
> When generating quoted flowed lines, an agent needs to pay attention
> to changes in quote depth. A sequence of quoted lines of the same
> quote depth immediately followed by lines of a different quote
> depth MUST be encoded so that lines of the same quote depth are a
> paragraph, with the last line generated as fixed and prior lines
> generated as flowed.
>
> That seems to be a much stronger requirement than you intend. Within a
> single quote depth, there might be multiple paragraphs, non-paragraph
> fixed-lines, and separator lines. But the sentence quoted above seems
> to say that because all of this text is a bunch of "lines at the same
> quote depth", it must be encoded as "a paragraph", with the last line
> fixed and all other lines flowed. Perhaps the intention is something
> like this:
>
> When generating quoted flowed lines, an agent needs to pay attention
> to changes in quote depth. All lines of a paragraph MUST be
> unquoted, or else they MUST all be quoted and have the same quote
> depth. Therefore, whenever there is a change in quote depth, or a
> change from quoted to unquoted, or change from unquoted to quoted,
> the line immediately preceeding the change MUST NOT be a flowed
> line.
Indeed. Thanks for catching this.
> Section 5.5 goes on to say:
>
> If a receiving agent wishes to reformat flowed quoted lines (joining
> and/or wrapping them) on display or when generating new messages,
> the lines SHOULD be de-quoted, reformatted, and then re-quoted. To
> de-quote, the number of close angle brackets in the quote indicator
> at the start of each line is counted. Consecutive lines with the
> same quote depth are considered one paragraph and are reformatted
> together. To re-quote after reformatting, a quote indicator
> containing the same number of close angle brackets originally
> present are prefixed to each line.
>
> I think one sentence there is inaccurate: "Consecutive lines with the
> same quote depth are considered one paragraph and are reformatted
> together." Consecutive lines with the same quote depth could be one
> paragraph or several paragraphs or non-paragraph fixed lines (in which
> case no reformatting is requested) or separator lines. I think that
> sentence can simply be removed. Reformatting is covered elsewhere; this
> section is about quoting.
Another good catch. The sentence is deleted in -04.
I agree; thanks again.
> Here we have more instances of the phrase "change in quote depth". If
> we keep the current view that unquoted lines have no quote depth and
> quoted lines have non-zero quote depth, then we really ought to be
> saying "change in quote depth, or change from quoted to unquoted, or
> change from unquoted to quoted". If we adopt the view that all lines
> have a quote depth, which can be zero, then the simple phrase "change in
> quote depth" will mean what we want it to mean.
I think "change in quote depth" can include a changing between quoted
and unquoted. Even though I have retained the ABNF distinction
between quoted and unquoted, I don't think the text has to be too
rigid about it.
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
Nothing astonishes men so much as common sense and plain dealing.
--Ralph Waldo Emerson.
> Note that the idea was to allow signature separator lines to be
> quoted, but not stuffed unless also quoted. That way, stuffing can
> be used to guard against a line being confused with a signature
> separator.
I failed to notice that subtlety. Hmmm, are you sure that's what you
want? That would be the only place where quoting and stuffing are not
orthogonal. It looks like it might be asking for trouble. Consider
this format=flowed message body:
Foo
--
bar.
The first two lines end with a space, and the third does not. According
to your definition of separator lines, this is a single paragraph,
because the middle line is a flowed line, not a separator line. But
what happens if I try to quote that paragraph by inserting a quote
indicator before each line:
>Foo
> --
>bar.
According to your definition of separator lines, this is now an invalid
format=flowed message body, because it contains a flowed line followed
by a separator line.
There would be no such gotcha if quoting and stuffing were kept
independent. The rule could be that separator lines are still separator
lines after being stuffed, or that separator lines cease to be separator
lines when they are stuffed, as long as it is the same for both quoted
and unquoted lines.
> The suggested definitions for flowed lines attempt to eliminate the
> ambiguity between a flowed line and a signature separator but I don't
> think they allow for all cases of flowed lines.
I wouldn't be surprised if I made a mistake in that tricky part of the
grammar, but can you produce a concrete example of a line that ought to
be flowed and doesn't match my suggested flowed-line production? That
would be the surest way to expose the problem.
It's also possible that the grammar is correct, but structured in a
confusing way. If that's the case, I welcome suggestions for less
confusing ways of expressing the same syntax.
For convenience, here's another copy of the relevant parts of the
grammar that I suggested:
flowed-line = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
stuffed-flowed = [non-dash *text-char] /
"-" [non-dash *text-char / "-" 1*text-char]
; Is not "--".
unstuffed-flowed = non-sp-quote-dash *text-char /
"-" [non-dash *text-char / "-" 1*text-char]
; Not empty, not "--", does not begin with SP or ">".
quote = *">"
stuff = SP
flow = SP
I haven't yet looked at the -04 draft. I'll hold off making any more
comments until I have.
AMC
> > Note that the idea was to allow signature separator lines to be
> > quoted, but not stuffed unless also quoted.
>
> I failed to notice that subtlety. Hmmm, are you sure that's what you
> want? That would be the only place where quoting and stuffing are not
> orthogonal.
True, but then signature separator lines are a weird anomaly anyway.
The RFC 2646 ABNF rules are clear that signature separator lines
can't be stuffed unless they are also quoted, so I just carried that
forward in the update.
> It looks like it might be asking for trouble. Consider
> this format=flowed message body:
>
> Foo
> --
> bar.
>
> The first two lines end with a space, and the third does not. According
> to your definition of separator lines, this is a single paragraph,
> because the middle line is a flowed line, not a separator line. But
> what happens if I try to quote that paragraph by inserting a quote
> indicator before each line:
>
>>Foo
>> --
>>bar.
>
> According to your definition of separator lines, this is now an invalid
> format=flowed message body, because it contains a flowed line followed
> by a separator line.
True, but it would be rather foolish for a client to generate such a
thing. Why would it choose to wrap immediately before and after a
SPACE DASH DASH SPACE sequence?
> > The suggested definitions for flowed lines attempt to eliminate the
>> ambiguity between a flowed line and a signature separator but I don't
>> think they allow for all cases of flowed lines.
>
> I wouldn't be surprised if I made a mistake in that tricky part of the
> grammar, but can you produce a concrete example of a line that ought to
> be flowed and doesn't match my suggested flowed-line production? That
> would be the surest way to expose the problem.
>
> It's also possible that the grammar is correct, but structured in a
> confusing way. If that's the case, I welcome suggestions for less
> confusing ways of expressing the same syntax.
>
> For convenience, here's another copy of the relevant parts of the
> grammar that I suggested:
>
> flowed-line = quote (stuff stuffed-flowed / unstuffed-flowed) flow CRLF
> stuffed-flowed = [non-dash *text-char] /
> "-" [non-dash *text-char / "-" 1*text-char]
> ; Is not "--".
> unstuffed-flowed = non-sp-quote-dash *text-char /
> "-" [non-dash *text-char / "-" 1*text-char]
> ; Not empty, not "--", does not begin with SP or ">".
> quote = *">"
> stuff = SP
> flow = SP
I think it's a good idea to use parentheses to explicitly group the
ABNF constructs, to avoid confusion. RFC 2234 recommends this as
well.
So, taking your suggested 'unstuffed-flowed' to be
unstuffed-flowed = ( non-sp-quote-dash *text-char ) /
( "-" [non-dash *text-char ) /
( "-" 1*text-char] )
I think DASH DASH would match the third alternative for this
'unstuffed-flowed', since the first DASH matches the "-" and the
second DASH matches text-char, and only one text-char is required.
So a line of DASH DASH SPACE would match 'flowed-line'.
--
Randall Gellens
Opinions are personal; facts are suspect; I speak for myself only
-------------- Randomly-selected tag: ---------------
The well-bred contradict other people. The wise contradict themselves.
--Oscar Wilde
> The grammar in the -03 and -04 versions is based on the suggested
> replacement you provided, plus fixes...to distinguish between quoted
> and unquoted. I agree that this latter change does make the grammar
> larger and more complex.
Perhaps there is a middle ground. Here is an excerpt from the -04
grammar:
flowed-line = ( flowed-line-qt / flowed-line-unqt ) flow CRLF
flowed-line-qt = quote ( ( stuffing stuffed-flowed ) /
unstuffed-flowed )
flowed-line-unqt = ( stuffing stuffed-flowed ) / unstuffed-flowed
fixed-line = fixed-line-qt / fixed-line-unqt
fixed-line-qt = quote ( ( stuffing stuffed-fixed ) /
unstuffed-fixed ) CRLF
fixed-line-unqt = ( stuffed-fixed / unstuffed-fixed ) CRLF
quote = 1*">"
Here is what I had proposed:
fixed-line = quote ( (stuff stuffed-fixed) /
unstuffed-fixed ) CRLF
flowed-line = quote ( (stuff stuffed-flowed) /
unstuffed-flowed ) flow CRLF
quote = *">"
Here is a tweak to that proposal:
fixed-line = [quote] ( (stuff stuffed-fixed) /
unstuffed-fixed ) CRLF
flowed-line = [quote] ( (stuff stuffed-flowed) /
unstuffed-flowed ) flow CRLF
quote = 1*">"
My original proposed grammar presented the view that the quoting is
always present, but might have a depth of zero. The tweaked grammar
presents a view more in line with the text of the draft: the quoting can
be present or not present, and if it is present it has a non-zero depth.
Unlike the -04 draft, the tweaked proposal doesn't give explicit names
to the quoted/unquoted versions of the lines, it just shows "[quote]"
in brackets, implying that the line is either quoted or not. Forgoing
the explicit names shaves four rules off the grammar. Just something to
consider.
> I believe this has been cleaned up in -04 so that it is now clear from
> all references in the text as well as the grammar that signature lines
> can be quoted or quoted and stuffed but they can't be stuffed without
> being quoted.
Section 5.1 is the first introduction to the syntax, and it says:
Logically, this test for quoted lines is done before any other tests
(that is, before checking for space-stuffed and flowed).
Logically, this leading space is deleted before examining the line
further (that is, before checking for flowed).
If the line ends in a space, the line is flowed. Otherwise it is
fixed. The exception to this rule is a signature separator line,
described in Section 5.3. Such lines end in a space but are neither
flowed nor fixed.
It seems clear from reading 5.1 that the test for signature separator
lines happens along with the test for flowed lines, after the quoting
and space-stuffing have been stripped off. But that inference is
inconsistent with the grammar. I was hoping that section 5.1 would
give the complete procedure for analyzing the structure of a flowed
body. Later sections tell what to do with that structure & why & how,
but I was hoping that 5.1 would contain all the same information as
the grammar (in plain English procedural form rather than a formal
declarative form).
Section 5.3 in -04 says:
A receiving agent needs to test for a signature line both before the
test for a quoted line (see Section 5.5) and also after logically
counting and deleting quote marks and stuffing (see Section 5.4)
from a quoted line.
If that's true, then I'd like to see those two tests in section 5.1
along with the other tests. But I'm skeptical of testing for a
signature line after deleting the stuffing. I don't see how that's
useful, because after the stuffing is deleted, there is no memory of it
(unlike the quote indicators, which are remembered in the quote depth).
If the line is dash-dash-space at this point, it might be a signature
line, but it might not be (if it had been stuffed and not quoted).
To decode the line syntax indicated in the -04 grammar, I think the
actual decoding steps are:
1. Count & strip quote indicators.
2. Check for signature separator:
dash-dash-space is a separator (always),
space-dash-dash-space is a separator if quote depth is nonzero,
3. Unstuff: delete leading space if present.
4. Check for trailing space to determine flowed/fixed (unless the
line has already been classified as a signature separator).
Another way to achieve the same effect is:
1. Count & strip quote indicators.
2. Unstuff: delete leading space if present, but remember that the line
was stuffed.
3. If the line was quoted or neither-quoted-nor-stuffed, check for
signature separator: dash-dash-space.
4. Check for trailing space to determine flowed/fixed (unless the
line has already been classified as a signature separator).
I still think it's very counterintuitive that I can use space-stuffing
to hide a signature line in unquoted text, but I can't do the same in
quoted text. I think the grammar and parsing would be simpler and more
intuitive if the sequence were either:
1. Count & strip quote indicators.
2. Unstuff: delete leading space if present.
3. Check for signature separator: dash-dash-space.
4. Check for trailing space to determine flowed/fixed (unless the
line has already been classified as a signature separator).
or:
1. Count & strip quote indicators.
2. Check for signature separator: dash-dash-space.
3. Unstuff: delete leading space if present.
4. Check for trailing space to determine flowed/fixed (unless the
line has already been classified as a signature separator).
In other words, stuffing either hides signature separators or it
doesn't, regardless of quoting.
In any case, I'd like to see section 5.1 include all the steps for
decoding a line, whatever they are.
> > can you produce a concrete example of a line that ought to be flowed
> > and doesn't match my suggested flowed-line production?
> >
> > flowed-line = quote (stuff stuffed-flowed / unstuffed-flowed) flow
> > CRLF
> > stuffed-flowed = [non-dash *text-char] /
> > "-" [non-dash *text-char / "-" 1*text-char]
> > ; Is not "--".
> > unstuffed-flowed = non-sp-quote-dash *text-char /
> > "-" [non-dash *text-char / "-" 1*text-char]
> > ; Not empty, not "--", does not begin with SP or ">".
> > quote = *">"
> > stuff = SP
> > flow = SP
>
> I think it's a good idea to use parentheses to explicitly group the
> ABNF constructs, to avoid confusion.
>
> So, taking your suggested 'unstuffed-flowed' to be
>
> unstuffed-flowed = ( non-sp-quote-dash *text-char ) /
> ( "-" [non-dash *text-char ) /
> ( "-" 1*text-char] )
Notice the position of your parentheses relative to my square brackets.
Perhaps I should have put spaces around the brackets:
stuffed-flowed = [ non-dash *text-char ] /
"-" [ (non-dash *text-char) / ("-" 1*text-char) ]
; Is not "--".
unstuffed-flowed = (non-sp-quote-dash *text-char) /
"-" [ (non-dash *text-char) / ("-" 1*text-char) ]
; Not empty, not "--", does not begin with SP or ">".
Maybe it would be clearer in a more verbose form without the brackets
and without any nesting:
stuffed-flowed = "" / "-" /
(non-dash *text-char) /
("-" non-dash *text-char) /
("--" 1*text-char)
; Is not "--".
unstuffed-flowed = "-" /
(non-sp-quote-dash *text-char) /
("-" non-dash *text-char) /
("--" 1*text-char)
; Not empty, not "--", does not begin with SP or ">".
Remember that these rules were written under the assumption that
signature separators can be quoted and/or stuffed in any combination.
These rules would need to be adjusted to reflect draft -04 sig-sep
syntax, or to reflect a syntax that never allows stuffed sig-sep lines.
But I have no doubt that we can write an unambiguous grammar for any of
these syntaxes.
AMC
> 1. Count & strip quote indicators.
> 2. Unstuff: delete leading space if present.
> 3. Check for signature separator: dash-dash-space.
> 4. Check for trailing space to determine flowed/fixed (unless the
> line has already been classified as a signature separator).
Hmmm! I have not followed all the discussion above, but there is one
property that should hold regardless:
A. Some clients will be flowed-aware, and will display the message
accordingly.
B. Some clients will not be flowed-aware, and will display the message as
received.
C. A message either has a signature line, or it doesn't have a signature
line.
D. The answer to question C (does the user see a signature line on his
display) MUST be the same whether viewed by an A-like client or a B-like
client.
Does what you propose have that property? The bit I have quoted above
suggests to me that it does not.
> C. A message either has a signature line, or it doesn't have a
> signature line.
>
> D. The answer to question C (does the user see a signature line on
> his display) MUST be the same whether viewed by an A-like client or a
> B-like client.
Perhaps the draft is not clear enough about the distinction between
"signature separator for the purpose of rewrapping" versus "signature
separator for the purpose of displaying a sig". The draft is talking
about the former, and has nothing to say about the latter. For example:
---begin-example---
>The quick brown fox
>jumps over the lazy
>dog.
>--
>Joe Smith
How now, brown cow?
--
Jane Cooper
---end-example---
The first "signature separator" is classified as a signature separator, rather
than a flowed line, in order to prevent it from being rewrapped like so:
>The quick brown fox jumps
>over the lazy dog.
>-- Joe Smith
But its classification as a signature separator for this purpose doesn't
mean that "Joe Smith" will be displayed as a sig. A message body has
only one sig, and in this case it's "Jane Cooper".
AMC
>But its classification as a signature separator for this purpose doesn't
>mean that "Joe Smith" will be displayed as a sig. A message body has
>only one sig, and in this case it's "Jane Cooper".
OK, if that is always the effect of the definitions, then I am happy.