> Hi Bruce,
>
> As I said on the ietf-822 list, I was getting started on updates to
> RFC 2822 to move it along to draft and was looking at:
>
> <http://users.erols.com/blilly/mparse/rfc2822grammar_simplified.txt>
>
> I notice that the ABNF in there has a few things that are non-2822
> such as encoded words. Do you have a copy of the ABNF which is purely
> the 2822 replacement that we can post to the ietf-2822 list?
>
> pr
I have an older version which does not have the encoded-word grammar
(full text below).
The rationale for adding encoded-word grammar is:
a) 822 (as amended by RFC 2047 section 5 and as further amended by RFC
2231) had it,
though spread out over three documents
b) it is necessary for MIME-conforming implementations
c) the encoded-word rules are rather complex -- I believe that the
grammar in the current
document (URI above) covers everything except the rule prohibiting
encoded-words in
Received fields. In particular, the rules regarding adjacent linear
whitespace are quite
complex.
From an implementor's perspective, I'd like to see all of the relevant
base grammar (i.e.
base field and supporting grammar) in a single document; indeed, one of
the benefits of
2822 is that it consolidated most of the "... amends RFC 822" piecemeal
details into a
single document (obviously, the 2047/2231 amendments somehow didn't make
it into
2822). I don't believe there is any harm in including the encoded-word
grammar as
encoded-words appear in the higher-level constructs as alternatives to
ccontent, word,
and utext.
Following is the text of the modified grammar w/o encoded-word grammar,
interspersed
with some notes:
rfc2822grammar_simplified.txt version 0.13 2001/08/08 16:02:35
excerpted from RFC 2822 and modified by Bruce Lilly
NO-WS-CTL = %d1-8 / ; US-ASCII control characters
%d11 / ; that do not include the
%d12 / ; carriage return, line feed,
%d14-31 / ; and white space characters
%d127
text = %d1-9 / ; Characters excluding CR and LF
%d11 / %d12 / %d14-127 / obs-text
specials = "(" / ")" / ; Special characters used in
"<" / ">" / ; other parts of the syntax
"[" / "]" / ":" / ";" / "@" / "\" / "," / "." /
DQUOTE
quoted-pair = ("\" text)
[N.B. had redundant obs-qp alternative]
FWS = ([*WSP CRLF] 1*WSP) / ; Folding white space
obs-FWS
ctext = NO-WS-CTL / ; Non white space controls
%d33-39 / ; The rest of the US-ASCII
%d42-91 / ; characters not including "(",
%d93-126 ; ")", or "\"
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = *([FWS] comment) (([FWS] comment) / FWS)
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" /
"^" / "_" / "`" / "{" / "|" / "}" / "~"
atom = 1*atext [CFWS]
dot-atom = dot-atom-text [CFWS]
dot-atom-text = 1*atext *("." 1*atext)
qtext = NO-WS-CTL / ; Non white space controls
%d33 / ; The rest of the US-ASCII
%d35-91 / ; characters not including "\"
%d93-126 ; or the quote character
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
qcontent = qtext / quoted-pair
quoted-string = DQUOTE [FWS] *(qcontent [FWS]) DQUOTE [CFWS]
word = atom / quoted-string
phrase = 1*word / obs-phrase
utext = NO-WS-CTL / ; Non white space controls
%d33-126 / ; The rest of US-ASCII
obs-utext
unstructured = *(utext [FWS])
date-time = ([ day-name "," [FWS]] date FWS time [CFWS]) /
obs-date-time
day-name = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" /
"Sun"
date = day FWS month-name FWS year
year = 4*DIGIT
month-name = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" /
"Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
day = 1*2DIGIT
time = time-of-day FWS zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT
minute = 2DIGIT
second = 2DIGIT
zone = ( "+" / "-" ) 4DIGIT
[N.B. no CFWS between +- and 4DIGIT]
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = ("<" [CFWS] addr-spec ">" [CFWS]) / obs-angle-addr
group = display-name ":" [CFWS] [mailbox-list] ";" [CFWS]
display-name = phrase
mailbox-list = (mailbox *("," [CFWS] mailbox)) / obs-mbox-list
address-list = (address *("," [CFWS] address)) / obs-addr-list
addr-spec = local-part "@" [CFWS] domain
local-part = dot-atom / quoted-string / obs-local-part
domain = dot-atom / domain-literal / obs-domain
domain-literal = "[" [FWS] *(dcontent [FWS]) "]" [CFWS]
dcontent = dtext / quoted-pair
dtext = NO-WS-CTL / ; Non white space controls
%d33-90 / ; The rest of the US-ASCII
%d94-126 ; characters not including "[",
; "]", or "\"
[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
message = (fields / obs-fields) [CRLF body]
body = *(*998text CRLF) *998text
fields = *(trace *(resent-date / resent-from /
resent-sender / resent-to / resent-cc / resent-bcc / resent-msg-id))
*(orig-date / from / sender / reply-to / to / cc
/ bcc / message-id / in-reply-to / references / subject / comments /
keywords / optional-field)
orig-date = "Date:" [FWS] date-time CRLF
from = "From:" [CFWS] mailbox-list CRLF
sender = "Sender:" [CFWS] mailbox CRLF
reply-to = "Reply-To:" [CFWS] address-list CRLF
to = "To:" [CFWS] address-list CRLF
cc = "Cc:" [CFWS] address-list CRLF
bcc = "Bcc:" [CFWS] [address-list] CRLF
message-id = "Message-ID:" [CFWS] msg-id CRLF
in-reply-to = "In-Reply-To:" [CFWS] 1*msg-id CRLF
references = "References:" [CFWS] 1*msg-id CRLF
msg-id = ( "<" id-left "@" id-right ">" [CFWS]) / obs-msg-id
id-left = dot-atom-text / no-fold-quote
id-right = dot-atom-text / no-fold-literal
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
no-fold-literal = "[" *(dtext / quoted-pair) "]"
subject = "Subject:" [FWS] [("cmsg" / "Re: ") [FWS]]
unstructured CRLF
[ RFC 1036 sect. 2.2.6 "cmsg" Subject hack, sect. 2.1.4 "Re: " ]
comments = "Comments:" [FWS] unstructured CRLF
keywords = "Keywords:" [CFWS] phrase *("," [CFWS] phrase) CRLF
resent-date = "Resent-Date:" [FWS] date-time CRLF
resent-from = "Resent-From:" [CFWS] mailbox-list CRLF
resent-sender = "Resent-Sender:" [CFWS] mailbox CRLF
resent-to = "Resent-To:" [CFWS] address-list CRLF
resent-cc = "Resent-Cc:" [CFWS] address-list CRLF
resent-bcc = "Resent-Bcc:" [CFWS] [address-list] CRLF
resent-msg-id = "Resent-Message-ID:" [CFWS] msg-id CRLF
trace = [return] 1*received
return = "Return-Path:" [CFWS] path CRLF
path = ("<" [CFWS] [addr-spec] ">" [CFWS]) / obs-path
received = "Received:" [CFWS] name-val-list ";" [FWS]
date-time CRLF
name-val-list = [*(name-val-pair CFWS) name-val-pair]
[N.B. 2822 specification does not provide for mandatory CFWS at end of
list (as opposed to RFC 821 (required <SP>) and 2821)
[name-val-pair CFWS *(name-val-pair CFWS)]
]
name-val-pair = item-name CFWS item-value
item-name = ALPHA *(["-"] (ALPHA / DIGIT))
item-value = 1*angle-addr / addr-spec / atom / domain / msg-id
optional-field = field-name ":" [FWS] unstructured CRLF
field-name = 1*ftext
ftext = %d33-57 / ; Any character except
%d59-126 ; controls, SP, and
; ":".
obs-qp = "\" (%d0-127)
[N.B. unnecessary]
obs-text = %d0-127
[N.B. original 2822 specification was as obs-utext in this file, which
permitted multiple characters]
obs-char = %d0-9 / %d11 / ; %d0-127 except CR and
%d12 / %d14-127 ; LF
obs-utext = *LF *CR *(obs-char *LF *CR)
[N.B. was obs-text]
obs-phrase = word *(word / ("." [CFWS]))
obs-phrase-list = phrase / (1*([phrase] "," [CFWS]) [phrase])
obs-FWS = 1*WSP *(CRLF 1*WSP)
obs-date-time = [ day-name [CFWS] "," [CFWS]] obs-date [CFWS]
FWS [CFWS] obs-time [CFWS]
[N.B. obs- rule does not provide for adjacent date and time permitted by
RFC 822]
obs-date = day CFWS month-name CFWS obs-year
[N.B. obs- rule does not permit (e.g.) 1Jan2001 which was permissible
under RFC 822]
obs-year = 2*DIGIT
obs-time = obs-time-of-day CFWS (zone / obs-zone)
[N.B. obs- rule does not permit adjacent time and zone, which was
permissible under RFC 822]
obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] ":"
[[CFWS] second]
obs-zone = "UT" / "GMT" / ; Universal Time
; North American UT
; offsets
"EST" / "EDT" / ; Eastern: - 5/ - 4
"CST" / "CDT" / ; Central: - 6/ - 5
"MST" / "MDT" / ; Mountain: - 7/ - 6
"PST" / "PDT" / ; Pacific: - 8/ - 7
%d65-73 / ; Military zones - "A"
%d75-90 / ; through "I" and "K"
%d97-105 / ; through "Z", both
%d107-122 ; upper and lower case
obs-angle-addr = "<" [CFWS] [obs-route] addr-spec ">" [CFWS]
obs-route = obs-domain-list ":" [CFWS]
obs-domain-list = "@" [CFWS] domain *(1*("," [CFWS]) "@" [CFWS]
domain)
obs-local-part = word *("." [CFWS] word)
obs-domain = atom *("." [CFWS] atom)
obs-mbox-list = 1*([mailbox] "," [CFWS]) [mailbox]
obs-addr-list = 1*([address] "," [CFWS]) [address]
obs-fields = *(obs-return / obs-received / obs-orig-date /
obs-from / obs-sender / obs-reply-to / obs-to / obs-cc / obs-bcc /
obs-message-id / obs-in-reply-to / obs-references / obs-subject /
obs-comments / obs-keywords / obs-resent-date / obs-resent-from /
obs-resent-send / obs-resent-rply / obs-resent-to / obs-resent-cc /
obs-resent-bcc / obs-resent-mid / obs-optional)
obs-orig-date = "Date" *WSP ":" [CFWS] date-time CRLF
obs-from = "From" *WSP ":" [CFWS] mailbox-list CRLF
obs-sender = "Sender" *WSP ":" [CFWS] mailbox CRLF
obs-reply-to = "Reply-To" *WSP ":" [CFWS] address-list CRLF
obs-to = "To" *WSP ":" [CFWS] address-list CRLF
obs-cc = "Cc" *WSP ":" [CFWS] address-list CRLF
obs-bcc = "Bcc" *WSP ":" [CFWS] [address-list] CRLF
obs-message-id = "Message-ID" *WSP ":" [CFWS] msg-id CRLF
obs-in-reply-to = "In-Reply-To" *WSP ":" [CFWS] *(phrase / msg-id)
CRLF
obs-references = "References" *WSP ":" [CFWS] *(phrase / msg-id) CRLF
obs-msg-id = "<" [CFWS] addr-spec ">" [CFWS]
obs-subject = "Subject" *WSP ":" [FWS] [("cmsg" / "Re:")
[FWS]] unstructured CRLF
[ RFC 1036 sect. 2.2.6 "cmsg" hack, 2.1.4 "Re:" (w/ or w/o space) ]
obs-comments = "Comments" *WSP ":" [FWS] unstructured CRLF
obs-keywords = "Keywords" *WSP ":" [CFWS] obs-phrase-list CRLF
obs-resent-from = "Resent-From" *WSP ":" [CFWS] mailbox-list CRLF
obs-resent-send = "Resent-Sender" *WSP ":" [CFWS] mailbox CRLF
obs-resent-date = "Resent-Date" *WSP ":" [CFWS] date-time CRLF
obs-resent-to = "Resent-To" *WSP ":" [CFWS] address-list CRLF
obs-resent-cc = "Resent-Cc" *WSP ":" [CFWS] address-list CRLF
obs-resent-bcc = "Resent-Bcc" *WSP ":" [CFWS] [address-list] CRLF
obs-resent-mid = "Resent-Message-ID" *WSP ":" [CFWS] msg-id CRLF
obs-resent-rply = "Resent-Reply-To" *WSP ":" [CFWS] address-list CRLF
obs-return = "Return-Path" *WSP ":" [CFWS] path CRLF
obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";"
[CFWS] obs-date-time ] CRLF
[N.B. RFC 822 required date-time stamp]
[N.B. reference online version of 2822 specification does not permit WSP
before colon if date-time stamp is used; RFC 822 permitted (nay, required)
"Received" *WSP ":" [CFWS] name-val-list ";" [CFWS]
obs-date-time CRLF
]
obs-path = obs-angle-addr
obs-optional = field-name *WSP ":" [FWS] unstructured CRLF
--------------------------------------------------------------------------------
Notes not part of modified grammar:
For LR(1) parser compatibility, lexical tokens are grouped such that
trailing
WS, FWS, or CFWS is associated with its preceding lexical token. Therefore,
no lexical token handled by the higher-level parser grammar rules has any
ambiguity associated with optional WS, FWS, or CFWS. So, where this revised
grammar has:
obs-mbox-list = 1*([mailbox] "," [CFWS]) [mailbox]
that is handled by the implementation as:
obs-mbox-list = 1*([mailbox] ("," [CFWS])) [mailbox]
Additional rules such as:
start = (":" [FWS]) / obs-start
obs-start = *WSP ":" [FWS]
cstart = (":" [CFWS]) / obs-cstart
obs-cstart = *WSP ":" [CFWS]
dstart = start / obs-cstart
can be used to reduce the number of rules, e.g.:
orig-date = "Date" dstart date-time CRLF
(eliminating obs-orig-date (also applies to resent-date))
subject = "Subject" start ["cmsg" [FWS]] unstructured CRLF
(eliminating obs-subject (start also applies to comments and
optional-field))
from = "From" cstart mailbox-list CRLF
(eliminating obs-from (cstart applies to remaining header fields))
etc., allowing all of the obs- header fields to be eliminated, and
obs-fields to
be simplified.
And adding:
resent = "Resent-"
allows:
resent-from = resent from
etc., allowing the resent- fields to be simplified and ensuring that the
definitions remain in sync between base and resent- versions.
> a) 822 (as amended by RFC 2047 section 5 and as further amended by RFC
> 2231) had it,
> though spread out over three documents
Correction:
2047 and 2231 are also amended by the errata page, so that's a total of
4 documents.
>I have an older version which does not have the encoded-word grammar
>(full text below).
Thanks for sending it.
>From an implementor's perspective, I'd like to see all of the
>relevant base grammar (i.e.
>base field and supporting grammar) in a single document; indeed, one
>of the benefits of
>2822 is that it consolidated most of the "... amends RFC 822"
>piecemeal details into a
>single document (obviously, the 2047/2231 amendments somehow didn't
>make it into
>2822).
One of the rules we lived under during DRUMS (the WG that produced
2822) was that we would not include anything from MIME so that this
document could (if it needed to) make it to full Standard before MIME
did. (You can't make normative reference to standards lower on the
standards track.) Once both the base mail format and 2047/2231 make
it to full Standard, I think it would be wise to combine them into a
single document.
>I don't believe there is any harm in including the encoded-word grammar as
>encoded-words appear in the higher-level constructs as alternatives
>to ccontent, word,
>and utext.
Except that there would have to be explanation of these terms
referring to 2047/2231. And this would surely be more than a simple
syntactic change to 2822. I think this is out of the question.
pr
--
Pete Resnick <http://www.qualcomm.com/~presnick/>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102
I've now gone through Bruce Lilly's suggested changes to the RFC 2822
grammar.
Generally, changes mostly seem to be removal of FWS and CFWS to the
left of other tokens. For example, in "From: A b <c...@d.ef>" 2822
permits the space after b to be matched by either the trailing [CFWS]
in the Atom rule or by leading one in angle-addr. Bruce changes the
grammar such that there aren't leading CFWS/FWS invocations, and adds
other CFWS/FWS invocations in other places to compensate.
In general I like his changes. (I didn't expect to. 2822 is not some
draft-blah-blah-00 whose syntax may be changed lightly.) There are
some details, though. As always.
1. Bruce elimites obs-qp, even though it can match a few pairs
quoted-pair cannot, such as "\" DEL.
2. I think I found a case in which Bruce permits CFWS where it wasn't
before, although I cannot decipher my very cramped margin notes. It
was something like "To: blah: (cfws) a...@b.org;".
3. The date syntax is reordered, and four-digit years seem to have
disappeared from the obsolete syntax. This looks like a good idea:
obs-year = 2DIGIT / 4DIGIT
4. obs-time-of-day looks wrong.
obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] ":" [[CFWS]
second]
Better:
obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] [":" [CFWS] second]
5. obs-received merits discussion on its own. RFC 2822 says
obs-received = "Received" *WSP ":" name-val-list CRLF
which Bruce changes to
obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";" [CFWS] obs-date-time ] CRLF
An incompatible change, but perhaps correct.
6. I don't like Bruce's changes to subject. They mix in RFC 1036
syntax, which IMO does not belong in 2822.
7. There may be a few cases where FWS in 2822 is replced by CWFS in
Bruce's grammar. It's a little hard to tell. For example, in Bruce's
grammar there always is CFWS following field-name ":", I'm not sure
the same holds for 2822. It may.
8. obs-domain-list is changed in a way I don't understand, from
obs-domain-list = "@" domain *(*(CFWS / "," ) [CFWS] "@" domain)
to
obs-domain-list = "@" [CFWS] domain *(1*("," [CFWS]) "@" [CFWS] domain)
I may have missed something; I did this on the airplane and finished
typing it down after arriving 20 hours late. I do not like the Mumbai
airport.
Arnt
>Pete Resnick wrote:
>> I notice that the ABNF in there has a few things that are non-2822
>> such as encoded words. Do you have a copy of the ABNF which is purely
>> the 2822 replacement that we can post to the ietf-2822 list?
>I have an older version which does not have the encoded-word grammar
>(full text below).
>Following is the text of the modified grammar w/o encoded-word grammar,
>interspersed
>with some notes:
>
>rfc2822grammar_simplified.txt version 0.13 2001/08/08 16:02:35
>excerpted from RFC 2822 and modified by Bruce Lilly
>
>
>quoted-pair = ("\" text)
>[N.B. had redundant obs-qp alternative]
I think not. The obs version allows \NUL, \CR and \LF, which the regular
version does not.
>[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
Bruce gives many examples of differences from RFC 822. I will leave it to
others to comment on the rights and wrongs, but some of them certainly
look like bugs in RFC 2822 to me.
>atom = 1*atext [CFWS]
And here starts the changes to remove ambiguities from the grammar.
Essentially, we want to get rid of cases where the grammar can produce two
CFWSs side by side (because it makes it harder for the human reader to
puzzle it out, and it makes it harder to derive automatic parsers, whether
LR(1) or otherwise).
When I first encountered this syntax, I convinced myself that it would be
more trouble than it was worth, and hugely increase the size of the
grammar, to remove these cases. Now Bruce has unconvinced me.
Essentially, he arranges that all the relevant constructs can be followed
by a CFWS (usually optional) but never preceded by one. So he has removed
around a dozen occurrences of [CFWS] from the syntax, and inserted around
30 new occurrences. AFAICS his revisions all work, and are worthwhile.
I have a few niggles, and there are still some other ambiguities which he has
not touched, which I will come to later.
>date-time = ([ day-name "," [FWS]] date FWS time [CFWS]) /
>obs-date-time
All the obs- versions of month, day, etc are gone, and are replaced by a
single rule for obs-date-time. On the face of it, this is a good move, but
it leads to problems later on, as we shall see.
>zone = ( "+" / "-" ) 4DIGIT
>[N.B. no CFWS between +- and 4DIGIT]
Indeed. Are you saying that such was allowed in RFC 822?
>message = (fields / obs-fields) [CRLF body]
But this rule leads to horrendous ambiguities, with no prospect of
avoiding them in less than 50 pages of syntax :-( . But I shall defer
discussion of that till later, because there are other issues with it.
>subject = "Subject:" [FWS] [("cmsg" / "Re: ") [FWS]]
>unstructured CRLF
>[ RFC 1036 sect. 2.2.6 "cmsg" Subject hack, sect. 2.1.4 "Re: " ]
AAARRRRRGGGGGGGGGHHHHHHHHHHH!
Please, no "Re: " or "cmsg" in the syntax. Quite apart from the introduced
ambiguity (the 'unstructured' could begin with those things anyway), the
"Re: " convention is better described by verbiage in the semantics. We
have removed it from the syntax of Usefor (and Bruce was one of the people
who urged that). It remains as a semantic convention with wording with
much the same effect as the wording currently in RFC 2822 (and even that
is not cast in concrete yet). As for "cmsg", that convention is no longer
implemented anywhere AFAIK; Usefor says it MUST NOT be used for any
semantic effect, though it still recommends putting it there for old
time's sake when a Control header is present (though I am now dubious
about even that).
>path = ("<" [CFWS] [addr-spec] ">" [CFWS]) / obs-path
I think it would be better to say
path = angle-addr / "<" [CFWS] ">" [CFWS]
That way you avoid the need for obs-path (obs-angle-address takes care of
it)
Now we come to the obs- syntax, where there are still many ambiguities. As
things stand, sometimes the obs- syntax allows something that is already
in the regular syntax (that is ambiguous). OTOH, sometimes it does not,
and sometimes it allows only a part of what is in the regular syntax, all
of which can be very confusing to the reader who tries to work out exactly
how the obs- syntax differs from the regular.
It would be much better for each bit of obs- syntax to produce only the
extra bits which are not already in the regular stuff, and as I shall show
this is quite easily done.
>obs-qp = "\" (%d0-127)
>[N.B. unnecessary]
But if you do keep it, all it needs is:
obs-qp = "\" ( NUL / LF / CR )
>obs-text = %d0-127
should be:
obs-text = NUL / LF / CR
>obs-char = %d0-9 / %d11 / ; %d0-127 except CR and
> %d12 / %d14-127 ; LF
>
It would be clearer to say:
obs-char = utext / NUL / WSP
>obs-utext = *LF *CR *(obs-char *LF *CR)
>[N.B. was obs-text]
No, that does not work because it allows CRLF not followed by WSP in the
middle of an 'unstructured'. I think the only way out of that is to
rewrite the rule for 'unstructured':
unstructured = *(utext [FWS]) obs-ltext
obs-utext = (1*LF *CR / 1*CR) obs-char / NUL
obs-ltext = *LF *CR
>obs-phrase = word *(word / ("." [CFWS]))
OK, but that is not "obsolete". It is intended as an extension to be
allowed sometime in the future on a "MUST accept, SHOULD NOT generate yet"
basis. So please can we rename it as 'extended-phrase' (which is what I
have currently put in Usefor).
>obs-phrase-list = phrase / (1*([phrase] "," [CFWS]) [phrase])
To be truly "obs-", that needs to contain at least one occurrence of two
"."s with no phrase between them. A syntax to achieve this would be:
obs-phrase-list = phrase *("," [CFWS] phrase)
2*("," [CFWS])
*(1*("," [CFWS]) phrase)
>obs-FWS = 1*WSP *(CRLF 1*WSP)
should be:
obs-FWS = 2*(*WSP CRLF 1*WSP)
>
>obs-date-time = [ day-name [CFWS] "," [CFWS]] obs-date [CFWS]
>FWS [CFWS] obs-time [CFWS]
Essentially, if all the CFWS are, in fact, FWS, then it is regular;
otherwise, it is obs-. But it would be a pain to write it all as one rule
that way. However, if you were to restore the separate obs-month, obs-day,
etc as in RFC 2822, I think it could be done quite easily.
>obs-angle-addr = "<" [CFWS] [obs-route] addr-spec ">" [CFWS]
should be:
obs-angle-addr = "<" [CFWS] obs-route addr-spec ">" [CFWS]
>obs-local-part = word *("." [CFWS] word)
That is a tricky one. Essentially, a phrase can consist of a collection of
atoms and quoted-strings separated by ("." [CFWS]). To be considered
"obs", it has to contain, somewhere, either a genuine CFWS, or an
(atom "." quoted-string), or a (quoted-string "." atom), or a
(quoted-string "." quoted-string). If it does not have one of those
somewhere, it is regular. Here is the syntax to do it:
obs-local-part = *(word ".") word "." CFWS word *("." [CFWS] word) /
*(atom ".") atom "." quoted-string *("." word) /
*(quoted-string ".") quoted-string "." atom *("." word) /
1*(quoted-string ".") quoted-string
>obs-domain = atom *("." [CFWS] atom)
should be:
obs-domain = dot-atom 1*("." CFWS dot-atom)
>obs-mbox-list = 1*([mailbox] "," [CFWS]) [mailbox]
Needs same treatment as obs-phrase-list.
>obs-addr-list = 1*([address] "," [CFWS]) [address]
Needs same treatment as obs-phrase-list.
>obs-fields = *(obs-return / obs-received / obs-orig-date /
>obs-from / obs-sender / obs-reply-to / obs-to / obs-cc / obs-bcc /
>obs-message-id / obs-in-reply-to / obs-references / obs-subject /
>obs-comments / obs-keywords / obs-resent-date / obs-resent-from /
>obs-resent-send / obs-resent-rply / obs-resent-to / obs-resent-cc /
>obs-resent-bcc / obs-resent-mid / obs-optional)
>
>obs-orig-date = "Date" *WSP ":" [CFWS] date-time CRLF
If you retain the ambiguous syntax for 'message' then this rule, and the
following ones like it, are fine. But if you do away with that syntax (see
below), then they would need to be changed to
obs-orig-date = "Date" 1*WSP ":" [CFWS] date-time CRLF
in order to disambiguate them from their regular counterparts. Also, some
of them which include explicit obs- syntax would need some attention.
>obs-subject = "Subject" *WSP ":" [FWS] [("cmsg" / "Re:")
>[FWS]] unstructured CRLF
>[ RFC 1036 sect. 2.2.6 "cmsg" hack, 2.1.4 "Re:" (w/ or w/o space) ]
But, please no, "Re: " or "cmsg".
>obs-path = obs-angle-addr
Not needed if my suggestion above for 'path' is adopted.
Not back to the message grammar:
>message = (fields / obs-fields) [CRLF body]
That is trying to kill two birds with one stone:
1) to force some order into the regular headers;
2) to allow WSP before the ":", and also a few extra obs- features
The problem is that everything that turns up in fields also turns up in
obs-fields. Hence it is ambiguous (and I doubt it is LR(1) either). And it
would be totally unfixable by writing further syntax (except as a
theoretical possibility. So the alternatives are
a) Leave it ambiguous, and admit that it is so, or
b) Enforce the ordering of the headers by verbiage, rather than
syntactic means.
However, before doing either of those, please reconsider whether the
ordering you are trying to enforce is too rigid. Currently, RFC2822
requires:
1. Return-Path
2. 1*Received
3. *Resent-xxx
4. Other headers
Yes, it is a good idea that tracing headers be added at the top, so you
can tell the order in which the message passed through various agents, but
there are some useful cases which have been excluded, for example:
Received: from D by E
Received: from C by D
Resent-To: bar@E
Resent-From: foo@C
Received: from B by C
Received: from A by B
IOW, why forbid keeping a record of how it travelled from its origin to
the place where it was resent?
Here is another example (a real one this time, which some readers of
uk.net.news.management may recognize):
Received: from lon-mail-1.gradwell.net (localhost [127.0.0.1])
by clerew.man.ac.uk (8.11.7+Sun/8.11.7) with ESMTP id i05HCjF01021
for <c...@clerew.man.ac.uk>; Mon, 5 Jan 2004 17:12:45 GMT
Delivered-To: postmaster@A
Received: (qmail 81124 invoked by uid 800); 5 Jan 2004 12:54:22 -0000
Delivered-To: forward...@clerew.man.ac.uk
X-Gradwell-SpamScore: ssss
X-Gradwell-SpamScore: ssss
X-Gradwell-Mailfilter: Spam detected by SpamAssassin with 4.0 hits
(3 required)
X-Gradwell-Mailfilter: SpamAssassin hits were PRIORITY_NO_NAME
RCVD_IN_DYNABLOCK RCVD_IN_SORBS X_PRIORITY_HIGH
X-Envelope-To: c...@clerew.man.ac.uk
X-Forwarding-To: c...@clerew.man.ac.uk
Received: (qmail 80864 invoked from network); 5 Jan 2004 12:54:02 -0000
Received: from newred.gradwell.net (193.111.200.20)
by lon-mail-1.gradwell.net with SMTP; 5 Jan 2004 12:54:02 -0000
Received: (qmail 12659 invoked by uid 1148); 5 Jan 2004 12:54:01 -0000
Mailing-List: contact committ...@usenet.org.uk; run by ezmlm
Reply-To: comm...@usenet.org.uk
List-Post: <mailto:comm...@usenet.org.uk>
List-Help: <mailto:committ...@usenet.org.uk>
Delivered-To: mailing list comm...@usenet.org.uk
Received: (qmail 12622 invoked from network); 5 Jan 2004 12:54:00 -0000
Received: from lon-mail-2.gradwell.net (193.111.201.126)
by newred.gradwell.net with SMTP; 5 Jan 2004 12:54:00 -0000
Received: (qmail 71512 invoked by uid 800); 5 Jan 2004 12:54:00 -0000
Delivered-To: forwarding...@usenet.org.uk
X-Gradwell-SpamScore: ssss
X-Gradwell-Mailfilter: Not Spam, SpamAssassin hits of 4.0 (5 required)
Received: (qmail 71352 invoked from network); 5 Jan 2004 12:53:49 -0000
Received: from host217-42-124-162.range217-42.btcentralplus.com
(HELO smtp-relay.vlaad.co.uk) (217.42.124.162) by lon-mail-2.gradwell.net
with SMTP; 5 Jan 2004 12:53:49 -0000
Received: from gst-group.co.uk (localhost [127.0.0.1]) by
smtp-relay.vlaad.co.uk with SMTP (Mailtraq/2.4.0.1534) id SMTPE9B4633C;
Mon, 05 Jan 2004 12:53:22 -0000
Mime-Version: 1.0
.....
Now there are all sorts of perfectly genuine "tracing headers" in there,
all added in transit, and all useful. Some of them are X-headers (so you
need the concept of an "X-tracing-header"). Some, like "Delivered-To"
probably should have been X-headers. Some of them, like the "List.*" ones
are properly defined by an RFC, just not by RFC 2822. And that "Reply-To"
in the middle was added by the mailing list expander, as its position
indicates.
It is non-conformant with RFC 2822 but, IMO, it ought not to be.
So what RFC 2822bis really needs is some careful discussion of what
tracing headers are and how they are to be added, and not a rigid syntax.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
>1. Bruce elimites obs-qp, even though it can match a few pairs
>quoted-pair cannot, such as "\" DEL.
Yup, I think that's Bruce's mistake.
>2. I think I found a case in which Bruce permits CFWS where it
>wasn't before, although I cannot decipher my very cramped margin
>notes. It was something like "To: blah: (cfws) a...@b.org;".
2822 allows CFWS there.
>3. The date syntax is reordered, and four-digit years seem to have
>disappeared from the obsolete syntax.
Nope. 2*DIGIT means 2 or more digits, which is correct.
>4. obs-time-of-day looks wrong.
>
> obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] ":" [[CFWS]
>second]
>
>Better:
>
> obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] [":" [CFWS] second]
Yup, that looks better.
>5. obs-received merits discussion on its own. RFC 2822 says
>
> obs-received = "Received" *WSP ":" name-val-list CRLF
>
>which Bruce changes to
>
> obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";"
>[CFWS] obs-date-time ] CRLF
>
>An incompatible change, but perhaps correct.
Yes, I think Bruce fixes a bug in 2822 here.
>6. I don't like Bruce's changes to subject. They mix in RFC 1036
>syntax, which IMO does not belong in 2822.
I agree. These should be removed.
>7. There may be a few cases where FWS in 2822 is replced by CWFS in
>Bruce's grammar. It's a little hard to tell. For example, in Bruce's
>grammar there always is CFWS following field-name ":", I'm not sure
>the same holds for 2822. It may.
I believe the only place where this is an issue is the Date: field,
and he got that one correct. I see no problems on this one.
>8. obs-domain-list is changed in a way I don't understand, from
>
> obs-domain-list = "@" domain *(*(CFWS / "," ) [CFWS] "@" domain)
>
>to
>
> obs-domain-list = "@" [CFWS] domain *(1*("," [CFWS]) "@" [CFWS] domain)
This was a bug in 2822. It allowed "@examp...@stupid.example",
where there is a required comma before the second "@".
>In <3FF7A5FC...@verizon.net> bli...@verizon.net writes:
>
>>quoted-pair = ("\" text)
>>[N.B. had redundant obs-qp alternative]
>
>I think not. The obs version allows \NUL, \CR and \LF, which the
>regular version does not.
Right.
>>[N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
>
>Bruce gives many examples of differences from RFC 822. I will leave
>it to others to comment on the rights and wrongs, but some of them
>certainly look like bugs in RFC 2822 to me.
I think Bruce is wrong on this one. I see ASCII NUL in 822's CHAR,
and that appears in ctext (minus a few characters).
>>zone = ( "+" / "-" ) 4DIGIT
>>[N.B. no CFWS between +- and 4DIGIT]
>
>Indeed. Are you saying that such was allowed in RFC 822?
It's not entirely clear, but it probably is.
>>message = (fields / obs-fields) [CRLF body]
>
>But this rule leads to horrendous ambiguities, with no prospect of
>avoiding them in less than 50 pages of syntax :-( . But I shall
>defer discussion of that till later, because there are other issues
>with it.
This is identical to 2822.
>>subject = "Subject:" [FWS] [("cmsg" / "Re: ") [FWS]]
>>unstructured CRLF
>>[ RFC 1036 sect. 2.2.6 "cmsg" Subject hack, sect. 2.1.4 "Re: " ]
>
>Please, no "Re: " or "cmsg" in the syntax.
Agreed.
>>path = ("<" [CFWS] [addr-spec] ">" [CFWS]) / obs-path
>
>I think it would be better to say
>
>path = angle-addr / "<" [CFWS] ">" [CFWS]
>
>That way you avoid the need for obs-path (obs-angle-address takes care of it)
Seems OK to me.
>Now we come to the obs- syntax, where there are still many
>ambiguities. As things stand, sometimes the obs- syntax allows
>something that is already in the regular syntax (that is ambiguous).
>OTOH, sometimes it does not, and sometimes it allows only a part of
>what is in the regular syntax, all of which can be very confusing to
>the reader who tries to work out exactly how the obs- syntax differs
>from the regular.
I disagree completely. I think it's much easier for the reader to
have some complete pieces in the obs- syntax even if there is
redundancy. For example, I think the horrible hoop you have to jump
through below for obs-local-part:
>obs-local-part = *(word ".") word "." CFWS word *("." [CFWS] word) /
> *(atom ".") atom "." quoted-string *("." word) /
> *(quoted-string ".") quoted-string "." atom
>*("." word) /
> 1*(quoted-string ".") quoted-string
is just nuts. I don't think this is the correct approach, I think it
makes the syntax completely unusable to a reader, and I would
strongly object to anything like it.
I'm going to skip all of your examples of this.
>>obs-utext = *LF *CR *(obs-char *LF *CR)
>>[N.B. was obs-text]
>
>No, that does not work because it allows CRLF not followed by WSP in
>the middle of an 'unstructured'.
Yup, because you can have "obs-utext obs-utext" which could be (abc
CR)(LF def).
>I think the only way out of that is to rewrite the rule for 'unstructured':
>
>unstructured = *(utext [FWS]) obs-ltext
>obs-utext = (1*LF *CR / 1*CR) obs-char / NUL
>obs-ltext = *LF *CR
Blech. I'll take a look and see what I can figure out without
resorting to that.
>>obs-phrase = word *(word / ("." [CFWS]))
>
>OK, but that is not "obsolete". It is intended as an extension to be
>allowed sometime in the future on a "MUST accept, SHOULD NOT
>generate yet" basis. So please can we rename it as 'extended-phrase'
>(which is what I have currently put in Usefor).
I am not convinced this is worth it. It's explained perfectly well in the text.
>Currently, RFC2822 requires:
>
>1. Return-Path
>2. 1*Received
>3. *Resent-xxx
>4. Other headers
No, it doesn't. Look at the parens and the repeats. It requires:
*(*(return-path 1*(received)) *(resent-xxx))
followed by other headers.
>Yes, it is a good idea that tracing headers be added at the top, so
>you can tell the order in which the message passed through various
>agents, but there are some useful cases which have been excluded,
>for example:
>
>Received: from D by E
>Received: from C by D
>Resent-To: bar@E
>Resent-From: foo@C
>Received: from B by C
>Received: from A by B
That's legal in 2822.
>Here is another example (a real one this time, which some readers of
>uk.net.news.management may recognize):
>
>Received: from lon-mail-1.gradwell.net (localhost [127.0.0.1])
> by clerew.man.ac.uk (8.11.7+Sun/8.11.7) with ESMTP id i05HCjF01021
> for <c...@clerew.man.ac.uk>; Mon, 5 Jan 2004 17:12:45 GMT
>Delivered-To: postmaster@A
>Received: (qmail 81124 invoked by uid 800); 5 Jan 2004 12:54:22 -0000
[...]
You're right, that's not allowed, and I think that is a bug that
needs to be fixed.
>Now there are all sorts of perfectly genuine "tracing headers" in
>there, all added in transit, and all useful.
So, likely we need optional-field to appear in trace. I think that's
the logical answer.
>
> On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>
>> 1. Bruce elimites obs-qp, even though it can match a few pairs
>> quoted-pair cannot, such as "\" DEL.
>
>
> Yup, I think that's Bruce's mistake.
and
> On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:
> In <3FF7A5FC...@verizon.net> bli...@verizon.net writes:
>
>> quoted-pair = ("\" text)
>> [N.B. had redundant obs-qp alternative]
>
>
> I think not. The obs version allows \NUL, \CR and \LF, which the
> regular version does not.
> Right.
RFC 2822 gives quoted-pair as:
quoted-pair = ("\" text) / obs-qp
and text as:
text = %d1-9 / ; Characters excluding CR and LF
%d11 /
%d12 /
%d14-127 /
obs-text
and obs-text (N.B. included in text) as:
obs-text = *LF *CR *(obs-char *LF *CR)
with obs-char defined as:
obs-char = %d0-9 / %d11 / ; %d0-127 except CR and
%d12 / %d14-127 ; LF
DEL is %d127, which is explicitly included in text and obs-char.
Indeed, text explicitly includes
every US-ASCII character by value except NUL, CR, and LF, and includes
those as well via obs-text
(explicitly permitting a single CR or LF) and obs-char (which includes
NUL). I don't see any single
US-ASCII character which quoted-pair doesn't permit after the
backslash, without having to resort
to obs-qp.
Maybe that wasn't intended by RFC 2822, but there it is.
As an aside, note that obs-text may consist of multiple octets, so
"\foo" could be considered a
quoted-pair ("foo" matches obs-text via *(obs-char *LF *CR)). I believe
that's a problem with 2822.
On the other hand, 822 specifically mentioned the multi-character \CRLF
and gave its semantics,
but 2822 doesn't seem to permit that.
>
> On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:
>
>> In <3FF7A5FC...@verizon.net> bli...@verizon.net writes:
>
>>> [N.B. RFC 822 ASCII NUL not permitted, even with obs- rules]
>>
>>
>> Bruce gives many examples of differences from RFC 822. I will leave
>> it to others to comment on the rights and wrongs, but some of them
>> certainly look like bugs in RFC 2822 to me.
>
>
> I think Bruce is wrong on this one. I see ASCII NUL in 822's CHAR, and
> that appears in ctext (minus a few characters).
But CHAR doesn't appear in 2822. ASCII NUL is excluded from NO_WS_CTL,
and so
cannot be in (2822's) ctext. The point is that some legal 822 messages
cannot be parsed
even with 2822's obs- rules. The specific instance of (unescaped) ASCII
NUL in a comment
is an example of that.
>
> On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>
>> 4. obs-time-of-day looks wrong.
>>
>> obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] ":" [[CFWS]
>> second]
>>
>> Better:
>>
>> obs-time-of-day = hour [CFWS] ":" [CFWS] minute [CFWS] [":" [CFWS]
>> second]
>
>
> Yup, that looks better.
Agreed.
Yes, it was something like that. I remember I thought it was a really
odd case. I'll see if I can find it again when the jetlag's gone.
Anyway, now that you've cleared up the things I misunderstood or didn't
understand, I like Bruce's changes even better. (Thanks, Bruce.)
Arnt
>
> On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>
>> 5. obs-received merits discussion on its own. RFC 2822 says
>>
>> obs-received = "Received" *WSP ":" name-val-list CRLF
>>
>> which Bruce changes to
>>
>> obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";"
>> [CFWS] obs-date-time ] CRLF
>>
>> An incompatible change, but perhaps correct.
>
>
> Yes, I think Bruce fixes a bug in 2822 here.
Received has issues, both w.r.t. 2822 and 2821.
First an historical overview: Received was first specified in RFC 821,
also known as a
"time stamp line". There have been, and still are, discrepancies
between the 821/822 and
2821/2822 definitions of the field body. Received is one of the trace
fields, and as noted
in RFC 1123, was initially primarily examined by hand. 1123 permitted
adding information
actually useful for tracing (i.e. the peer IP address, as opposed to the
HELO string which is
far too easily forged), however gave no syntax for doing so [common
practice has been to
include it in a comment].
Nowadays, tracing Received fields by hand is far too labor-intensive.
Unfortunately, due
to the exponential growth of spam, it is necessary. We now have the
unfortunate situation
where the required information (at least in 2821) includes the
easily-forged HELO/EHLO
string, and the useful not-so-easily-forged connection information is
relegated to an optional
construct. Worse, in 2821 that optional construct is specified as some
sort of structured
comment, indeed it is indistinguishable from a comment.
What is really desirable is something that has reliable trace
information in machine-readable
form, perhaps with the easily-forged information relegated to comments.
But that's another
discussion...
Prior to 2822, a time stamp line (a.k.a. Received field) always required
a time stamp. 2822
permits a time stamp line w/o a time stamp, which is an oxymoron. I
don't personally like
that (IMO the time stamp should be mandatory), but that's what 2822
says, and that's part
of what is included in the syntax above.
Another issue, not included above, is an incompatibility which has crept
in. 821/822 required
SP before the semicolon which delimits the start of the date-time.
However, most MTAs
incorrectly omit that space [in some, such as sendmail, that can be
easily rectified via a run-
time configuration patch; others, such as qmail, have that error
hard-coded]. 2821 requires
CFWS immediately before the semicolon, but 2822 makes it optional.
Given 821/822/2821's
requirements, I'd be inclined to revise 2822 to make at least CFWS
mandatory before the
semicolon when generating a message, and given past (clearly wrong, but
quite widespread)
practice, I'd require being able to parse a Received field w/o CFWS
before the semicolon
(via obs- syntax).
One more remaining incompatibility between 2821 and 2822 lies in the
permitted constructs;
2821 permits a quoted string as an item value (via 2821's "String")
whereas 2822 has no
such provision. That shows up in the "id" component, which has a long
history of conflicts
between 821/822 (1123 tried to rectify the conflict, but only added more
confusion).
Yet another incompatibility is that 2821 permits a mix of angle-addrs
(a.k.a. Paths) and
addr-specs (a.k.a. Mailboxes) in a "for" component, whereas 2822 permits
a single addr-spec
or multiple angle-addrs (and no mixture). It turns out that for rather
complicated reasons,
the 2821 provision for multiple addr-specs is rather difficult to
parse. Perhaps 2821's
successor should address that issue; in any event, let's at least remove
the remaining conflicts
between the 2821 and 2822 definitions one way or another.
I note also that there exist broken implementations which generate cruft
that cannot be parsed
even with 2822's exceptionally liberal rules. Here are some real-world
examples:
Received: from web197.nyc01.cbsig.net ([63.240.56.197])
by mx08.mrf.mail.rcn.net with smtp (Exim 3.35 #7)
id 1Af3Xn-0005vx-00
for bli...@erols.com; Fri, 09 Jan 2004 15:47:47 -0500
Received: (qmail 28572 invoked from network); 9 Jan 2004 20:47:13 -0000
Received: from nychubg02.cbs.com (170.20.9.151)
by web197 with SMTP; 9 Jan 2004 20:47:13 -0000
Received: by nychubg02.cbs.com with Internet Mail Service (5.5.2656.59)
id <ZC7JJYV9>; Fri, 9 Jan 2004 15:41:17 -0500
That's one recent example; among the problems:
as noted above, the SP/CFWS-before-semicolon issue
id's other than properly-constructed msg-ids
RFC 821 does not permit day-of-week in the time stamp
missing from and/or by components in some cases
illegal (non-RFC 1700 cruft) in "with" components
there is no defined "Mail" item-name
Received: from panic.noceast.dws.disney.com (panic.corp.disney.com [153.6.248.200])
by mail.disney.com (Switch-3.1.2/Switch-3.1.0) with ESMTP id h9NCwuN4022589
for <bli...@erols.com>; Thu, 23 Oct 2003 05:58:57 -0700 (PDT)
Received: from sm-flor-xc03.wdw.disney.com (sm-flor-xc03.wdw.disney.com [172.16.177.30]) by panic.noceast.dws.disney.com with ESMTP; Thu, 23 Oct 2003 08:55:03 -0400
Received: from sm-flor-xc01.wdw.disney.com ([172.16.177.21]) by sm-flor-xc03.wdw.disney.com with Microsoft SMTPSVC(5.0.2195.5329);
Thu, 23 Oct 2003 08:59:43 -0400
Received: from SM-NYNY-XC01.nena.wdpr.disney.com ([167.13.137.76]) by sm-flor-xc01.wdw.disney.com with Microsoft SMTPSVC(5.0.2195.5329);
Thu, 23 Oct 2003 08:59:42 -0400
Received: from sm-nyny-xm05.nena.wdpr.disney.com ([167.13.137.80]) by SM-NYNY-XC01.nena.wdpr.disney.com with Microsoft SMTPSVC(5.0.2195.6713);
Thu, 23 Oct 2003 08:59:41 -0400
That's even worse; additional problem is that parsing fails after the (illegal) with
component on encountering a lone "SMTPSVC" (and Microsoft was informed about that bug
in Windows 2000 well befor SP1; 3 service packs and as many years later and the bug
still hasn't been fixed (it shouldn't take more than 10 seconds for a competent
programmer to modify the source to a) use a legal value (ESMtp or SMTP) in the with
component, or b) elide the optional with component, or c) put the marketing BS in a
comment)...
For the record, I am NOT in favor of extending the syntax to accept such cruft -- I wish
that certain purveyors of brokenware would clean up their acts.
#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################
>1. Bruce elimites obs-qp, even though it can match a few pairs
>quoted-pair cannot, such as "\" DEL.
No, "\" DEL is in fact permitted by the regular syntax. The missed ones
are "\" (NUL / LF / CR)
>2. I think I found a case in which Bruce permits CFWS where it wasn't
>before, although I cannot decipher my very cramped margin notes. It
>was something like "To: blah: (cfws) a...@b.org;".
No, that was always allowed, because a dot-atom can always be preceded by
CFWS. But it is a good example of how hard it can be to check some of
these cases in the present syntax.
>3. The date syntax is reordered, and four-digit years seem to have
>disappeared from the obsolete syntax. This looks like a good idea:
I don't think 4DIGIT was ever in obs-year. Nor even in RFC 822?
>7. There may be a few cases where FWS in 2822 is replced by CWFS in
>Bruce's grammar. It's a little hard to tell. For example, in Bruce's
>grammar there always is CFWS following field-name ":", I'm not sure
>the same holds for 2822. It may.
Not so. Look at orig-date, for example, where it is explicitly [FWS] that
follows the ":".
>
> On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:
>
>>> obs-utext = *LF *CR *(obs-char *LF *CR)
>>> [N.B. was obs-text]
>>
>>
>> No, that does not work because it allows CRLF not followed by WSP in
>> the middle of an 'unstructured'.
>
>
> Yup, because you can have "obs-utext obs-utext" which could be (abc
> CR)(LF def).
>
>> I think the only way out of that is to rewrite the rule for
>> 'unstructured':
>>
>> unstructured = *(utext [FWS]) obs-ltext
>> obs-utext = (1*LF *CR / 1*CR) obs-char / NUL
>> obs-ltext = *LF *CR
>
>
> Blech. I'll take a look and see what I can figure out without
> resorting to that.
I believe that the only place that comes up is in 2822's "unstructured":
unstructured = *([FWS] utext) [FWS]
and could be rewritten to prevent the problem noted. Or a note could be
added stating
that any CRLF which is part of obs-utext must be followed by WS.
Further note that this is obs-syntax, which is not for generation of new
messages.
Also note that ASCII NUL is included in obs-char.
>Pete Resnick wrote:
>
>>
>>On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>>
>>>1. Bruce elimites obs-qp, even though it can match a few pairs
>>>quoted-pair cannot, such as "\" DEL.
>>
>>
>>Yup, I think that's Bruce's mistake.
Of course, as Bruce points out, DEL is in there, so this is no problem.
Yup, you're right, but that wasn't intended by 2822. Or more to the
point, it's not obvious to me that it was intended that bare CR or
bare LF should appear in the obs- version of body, which is the
result.
>As an aside, note that obs-text may consist of multiple octets, so
>"\foo" could be considered a
>quoted-pair ("foo" matches obs-text via *(obs-char *LF *CR)). I
>believe that's a problem with 2822.
I agree that it is a bug.
>On the other hand, 822 specifically mentioned the multi-character
>\CRLF and gave its semantics, but 2822 doesn't seem to permit that.
\CRLF is a quoted pair with CR followed by a bare LF. The current
syntax of 2822 permits that, leaving aside the question of whether or
not it should.
True. Then again, so is the rest of the message format and mail
transport.
If we really want to make mail traceable, we need to do a bit more than
fix Received. As I see it, we need:
- A message hash function that is invariant across the various kinds of
munging that happens in mail transport, but still good enough for
non-repudiation (though it probably won't be good enough to serve as a
general-purpose signature)
- An originator-id separate from From, MAIL FROM/Return-Path, Reply-To
or Sender that uniquely distinguishes the originator of the message
from other message originators. it doesn't have to actually expose the
originator's name, email address, account name, etc. - it could be a
nonce as long as the originating ISP or organization could trace it to
the actual originator within a reasonable time.
- A new header field which associates the message hash, originator-id,
timestamp, and originating ISP or organization, which is signed by that
originating ISP or organization, and which is easily verifiable by
recipients or MTAs
- A way to ensure that messages get tagged with originator-id when they
are injected into the mail systems (e.g. ISPs blocking port 25 and/or
MTAs refusing to accept incoming mail without originator-ids or with
unverifiable originator-ids)
- If you really wanted to, you could augment Received or add a new
trace field that recomputed the hash at each hop (to show if and where
the message was corrupted in transport)
This would give you a way to associate each message with an identifier
for the originator, issued by the originator's ISP or organization.
Then you'd need some way to ask that ISP or organization "is the guy
who sent this message trustworthy?" And they could say "as far as we
know, he doesn't have many abuse reports and he's been with us for
years" or "he just signed up yesterday" or "this is a trial account, we
have no billing information for him" or "we've had several hundred
abuse reports in the last 3 hours". And the receiving MTA or recipient
could delay, filter, or bounce the message accordingly. Of course,
some ISPs would lie, and some would not support the verification
protocol. But they'd get reputations for those decisions, and they'd
be marginalized.
But do we really want traceability? Or to put it another way, do we
really want to put hooks in the mail system that make mass surveillance
(by governments, or perhaps even by large companies or unscrupulous
ISPs) that much easier?
Keith
Yes.
>> 3. The date syntax is reordered, and four-digit years seem to have
>> disappeared from the obsolete syntax. This looks like a good idea:
>
> I don't think 4DIGIT was ever in obs-year. Nor even in RFC 822?
Sorry, Pete is right, 2*DIGIT was on the page and my brain was seeing
2DIGIT. 2*DIGIT is perfectly okay.
>> 7. There may be a few cases where FWS in 2822 is replced by CWFS in
>> Bruce's grammar. It's a little hard to tell. For example, in Bruce's
>> grammar there always is CFWS following field-name ":", I'm not sure
>> the same holds for 2822. It may.
>
> Not so. Look at orig-date, for example, where it is explicitly [FWS]
> that follows the ":".
Anyway, I think Bruce's stuff is better, because it's easier to read. In
2822 I'd need a long time to be sure where CFWS is permitted (as
opposed to only FWS), in Bruce's grammar it's much easier.
Arnt
On Fri, 16 Jan 2004, Keith Moore wrote:
> The "Received" header is woefully inadequate for spam tracing
True. Then again, so is the rest of the message format and mail
transport.
I don't agree it's "woefully inadequate." I agree that many do not put
all the information they could in a Received header (because it is
optional), but that does not make the Received header itself inadequate.
The syntax could be stricter to make automatic processing easier, but
that does not make it inadequate, either.
What makes spam tracing hard is the fact that spammers lie in the
Received headers, or at least cheat so the wrong information is there.
How would a new syntax fix that?
Or did you have something else in mind?
If we really want to make mail traceable, we need to do a bit more than
fix Received. As I see it, we need:
- A message hash function that is invariant across the various kinds of
munging that happens in mail transport, but still good enough for
non-repudiation (though it probably won't be good enough to serve as a
general-purpose signature)
- A new header field which associates the message hash, originator-id,
timestamp, and originating ISP or organization, which is signed by that
originating ISP or organization, and which is easily verifiable by
recipients or MTAs
Doesn't RFC1847 (security multiparts) already provide both of these, or
at least the framework sans the actual algorithm?
- An originator-id separate from From, MAIL FROM/Return-Path, Reply-To
or Sender that uniquely distinguishes the originator of the message
from other message originators. it doesn't have to actually expose the
originator's name, email address, account name, etc. - it could be a
nonce as long as the originating ISP or organization could trace it to
the actual originator within a reasonable time.
So you want something functionally like the ident protocol built into
the Received header?
- A way to ensure that messages get tagged with originator-id when they
are injected into the mail systems (e.g. ISPs blocking port 25 and/or
MTAs refusing to accept incoming mail without originator-ids or with
unverifiable originator-ids)
I agree in principle, I think. But couldn't a Received already provide
this information if everyone would just do it, i.e., it's optional now?
- If you really wanted to, you could augment Received or add a new
trace field that recomputed the hash at each hop (to show if and where
the message was corrupted in transport)
Why would you forward a message that was discovered to be corrupted?
This would give you a way to associate each message with an identifier
for the originator, issued by the originator's ISP or organization.
Then you'd need some way to ask that ISP or organization "is the guy
who sent this message trustworthy?" And they could say "as far as we
know, he doesn't have many abuse reports and he's been with us for
years" or "he just signed up yesterday" or "this is a trial account, we
have no billing information for him" or "we've had several hundred
abuse reports in the last 3 hours".
So you're looking for an extension to the ident protocol based on the
presence of a "string" inserted in a message by its original point of
submission?
But do we really want traceability? Or to put it another way, do we
really want to put hooks in the mail system that make mass
surveillance (by governments, or perhaps even by large companies or
unscrupulous ISPs) that much easier?
I'm sorry but I just don't see what you have in mind that is worse than
it is today, nor do I see us needing anything that we don't already have
(except perhaps to operationally require it).
Can you please elaborate on the "mass surveillance" you fear?
Jim
This has some general utility, and is also needed for anything along the lines
of the "domain keys" transit validation mechanism.
d/
In the realm of anti-spam, the typical approach to discussion seems to be
first to declare that some portion of the system is woefully inadequate, and
then to propose a new and wonderful solution. The 'then' clause is often
omitted.
So...
When we have some agreement on the information that is needed to facilitate
spam tracing, then we can decide whether it is better to add it to Received or
create a new header.
d/
On Fri, 16 Jan 2004, Dave Crocker wrote:
> > The "Received" header is woefully inadequate for spam tracing
>
> True. Then again, so is the rest of the message format and mail
> transport.
>
> I don't agree it's "woefully inadequate."
When we have some agreement on the information that is needed to
facilitate spam tracing, then we can decide whether it is better to
add it to Received or create a new header.
Absolutely.
And let us not forget some means to validate that the information we do
have or get is accurate and correct. Or is that what Nathaniel and
Keith meant by "traceable?"
Jim
Only if everyone's user agents treated a security multipart containing
a message semantically the same as they would the original message.
With current user agents, having submission servers push messages down
into security multiparts would change the way that messages were
interpreted, processed, and presented by a significant number of
intermediaries and MUAs.
> - An originator-id separate from From, MAIL FROM/Return-Path,
> Reply-To
> or Sender that uniquely distinguishes the originator of the message
> from other message originators. it doesn't have to actually
> expose the
> originator's name, email address, account name, etc. - it could be
> a
> nonce as long as the originating ISP or organization could trace
> it to
> the actual originator within a reasonable time.
>
> So you want something functionally like the ident protocol built into
> the Received header?
There's at least a vague similarity between the ident protocol and one
piece of the package I have in mind, but it might be confusing to make
too many associations between the two.
> - A way to ensure that messages get tagged with originator-id when
> they
> are injected into the mail systems (e.g. ISPs blocking port 25
> and/or
> MTAs refusing to accept incoming mail without originator-ids or
> with
> unverifiable originator-ids)
>
> I agree in principle, I think. But couldn't a Received already provide
> this information if everyone would just do it, i.e., it's optional now?
The two are fairly different. You want to capture the originator-id at
submission time, not at every hop. And you need protocol elements that
Received (not being extensible) cannot convey. To me it makes more
sense to start from whole cloth.
> - If you really wanted to, you could augment Received or add a new
> trace field that recomputed the hash at each hop (to show if and
> where
> the message was corrupted in transport)
>
> Why would you forward a message that was discovered to be corrupted?
Probably because the message might have been corrupted in a way that
makes the hash invalid without actually changing the content of the
message. Something tells me it will be difficult to write a
canonicalization function that accommodates all of the various kinds of
message and header munging that is out there.
> This would give you a way to associate each message with an
> identifier
> for the originator, issued by the originator's ISP or organization.
> Then you'd need some way to ask that ISP or organization "is the
> guy
> who sent this message trustworthy?" And they could say "as far as
> we
> know, he doesn't have many abuse reports and he's been with us for
> years" or "he just signed up yesterday" or "this is a trial
> account, we
> have no billing information for him" or "we've had several hundred
> abuse reports in the last 3 hours".
>
> So you're looking for an extension to the ident protocol based on the
> presence of a "string" inserted in a message by its original point of
> submission?
Again, I think an analogy to ident would be confusing. You would not
be asking this ISP to say "who is on the other end of this connection"
- you'd be asking the ISP to tell you some information about the guy
who sent the message with originator-id field "XXXXX". (Though it is
conceivable that a first-hop MTA might want to use something like ident
as a means to obtain originator-ids for messages that don't have them,
that's not what I'm proposing now.)
> But do we really want traceability? Or to put it another way, do
> we
> really want to put hooks in the mail system that make mass
> surveillance (by governments, or perhaps even by large companies or
> unscrupulous ISPs) that much easier?
>
> I'm sorry but I just don't see what you have in mind that is worse than
> it is today,
Given the apparent aspirations of the current occupant of the US White
House, things could get far, far worse than they are now. I'd far
rather have spammers than Big Brother George any day.
> Can you please elaborate on the "mass surveillance" you fear?
Have you read the Patriot Act lately?
Okay, I'll be more specific. If every message has an originator-id
tag that can be traced to the origin, it becomes fairly simple for the
US Government to insist that all US ISPs (and perhaps, all foreign ISPs
peering with US ISPs) give them a list of mappings from tags to more
recognizable identifiers (such as credit card #s), or that the ISPs
generate tags in such a way that the USG can simply decrypt them to
obtain those identifiers. Given the kind of stuff that is already
authorized by existing laws, (and if not authorized, obtained by
coercion of various kinds) this isn't much of a stretch. And everybody
knows that terrorists use email...
Anyway, the reason my proposal allows originator-ids to be ephemeral is
to make it hard for ordinary people to track messages - I believe
anonymous speech is important - and also out of recognition that
spammers can probably get ephemeral accounts anyway. The solution to
ephemeral accounts is to provide a way for recipients to distinguish
these from ones where the ISP really does know who sent the message.
But I really don't know how to make messages more traceable on one hand
and not enable more surveillance on the other.
Keith
I may be missing the point here but about a year ago I wrote a draft for a
DNS RR record to keep track of the physical location of the A or MX record
as a "physical postal address" just for this purpose, with the intention of
being able to track back the location of SPAM.
With new state laws I thought it was the way to go, and then use MTA use
this information.
Would this not help? If so I have a copy of the draft I am currently
revising.
Al Costanzo
> Yup, you're right, but that wasn't intended by 2822. Or more to the
> point, it's not obvious to me that it was intended that bare CR or
> bare LF should appear in the obs- version of body, which is the result.
OK. This looks like it might need some discussion and non-trivial work
to fix.
>> As an aside, note that obs-text may consist of multiple octets, so
>> "\foo" could be considered a
>> quoted-pair ("foo" matches obs-text via *(obs-char *LF *CR)). I
>> believe that's a problem with 2822.
>
>
> I agree that it is a bug.
I think the revised grammar corrects that.
>> On the other hand, 822 specifically mentioned the multi-character
>> \CRLF and gave its semantics, but 2822 doesn't seem to permit that.
>
>
> \CRLF is a quoted pair with CR followed by a bare LF. The current
> syntax of 2822 permits that, leaving aside the question of whether or
> not it should.
Well, 2822 provides for parsing as described, but as both \CR and lone
LF are obs- constructs,
it does not permit generation. Moreover, the semantics of \CRLF as
given in RFC 822 are quite
different from \CR followed by a lone LF.
If indeed the RFC 822 construct and its semantics are to be deemed
obsolete, there at least ought
to be mention of that fact in the "Differences from Earlier Standards"
appendix. OTOH if it's
to be continued, there ought to be provision for it as a single entity
in the grammar.
> On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>
>> 6. I don't like Bruce's changes to subject. They mix in RFC 1036
>> syntax, which IMO does not belong in 2822.
>
>
> I agree. These should be removed.
IMO, Subject should be unstructured, period. Unfortunately, RFC 1036
introduced two
hacks:
1. "Re: " (which it got wrong!), which *requires* certain actions when
encountered, i.e. it is
effectively part of the syntax. See RFC 1036 sections 2.1.4, 2.2.5.
2. "cmsg", which *requires* certain actions when encountered. See RFC
1035 sections 2.2.6
and 3 (including subsections).
This is the same "Subject" field described in RFCs 822 and 2822.
Now, if an RFC 2822 successor were to repudiate those requirements
spelled out in RFC 1036,
that would be fine.
As things stand now, it is necessary to recognize the "re: " and "cmsg"
hacks in order to comply
with RFC 1036, and there is at least one draft nearing RFC status that
would add "Auto: " to the
list of Subject field hacks.
RFC 2822 currently has verbiage (section 3.6.5) which falls short of a
requirement, however it
does nothing to remove the RFC 1036 requirements. The verbiage also:
1. implies that presence of "Re: " indicates that a message is a reply,
which is incorrect (I daily
receive several spam messages having Subject fields beginning with
"Re: " which are not replies).
2. seems wholly unnecessary. It might as well state that when not used
in a reply, Subject MAY
begin with "Re: ", when used in a reply, Subject MAY begin with
"Qwerty%$@&^%#:_",
when not used in a reply, Subject MAY begin with "Qwerty%$@&^%#:_", etc.
If an RFC 2822 successor were to repudiate the RFC 1036 Subject hack
requirements, then
subject = "Subject:" unstructured CRLF
would be accurate (provided no other hacks are introduced in the interim).
>
> On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:
>
>>> obs-phrase = word *(word / ("." [CFWS]))
>>
>>
>> OK, but that is not "obsolete". It is intended as an extension to be
>> allowed sometime in the future on a "MUST accept, SHOULD NOT generate
>> yet" basis. So please can we rename it as 'extended-phrase' (which is
>> what I have currently put in Usefor).
>
>
> I am not convinced this is worth it. It's explained perfectly well in
> the text.
I believe that the complaint is not about the explanation, but rather
about the label,
which labels as "obs[olete]" something which has never been legal, and which
might at some point in the future be legalized. I.e., a different, more
accurately
descriptive label is being requested.
>I don't think 4DIGIT was ever in obs-year. Nor even in RFC 822?
>
>
2822 obs-year provides for 2+ DIGITs. RFC 822's error in omitting
4-digit years
was amended by RFC 1123.
[quoting Nathaniel Borenstein]
> The "Received" header is woefully inadequate for spam tracing
I'm not quite ready to abandon it, primarily for practical reasons:
+ it's already widely implemented
+ it's already used for spam tracking
Any alternative will require substantial time to pick up enough momentum
to be useful.
Granted, there are some issues that need to be resolved:
- the machine-readable parts should have reliable information
RFC 821 as amended by 1123 seems to permit this (relevant verbiage having
been copied, warts and all, into 2821), though not specified clearly
enough
to ensure interoperable machine-readable implementations.
- of course, some time will be required to migrate existing implementations,
however, I expect that to happen reasonably quickly, certainly more
quickly
than a ground-up replacement
> If we really want to make mail traceable, we need to do a bit more
> than fix Received. As I see it, we need:
>
> [...]
> But do we really want traceability? Or to put it another way, do we
> really want to put hooks in the mail system that make mass
> surveillance (by governments, or perhaps even by large companies or
> unscrupulous ISPs) that much easier?
What I'm after is a means of automating tracing for abuse complaints.
I don't expect:
a) to be able to use Received field content on-the-fly to reject spam
b) to be able to personally identify the individual responsible -- I'll
leave that
to his ISP's abuse department
c) the legal system to be of any use whatsoever regarding spam; nothing
significant
has changed since Jonathan Swift wrote "Gulliver's Travels" (q.v.;
e.g. see
http://swift.thefreelibrary.com/Gullivers-Travels/4-5 and search for
"perplexed")
in 1726.
And no, I certainly don't want a Big Brother mechanism. I expect ISPs
to behave
responsibly and in the community interest (and for the most part, they
do -- they
aren't any happier about having their resources wasted by spammers than
anybody
else).
Here's a concrete example of a recent spam message's Received fields,
manually traced,
with comments interspersed. Given the appropriate information in
machine-readable
form, there's no reason why this couldn't be automated.
Received: from mr06.mrf.mail.rcn.net (207.172.4.25 [207.172.4.25])
by ms07.mrf.mail.rcn.net (Mirapoint Messaging Server MOS 3.2.2-GA
FastPath)
with ESMTP id CDL85988;
Mon, 12 Jan 2004 06:14:33 -0500 (EST)
That's my ISP's mail servers (I recognize both the domain name and IP
address).
Received: from mx03.mrf.mail.rcn.net (mx03.mrf.mail.rcn.net [207.172.4.52])
by mr06.mrf.mail.rcn.net (Mirapoint Messaging Server MOS 3.3.5-GR)
with ESMTP id BGS07944;
Mon, 12 Jan 2004 06:14:31 -0500 (EST)
Another of my ISP's mail servers; note that mr06.mrf.mail.rcn.net in the by
component matches the from component of the Received field which was added
after this one. Note also that the timestamps are close in time and
flowing in the
correct direction.
Received: from pool-207-68-110-60.alt.east.verizon.net ([207.68.110.60])
by mx03.mrf.mail.rcn.net with smtp (Exim 3.35 #4)
id 1Ag01f-0006Td-00; Mon, 12 Jan 2004 06:14:31 -0500
That records whence my ISP's MX receiver received the message; a Verizon
customer.
Had I only the IP address, a reverse lookup would have told me that.
N.B. the time stamp
remains consistent.
Received: from [132.214.45.221] by
pool-207-68-110-60.alt.east.verizon.net; Tue, 13 Jan 2004 07:15:55 +0600
That one's bogus; at the time of receipt, the timestamp was in the future.
The spam complaint went to ab...@verizon.net.
[BTW, w.r.t. other trace fields inserted by ISPs, you might well see
some in the header
of this message...]
that's true of any new information that might be provided regardless of
whether it uses a new header field or if the new information somehow
gets stuffed into the received field. the point is, the existing
information is unreliable and inadequate.
> What I'm after is a means of automating tracing for abuse complaints.
me too. but I don't see how we can do that without providing
non-repudiation. otherwise it becomes easy to DoS somebody by forging
mail as if it were from them and generating lots of complaints about
it.
nor do I think it's sufficient to get ISPs to terminate spammers. what
we need is a way to find out if a message is spam (or if the sender is
a spammer) after the message is sent, but before it is delivered or
read.
> And no, I certainly don't want a Big Brother mechanism. I expect ISPs
> to behave
> responsibly and in the community interest (and for the most part, they
> do -- they
> aren't any happier about having their resources wasted by spammers
> than anybody
> else).
it's pretty hard to behave in the community interest when Big Brother
is twisting your arm.
Keith
Usenet is not email, and email should not be expected to inherit every
feature (or mistake) of Usenet.
Transfer details are irrelevant to the message format. The format is
essentially the same, and
in many contexts (e.g. via IMAP) a message is just a message -- there is
no way to differentiate
"Usenet" and "email". "Re: " is a mistake of Usenet that RFC 2822 has
picked up and run with.
("cmsg" can for the moment be ignored here, since it is only significant
for Usenet transport
software) From the point of view of library code handling the message
format, the fact that
1036 requires "Re: " (and cmsg) to be recognized effectively makes that
part of the syntax;
library code has no way of knowing whether the calling application is
Usenet-specific software,
an MTA, an email-only MUA, a combined email/news UA, or an IMAP client
(for which there
is no Usenet/email distinction).
>> I'm not quite ready to abandon it, primarily for practical reasons:
>> + it's already widely implemented
>> + it's already used for spam tracking
>>
>> Any alternative will require substantial time to pick up enough momentum
>> to be useful.
>
>
> that's true of any new information that might be provided regardless
> of whether it uses a new header field or if the new information
> somehow gets stuffed into the received field. the point is, the
> existing information is unreliable and inadequate.
Use of HELO/EHLO names is unreliable and inadequate. Use of domain names or
domain literals isn't new information.
>> What I'm after is a means of automating tracing for abuse complaints.
>
>
> me too. but I don't see how we can do that without providing
> non-repudiation. otherwise it becomes easy to DoS somebody by forging
> mail as if it were from them and generating lots of complaints about it.
Tracing back through Received fields, after my ISP's fields are
accounted for,
the host named (if named reliably, i.e. not via HELO/EHLO name) as the
source
is one of:
a) the sender's machine
b) one of the sender's ISPs' machines
c) an open relay
d) a resender, such as a mailing list expander
In case a or b, a complaint to the sender's ISP is appropriate; in case
c the operator
of the open relay is essentially the spammer's accomplice and should be
the recipient
of a complaint. In case d, one can trace back further.
> nor do I think it's sufficient to get ISPs to terminate spammers.
> what we need is a way to find out if a message is spam (or if the
> sender is a spammer) after the message is sent, but before it is
> delivered or read.
Tracing after the fact need not be the only tool used to fight spam;
however it is a
valuable tool and its value could be increased by making it more
amenable to automation.
> it's pretty hard to behave in the community interest when Big Brother
> is twisting your arm.
Yes, if Big Brother is forcing you to behave against the community
interest. I know of
no coercion against ISPs that would prevent an ISP from enforcing its
own terms of
service regarding unsolicited bulk email, do you?
Keith correctly and clearly stated the core problem: automating abuse
tracing without providing non-repudiation. It's an impossible goal if
stated in terms of absolutes, though, so let me restate my own vision
of what the goal should be: providing as much automated abuse tracing
as possible without non-repudiation, or (in less technical terms) to
provide the best possible spam tracing without eliminating anonymous
email.
This is a problem that needs to balance privacy rights with law
enforcement. Historically, the way that open societies have typically
dealt with this kind of issue is via the checks and balances of
distributed control. We can go a long way towards that by using
cryptographic tokens to validate ISP's, but requiring legal procedures
to access ISP's records in the course of a spam investigation.
Thus, getting a million complaints about spam messages that were
cryptographically shown to come through an ISP should be enough to get
a warrant to trace the senders, but one or a few complaints should be
subjected to a much higher standard, to prevent police "fishing trips"
in the name of spam control. Automated traceability is a key to
distinguishing between the two cases.
Of course, the previous paragraph would be more precise if, instead of
"ISP" I had said "privacy-sensitive administrative domain." Any
domain-administering entity could choose to make tracing information
from within its domain completely private, as long as it took
responsibility for working with the authorities when it proved to be
the terminal publicly-ascertainable node to which major spam could be
traced. Such domains would likely include large corporations and other
institutions, including those that choose to run anonymous email
gateways and fight spammers in their own ways (for example with
computationally intensive challenge-response systems).
And yes, I realize that the above discussion completely ignores the
jurisdictional issues, but I doubt that it will take more than a decade
or two to work them out, and we need to think in the long term if we
really want to control spam. -- Nathaniel
PS -- Am I afraid that all of this will help Big Brother? You bet.
That's why I want to design as many checks & balances into the system
as we can come up with. But I think the CANSPAM act has made it very
clear that there are likely to be ever more detailed regulations
governing email systems, and I think it would be a losing battle (and
therefore arguably irresponsible) to oppose any particular proposal
without having a more moderate alternative proposal to endorse. The US
congress passed this law, however ill-informedly, because they
correctly perceived a public demand to fix the problem of spam.
Enabling Big Brother is, quite simply, the easiest, laziest solution to
the problem, and therefore it is precisely what will happen unless we
go to the effort of designing a less objectionable one. -- Nathaniel
On Fri, 16 Jan 2004, Keith Moore wrote:
> Doesn't RFC1847 (security multiparts) already provide both of these, or
> at least the framework sans the actual algorithm?
Only if everyone's user agents treated a security multipart
containing a message semantically the same as they would the
original message.
Sure, so your primary concern is that the use of security multiparts
would be a less backwards compatible change than a change to the
Received syntax or adding a new header.
> - A way to ensure that messages get tagged with originator-id when
> they
> are injected into the mail systems
>
> I agree in principle, I think. But couldn't a Received already provide
> this information if everyone would just do it, i.e., it's optional now?
The two are fairly different. You want to capture the originator-id at
submission time, not at every hop. And you need protocol elements that
Received (not being extensible) cannot convey. To me it makes more
sense to start from whole cloth.
I agree with the principle. As to whether it's in a Received header or
in something different I also agree with Dave Crocker's point that we
really need to examine what we want to capture before we can know for
sure where to put it.
> - If you really wanted to, you could augment Received or add a new
> trace field that recomputed the hash at each hop (to show if and
> where
> the message was corrupted in transport)
>
> Why would you forward a message that was discovered to be corrupted?
Probably because the message might have been corrupted in a way that
makes the hash invalid without actually changing the content of the
message. Something tells me it will be difficult to write a
canonicalization function that accommodates all of the various kinds of
message and header munging that is out there.
I can imagine it might be a local policy preference for all messages to
be examined by a person instead of rejecting it if the hash validation
fails. However, it should be the case that except for human review the
message is rejected (for whatever we decide rejection means). There may
be some edge cases where broken mailers do really obscure things to
messages to break the hash, but either you are doing security right or
you don't do it.
> I'm sorry but I just don't see what you have in mind that is worse than
> it is today,
Given the apparent aspirations of the current occupant of the US White
House, things could get far, far worse than they are now. I'd far
rather have spammers than Big Brother George any day.
> Can you please elaborate on the "mass surveillance" you fear?
Have you read the Patriot Act lately?
Okay, I'll be more specific. If every message has an originator-id
tag that can be traced to the origin, it becomes fairly simple for the
US Government to insist that all US ISPs (and perhaps, all foreign ISPs
peering with US ISPs) give them a list of mappings from tags to more
recognizable identifiers (such as credit card #s), or that the ISPs
generate tags in such a way that the USG can simply decrypt them to
obtain those identifiers. Given the kind of stuff that is already
authorized by existing laws, (and if not authorized, obtained by
coercion of various kinds) this isn't much of a stretch. And everybody
knows that terrorists use email...
Anyway, the reason my proposal allows originator-ids to be ephemeral is
to make it hard for ordinary people to track messages - I believe
anonymous speech is important - and also out of recognition that
spammers can probably get ephemeral accounts anyway.
My point is that we are not that from this already. To stretch the
example to the extreme, the US Government could make a law that ISPs
have to make sure they can map a From: email address to an actual person
before accepting submission of a message (in the US of course). Why do
we need a special identifier?
And as you suggest, anonymity is achieved by using a throw-away account
with one of the free providers. We just need to know that we can filter
on such origins if we don't want "trust" or otherwise want them.
Jim
Jim
On Fri, 16 Jan 2004, Al Costanzo wrote:
Date: Fri, 16 Jan 2004 19:42:09 -0500
From: Al Costanzo <a...@akc.com>
To: James M Galvin <galvin+...@eListX.com>, ietf...@imc.org
> What I'm after is a means of automating tracing for abuse complaints.
me too. but I don't see how we can do that without providing
non-repudiation.
It is not clear to me that by "hash" you meant digital signature, but
clearly you need a signature for non-repudiation.
otherwise it becomes easy to DoS somebody by forging mail as if it
were from them and generating lots of complaints about it.
We need to be careful to avoid getting too wrapped up in DoS attacks.
The problem is that it's possible with or without a hash and a
signature.
So, I agree we should understand how any proposal permits or supports
DoS attacks, if it does, and we should certainly avoid any amplification
opportunities, but we're not going to prevent DoS with hash validation.
Jim
I would say that DNS is the perfect location for storing this information,
since unlike mail headers it is more difficult to mung around with by the
casual user and usually administrated properly by ISPs.
With the information stored in the DNS IMO, it gives us a second level of
repudiation to protect us all from spammers and help the US inforce the new
law.
Al
> Keith Moore wrote:
> >
> >> IMO, Subject should be unstructured, period. Unfortunately, RFC 1036
> >> introduced two
> >> hacks:
> >
> >
> > Usenet is not email, and email should not be expected to inherit every
> > feature (or mistake) of Usenet.
> Transfer details are irrelevant to the message format.
But protocols and applications are not.
> The format is essentially the same,
Similarity does not make them equivalent. Nor does it mean, as Keith
points out, that one must cover all aspects of the other.
> and
> in many contexts (e.g. via IMAP) a message is just a message -- there is
> no way to differentiate
> "Usenet" and "email".
Which at most makes it convenient for the grammers to be equivalent. It does
not rise to the level of requiring it.
> "Re: " is a mistake of Usenet that RFC 2822 has
> picked up and run with.
Only to the extent of describing a prose convention. It does not appear in the
ABNF, nor should it IMO.
> ("cmsg" can for the moment be ignored here, since it is only significant
> for Usenet transport
> software) From the point of view of library code handling the message
> format, the fact that
> 1036 requires "Re: " (and cmsg) to be recognized effectively makes that
> part of the syntax;
> library code has no way of knowing whether the calling application is
> Usenet-specific software,
> an MTA, an email-only MUA, a combined email/news UA, or an IMAP client
> (for which there
> is no Usenet/email distinction).
You're making all sorts of assumptions here that just do not hold water: That
it is necessary to handle "Re:" at the same syntactic level, that a library
even needs to concern itself with this detail, that all implementations must
allow for the conflation of netnews material and email material.
The bottom line as far as I'm concerned is that I see no requirement
that "Re:" be handled in the 2822 ABNF. And since it is not a requirement,
it becomes a matter of costs versus benefits. And from what I've seen
so far the costs far exceed the benefits.
Ned
> IMO, Subject should be unstructured, period. Unfortunately, RFC 1036
> introduced two hacks:
[...]
> 2. "cmsg", which *requires* certain actions when encountered. See RFC
> 1035 sections 2.2.6 and 3 (including subsections).
RFC 1036 is not a standard and this is not best practice for Usenet
software. Please do not include this in the grammar.
> Now, if an RFC 2822 successor were to repudiate those requirements
> spelled out in RFC 1036, that would be fine.
RFC 1036 is not a standards track document and therefore should not need
to be explicitly repudiated by another document.
--
Russ Allbery (r...@stanford.edu) <http://www.eyrie.org/~eagle/>
That, and there is a semantic difference between a signed message and a
cryptographically verifiable trace field in a message.
>> - A way to ensure that messages get tagged with originator-id when
>> they
>> are injected into the mail systems
>>
>> I agree in principle, I think. But couldn't a Received already
>> provide
>> this information if everyone would just do it, i.e., it's optional
>> now?
>
> The two are fairly different. You want to capture the
> originator-id at
> submission time, not at every hop. And you need protocol elements
> that
> Received (not being extensible) cannot convey. To me it makes more
> sense to start from whole cloth.
>
> I agree with the principle. As to whether it's in a Received header or
> in something different I also agree with Dave Crocker's point that we
> really need to examine what we want to capture before we can know for
> sure where to put it.
Fine with me. In general I agree with the approach that you decide
what the data model should be before you pick the presentation layer.
(wish more IETF efforts would do the same...)
>> - If you really wanted to, you could augment Received or add a new
>> trace field that recomputed the hash at each hop (to show if and
>> where
>> the message was corrupted in transport)
>>
>> Why would you forward a message that was discovered to be corrupted?
>
> Probably because the message might have been corrupted in a way
> that
> makes the hash invalid without actually changing the content of the
> message. Something tells me it will be difficult to write a
> canonicalization function that accommodates all of the various
> kinds of
> message and header munging that is out there.
>
> I can imagine it might be a local policy preference for all messages to
> be examined by a person instead of rejecting it if the hash validation
> fails. However, it should be the case that except for human review the
> message is rejected (for whatever we decide rejection means). There
> may
> be some edge cases where broken mailers do really obscure things to
> messages to break the hash, but either you are doing security right or
> you don't do it.
Actually I don't see this as a security measure. It doesn't really
protect systems from any kind of attack, and it doesn't authenticate
the message as being authored or witnessed by any publicly known
principal name. It doesn't even have to provide an unimpeachable
assurance of non-repudiation -- it just has to be good enough that it's
infeasible to make large quantities of spam look as if it were
associated with originator-ids other than those controlled by the
spammers.
The US Congress might or might not care, but for a variety of reasons
it would not be appropriate to use From in this way. There is no
defined header field that can be used for this purpose without
conflicting with valid and existing uses of the field. (Sender would
have been the right thing, but it's too widely misused now.)
you have to have something to sign that is derived from the message in
a repeatable fashion. actually it's not the hash function that needs
to be defined (SHA-1 would work fine), rather, it's the
canonicalization function that is applied to a message before computing
the hash.
> otherwise it becomes easy to DoS somebody by forging mail as if it
> were from them and generating lots of complaints about it.
>
> We need to be careful to avoid getting too wrapped up in DoS attacks.
> The problem is that it's possible with or without a hash and a
> signature.
yes, but once the complaint systems are automated then attacks on a
sender using fake reports of abuse from that sender become more
feasible. and chances are the complaints that have originator-id
fields are the ones that will be automated.
one nice thing - if the complaints themselves are required to have
verifiable originator-id fields then attacking a user by sending fake
abuse reports exposes the attacker :)
do we really care where the message was sent from, as opposed to who
sent it?
I could see it being useful in the case of mail that was sent from a
compromised machine (if we could locate the machine very precisely), or
in the case of mail that was sent from a laptop that accessed a random
802.11 network ("drive-by spamming"). But I have a hard time
understanding how DNS could help us get this information, especially in
the latter case.
Also, just as I don't want to expose the actual identity of every
message sender, I wouldn't want to expose the location of every message
sender. (I can see it now - send an e-mail critical of Dubya, and a
G-man knocks on your door within the hour...)
> Moreover, the semantics of \CRLF as given in RFC 822 are quite
> different from \CR followed by a lone LF.
I'm having trouble making sense of \CRLF in RFC 822.
Consider the following example:
From: "foo\
bar" <blah@example>
This could not have been created by starting with a one-line field
and then folding it, because 3.1.1 says that folding may happen
"wherever there may be linear-white-space (NOT simply LWSP-chars)".
Linear-white-space is allowed in qtext, but not in quoted-pair, so there
would be no way for the folding process to wedge a CRLF between the
backslash and the next CHAR.
The only way to parse this field, according to the grammar, is as:
"From" ":" <"> CHAR CHAR CHAR quoted-pair CHAR CHAR CHAR CHAR CHAR <"> ...
From : " f o o \CR LF SP b a r " ...
There is no way to parse the field in a way that involves the
linear-white-space token, and no way to parse it in a way that involves
the CRLF token.
Question: Can this field be unfolded?
3.1.1 says "Unfolding is accomplished by regarding CRLF immediately
followed by a LWSP-char as equivalent to the LWSP-char." But the CRLF
token is not present in the parsing of the field, so we cannot unfold
it.
On the other hand, 3.1.2 says "The field-body may be composed of any
ASCII characters, except CR or LF. (While CR and/or LF may be present
in the actual text, they are removed by the action of unfolding the
field.)" Therefore, we ought to be able to remove the CR and the LF by
unfolding the field.
On the other hand, the grammar clearly allows bare LF, implicitly in
qtext/dtext/ctext and explicitly in text, contradicting the previous
statement from 3.1.2.
On the other hand, 3.4.8 says "Each header field may be represented
on exactly one line consisting of the name of the field and its body,
and terminated by a CRLF; this is what the parser sees." Therefore,
we ought to be able to remove the internal line break by unfolding the
field.
On the other hand, 3.4.5 says "the presence of the quoting character
(backslash) explicitly indicates that the CRLF is data to the quoted
string." How can the line break be data to the quoted string if the
parser doesn't even see the line break as stated in 3.4.8?
By the way, 3.4.3 says that \CRLF within a comment "must be followed
by at least one LWSP-char." The same statement is not made regarding
quoted-strings, but could be inferred. But nothing in the grammar
enforces this. According to the grammar, this is a valid field:
From: "foo\
bar" <blah@example> (hi\
there)
This all seems like a big mess that should be deprecated. And indeed,
in RFC 2822, lone CR, lone LF, and \CR are all relegated to obsolete
syntax. I'm thinking that was a good move.
As for the obsolete grammar, parsing \CRLF as \CR followed by LF is
consistent with the 822 grammar, even if it doesn't seem to jibe with
the phrase "quoted CRLF" in the prose.
AMC
> That, and there is a semantic difference between a signed message and
> a cryptographically verifiable trace field in a message.
Could somebody outline a process whereby a single field or group of
fields in a
message could be signed, with the following conditions:
1. the mechanism is robust w.r.t. common types of message munging
(reordered fields,
possible dropping of fields (obviously, let's assume that the field
that is signed isn't
dropped), addition of trailing whitespace, etc.)
2. the mechanism is not subject to replay attacks (e.g. copying the
signed field from one
message to another)
I believe that S/MIME and PGP/MIME signed messages are robust w.r.t.
those criteria, since
the signed message is itself transfer encoded (if necessary) and
encapsulated via MIME; if the
MIME wrapper's header fields are munged, the signed message may still be
valid -- it is
protected from transport issues via the wrapper and encoding. But I
don't see how one can
do the same for a single header field that is not subject to a replay
attack.
>Bruce Lilly <bli...@verizon.net> wrote:
>
>
>
>>Moreover, the semantics of \CRLF as given in RFC 822 are quite
>>different from \CR followed by a lone LF.
>>
>>
>
>I'm having trouble making sense of \CRLF in RFC 822.
>
>Consider the following example:
>
>From: "foo\
> bar" <blah@example>
>
>This could not have been created by starting with a one-line field
>and then folding it, because 3.1.1 says that folding may happen
>"wherever there may be linear-white-space (NOT simply LWSP-chars)".
>Linear-white-space is allowed in qtext, but not in quoted-pair, so there
>would be no way for the folding process to wedge a CRLF between the
>backslash and the next CHAR.
>
>The only way to parse this field, according to the grammar, is as:
>
>"From" ":" <"> CHAR CHAR CHAR quoted-pair CHAR CHAR CHAR CHAR CHAR <"> ...
> From : " f o o \CR LF SP b a r " ...
>
>There is no way to parse the field in a way that involves the
>linear-white-space token, and no way to parse it in a way that involves
>the CRLF token.
>
>Question: Can this field be unfolded?
>
>
I'd interpret that example as a field which is not folded (the
backslash-escaped CRLF
is not line folding because it is escaped) in which the quoted-string
contains a CRLF. That
CRLF is escaped so that it is preserved for the application (an
unescaped CRLF would be
part of line folding, which would not be visible to the application).
Consider as an
alternative example:
From: "Foo Bar" <"foo\
bar"@example>
[ignoring for the moment whether or not a CRLF in a local-part is either
sensible or
advisable]
>This all seems like a big mess that should be deprecated. And indeed,
>in RFC 2822, lone CR, lone LF, and \CR are all relegated to obsolete
>syntax. I'm thinking that was a good move.
>
>
Perhaps it should be deprecated; on the other hand, 2822 provides no
mechanism for
passing CR, LF, or NUL at the application layer, as quoting is not
permitted for
those octets (except via obs- constructs, which may not be used for
message generation).
If backslash-escaped CR and LF (ignoring NUL for the moment) were
permitted, one
could have:
From: "foo\CR\LFbar" <blah@example>
etc., which ought to present no problems; there is no explicit CRLF pair
on the wire, so
folding/unfolding isn't an issue, the application layer still gets the
CR and LF octets when
parsing the quoted string, and it is backwards-compatible (i.e. that was
legal in 822 and
semantics are unchanged).
But let's clearly document the changes from 822!
>As for the obsolete grammar, parsing \CRLF as \CR followed by LF is
>consistent with the 822 grammar, even if it doesn't seem to jibe with
>the phrase "quoted CRLF" in the prose.
>
Maybe; handling of WSP after \CRLF would seem to be somewhat different
(I agree with
you that 822 isn't quite clear about that)
>RFC 1036 is not a standard and this is not best practice for Usenet
>software. Please do not include this in the grammar.
>
>
RFC 1036's title is "Standard for Interchange of USENET Messages". While it
has no standing as an Internet Standard it certainly claims to be a
standard, and
moreover it is current and is THE RFC addressing message format w.r.t.
Usenet
(RFC 850 having been superseded by 1036, and 1036 not having been amended or
superseded).
I agree that a number of things in 1036 are questionable practice;
Subject field
hacks are one of those questionable practices. But questionable or not,
that's
what 1036 requires.
>RFC 1036 is not a standards track document and therefore should not need
>to be explicitly repudiated by another document.
>
>
As a self-proclaimed standard for message format, and with a number of
discrepancies
with other message format RFCs, 1036 presents a problem for implementors; an
implementor apparently has the following choices:
a) ignore RFC 1036, which means ignoring Usenet, since 1036 is THE RFC
dealing
with Usenet message format
b) take 1036 into account, which imposes structure on the supposedly
unstructured
Subject field
c) pick and choose bits and pieces from the various standards
Option a is certainly a viable choice. Option b leads to the present
discussion.
Option c can lead to incompatibilities and lack of interoperability; it
is the option of
last resort.
Repudiation of the questionable items may provide an out for the hapless
implementor.
Maybe not [old messages still exist, and may need to be parsed in
accordance with old
standards].
One of the lessons to be learned is that it can be extremely difficult
to recover from
poor design decisions.
> RFC 1036's title is "Standard for Interchange of USENET Messages".
> While it has no standing as an Internet Standard it certainly claims to
> be a standard, and moreover it is current and is THE RFC addressing
> message format w.r.t. Usenet (RFC 850 having been superseded by 1036,
> and 1036 not having been amended or superseded).
You're simply wrong about this.
There is no standard document governing the message format of Usenet.
There is one obsolete informational RFC that does not agree with current
practice, and an effort to write a standard which is completely stalled.
Presenting the obsolete informational RFC as a standard because it
contains the word "standard" in the title and no one has yet written
anything better is not doing anyone any favors.
> As a self-proclaimed standard for message format, and with a number of
> discrepancies with other message format RFCs, 1036 presents a problem
> for implementors; an implementor apparently has the following choices:
> a) ignore RFC 1036, which means ignoring Usenet, since 1036 is THE RFC
> dealing with Usenet message format
This is simply nonsensical garbage. The world does not magically stop
working when you don't have an RFC to work from.
> b) take 1036 into account, which imposes structure on the supposedly
> unstructured Subject field
> c) pick and choose bits and pieces from the various standards
It's incomprehensible to me how you can omit the choice that essentially
all Usenet implementors actually take, which is:
d) write code which works with the Usenet messages in the wild and
follows commonly accepted best practice, using RFC 1036 as one of many
guides but not giving it all that much weight
Whether you think this is a viable choice or not, it clearly is because
this is what people actually do. Reality is at odds with your opinions.
> Adam M. Costello wrote:
>
> >Consider the following example:
> >
> >From: "foo\
> >bar" <blah@example>
You have misquoted me (or more likely, your MUA has). The example was:
From: "foo\
bar" <blah@example>
The space before "bar" is crucial, because the alternative (without the
space) is another equally interesting example:
From: "foo\
bar" <blah@example>
In fact, when you presented your own example, I thought you omitted the
space deliberately:
From: "Foo Bar" <"foo\
bar"@example>
But now I suspect that your MUA ate the space, and what you originally
intended was:
From: "Foo Bar" <"foo\
bar"@example>
But in any case both are worth considering.
> I'd interpret that example as a field which is not folded (the
> backslash-escaped CRLF is not line folding because it is escaped) in
> which the quoted-string contains a CRLF.
We can certainly find evidence in RFC 822 to support that
interpretation, but we can also find evidence to doubt it. As I
mentioned, 3.4.8 says "Each header field may be represented on exactly
one line consisting of the name of the field and its body, and
terminated by a CRLF; this is what the parser sees." The proposed
interpretation has the parser seeing the field as two lines (because
the parser sees the line break inside the quoted-string). Also, what
could 3.4.5 be talking about when it says "Quoted CRLFs (i.e., a
backslash followed by a CR followed by a LF) are also subject to rules
of folding"?
Finally, is it valid not to have a space after \CRLF? As in:
From: "Foo Bar" <"foo\
bar"@example>
The grammar allows it, and if we go with the proposed interpretation
that \CRLF is not an instance of folding, then what would require a
space after the \CRLF?
There is a related question for the other example:
From: "Foo Bar" <"foo\
bar"@example>
Is the space part of the local part? 3.4.5 says "Stripping off the
first following LWSP-char is also appropriate when parsing quoted CRLFs"
(the "also" means "similar to the case of unquoted CRLFs"). But why?
Why put the space in only to have it stripped out again? The only
reasonable explanation is that the space is required (this gets back to
the previous question). Only if it's required do you need a rule about
stripping it, so that you can have a quoted-string whose meaning is
fooCRLFbar with no space.
We now have two arguments that space is required after \CRLF, but that
in turn argues that \ CR LF LWSP-char is indeed some sort of folding.
Maybe it's a fold that cannot be unfolded. Except there's still that
pesky statement in 3.4.8 that "Each header field may be represented on
exactly one line".
I don't expect us to be able to settle this. I think RFC 822 is not
self-consistent on this issue.
> If backslash-escaped CR and LF (ignoring NUL for the moment) were
> permitted, one could have:
>
> From: "foo\CR\LFbar" <blah@example>
>
> etc., which ought to present no problems;
Until it gets converted to the local line-ending conventions. Your
example contains all three possibilities: CR not followed by LF, LF not
preceeded by CR, and CRLF (terminating the field). Imagine saving this
message to an mbox file on a Unix machine, where lines are terminated
by LF. How will you do it? Normally CRLF gets translated to LF, but
that's not reversible if the input already contains LF not preceeded by
CR.
Other problems: How would this field display? Could it be cut and
pasted?
I think any sort of control characters in header fields, other than
CRLF (as a unit) and maybe TAB, is asking for headaches. Even TAB is
somewhat troublesome.
> But let's clearly document the changes from 822!
Agreed.
AMC
P.S. The tendency of your MUA to drop spaces at the beginnings of lines
is probably related to its use (or misuse) of format=flowed.
I notice that each of your paragraphs consists of multiple "paragraphs"
in the
format=flowed sense, so that they don't actually flow, but instead end
up looking
like this paragraph.
>Bruce Lilly <bli...@verizon.net> writes:
>
>
>
>>RFC 1036's title is "Standard for Interchange of USENET Messages".
>>While it has no standing as an Internet Standard it certainly claims to
>>be a standard, and moreover it is current and is THE RFC addressing
>>message format w.r.t. Usenet (RFC 850 having been superseded by 1036,
>>and 1036 not having been amended or superseded).
>>
>>
>
>You're simply wrong about this.
>
>
Perhaps, but it will take more than a bald assertion to convince me.
>There is no standard document governing the message format of Usenet.
>
>
RFC 1036 says otherwise.
>There is one obsolete informational RFC that does not agree with current
>practice,
>
According to the latest rfc-index, 1036 has not been obsoleted. It
might well be
your opinion that it should be obsoleted or reclassified, but that does
not affect its
official status. You are of course free to petition for reclassification
of RFC 1036
to historic status (as RFCs 3166, 3638, etc. have done for other RFCs).
> and an effort to write a standard which is completely stalled.
>
>
It seems to be stalled for lack of agreement on just what comprises
"current practice",
not to mention what *should* comprise best practice. That and the
mantra that "the
draft just needs a few minor tweaks" that has been repeated for
literally years. Unless
I've missed something (entirely possible), in spite of new leadership
(as of about a year
ago) and a planned rechartering of the WG that was set to work on a
successor to 1036
some eight years ago in order to deal with "urgent issues", that WG in
fact has no new
charter. That's not a good sign. I share your frustration, and I wish
1036 had long been
obsoleted by an update that incorporated those things upon which the WG
could agree
(as I had suggested quite some time ago). But it hasn't happened, and
the sad fact is
that 1036 is still the current standard.
>Presenting the obsolete informational RFC as a standard because it
>contains the word "standard" in the title and no one has yet written
>anything better is not doing anyone any favors.
>
>
And pretending that there is a well-defined "current practice" does no
favors either.
>>a) ignore RFC 1036, which means ignoring Usenet, since 1036 is THE RFC
>> dealing with Usenet message format
>>
>>
>
>This is simply nonsensical garbage. The world does not magically stop
>working when you don't have an RFC to work from.
>
>
Did I say the world would stop working? Did I say that there isn't an
applicable RFC?
News flash: Usenet is not the world. In the grand scheme of things, it
isn't even close
to being important. It would be difficult to present a convincing
argument that Usenet
isn't the least important application that uses the text message format.
Certainly email,
which is legitimately used (i.e. excluding spam) to a much larger extent
than Usenet, and
which has become an essential tool for commerce, is far more important
than Usenet.
Voice mail, fax, EDI, and even SIP could be argued as more important
than Usenet. Not to
mention the fact that Usenet's historically appallingly low
signal-to-noise ratio (which
seems to keep falling to new lows) is making Usenet largely irrelevant,
it having been
supplanted by weblogs and the like to a large extent.
>c) pick and choose bits and pieces from the various standards
>
>
>
>It's incomprehensible to me how you can omit the choice that essentially
>all Usenet implementors actually take, which is:
>
> d) write code which works with the Usenet messages in the wild and
> follows commonly accepted best practice, using RFC 1036 as one of many
> guides but not giving it all that much weight
>
>
>
That amounts to precisely the same thing as c, viz. picking and choosing
bits and pieces,
and has the predictable and observable consequences noted, viz.
incompatibilities and
interoperability problems. Writing code which attempts to "work with
the Usenet
messages in the wild" amounts to picking and choosing from a plethora of
incompatible
options; is a header field restricted to the set of octets prescribed by
RFCs 1036, 850, 822,
and 2822, or is it "anything goes"? Are charsets other than the default
properly tagged
per the MIME RFCs or left up to the recipient to guess? And exactly
which charset *is*
the default? Does the existence of the magic incantation " Re: " at the
start of a
Subject field body define a message as a reply, or is it the presence of
a References field
that defines a message as a reply? Where are all of these supposedly
well-defined issues
of "current practice" actually defined? Has the cognizant WG come to
rapid agreement
on any of those issues?
>Bruce Lilly <bli...@verizon.net> wrote:
>
>
>
>>Adam M. Costello wrote:
>>
>>
>>
>>>Consider the following example:
>>>
>>>From: "foo\
>>>bar" <blah@example>
>>>
>>>
>
>You have misquoted me (or more likely, your MUA has). The example was:
>
>From: "foo\
> bar" <blah@example>
>
>The space before "bar" is crucial, because the alternative (without the
>space) is another equally interesting example:
>
>
Yes, my MUA (Mozilla 1.6) did it.
>In fact, when you presented your own example, I thought you omitted the
>space deliberately:
>
>From: "Foo Bar" <"foo\
>bar"@example>
>
>
>
I did. It was intended that the CRLF was part of the local-part
(ignoring whether
or not that was sensible or advisable). And I didn't want there to be
confusion about
whether it was CRLF or CRLFSP.
>I don't expect us to be able to settle this. I think RFC 822 is not
>self-consistent on this issue.
>
>
>
Agreed, and that's an indication in favor of deprecation.
>>If backslash-escaped CR and LF (ignoring NUL for the moment) were
>>permitted, one could have:
>>
>>From: "foo\CR\LFbar" <blah@example>
>>
>>etc., which ought to present no problems;
>>
>>
>
>Until it gets converted to the local line-ending conventions. Your
>example contains all three possibilities: CR not followed by LF, LF not
>preceeded by CR, and CRLF (terminating the field). Imagine saving this
>message to an mbox file on a Unix machine, where lines are terminated
>by LF. How will you do it? Normally CRLF gets translated to LF, but
>that's not reversible if the input already contains LF not preceeded by
>CR.
>
>
In this case, there's no reason to unescape before saving, and therefore
there's no
CRLF (as opposed to \CR\LF) to convert.
Conversion of line endings is tricky business, and I'd recommend against
it except when
saving an attachment of text type. I suspect there would be problems on
many systems
saving a message with structure:
multipart/mixed, content-transfer-encoding 8bit
application/octet-stream, content-transfer-encoding 8bit
some binary content including 0x0a 0x0d 0x0a 0x0d
Assuming line endings must be converted, doing it correctly requires
parsing the
MIME structure -- the embedded 0x0d 0x0a in the binary content must not be
altered. In the example if 0x0d 0x0a (i.e. CRLF) is converted to a lone
0x0a, the
four-octet sequence becomes the three-octet sequence 0x0a 0x0a 0x0d. As you
noted, that's not a reversible transformation; it's likely to yield the
five-octet
sequence 0x0d 0x0a 0x0d 0x0a 0x0d.
>Other problems: How would this field display? Could it be cut and
>pasted?
>
>
>
Display is one issue. Cut and paste should be verbatim, i.e. using the
on-the-wire
representation.
>I think any sort of control characters in header fields, other than
>CRLF (as a unit) and maybe TAB, is asking for headaches. Even TAB is
>somewhat troublesome.
>
>
True, including security implications for some control characters.
Probably the
safest for display purposes is to use a textual representation for
control characters,
possibly with some form of highlighting to avoid confusion with literal
text.
>P.S. The tendency of your MUA to drop spaces at the beginnings of lines
>is probably related to its use (or misuse) of format=flowed.
>
>I notice that each of your paragraphs consists of multiple "paragraphs"
>in the
>format=flowed sense, so that they don't actually flow, but instead end
>up looking
>like this paragraph.
>
>
format=flowed is another issue, regarding which I haven't yet added my 2
cents worth.
So here it is, FWIW. Format=flowed is far too complex for text/plain --
it amounts to
a markup language (a minimalist one, perhaps, but markup nevertheless).
Implementation
differences (and/or lack of implementation support) is probably
indicative of the complexity
and incompatibility with text/plain. IMO it would be best for it to
have its own subtype,
just as other markup languages do (e.g. text/html, text/richtext).
There is no shortage of implementors who can be misled into thinking
that 1036 is the thing to support.
Arnt
I think this is a bit too legalistic. RFC 1036 is the best available
document for a widely used and well-known system, which gives it some
standing in practice, and which makes a textual repudiation reasonable
(even if not strictly necessary).
IMO it would be better for an RFC 1036 successor to repudiate this, but
I've heard they're waiting to get RFC number 10036, so that won't
happen soon.
Arnt
- You only include specific fields in the hash, fields that are likely
to be maintained end-to-end.
- You hash the fields in a predetermined order.
- Before hashing each of those fields, you delete preceding and
trailing whitespace, and you replace all internal whitespace (including
CRLF-space) with a single space. you canonicalize the field-name (say,
all lower case letters).
- Some fields might have specific rules for canonicalization before
hashing. e.g. For fields containing addresses you would remove all
phrases and comments, leaving only the addresses; you would then
canonicalize each of the addresses (e.g. any local part that isn't a
quoted string would be enclosed in double quotes, contents of all local
parts would be lower-cased, all domains would be lower-cased; < and >
enclosing addresses would be removed); the resulting addresses would be
sorted; and any white space next to a special would be removed. (I'm
not sure yet whether I think address fields should be included in the
hash, but this is how you might do it if such fields were included)
I don't think it would be easy to work out the details; but I do
believe it's feasible.
A few systems would still mung headers to the point that the
originator-id verification would fail. The consequences would mostly
be these: For systems that munged outgoing mail, their recipients' mail
would not be verifiable. For systems munging incoming mail, their
recipients would not be able to verify originator-ids on mail that they
received or on mail that they forwarded to other systems. In either
case there would be an incentive to fix the implementations that munged
mail.
it might also be desirable to impose some constraints on messages that
are originated with an originator-id, such as limiting the permissible
variation on the format of address fields, date fields, etc.
Note again, the purpose isn't to prove that the sender sent the exact
message that the recipient received. The purpose is to make it
difficult for the sender to credibly say the "I didn't send anything
resembling that message" for large numbers of messages. A sender
should not be penalized for a small number of messages considered
offensive to recipients regardless of whether the sender claims that he
didn't send them.
> 2. the mechanism is not subject to replay attacks (e.g. copying the
> signed field from one
> message to another)
You include the message-id and/or the message body in the fields to be
hashed.
>> This is simply nonsensical garbage. The world does not magically stop
>> working when you don't have an RFC to work from.
> There is no shortage of implementors who can be misled into thinking that
> 1036 is the thing to support.
I know. Which is why it would be really good to replace it. But when
they discover that their software doesn't interoperate, they generally
shift to supporting what's actually present in the wild.
> And pretending that there is a well-defined "current practice" does no
> favors either.
I didn't mean to do that. I am actually arguing the exact opposite by
saying that RFC 1036 is *also* not a well-defined standard to follow since
it won't interoperate with Usenet as it exists today. I am arguing
exactly that Usenet is simply not well-defined, by RFC 1036 or anything
else.
>>> a) ignore RFC 1036, which means ignoring Usenet, since 1036 is THE RFC
>>> dealing with Usenet message format
>> This is simply nonsensical garbage. The world does not magically stop
>> working when you don't have an RFC to work from.
> Did I say the world would stop working? Did I say that there isn't an
> applicable RFC? News flash: Usenet is not the world. In the grand
> scheme of things, it isn't even close to being important.
You missed my point.
You're saying that if you ignore RFC 1036 you have to ignore Usenet. This
is obvious nonsense, since there are reams of software written to work
with Usenet articles that ignore RFC 1036. My point is twofold: one
cannot simply write to something that claims to be a standard because
sometimes it's badly obsolete, and it is actually possible to write
interoperable software without a standard (it's just very annoying).
> It would be difficult to present a convincing argument that Usenet isn't
> the least important application that uses the text message
> format. Certainly email, which is legitimately used (i.e. excluding
> spam) to a much larger extent than Usenet, and which has become an
> essential tool for commerce, is far more important than Usenet. Voice
> mail, fax, EDI, and even SIP could be argued as more important than
> Usenet. Not to mention the fact that Usenet's historically appallingly
> low signal-to-noise ratio (which seems to keep falling to new lows) is
> making Usenet largely irrelevant, it having been supplanted by weblogs
> and the like to a large extent.
This is a completely different discussion that I'm not going to have on
this mailing list. I would prefer that this group *did* ignore Usenet
except for the small and very specific places where exactly following RFC
2822 makes it very difficult to use with Usenet-like applications, such as
allowing whitespace in the middle of message IDs. Other than those very
specific issues, RFC 2822 work would be best served by ignoring Usenet
because Usenet would be best served by ceasing its attempts to tread off
into uncharted territory and moving back to using the same message format
as e-mail.
> That amounts to precisely the same thing as c, viz. picking and choosing
> bits and pieces, and has the predictable and observable consequences
> noted, viz. incompatibilities and interoperability problems.
Okay, fair enough.
> Writing code which attempts to "work with the Usenet messages in the
> wild" amounts to picking and choosing from a plethora of incompatible
> options;
Yes. And that's how you have to write Usenet software right now.
> Where are all of these supposedly well-defined issues of "current
> practice" actually defined?
I did not mean to claim they were well-defined.
> Has the cognizant WG come to rapid agreement on any of those issues?
No. The cognizant WG is completely disfunctional.
>According to the latest rfc-index, 1036 has not been obsoleted. It
>might well be your opinion that it should be obsoleted or
>reclassified, but that does not affect its official status. You are
>of course free to petition for reclassification of RFC 1036 to
>historic status (as RFCs 3166, 3638, etc. have done for other RFCs).
Oy. If I go ahead and get 1036 moved to historic, can we stop having
this discussion?
pr
--
Pete Resnick <http://www.qualcomm.com/~presnick/>
QUALCOMM Incorporated - Direct phone: (858)651-4478, Fax: (858)651-1102
Speaking for myself, I really would like the option to not use bandwidth
receiving email from previously unknown senders unless its source can be
traced.
I think the damage being done by spam flooding my mailbox is somewhat
greater than the damage caused by other agencies monitoring its
content. I'm not inclined to trust my secrets to email, in any case.
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact
>On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:
>>3. The date syntax is reordered, and four-digit years seem to have
>>disappeared from the obsolete syntax.
>Nope. 2*DIGIT means 2 or more digits, which is correct.
So you can a year with 17 digits in it? I don't think that is a good idea.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
>Pete Resnick wrote:
>> I agree. These should be removed.
>IMO, Subject should be unstructured, period. Unfortunately, RFC 1036
>introduced two
>hacks:
>1. "Re: " (which it got wrong!), which *requires* certain actions when
>encountered, i.e. it is
> effectively part of the syntax. See RFC 1036 sections 2.1.4, 2.2.5.
>2. "cmsg", which *requires* certain actions when encountered. See RFC
>1035 sections 2.2.6
> and 3 (including subsections).
>This is the same "Subject" field described in RFCs 822 and 2822.
>Now, if an RFC 2822 successor were to repudiate those requirements
>spelled out in RFC 1036,
>that would be fine.
No! If they are to be repudiated, then it is for an RFC 1036 successor to
do so. Indeed, Usefor is taking care of both of these (though the exact
form of such care is not in concrete yet).
[...]
> I don't think it would be easy to work out the details; but I do
> believe it's feasible.
OK, thanks. I'm not sure that messing with address local-parts is the
right thing to do
(local-parts can be case-sensitive), but I get the idea. Sounds quite
complex, to the
point that I have some doubts about how well different implementations
would work.
I think there are also a few missing bits and pieces which are necessary
to make
such verification useful from a practical (i.e. minimizing manual setup
and intervention)
point of view, but that's a separate issue.
Thanks again.
#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################
>Keith Moore wrote:
>> That, and there is a semantic difference between a signed message and
>> a cryptographically verifiable trace field in a message.
>Could somebody outline a process whereby a single field or group of
>fields in a
>message could be signed, with the following conditions:
>1. the mechanism is robust w.r.t. common types of message munging
>(reordered fields,
> possible dropping of fields (obviously, let's assume that the field
>that is signed isn't
> dropped), addition of trailing whitespace, etc.)
>2. the mechanism is not subject to replay attacks (e.g. copying the
>signed field from one
> message to another)
Usefor did some work on this (but for other reasons). However, it was
decided it was a step too far for the current draft, but a possibility for
a later 'security' document.
The internet-draft has expired now, but you can still find it on
<http://www.landfiled.com/usefor/drafts/draft-lindsey-usefor-signed-01.txt>
It included a very elaborate canonicalization. Even so, I would write it
differently now. And worse things have happened since I wrote it, for
example how do you canonicalize IDNA domain names, as UTF-8 or as
punycode?
>On 1/15/04 at 9:59 PM +0000, Charles Lindsey wrote:
>>Now we come to the obs- syntax, where there are still many
>>ambiguities. As things stand, sometimes the obs- syntax allows
>>something that is already in the regular syntax (that is ambiguous).
>>OTOH, sometimes it does not, and sometimes it allows only a part of
>>what is in the regular syntax, all of which can be very confusing to
>>the reader who tries to work out exactly how the obs- syntax differs
>>from the regular.
>I disagree completely. I think it's much easier for the reader to
>have some complete pieces in the obs- syntax even if there is
>redundancy. For example, I think the horrible hoop you have to jump
>through below for obs-local-part:
Yes, that syntax was pretty ugly, but most of them come out reasonably clean.
But I think the real point is that you need to be consistent. Either the
obs- version of a rule should consistenly include the regular version of
the same rule, or they should be consistently disjoint. Problem is that at
the moment you have neither of these situation.
But whatever is done, I think the text needs to make it clear what policy
has been followed.
>>>obs-phrase = word *(word / ("." [CFWS]))
>>
>>OK, but that is not "obsolete". It is intended as an extension to be
>>allowed sometime in the future on a "MUST accept, SHOULD NOT
>>generate yet" basis. So please can we rename it as 'extended-phrase'
>>(which is what I have currently put in Usefor).
>I am not convinced this is worth it. It's explained perfectly well in the text.
Yes, but it is too easily missed if it is hidden away in the obsolete
stuff, which is how I cam to miss it when constructing the Usefor syntax
until Bruce pointed it out.
It is an extension to the preceding standards. It should be in a section
entitled "extensions", or some such.
>>Currently, RFC2822 requires:
>>
>>1. Return-Path
>>2. 1*Received
>>3. *Resent-xxx
>>4. Other headers
>No, it doesn't. Look at the parens and the repeats. It requires:
OK.
>>Here is another example (a real one this time, which some readers of
>>uk.net.news.management may recognize):
>>
>>Received: from lon-mail-1.gradwell.net (localhost [127.0.0.1])
>> by clerew.man.ac.uk (8.11.7+Sun/8.11.7) with ESMTP id i05HCjF01021
>> for <c...@clerew.man.ac.uk>; Mon, 5 Jan 2004 17:12:45 GMT
>>Delivered-To: postmaster@A
>>Received: (qmail 81124 invoked by uid 800); 5 Jan 2004 12:54:22 -0000
>[...]
>You're right, that's not allowed, and I think that is a bug that
>needs to be fixed.
>>Now there are all sorts of perfectly genuine "tracing headers" in
>>there, all added in transit, and all useful.
>So, likely we need optional-field to appear in trace. I think that's
>the logical answer.
OK. But I still think that the syntax is the wrong place to enforce this.
The number of way of including tracing information is going to increase
(see separate thread currently running), so you need to allow maximum
flexibility for future developments.
Also, my example included a Reply-To header inserted my a mailing list
expander in the midst of the tracings. Was that right or wrong?
>Further note that this is obs-syntax, which is not for generation of new
>messages.
>Also note that ASCII NUL is included in obs-char.
Note also that NULs and naked CR and LF in headers will not be transported
correctly by many transports, and there is nothing corresponding to CTE
Q-P to tunnel them through as there is in the case of bodies.
Which also raises the issue of which parts of the obs-syntax are there
because there exist (or once existed) agents that routinely generated such
things, and which parts are there because they were strictly legal under
RFC 822, but nobody ever actually generated them (except those
participating in obfuscated header contests)?
I have this feeling that we are requiring agents to recognize obs-
features that have never ever been seen in the wild.
>I didn't mean to do that. I am actually arguing the exact opposite by
>saying that RFC 1036 is *also* not a well-defined standard to follow since
>it won't interoperate with Usenet as it exists today. I am arguing
>exactly that Usenet is simply not well-defined, by RFC 1036 or anything
>else.
>
>
OK, I see. Presumably 1036 was reasonably accurate when issued, since
it described
the operation of a widely used implementation. The fact that actual use
has deviated
with no revision to the documentation says more about Usenet than about
1036.
>You're saying that if you ignore RFC 1036 you have to ignore Usenet. This
>is obvious nonsense, since there are reams of software written to work
>with Usenet articles that ignore RFC 1036. My point is twofold: one
>cannot simply write to something that claims to be a standard because
>sometimes it's badly obsolete, and it is actually possible to write
>interoperable software without a standard (it's just very annoying).
>
>
I'm saying it's one option. Picking and choosing is another. Making
things up with
no supporting standard is another -- and you're right about that being
annoying.
>>It would be difficult to present a convincing argument that Usenet isn't
>>the least important application that uses the text message
>>format.
>>
[...]
>This is a completely different discussion that I'm not going to have on
>this mailing list.
>
It simply means that given the alternatives (pick and choose, and
justify the choices made,
or make things up and justify them, it's far simpler to say "Usenet?
Sorry, not supported."
That isn't to say that that's a great choice; it's the least of evils.
> I would prefer that this group *did* ignore Usenet
>except for the small and very specific places where exactly following RFC
>2822 makes it very difficult to use with Usenet-like applications, such as
>allowing whitespace in the middle of message IDs. Other than those very
>specific issues, RFC 2822 work would be best served by ignoring Usenet
>because Usenet would be best served by ceasing its attempts to tread off
>into uncharted territory and moving back to using the same message format
>as e-mail.
>
>
I agree that there are some issues where 2822 could make some things
easier for
some uses of the text message format. And I wholeheartedly agree that
Usenet
should continue to use the common text message format (indeed, RFC 1036
(section 2) goes to great lengths to indicate that is is in fact the
same format).
Unfortunately, some provisions of RFC 1036 have an impact on that common
message format, and the Subject field hacks are such provisions.
>So you can a year with 17 digits in it? I don't think that is a good idea.
>
>
>
Pete didn't want to have to revise the document in the year 9999 :-).
> On 1/18/04 at 8:56 AM -0500, Bruce Lilly wrote:
>
>> According to the latest rfc-index, 1036 has not been obsoleted. It
>> might well be your opinion that it should be obsoleted or
>> reclassified, but that does not affect its official status. You are
>> of course free to petition for reclassification of RFC 1036 to
>> historic status (as RFCs 3166, 3638, etc. have done for other RFCs).
>
>
> Oy. If I go ahead and get 1036 moved to historic, can we stop having
> this discussion?
Well, if you can get 1036's requirements for parsing structure of an
unstructured
field (Subject) repudiated by any means, yes, we can stop having this
discussion.
I don't think moving 1036 to historic at this point would be wise, as
1036 does define
several extension fields.
But as the cognizant WG has not seen fit to repudiate the conflicting
sections of 1036,
it probably should be done by 2822's successor as the text message
format is affected.
On the other hand, if you can prompt the cognizant WG to either issue a
1036 amendment
that specifically addresses the conflicting issues or to issue a draft
of a 1036 successor that
has a non-zero chance of becoming an RFC, that would work also. To date
that WG has
produced neither (though at one point about a year ago there was a draft
proferred by
Dan Kohn that showed considerable promise).
well I suppose you could say that the reason I'm publicly discussing
these ideas now is that spam has gotten bad enough that extreme
measures seem to be required.
also, earlier versions of this scheme made surveillance easier than the
one I just proposed.
Keith
In the year 999999999999999, it will be a good idea. Don't contribute to
the Y10P problem. ;-)
But maybe it should be " 4DIGIT / %x31-39 3*DIGIT / obs-year ", i.e.
years with more than 4 digits must not have leading zeros.
Claus
--
http://www.faerber.muc.de
I think making mail traceable is the very best thing that the working group
can do. People are sick of SPAM and they need a solution.
I already spoke on the reasons why believe this information needs to be in
DNS in a previous post but I did not go into the detals as to what this
could do to help the situation.
In the example below, the infomation that would be used by Client's mail
software or the MTA is the portion before the double quote, the information
after the doulble quote are for ''non-machine viewing''.
After GL RR identifier you have the two char country code followed by the
postal code.
GL US.45420.1910
We now have states that have enacted laws concerning spam and now the
federal government as well. But even if you put this as a reason to use
this idea aside, having this information would allow people to configure
email software they are using to block emails from a country they do not
wish to receive email from as well as being able to trace where the email
came from.
In addtion, just as there is no rule forcing an ISP to use this RR, mail
software could be set to block email from an address that do not contain the
information in the same way the ISPs block email that is comming from an A
record that do not have a PTR record now.
IN NS ns.akc.net.
uspring IN A 192.188.192.2
IN MX 5 mail
IN HINFO Vax VMS
IN GL US.45420.1910 "1425 Arbor Avenue, Dayton OH"
ftp IN CNAME uspring
I will re-post the draft shortly
Al Costanzo
Keith
And while I agree the DNS is probably a natural place for the
information, I feel obliged to note the problems that exist getting
reverse lookup entries properly configured in the DNS.
Also, the DNS might seem a "secure" location for information, surely we
all realize the DNS is just a "major security incident waiting to
happen." But then maybe this is the "killer app" for DNSSEC?
Jim
On Sat, 17 Jan 2004, Al Costanzo wrote:
Date: Sat, 17 Jan 2004 17:08:55 -0500
From: Al Costanzo <a...@akc.com>
To: James M Galvin <galvin+...@eListX.com>
Cc: ietf...@imc.org
Subject: Re: making mail traceable
To this point I would say, we need to know the physical location of the
machine (node) on the Internet sending the email.
I would say that DNS is the perfect location for storing this information,
since unlike mail headers it is more difficult to mung around with by the
casual user and usually administrated properly by ISPs.
With the information stored in the DNS IMO, it gives us a second level of
repudiation to protect us all from spammers and help the US inforce the new
law.
Al
----- Original Message -----
From: "James M Galvin" <galvin+...@elistx.com>
To: "Al Costanzo" <a...@akc.com>
Cc: <ietf...@imc.org>
Sent: Saturday, January 17, 2004 3:55 PM
Subject: Re: making mail traceable
>
> To the extent it is "information" we agree we need for email to be
> traceable and we agree that having it in the DNS is the right place for
> it, then it would be helpful. Dave's point was that we should be
> discussing what information we need and agree on that before we try to
> agree on where to put it.
>
> Jim
>
>
>
>
> On Fri, 16 Jan 2004, Al Costanzo wrote:
>
> Date: Fri, 16 Jan 2004 19:42:09 -0500
> From: Al Costanzo <a...@akc.com>
> To: James M Galvin <galvin+...@eListX.com>, ietf...@imc.org
> Subject: Re: making mail traceable
>
>
>
> I may be missing the point here but about a year ago I wrote a draft
for a
> DNS RR record to keep track of the physical location of the A or MX
record
> as a "physical postal address" just for this purpose, with the
intention of
> being able to track back the location of SPAM.
>
> With new state laws I thought it was the way to go, and then use MTA
use
> this information.
>
> Would this not help? If so I have a copy of the draft I am currently
> revising.
>
> Al Costanzo
> ----- Original Message -----
> From: "James M Galvin" <galvin+...@elistx.com>
> To: "Dave Crocker" <dcro...@brandenburg.com>
> Cc: "ietf-822" <ietf...@imc.org>
> Sent: Friday, January 16, 2004 5:02 PM
> Subject: Re: making mail traceable
>
>
> >
> >
> > On Fri, 16 Jan 2004, Dave Crocker wrote:
> >
> > > > The "Received" header is woefully inadequate for spam
tracing
> > >
> > > True. Then again, so is the rest of the message format
and mail
> > > transport.
> > >
> > > I don't agree it's "woefully inadequate."
> >
> > When we have some agreement on the information that is needed to
> > facilitate spam tracing, then we can decide whether it is better
to
> > add it to Received or create a new header.
> >
> > Absolutely.
> >
> > And let us not forget some means to validate that the information we
do
> > have or get is accurate and correct. Or is that what Nathaniel and
> > Keith meant by "traceable?"
> >
> > Jim
> >
> >
>
>
>
>
On Sun, 18 Jan 2004, Keith Moore wrote:
>> Doesn't RFC1847 (security multiparts) already provide both of these,
>> or
>> at least the framework sans the actual algorithm?
>
> Only if everyone's user agents treated a security multipart
> containing a message semantically the same as they would the
> original message.
>
> Sure, so your primary concern is that the use of security multiparts
> would be a less backwards compatible change than a change to the
> Received syntax or adding a new header.
That, and there is a semantic difference between a signed message and a
cryptographically verifiable trace field in a message.
Agreed. My point was that the same technology would be used but would
be interpreted differently.
>> - If you really wanted to, you could augment Received or add a new
>> trace field that recomputed the hash at each hop (to show if and
>> where
>> the message was corrupted in transport)
>>
>> Why would you forward a message that was discovered to be corrupted?
>
> Probably because the message might have been corrupted in a way
> that
> makes the hash invalid without actually changing the content of the
> message. Something tells me it will be difficult to write a
> canonicalization function that accommodates all of the various
> kinds of
> message and header munging that is out there.
>
> I can imagine it might be a local policy preference for all messages to
> be examined by a person instead of rejecting it if the hash validation
> fails. However, it should be the case that except for human review the
> message is rejected (for whatever we decide rejection means). There
> may
> be some edge cases where broken mailers do really obscure things to
> messages to break the hash, but either you are doing security right or
> you don't do it.
Actually I don't see this as a security measure. It doesn't really
protect systems from any kind of attack, and it doesn't authenticate
the message as being authored or witnessed by any publicly known
principal name. It doesn't even have to provide an unimpeachable
assurance of non-repudiation -- it just has to be good enough that it's
infeasible to make large quantities of spam look as if it were
associated with originator-ids other than those controlled by the
spammers.
This seems incongruous to me.
Some principal is going to assign a hash. There has to be a principal
because the hash has to be digitally signed. If it's not there's no
point having it for the purpose as described.
This means the message is at least witnessed and there's non-repudiation
to the extent the principal can not deny having handled the message.
That is a security measure.
As to whether any of the content is impeachable, that depends on when
the hash is actually created. Certainly if it's created after a "Date:"
is added or after the local "Received:" is added, then those fields are
impeachable. Presumably the rest of the content is opaque to the
creation of the hash so it's content may not be impeachable against the
principal, but with the Date: field you've got a security service.
Did you have something different in mind?
>> I'm sorry but I just don't see what you have in mind that is worse
>> than
>> it is today,
>
> Anyway, the reason my proposal allows originator-ids to be
> ephemeral is
> to make it hard for ordinary people to track messages - I believe
> anonymous speech is important - and also out of recognition that
> spammers can probably get ephemeral accounts anyway.
>
> My point is that we are not that [far] from this already. To stretch the
> example to the extreme, the US Government could make a law that ISPs
> have to make sure they can map a From: email address to an actual
> person
> before accepting submission of a message (in the US of course). Why do
> we need a special identifier?
The US Congress might or might not care, but for a variety of reasons
it would not be appropriate to use From in this way. There is no
defined header field that can be used for this purpose without
conflicting with valid and existing uses of the field. (Sender would
have been the right thing, but it's too widely misused now.)
What you say is true but it doesn't refute what I said. Does this mean
you agree with me?
Jim
I want to know where the email is comming from.
On Sun, 18 Jan 2004, Keith Moore wrote:
> To this point I would say, we need to know the physical location of the
> machine (node) on the Internet sending the email.
do we really care where the message was sent from, as opposed to who
sent it?
This is a really good question.
My current thinking is I don't want to tightly couple the message
direclty to a person, even in an ephemeral way. I would rather note
sites, especially those that have multiple users.
So, rather than even having a home user site create a "who sent it
identifier", I would prefer the ISP submission server create a "message
passed through my server identifier", along with additional private log
entries that make it possible to trace the message back to its actual
origin. Those log entries can then be retrieved through ordinary law
enforcement means.
Now, perhaps from an implementation point of view that's not much
different than an emphemeral original identifier, but I believe it will
be perceived differently and the distinction is important.
Also, just as I don't want to expose the actual identity of every
message sender, I wouldn't want to expose the location of every message
sender. (I can see it now - send an e-mail critical of Dubya, and a
G-man knocks on your door within the hour...)
Yes, well, any security measure that involves a known principle carries
that risk so I don't view it as a reason not to do this.
Jim
On Sun, 18 Jan 2004, Bruce Lilly wrote:
Could somebody outline a process whereby a single field or group of
fields in a message could be signed, with the following conditions:
1. the mechanism is robust w.r.t. common types of message munging
(reordered fields, possible dropping of fields (obviously, let's
assume that the field that is signed isn't dropped), addition of
trailing whitespace, etc.)
This is where I think multipart/signed got it right and is a big win (he
says with all humility :-). There are only two real requirements.
the object to be signed must be canonicalized
it must be universally representable
There's a couple details in each of those but basically you create a
7bit opaque object to sign and transfer from the originator to the
recipient. That object is pushed one level inside the message, inside
the outermost multipart/signed.
Now, if you're looking to add a signature (or some other cryptographic
trace value) to the outermost headers and have it apply over the entire
message (including the outermost headers), then you've got a really hard
problem.
Actually, I don't think "really hard problem" begins to describe just
how hard it would be. I can not imagine why anyone would want to do
such a thing. It just isn't practical in today's Internet email
environment.
S/MIME and PGP/MIME inherit this to the extent they use
multipart/signed.
2. the mechanism is not subject to replay attacks (e.g. copying the
signed field from one message to another)
This is a separate requirement, easily included with a protected Date:
and/or nonce available.
I believe that S/MIME and PGP/MIME signed messages are robust w.r.t.
those criteria, since the signed message is itself transfer encoded
(if necessary) and encapsulated via MIME;
They can be robust with respect to replay but are not by default.
Jim
It's going to be very similar technology, but the actual protocol used
needs to be different. If submission servers start applying security
multiparts to messages it's going to break things, and it's going to be
misinterpreted.
What I was responding to is the "either you are doing security right or
you don't do it". I certainly agree that this should be "done right"
(to the extent that we can figure out what "right" is) but I think that
calling this a "security" measure invites confusion between this
application of security technologies (public-key cryptography,
noninvertable hash functions) and other applications of those
technologies, for which "doing it right" means something different than
it does here.
> As to whether any of the content is impeachable, that depends on when
> the hash is actually created.
It also depends on what is included in the hash. I believe it will be
necessary to omit some information from the hash in order to get the
hash to survive most existing mail transports. I don't think this is a
problem as long as we don't treat the originator-id tag as a digital
signature.
>>> I'm sorry but I just don't see what you have in mind that is worse
>>> than it is today,
>>
>> Anyway, the reason my proposal allows originator-ids to be
>> ephemeral is to make it hard for ordinary people to track messages -
>> I believe anonymous speech is important - and also out of recognition
>> that spammers can probably get ephemeral accounts anyway.
>>
>> My point is that we are not that [far] from this already. To stretch
>> the example to the extreme, the US Government could make a law that
>> ISPs have to make sure they can map a From: email address to an
>> actual person before accepting submission of a message (in the US of
>> course). Why do we need a special identifier?
>
> The US Congress might or might not care, but for a variety of
> reasons it would not be appropriate to use From in this way. There is
> no defined header field that can be used for this purpose without
> conflicting with valid and existing uses of the field. (Sender would
> have been the right thing, but it's too widely misused now.)
>
> What you say is true but it doesn't refute what I said. Does this
> mean you agree with me?
I still think an Originator-id field would be at least somewhat easier
for governments to abuse in this way than any of the existing fields.
Variant use of From is sufficiently widespread that I suspect there
would be some pushback against the government trying to constrain its
use.
there are too many nutcases in the world who would physically attack
people if they knew the actual, physical source of their email.
I don't really like doing that either. But I don't think the
granularity of "site" is good enough to identify and marginalize
spammers - actually experience with trying to blacklist sites (or IP
address blocks associated with sites) IMHO indicates that it is not
good enough for this purpose.
Also, for lots of reasons I don't think that giving law enforcement a
way to track down spammers is a desirable way to solve the spam
problem. It would get the government too involved in mediating
people's communications, it would invite favoritism, it would require
too many LE resources, and for that reason it would be hard to limit
abuse. I'd far rather find a way for the net to be self-policing by
allowing recipients (or recipient sites) to marginalize spammers
without actually finding out who the spammers are. Of course, for
serious infractions of the law, LE will still be able to trace
originator-ids to the actual people who sent the messages - I don't see
any way of avoiding this.
> So, rather than even having a home user site create a "who sent it
> identifier", I would prefer the ISP submission server create a "message
> passed through my server identifier", along with additional private log
> entries that make it possible to trace the message back to its actual
> origin. Those log entries can then be retrieved through ordinary law
> enforcement means.
This would be an acceptable implementation of originator-id. But to
make the system complete ordinary users would need to be able to query
the ISP to learn something about the sender's reputation.
> Also, just as I don't want to expose the actual identity of every
> message sender, I wouldn't want to expose the location of every
> message
> sender. (I can see it now - send an e-mail critical of Dubya, and
> a
> G-man knocks on your door within the hour...)
>
> Yes, well, any security measure that involves a known principle carries
> that risk so I don't view it as a reason not to do this.
Knowing who you are and knowing where you are are different things. If
they were the same thing Usama bin Laden would be dead now.
Keith
> do we really care where the message was sent from, as opposed to who
> sent it?
>
> My current thinking is I don't want to tightly couple the message
> direclty to a person, even in an ephemeral way. I would rather note
> sites, especially those that have multiple users.
>
> So, rather than even having a home user site create a "who sent it
> identifier", I would prefer the ISP submission server create a "message
> passed through my server identifier",
We need to distinguish between having a mechanism with semantics about the
MTA, versus one with semantics about the author of the message. Which
underlying semantic do we really care about and why?
I believe the semantic we care about, for MTAs, is that they are part of
well-behaved networks, not that they are "authorized" by the author, or that
they in turn vouch directly for the author. In other words, is the hosting
ISP running a coherent, controlled MTA environment? If the answer is yes, we
are not certain that other MTA sources from that ISP are rogue, but we are
certain they are not vouched for, by the ISP.
I believe the biggest semantic we want to see is "this author is
well-behaved". Anything about the MTA is indirect. It might be useful, but
it's not core. A step in that direction is to authenticate the author or to
provide an assurance that they can be located.
Schemes that involve MTA registration for/by the author (SPF, LMAP, RMX, ...)
confuse these two semantics and they create very Procrustean usage and
administration scenarios. For an operation with users that send from outside
the ISP, they create problematic usage patterns.
In contrast, schemes that focus on the author directly are more flexible. It
is quite straightforward for an ISP to operate that scheme on behalf of the
author, in those environments that permit it. This means that direct
author-focused schemes permit a number of operational styles, one of which is
equivalent to the MTA registrations such as SPF, LMAP and RMX. This includes
having the ISP mask the actual author, while retaining accountability for
them.
d/
--
Dave Crocker <dcrocker-at-brandenburg-dot-com>
Brandenburg InternetWorking <http://brandenburg.com>
I agree with you on this 100%.
The exact location may be a disaster to publish but precision of the RR
record is configurable.
You could just list the Country or Country & State or Country & State &
City.
So I don't think a nutcase is an issue here.
Al Costanzo
----- Original Message -----
From: "Keith Moore" <mo...@cs.utk.edu>
To: "Al Costanzo" <a...@akc.com>
Cc: "Keith Moore" <mo...@cs.utk.edu>; <ietf...@imc.org>
Sent: Monday, January 19, 2004 12:19 PM
Subject: Re: making mail traceable
you've obviously never known someone well who was stalked by her
nutcase ex-husband.
Keith
Perhaps there exists a way to combine mail trace field with dns
(in part to reduce size of the has hash necessary for email).
The way it could work is that public key is available for verification
from dns (this must include version information allowing company to
use multiple keys at the same time when moving from one to another or
just when its convinient, i.e. when it has multiple servers). This public
key can be used to verify the private verificaion id embedded in the email
as well as parameters of the actual email as it was being processed.
My thinking is that good parameters for hash are RCPT TO and MAIL FROM,
date and message id (unfortunetly RCPT TO can have multiple addresses in
fact it can be very long list with current SMTP standard).
The best would actually be if this could be done as part of "Received"
information where each mail server on the way could add additional
"Received" header and this header would be in such standard format as to
be easily parsed by automated systems, something like:
Received: FROM above.proper.com (above.proper.com [208.184.76.39])
BY sokol.elan.net (8.12.5/8.12.5)
FOR wil...@elan.net ENVELOPE-FROM owner-i...@mail.imc.org
SENT-ON 200301191237 MESSAGE-ID i0JIT3lQ001354
VERIFY-DSN key-name#mail.imc.org BY-USING algorithm-name
VERIFICATION-ID xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The systems concerned about message origin and its path would then use
the known parameters of the message already included in the received
header (FOR, FROM, MESSAGE-ID) and VERIFICATION-ID and use algorithm
specified (this should perhaps be embedded as part of verification-id)
and check if such verification id could only have been created by somebody
who used private key which corresponds to public key available from
VERIFY-DSN location. I'm however concerned that standard X.509 or similar
algorithms would create and ID that is too long, but 2-4 lines as specified
above is probably ok which gives us up to 256 bytes. Another thing to be
concerned about is how quickly (and how much cpu resources are needed)
when creating such verification ids and when verifiying them.
--
William Leibzon
Elan Networks
wil...@elan.net
> Also, for lots of reasons I don't think that giving law enforcement a
> way to track down spammers is a desirable way to solve the spam
> problem. It would get the government too involved in mediating
> people's communications, it would invite favoritism, it would require
> too many LE resources, and for that reason it would be hard to limit
> abuse. I'd far rather find a way for the net to be self-policing
> by....
I don't find it desirable either, and I too would rather have seen the
community find a self-policing solution. (For the record, I would also
prefer for people to be so loving and compassionate that violence never
reached the level of requiring police or governments.) However,
CANSPAM represents the official rejection of our preferences by the US
government, and I see little gain in dwelling on battles already lost.
Washington today is full of people trying to figure out their new
mandate of regulating email. They're going to do it, regardless of our
preferences, and I think some of us who understand the complexities
should try to play a constructive role in fleshing out that mandate. I
consider it virtually inevitable that they will require some form of
enhanced tracability in the name of stopping spam -- it's just about
the very first instinct of a bureaucracy -- so the question I am trying
to pose is, essentially, what is the least harmful way of doing this?
Preserving anonymous email is very important to me, and a battle worth
fighting very passionately in the long run. The architecture of any
anti-spam tracing facilities will be critical to that battle. I think
we need to accept that we've lost the "whether" battle -- e.g. whether
or not there will be police on the net trying to trace email -- and
try to respond in a strategically intelligent way, by helping define a
tracing architecture that is anonymity-friendly. I expect that some
people I respect enormously may choose to fall on their swords in total
opposition to the "email police." But I hope that some of the many
great minds of the IETF will focus on the challenging design question
of how to preserve interpersonal anonymity while facilitating spam
tracing. I'm not sure I would want to live with the result of forcing
that to be an either/or choice. -- Nathaniel
---
Nathaniel S. Borenstein <n...@cpsr.org>
President, Computer Professionals for Social Responsibility
http://www.cpsr.org
I'm interested in solving a problem, not in trying to make the US
government look good. Now if it turns out that, by some accident or
divine intervention, CANSPAM really does turn out to be useful, that's
fine with me. But it doesn't seem wise to depend on it working.
> Washington today is full of people trying to figure out their new
> mandate of regulating email. They're going to do it, regardless of
> our preferences, and I think some of us who understand the
> complexities should try to play a constructive role in fleshing out
> that mandate.
I think some of us who understand the complexities should try to play a
constructive role by solving the problem as best as we can - but that
doesn't include taking direction from a technically inept, parochial
legislative body, that is easily swayed by direct marketing proponents
into acting contrary to the public interest.
> I consider it virtually inevitable that they will require some form
> of enhanced tracability in the name of stopping spam -- it's just
> about the very first instinct of a bureaucracy -- so the question I am
> trying to pose is, essentially, what is the least harmful way of doing
> this?
I believe the least harmful way of doing this is to provide
traceability in such a way that government intervention isn't needed to
solve most of the spam problem. Then there will be strong arguments
for restricting government intervention to more dangerous problems of
email abuse - e.g. mailing of child porn and bomb threats - and to
impose judicial review and auditing on such interventions as safeguards
against abuse.
Keith
IMHO it is a question of what one wants to accomplish with the signature.
If it's end-to-end, aka MUA <> sMTA <> rMTA <> MUA signatures are nice
to have, but as I can trust my rMTA's Received: line about the sending MTA
it is not of much additional information.
The more interesting situation is if the message is multi hop, like with
forwards. Then we have a situation of MUA <> sMTA <> forw MTA <> rMTA <> MUA
and one has to trust "forw MTA" to add correct trace information about
sMTA. It is desirable to have some signature that allows my rMTA to
verify that the envelope sender (aka the originator address) and the
trace information provided by fMTA are correct and also to protect
against replay attacks, where a signature derived from MUA <> sMTA
is intercepted and abused by evil MTA to send spam and forge sender
addresses and trace information.
I don't think it is an easy task to find information to add to the hash.
Even if the Date field looks kinda sexy to be added it will cause all
sorts of problems, as even today a not too small amount of emails take
a week or more to reach their rMTA and there are a lot of MTAs with
wrong time out there. So the acceptance window has to be quite big and
that leaves enough time to abuse such a hash for replay attacks.
Adding the body of the email to the hash is also playing vabanque as
e.g. mailing lists add trailers to the message and break the hash.
\Maex
--
SpaceNet AG | Joseph-Dollinger-Bogen 14 | Fon: +49 (89) 32356-0
Research & Development | D-80807 Muenchen | Fax: +49 (89) 32356-299
"The security, stability and reliability of a computer system is reciprocally
proportional to the amount of vacuity between the ears of the admin"
> I'm interested in solving a problem, not in trying to make the US
> government look good. Now if it turns out that, by some accident or
> divine intervention, CANSPAM really does turn out to be useful, that's
> fine with me. But it doesn't seem wise to depend on it working.
If you believe that CANSPAM might be merely "useless" then I think you
are a wild-eyed optimist. This isn't primarily about doing good --
although that would be nice -- it's about mitigating damage. You're in
for a rude awakening when the government starts regulating the contents
of your sendmail.cf file.
> I think some of us who understand the complexities should try to play
> a constructive role by solving the problem as best as we can - but
> that doesn't include taking direction from a technically inept,
> parochial legislative body, that is easily swayed by direct marketing
> proponents into acting contrary to the public interest.
I'm not advocating your taking direction from anyone, Keith. I'm
proposing that we put our heads together and give them the best
*advice* we can collectively come up with on *how* to do some things
they are determined to do one way or another.
> I believe the least harmful way of doing this is to provide
> traceability in such a way that government intervention isn't needed
> to solve most of the spam problem.
Sounds perfect. Unfortunately the train has left the station -- the
government *is* intervening, and we'd better be prepared to suggest a
strategy for them, or they'll do something unnecessarily stupid, I
promise. While we discuss these issues, the Beltway traffic is snarled
by software vendors pitching spam "solutions" to the feds. I'm sure
some of them are much worse than others; it would help if the ietf
could provide a guiding vision. (But then again, I'm not sure I know
why I would expect the ietf-822 list to succeed where the ASRG has
failed, alas.) -- Nathaniel
On Jan 19, 2004, at 4:41 PM, Markus Stumpf wrote:
> On Mon, Jan 19, 2004 at 12:18:14PM -0500, Keith Moore wrote:
>> It also depends on what is included in the hash. I believe it will be
>> necessary to omit some information from the hash in order to get the
>> hash to survive most existing mail transports. I don't think this is
>> a
>> problem as long as we don't treat the originator-id tag as a digital
>> signature.
>
> IMHO it is a question of what one wants to accomplish with the
> signature.
> If it's end-to-end, aka MUA <> sMTA <> rMTA <> MUA signatures are nice
> to have, but as I can trust my rMTA's Received: line about the sending
> MTA
> it is not of much additional information.
it's a moot point. it's much easier to make e2e sigs work than to make
hop-by-hop sigs work.
> I don't think it is an easy task to find information to add to the
> hash.
offhand:
- subject field (perhaps truncated to XX bytes)
- message body
- source IP address and port
- precise date/time (not the Date header field)
- *maybe* some form of the envelope recipient list
> Adding the body of the email to the hash is also playing vabanque as
> e.g. mailing lists add trailers to the message and break the hash.
interesting point that. but if the body is the last thing to be hashed
you may be able to
recheck the hash at every line boundary. you might also have the id
field include the
number of bytes from the body that are hashed.
I don't think you and I are in significant disagreement. I do think
we're probably talking about slightly different things.
Keith
But all of this can be used for replay attacks. Get an account at
big-email-provider-1 and send the spam mail to your address at
big-email-provider-2. Take the message and reinject it via a proxy
server by adding a fake Received: line and using a faked envelope sender
to make it look like a forward and a consitent chain of mailservers.
You can use the same message some thousand times with any of some
thousand open proxy servers and for any envelope recipient you like
(replay attack).
With Bcc mimic and multiple RCPT TOs in one stream it might be dangerous
or impossible to use some form of the envelope recipient list.
Maybe I am missing something, but I can't see how this helps making
the trace of the message more trustworthy than the Received lines only.
either you strip off the originator-id field after it arrives at
big-email-provider-2, or you don't.
if you don't strip it off, then the original originator-id is still
there, and it associates your spam with your account at
big-email-provider-1.
if you do strip it off, another originator-id field gets added by
big-email-provider-2, and your spam gets associated with that account.
either way, the spam is traceable to an account that is associated with
you. recipients of the message can complain to whichever ISP issued
the originator-id field, and that ISP will figure out pretty quickly
that you're a spammer, and blacklist you. when other recipients
inquire about that originator-id (or even a different originator-id
that maps to the same account), they'll find out that you're
blacklisted.
> Take the message and reinject it via a proxy
> server by adding a fake Received: line and using a faked envelope
> sender
> to make it look like a forward and a consitent chain of mailservers.
> You can use the same message some thousand times with any of some
> thousand open proxy servers and for any envelope recipient you like
> (replay attack).
if I understand what you're saying, it shouldn't matter. the number
of hops that the message takes, and the depth of branching in the tree,
should be irrelevant. what really matters is that you can't claim that
you didn't send the message.
the real trick is to prevent the other kind of attack - some miscreant
wants to discredit some vendor, so they take a single message that the
vendor sent legitimately and re-send it to a few million people. we
need to make sure that the message is traced to the miscreant, not the
vendor.
> With Bcc mimic and multiple RCPT TOs in one stream it might be
> dangerous
> or impossible to use some form of the envelope recipient list.
there are a number of problems with using the recipient list. that's
why I said "maybe".
Keith
Normally I can see your point of view, but not in this case. I find your
response containing a gaping hole in logic. Allowing someone to know what
City or State an email came from does NOTHING to give away a persons actual
location only the location of the machine that sent it.
And if a person is worried about being stalked they should contact the
police. If they are afraid that email is going to be giving the location
away they should not be sending email.
Lets be realistic here. Using your logic the peson should not use a phone
either because I stalker may have caller ID. Getting back to reality and the
problem at hand:
To the point IMO there needs to be a way to determine where mail came from,
a physical location. When I first became active in the IETF I worked at a
college, since then I have worn many hats, as the owner of an ISP, I can see
the need for this type of information.
First of all this could be used to filter email, second, to determine
juristication of law, third allow an ISP decide what action it would /
should take against the SPAMMER or the ISP that is allowing the SPAM to
flood into his mail server or network.
The only thing that completely anonymous email does is cause people to send
volumes of SPAM and waste millions and millions of dollars in credit card
fraud. We receive thousands of dollars worth of orders from people using
stolen credit cards with anonyous email accounts that cause chargebacks all
the time.
Because of all this fraud, most of our customers have to make hard choices,
such as, ignore orders from Yahoo! and HotMail accounts or accept the
possibility that these orders will result in charge backs.
Credit card owners want to report the card the lost and want to prosecute
the theif but frankly views like yours are going to stifle the abilty to
bring thiefs to justice.
I am sure it is not your intention to help people commit credit card fraud
or send SPAM but realistically there needs to be some type of responsibility
in email assigned to email accounts and thier ISPs and I think it is high
time that this is done.
I do not mean to offend you in any way but on this topic I think you are
totally wrong and hope you re-consider your thoughts on this subject. Can
tell you have a strong view on this subject, but I would like your help in a
solution and not just say this will be done "over my dead body". Statements
such as this are not constructive and your argument has no merit.
The physical location of the machine sending the email is an important thing
to know to fight SPAM and fighting credit card fraud.
By not considering this, I do not think you are helping solve a serious
problem that I see existing RIGHT NOW and getting worse every day.
Creating the GL RR and using this in mail software will be a first step in
solving a serious problem.
With Regards,
Al Costanzo
----- Original Message -----
From: "Keith Moore" <mo...@cs.utk.edu>
To: "Al Costanzo" <a...@akc.com>
Cc: "Keith Moore" <mo...@cs.utk.edu>; <ietf...@imc.org>
Sent: Monday, January 19, 2004 12:49 PM
Subject: Re: making mail traceable
>This is where I think multipart/signed got it right and is a big win (he
>says with all humility :-). There are only two real requirements.
>
> the object to be signed must be canonicalized
>
> it must be universally representable
>
>
[...]
>Now, if you're looking to add a signature (or some other cryptographic
>trace value) to the outermost headers and have it apply over the entire
>message (including the outermost headers), then you've got a really hard
>problem.
>
>Actually, I don't think "really hard problem" begins to describe just
>how hard it would be. I can not imagine why anyone would want to do
>such a thing. It just isn't practical in today's Internet email
>environment.
>
>S/MIME and PGP/MIME inherit this to the extent they use
>multipart/signed.
>
>
One problem with encapsulation is that the signed information *is*
encapsulated. That's
fine for an end-to-end application, but doesn't work well for a message
that needs to be
validated and interpreted at multiple points. Examples of the latter
would be a Usenet
control message, or a Usenet article posted to a moderated group. In
each case, some of
the message header fields and the body would be signed, while fields
which are expected
to change in transit (trace fields) would not be signed. Avoiding
encapsulation permits
backwards compatibility; existing software can handle the message with
the same lack of
authentication that is currently the case, while newer software can
verify authenticity.
With encapsulation, everything has to be rewritten to look inside the
wrapper.
I think Keith's outline of a solution is viable.
There are a number of other issues that would have to be addressed in
the Usenet examples:
1. Authorization; a mechanism for distributing and updating
authorization information
(who is authorized to approve a moderated article, or to issue
control messages) is needed.
2. A standardized open mechanism for obtaining information necessary to
complete
authentication. Aside from cost issues, some people are leery of
dealing with CAs,
especially after last year's Verisign DNS shenanigans. Some people
distribute public
keys via finger service or via a web site, but that generally
involves manual intervention
to make the keys available to authentication software. If an email
address serves as an
identifier for the purpose of authorization, then perhaps an SMTP
extension could be
used to retrieve a public key corresponding to that email address --
but that's just a
thought. Of course, given a distributed authorization database,
public keys could be
part of that. The SMTP extension idea might be useful where
authentication is needed
but authorization is not.
The same sort of solution might be useful in email. Assuming a viable
means of signing
non-trace header fields plus message body, and availability of an open
automated mechanism
for obtaining public keys, signed email which is fully compatible with
existing MTAs and MUAs
could quickly become popular. Without a multipart/signed wrapper, such a
message could
be read by existing MUAs. Enhanced MUAs/MTAs could handle signed
messages as follows:
1. author composes message, MUA (or possibly submission MTA) computes a
hash over non-trace
header fields plus body and adds a header field consisting of that
hash encrypted with the
user's private key (and possibly transfer-encoded for robustness).
2. MTAs transfer message as usual, adding trace fields.
3. recipient's MUA checks for header field containing encrypted hash:
a) no field: message is not signed (or hash field has been lost in
transit)
b) encrypted hash field is found: MUA queries DNS for MX records
corresponding to sender
envelope address, then opens connection to an SMTP server to
retrieve public key, finally
attempts authentication by decrypting hash and verifying that
hash is valid
i) if hash is valid, message is authentic and unchanged (with a
high degree of probability)
ii) if hash is not valid, message is forged or has been modified
in transit
iii) in case of DNS or SMTP problems, authenticity cannot be
established.
Obviously such a scheme requires support in MUAs (possibly submission
MTAs), and SMTP
servers. If submission MTAs are used for generation of the encrypted
hash, ISPs could
implement the necessary mechanisms with the exception of MUA
authentication support
(the ISP would generate a key pair for each email address, using the
private key for
encryption and returning the public one in response to the SMTP
extension request; all
of that could be handled transparently to computer-illiterate end users).
#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################
>To the point IMO there needs to be a way to determine where mail came from,
>a physical location. When I first became active in the IETF I worked at a
>college, since then I have worn many hats, as the owner of an ISP, I can see
>the need for this type of information.
>
>First of all this could be used to filter email, second, to determine
>juristication of law, third allow an ISP decide what action it would /
>should take against the SPAMMER or the ISP that is allowing the SPAM to
>flood into his mail server or network.
>
>
First I'll say that I agree with Keith's points. But I want to add a
different perspective.
While *you* may think that you can filter email based on geographic
location of the
originator, assuming that you can accurately determine that, I cannot.
I correspond
with people around the world, and I cannot implement such a crude means
of "filtering".
Second, jurisdiction may depend on where intermediate servers are
located. In any event,
I don't hold much hope for solutions based on hordes of shysters.
Moreover, knowing
the location of the originator might not help; according to what I've
heard the UselesS
Congress' recent SPAMMERS-CAN-SPAM bill overrides tougher state laws. And
obviously US law is ineffective w.r.t. spam originated outside of the USA.
Third, if an ISP's customer is spamming, that ISP's terms of service
should be sufficient to
deal with the problem. If an ISPs networking peers are sources of spam,
cutting off peering
arrangements (or threats do do so) are a potential solution. If an ISP
is operating an open
relay, he should purchase a clue.
From a practical point of view, it is unlikely that DNS can hold
information about the
location of portable machines (such as the laptop on which I'm composing
this message,
which has been in 5 states in the past month). It also won't help for
RFC 1918 IP
addresses.
>[...]
>The physical location of the machine sending the email is an important thing
>to know to fight SPAM and fighting credit card fraud.
>
>
>
I disagree. The network topology (where the source connects to the
Internet) is important;
geographical location is largely irrelevant.
>Creating the GL RR and using this in mail software will be a first step in
>solving a serious problem.
>
>
For the moment assuming that geographical location has some value; LOC
RRs already
exist -- why reinvent the wheel?
Let me first speak about the re-invention of the wheel. IMHO when a wheel
does not turn it is because it is square or flat and needs to be redesigned.
Walk up to someone in IT and ask the the Log & Lat of their office and see
if they know it. Now ask them their offices postal address. I would say
100% of the people will know where they work but few if any will know their
log and lat position save the few who received a hand held GPS unit for the
holiday.
Because of this LOC is seldom used. So my first point is to get vaild data
into the DNS that people can 'easily' get without a GPS unit.
Next, different laws apply to different parts of the country and the world.
I am not only speaking about spammers. I am talking about credit card fraud,
identity theft, etc. There are many issues and unfortunately the world is
not one happy IETF meeting.
There needs to be a way to know where the mail is physically comming from
and what laws apply so that in the case of credit card fraud, it can quickly
be turned over to the proper authorities.
Lets look at a non-spam, non government application for a moment to peak
your interest in the concept.
We have a person that just placed a credit card order on our website. The
cerdit card information is all vaild and the address is correct. All is
verified. This of course does not mean that the card is not stolen. An
addtional step would be to know where the email oringinated from.
This is not a perfect solution but it helps greatly. I know you will retort
it wont work with laptops and it wont work with yada yada yada... but it
will work in many cases.
It will prevent someone from Pakistan (for example) saying they live in Ohio
USA and using a stolen credit card from ohio. ( I saw this example myself)
Have the data avaliable will help both electronic mail as well as e-commerce
applications.
I know it will not be perfect overnight but it will help greatly, if you
have a better solution let me know. But the thing that is needed to knaot
this work is NOT verifying an email address is valid but verifying a
location is valid and making the solution easy enough for the average joe
without a GPS to make it work.
If you have a better idea, PLEASE tell me. But the solution is knowing the
physical addresss IMO.
Al Costanzo
"Bruce Lilly" wrote:
----- Original Message -----
From: "Bruce Lilly" <bli...@verizon.net>
To: "Al Costanzo" <a...@akc.com>
Cc: <mo...@cs.utk.edu>; <ietf...@imc.org>
Sent: Monday, January 19, 2004 11:27 PM
Subject: Re: making mail traceable
>
>But maybe it should be " 4DIGIT / %x31-39 3*DIGIT / obs-year ", i.e.
>years with more than 4 digits must not have leading zeros.
Yes, I like that. There remains the issue of how big a field implementors
should provide to store such years, but if they do not anticipate their
implementations still being around when the Y10K bug looms, they have no
problem. If they do so anticipate, then they choose to be safe until the
Y100K bug looms, and so on.
--
Charles H. Lindsey ---------At Home, doing my own thing------------------=
------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.u=
k/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU=
, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4=
AB A5
If it won't work for the CEO sitting in his hotel room far away typing
on his laptop, the company won't adopt it.
That is to say, "work in many cases" is not sufficient. It must also not
break too many critical cases.
Arnt
Er, shouldn't that be:
4DIGIT / %x31-39 4*DIGIT / obs-year
to avoid syntactic ambiguity between the first and second options?
#g
------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact