Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How can I send mail with the word "From" at the start of a line?

48 views
Skip to first unread message

Kari E. Hurtta

unread,
Jun 21, 1995, 3:00:00 AM6/21/95
to
[ Added comp.mail.mime as receiver. ]

sta...@haas.berkeley.edu (Richard Stanton) writes in comp.mail.sendmail:
»It seems that whenever I send mail which contains a line (anywhere in
»the body of the message) starting with the word "From", the word is
»converted to ">From". This is a bit of a pain if I'm trying to send
»papers etc.

»Is there a way to avoid this?

Send mail with encoded with base64 (also quoted-printable may be
sufficient -- most of implementations encodes "From ") by using MIME.

[ Hint cc'ed to questioner. ]

Phillip Vandry

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
hur...@dionysos.fmi.fi (Kari E. Hurtta) writes:

>=BBIs there a way to avoid this?

>Send mail with encoded with base64 (also quoted-printable may be
>sufficient -- most of implementations encodes "From ") by using MIME.

Aren't we supposed to use Content-Length: instead of "From " to detect
the start of messages?

-Phil

Keith Moore

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
> Aren't we supposed to use Content-Length: instead of "From " to detect
> the start of messages?

NO. Content-Length is NOT standard. It is brain-damage left over
from AT&T Mail (which is NOT compatible with Internet mail, though
they are similar), and leaked into the Internet via various SysV-based
products, though much of the damage is from Solaris. It doesn't work
in Internet mail transport because different hosts have different
representation of end-of-line.

You should NEVER send a message with a Content-Length header in
Internet mail -- at best it is redundant, and at worst it is
misleading enough to cause your message to get trashed.

As for the format of messages in your local mailbox, you can do
anything you like. Older UNIX boxes uniformly use "From " to separate
messages within a mailbox, which means that they have to somehow quote
any "From " that appears in the text of the message itself. Some (but
not all) newer UNIX boxes use Content-Length. If you're trying to
share mailboxes between systems of both types, you're screwed. One
popular approach to solving the problem is to continue to quote all
"From " lines, but adding a Content-Length header on delivery. This,
of course, has all the disadvantages of both schemes.

Conten-Length is a clear case where the cure is worse than the disease.

Keith Moore

Mark Crispin

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to Phillip Vandry
On 22 Jun 1995, Phillip Vandry wrote:
> Aren't we supposed to use Content-Length: instead of "From " to detect
> the start of messages?

If you believe what appears in Content-Length: headers, I have a special
on bridges right now. I've managed to knock out more than one MUA by
sending it evil Content-Length: values.

A better solution is to make the test for "From " headers more rigid
(don't just consider every line that appears with "From " to be a start of
message mark, example the syntax of the line carefully), and apply a
similar check to the agent that applies quoting. False positives, with a
rigid enough test, are very rare.

-- Mark --

DoD #0105, R90/6 pilot, FAX: (206) 685-4045 ICBM: N 47 39'35" W 122 18'39"
Science does not emerge from voting, party politics, or public debate.

Mark Crispin

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
On 22 Jun 1995, Keith Moore wrote:
> > While Keith's answer is essentially correct, it is filled with much
> > unnecessary vitriol.

It's all a matter of opinion.

Some of us who have had to fight battles with the Content-Length: header
might say that Keith's answer was quite restrained.

> "Calling something brain-damaged is really bad; it also implies that
> it is unusable, and that its failure to work is due to poor design
> rather than some accident." - The New Hacker's Dictionary

The definition includes other details: "obviously wrong; extremely poorly
designed".

> The Content-Length header isn't brain-damaged in and of itself. If
> you are designing a mailbox format to put multiple messages in one
> file, which doesn't have to be backward compatible with anything, and
> you have the same representation for the mailbox on each machine,
> using a byte count is a reasonable design choice.

Except that in Content-Length:, the byte count is buried somewhere in the
RFC-822 header, instead of being carried out of band (e.g. in the "From "
line). The same brain-damage is also in the Status: header.

Compare the mail.txt format, which conveys considerably richer status and
a byte count in a single, one-line out-of-band prologue. There is no need
to look at any part of the RFC-822 header at all.

Lesson for software designers: design your data representations so they
can be handled well even when they scale to huge proportions.

> The brain damage is in using the Content-Length header with Internet
> mail (especially when transmitting it on the wire), and in using it in
> mailbox formats that needed to be compatible with old user agents, and
> (via NFS) with other platforms.

And this has, indeed, been a major disaster.

> Perhaps I'm just being naive, but I'd rather attribute this to a lack
> of foresight caused by "brain damage", than to any kind of conscious
> decision or malicious intent.

I agree that it isn't a conscious decision or malicious intent; but I
think more than lack of foresight was involved. This is an artifact of
the late unlamented "Unix Wars" of the 1980s, and the whole "System V:
Consider It Standard" campaign. I doubt that compatibility with the
installed base of non-SVR4 systems was never even considered; the issue
probably never came up.

In other words, it was arrogance; "there is only one Unix, and we define
it."

Keith Moore

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
> While Keith's answer is essentially correct, it is filled with much
> unnecessary vitriol. I agree that the use of content-length should never
> have leaked outside of the system, should have remained strictly as part of
> the mailbox format, and it is EXTREMELY unfortunate that it did so. Calling
> it brain damage is going a bit too far.

I should clarify this a bit.

"Brain damage" is of course a technical term:

"Calling something brain-damaged is really bad; it also implies that
it is unusable, and that its failure to work is due to poor design
rather than some accident." - The New Hacker's Dictionary

The Content-Length header isn't brain-damaged in and of itself. If


you are designing a mailbox format to put multiple messages in one
file, which doesn't have to be backward compatible with anything, and
you have the same representation for the mailbox on each machine,

using a byte count is a reasonable design choice. Other choices are
also reasonable, but as Tony says, there are drawbacks to whatever
choice you make.

The brain damage is in using the Content-Length header with Internet
mail (especially when transmitting it on the wire), and in using it in
mailbox formats that needed to be compatible with old user agents, and
(via NFS) with other platforms.

Perhaps I'm just being naive, but I'd rather attribute this to a lack


of foresight caused by "brain damage", than to any kind of conscious
decision or malicious intent.

Keith

Paul Eggert

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
Mark Crispin <m...@CAC.Washington.EDU> writes:

> A better solution is to make the test for "From " headers more rigid ...


> False positives, with a rigid enough test, are very rare.

Unfortunately this can turn into a portability problem.
Many weird "From " header formats are used,
so once you make the test rigid enough,
you start to run into the problem of false negatives.
This can complicate the test considerably.
For example, here is the GNU Emacs 19.29 string that represents
the regular expression for matching "From " lines:

"^From \\([^ \n]*\\(\\|\".*\"[^ \n]*\\)\\|<[^<>\n]+>\\) ?\\([^ \n]*\\) *\\([^ ]*\\) *\\([0-9]*\\) *\\([0-9:]*\\) *\\([A-Z]?[A-Z][A-Z][A-Z]\\( DST\\)?\\|[-+]?[0-9][0-9][0-9][0-9]\\|\\) * [0-9][0-9]\\([0-9]*\\) *\\([A-Z]?[A-Z][A-Z][A-Z]\\( DST\\)?\\|[-+]?[0-9][0-9][0-9][0-9]\\|\\) *\\(remote from .*\\)?\n"

The source code that computes and explains this regular expression is
over 40 lines long.

The Emacs maintainers understandably call those lines ``pinhead headers''.

John Gardiner Myers

unread,
Jun 22, 1995, 3:00:00 AM6/22/95
to
Keith Moore <mo...@CS.UTK.EDU> writes:
> Conten-Length is a clear case where the cure is worse than the disease.

Content-Length is a clear case where the cure was botched.

The problem with the mailbox format that uses Content-Length is not so
much that it is incompatible with the traditional unix mailbox format,
it is that it is both incompatible with and indistinguishable from the
traditional unix mailbox format. If the designers of the byte-counted
mailbox format had changed the format's magic number, then mail
clients would be able to detect and deal with the incompatibility.

Mark's comments about the in-band placement of the byte count are
right on the mark.

--
_.John G. Myers Internet: jg...@CMU.EDU
LoseNet: ...!seismo!ihnp4!wiscvm.wisc.edu!give!up

Michael B. Smith

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
In article <3sd17e$i...@due.unit.no>,
Arnt Gulbrandsen <agu...@nvg.unit.no> wrote:
>In article <Pine.NXT.3.92.95062...@Tomobiki-Cho.CAC.Washington.EDU>,

>Mark Crispin <m...@CAC.Washington.EDU> wrote:
>>A better solution is to make the test for "From " headers more rigid
>>(don't just consider every line that appears with "From " to be a start of
>>message mark, example the syntax of the line carefully), and apply a
>>similar check to the agent that applies quoting. False positives, with a

>>rigid enough test, are very rare.
>
>The From line shouldn't be more rigid. Some versions of Ultrix mail
>do that, and it sucks. An undocumented restriction which surfaced
>when I replaced sendmail: No fun.

Pine already has an extremely rigid check on the format of the From
line.

Completely improperly, IMO.

Also IMO the date format on the From line is already broken, as it is
apparently ctime() format, instead of an RFC822 date or a RFC1123 date.

>But I might go for "From " on one line and a valid SMTP header on
>the next (roughly, word-colon).

Which is close to what my reader does (word-colon-whitespace).

Laurence Lundblade

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
I history would shed some light here, though I'm not sure I've got my
facts straight. Please correct me!

I believe that the old Berkeley mail file format with the "From " line
actually predates RFC-822! It was certainly invented independently of
RFC-822. I believe the "From " line was *the entire header* for the mail
system it was designed for, a very simple little e-mail system for
Berkeley UNIX before there was any Internet e-mail standards at all. Back
then e-mail was moved mostly via UUCP.

When RFC-822 came along it was set up so the mail system worked together
which involved storing the new 822 header in the same old format - a
natural and expedient thing to do. Today we kind of think of them as
being part of Internet e-mail, but they're not.

So, there's nothing wrong with the date format being ctime() and not 822
because the date is not part of the 822 header at all. It belongs to the
message store formatting. Technically 822 is an on the wire standard too.
It really only specifies the format of the message when it is being
transmitted, though it makes sense to use it other places.

What's amazing to me is that we're still using this format which has this
obvious flaw and is something like 15 years old. We really could use
something better and more efficient, especially considering the size of
mail boxes and demands we make of them now (e.g. threading, indexing...).
I know Mark as got the Tenex format, and the folks at CMU have one for
their IMAP server, and there are others.

Laurence Lundblade
L...@CSGrad.CS.VT.EDU (and a few other addresses)
http://oneworld.wa.com/laurence/home.html
Virginia Tech CS -- Blacksburg, Virginia, US -- 703-552-2537

Keith Moore

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
> >The From line shouldn't be more rigid. Some versions of Ultrix mail
> >do that, and it sucks. An undocumented restriction which surfaced
> >when I replaced sendmail: No fun.
>
> Pine already has an extremely rigid check on the format of the From
> line.
>
> Completely improperly, IMO.

This is a black art, not a science. If you really know about all
possible formats of the "From " line, it makes sense to do a rigid
check. If there are variants that you don't know about, a rigid check
won't work. But no naive strategy (either just checking "From " or
doing a very rigid check) is going to work well.

My personal preference would be to look for something beginning with
"From " but which includes things that look like an email address, a
month name, a time, and a year.

> Also IMO the date format on the From line is already broken, as it is
> apparently ctime() format, instead of an RFC822 date or a RFC1123 date.

The "From " line isn't supposed to be in RFC822 format because it
isn't part of the message. It's part of the mailbox format, which in
the Internet world is left up to the local system. UNIX mail has used
ctime() format ever since V7 (at least). It predates RFC 822 by a few
years.

> >But I might go for "From " on one line and a valid SMTP header on
> >the next (roughly, word-colon).
>
> Which is close to what my reader does (word-colon-whitespace).

Note that while this is common, RFC 822 doesn't require that the colon
immediately follow the word, or that a space immediately follow the
colon.

From :(random stuff)mo...@cs.utk.edu

is a legal RFC 822 header.

--
Keith Moore http://www.cs.utk.edu/~moore/
Computer Science Dept. / Univ of Tenn / 107 Ayres Hall / Knoxville TN 37996

US Gov't at war with the Internet: Senator Exon rapes 1st Amendment.
Justice Department harasses PGP author Phil Zimmerman. Clinton denies
crypto export, pushes Clipper Chip. Fight now, while there's still a chance.

Arnt Gulbrandsen

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
>On 22 Jun 1995, Phillip Vandry wrote:
>> Aren't we supposed to use Content-Length: instead of "From " to detect
>> the start of messages?
>
>If you believe what appears in Content-Length: headers, I have a special
>on bridges right now. I've managed to knock out more than one MUA by
>sending it evil Content-Length: values.

Seconded.

>A better solution is to make the test for "From " headers more rigid
>(don't just consider every line that appears with "From " to be a start of
>message mark, example the syntax of the line carefully), and apply a
>similar check to the agent that applies quoting. False positives, with a
>rigid enough test, are very rare.

The From line shouldn't be more rigid. Some versions of Ultrix mail


do that, and it sucks. An undocumented restriction which surfaced
when I replaced sendmail: No fun.

But I might go for "From " on one line and a valid SMTP header on
the next (roughly, word-colon).

--Arnt

Keith Moore

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
This is my second followup to a message written by Tony earlier today...

> While Keith's answer is essentially correct, it is filled with much
> unnecessary vitriol.

After thinking about it for awhile, he's right. Actually he put it
rather well.

I'll admit to a fair amount of frustration as a result of all the
incompatibilities caused by the "UNIX wars". I expressed my earlier
responses to the Content-Length question in very strong terms --
hoping that doing so would discourage use of Content-Length in any new
mail-based applications, and minimize the amount of lossage that it
causes.

But frustration can hinder clear expression, and my venting does
nothing to ease the tensions between the casualties on both sides of
the UNIX wars.

So while I stand by the technical details of my earlier statements, I
want to publically apologize for any ill feeling that I might have
caused by my means of expressing them.

Keith

Rahul Dhesi

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
In <950622204...@ig1.att.att.com> han...@pegasus.ATT.COM writes:

>< NO. Content-Length is NOT standard. It is brain-damage left over from

>< AT&T Mail...

>...Calling


>it brain damage is going a bit too far.

The Content-Length header is a sign of brain damage because:
Counting characters in text files is not very nonportable.

They should have counted lines.
--
Rahul Dhesi <dh...@rahul.net>
"please ignore Dhesi" -- Mark Crispin <m...@CAC.Washington.EDU>

Barton E. Schaefer

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
On Jun 23, 7:41pm, Keith Moore wrote:
} Subject: Re: How can I send mail with the word "From" at the start of a li

}
} The "From " line isn't supposed to be in RFC822 format because it
} isn't part of the message. It's part of the mailbox format, which in
} the Internet world is left up to the local system. UNIX mail has used
} ctime() format ever since V7 (at least). It predates RFC 822 by a few
} years.

The lore I'm familiar with says that the original "From " line format
was the same as the one defined in RFC976 section 2.4 (UUCP envelopes),
except that the "remote from ..." part is typically dropped when used
in mailboxes. This makes sense to me, since the earliest UNIX mail
predated TCP networks and SMTP. Have I been mislead all these years?

--
Bart Schaefer Vice President, Technology, Z-Code Software
scha...@z-code.com Division of NCD Software Corporation
http://www.well.com/www/barts


Leslie Mikesell

unread,
Jun 23, 1995, 3:00:00 AM6/23/95
to
In article <950622204...@ig1.att.att.com>,
<han...@pegasus.ATT.COM> wrote:

>< Content-Length is a clear case where the cure is worse than the disease.


>
>While Keith's answer is essentially correct, it is filled with much
>unnecessary vitriol.

Yes, the "disease" in this case is simply a refusal to agree on the
proper handling. Escaping "From " could have worked if it had
been done in a way that could have been reversed. Likewise,
Content-Length: could have worked with a few additional tweaks. But,
rather than making someone else's concept work everyone has preferred
to start over from scratch.

>I agree that the use of content-length should never
>have leaked outside of the system, should have remained strictly as part of
>the mailbox format, and it is EXTREMELY unfortunate that it did so. Calling

>it brain damage is going a bit too far.

It is SMTP that is brain damaged, and Content-Length isn't quite
enough to fix it even when the receiver regenerates it. Suppose
you have a message that does not end with a newline. SMTP
requires a message to end with a newline so the sender must append
one. Now, even if the rest of the message is sent unchanged, the
receiver's concept of length will be wrong since it has no indication
that the final newline was not part of the message. The problem shows
up when you try to use an AT&T PMX-mailer that uses nested Content-Type:,
Content-Length: headers for multipart attachments. The PC versions
at least get very upset when the initial Content-Length: is one
more than the length of the actual total of the parts.

>When you start dealing with unencoded arbitrary-content (binary or
>otherwise) messages, you need SOME form of delineation for quickly scanning
>a mailbox, and using content-length is far superior to using From<sp> as the
>delimiter. Using separate files per message is another approach which some
>systems have successfully used. Using a database (message store, etc.) is
>another approach. Using out-of-band information is another approach. Other
>schemes exist. ALL have their drawbacks.

For some reason I've never seen anyone use the scheme that seems the
most reasonable to me. Let the transport drop 1 message per file
in a directory for each user, using links for multiple recipients.
Then have the user agent collect anything you want to keep for long
term storage into some compressed format like zip or zoo. This
eliminates the need for MTA and MUA to agree on delimiters and
generally avoids locking contention.

Les Mikesell
l...@mcs.com

Keith Moore

unread,
Jun 24, 1995, 3:00:00 AM6/24/95
to
> >< Content-Length is a clear case where the cure is worse than the disease.
> >
> >While Keith's answer is essentially correct, it is filled with much
> >unnecessary vitriol.
>
> Yes, the "disease" in this case is simply a refusal to agree on the
> proper handling. Escaping "From " could have worked if it had
> been done in a way that could have been reversed. Likewise,
> Content-Length: could have worked with a few additional tweaks. But,
> rather than making someone else's concept work everyone has preferred
> to start over from scratch.

Yes, one of the problems with starting over from scratch is that you
make a different set of mistakes.

> It is SMTP that is brain damaged, and Content-Length isn't quite
> enough to fix it even when the receiver regenerates it. Suppose
> you have a message that does not end with a newline. SMTP
> requires a message to end with a newline so the sender must append
> one.

You could call this a design defect of SMTP...if SMTP were ever
intended to carry anything but text files. But SMTP was specifically
designed to carry only text files, and to do so in an environment
where text files had a different storage representation on every
system...as they did in the late 70's ARPANET. The only brain damage
is in (a) trying to use unmodified SMTP for something other than text
messages, and (b) trying to use Content-Length to encode anything
transmitted over SMTP.

Of course, this would probably not have happened if AT&T mail had been
clearly distinguishable from RFC 822 (and vice versa).

> Now, even if the rest of the message is sent unchanged, the
> receiver's concept of length will be wrong since it has no
> indication that the final newline was not part of the message. The
> problem shows up when you try to use an AT&T PMX-mailer that uses
> nested Content-Type:, Content-Length: headers for multipart
> attachments.

In other words, the problem shows up when you try to shove something
through SMTP that isn't a valid message for SMTP. SMTP is
specifically limited to text messages consisting of lines of 1000
characters or less of ASCII characters, each line terminated by a CRLF
pair.

> For some reason I've never seen anyone use the scheme that seems the
> most reasonable to me. Let the transport drop 1 message per file
> in a directory for each user, using links for multiple recipients.

CMU's Andrew Message System does something very similar to this.
Indeed, the mail transport mechanism is trivial.

Keith

Rahul Dhesi

unread,
Jun 24, 1995, 3:00:00 AM6/24/95
to
For storing mailboxes in index format, the simplest approach is the
one used by MH: one message per file, and a mailbox is a directory.

For storing messages in a single file, here is a simple scheme that
will remove all ambiguity. It's a combination of old and new
strategies.

1. (old) 'From ...' (standard pattern) at beginning of line (BOL)
begins a message
2. (old) 'From ' at BOL, when part of the message body, is
converted to '>From ' before the message is added into
a mailbox.
3. (new) '>From ', '>>From ', '>>>From ', etc., at BOL, when part of the
message body, are escaped by prepending on '>' before the message
is added into a mailbox.
4. (new) The mail agent always strips out one '>' from any instance of
'>From ', '>>From ', '>>>From ', etc. at BOL, before showing the message
to the user or moving it from a mailbox into any non-mailbox location.

Ok, look at the advantages of this scheme.

1. A mail reader using the above scheme is 100% compatible with
existing mailbox formats.

2. Said mail reader will correctly show 'From ' at BOL in a message
body, by stripping out the superfluous '>' that is added by existing
mail delivery agents.

3. If said mail reader finds occurrences of '>>From ', '>>>From ',
etc., at BOL in a message body, it may unnecessarily strip out one '>'.
In practice this is unlikely to cause problems.

4. Mail readers and delivery agents can be incrementally revised to
use this scheme.

John Gardiner Myers

unread,
Jun 24, 1995, 3:00:00 AM6/24/95
to
han...@pegasus.ATT.COM writes:
> I sincerely think SOME scheme will be necessary in the future when we start
> seeing binary mime messages flowing around the network. Will YOUR mailbox be
> able to hold them without re-encoding the binary pieces?

Mine would, were it not that IMAP4 prohibits transmitting NUL octets.
Re-encoding at fetch time would be possible, though challenging.

In order to hold binary MIME messages, the mailbox format really needs
to store the message in canonical MIME format. That is, lines have to
be separated by CRLF, not LF. None of the existing popular mailbox
formats qualify.

I think we need standard mailbox protocols (like IMAP) more than we
need standard mailbox formats. If there is to be a standard mailbox
interchange format capable of handling binary MIME messages, we need
to start from scratch. I have some ideas for how to design
multipart/mailbox--IETF'ers can talk to me at Stockholm if they're
interested.

Phillip Vandry

unread,
Jun 24, 1995, 3:00:00 AM6/24/95
to
l...@MCS.COM (Leslie Mikesell) writes:

>For some reason I've never seen anyone use the scheme that seems the
>most reasonable to me. Let the transport drop 1 message per file
>in a directory for each user, using links for multiple recipients.

>Then have the user agent collect anything you want to keep for long
>term storage into some compressed format like zip or zoo. This
>eliminates the need for MTA and MUA to agree on delimiters and
>generally avoids locking contention.

Sure, mh and derived systems do it more or less this way. The linking
idea is not necesarily good, as the messages would have to be write
protected for the recipient t oavoid messing with other people's copies.
And it would have been a great system if we could in fact allow ourselves
to ditch everything we already have and start again.

-Phil

Keith Moore

unread,
Jun 24, 1995, 3:00:00 AM6/24/95
to
> I think we need standard mailbox protocols (like IMAP) more than we
> need standard mailbox formats.

I agree. But standard mailbox formats are the next thing, because
you'll still have a lot of installations that want to support IMAP,
POP, *and* direct file access from lots of different user agents.
Having a standard mailbox format would mean that all of these
could interoperate with each other.

> If there is to be a standard mailbox
> interchange format capable of handling binary MIME messages, we need
> to start from scratch.

Agreed.

Keith

Rahul Dhesi

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
In <3sg2m5$c...@Mars.mcs.com> l...@MCS.COM (Leslie Mikesell) writes:

>Suppose
>you have a message that does not end with a newline.

Isn't this, by definition, impossible in a text file?

A text file that ends with
a line that does not end with a 'newline'
ends with
a line that does not end.

Therefore said text file does not end.

Phillip Vandry

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
mo...@cs.utk.edu (Keith Moore) writes:

>You could call this a design defect of SMTP...if SMTP were ever
>intended to carry anything but text files. But SMTP was specifically
>designed to carry only text files, and to do so in an environment
>where text files had a different storage representation on every
>system...as they did in the late 70's ARPANET. The only brain damage
>is in (a) trying to use unmodified SMTP for something other than text
>messages, and (b) trying to use Content-Length to encode anything
>transmitted over SMTP.

So maybe there should be an SMTP extension where, for example, only
two CR-LF pairs in a row are considered an actual line terminator,
so that all single pairs are soft breaks for transmission. An
implementation supporting this extension as well as 8BITMIME would
be suitable for the transmission of binary data, since this extension
allows one to get around the 1000 character line length limit.

-Phil

Phillip Vandry

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
John Gardiner Myers <jg...@CMU.EDU> writes:

>I think we need standard mailbox protocols (like IMAP) more than we

>need standard mailbox formats. If there is to be a standard mailbox


>interchange format capable of handling binary MIME messages, we need

>to start from scratch. I have some ideas for how to design
>multipart/mailbox--IETF'ers can talk to me at Stockholm if they're
>interested.

Really? As was just recently said in this newsgroup, starting over
from scratch means we make a new set of mistakes. It seems to me a
modified IMAP would do the trick.

-Phil

John Gardiner Myers

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
van...@CAM.ORG (Phillip Vandry) writes:
> So maybe there should be an SMTP extension where, for example, only
> two CR-LF pairs in a row are considered an actual line terminator,
> so that all single pairs are soft breaks for transmission.

There's also the problem of NUL characters, as well as CR and LF
characters which are not part of a CRLF line separation sequence.

There is an experimental BINARYMIME extension which is used with
a "CHUNKING" extension for sending the message over using octet
counting instead of dot stuffing.

Ned Freed

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
> > You could call this a design defect of SMTP...if SMTP were ever
> > intended to carry anything but text files. But SMTP was specifically
> > designed to carry only text files, and to do so in an environment
> > where text files had a different storage representation on every
> > system...as they did in the late 70's ARPANET. The only brain damage
> > is in (a) trying to use unmodified SMTP for something other than text
> > messages, and (b) trying to use Content-Length to encode anything
> > transmitted over SMTP.

> So maybe there should be an SMTP extension where, for example, only


> two CR-LF pairs in a row are considered an actual line terminator,

> so that all single pairs are soft breaks for transmission. An
> implementation supporting this extension as well as 8BITMIME would
> be suitable for the transmission of binary data, since this extension
> allows one to get around the 1000 character line length limit.

In fact an extension to carry binary data over SMTP was defined well over a
year ago and just received approval from the IESG as a experimental protocol.
Its somewhat different from what you propose -- approaches along the lines of
yours were considered but rejected for various reasons I don't want to get into
here.

See the Internet Draft draft-mailext-smtp-binary-05.txt for further details, or
else wait for the RFC to come out.

Ned

Leslie Mikesell

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
In article <3sje29$5...@hustle.rahul.net>, Rahul Dhesi <dh...@rahul.net> wrote:

>>Suppose
>>you have a message that does not end with a newline.
>
>Isn't this, by definition, impossible in a text file?

Yes, but many years ago people realized that it is useful to send
more than lines of text by email. The "Content-Length: header
being discussed was an attempt at a general solution rather than
simply addressing how to allow lines starting with "From " in
a mailbox file. That is, it allows encapsulating arbitrary
content within a mailbox. It just doesn't mix well with SMTP.

Les Mikesell
l...@mcs.com

Leslie Mikesell

unread,
Jun 25, 1995, 3:00:00 AM6/25/95
to
In article <1995062408...@wilma.cs.utk.edu>,
Keith Moore <mo...@cs.utk.edu> wrote:

>Yes, one of the problems with starting over from scratch is that you
>make a different set of mistakes.

But more to the point, no one ever puts the big picture together when
they start over to handle each new detail separately.

>You could call this a design defect of SMTP...if SMTP were ever
>intended to carry anything but text files. But SMTP was specifically
>designed to carry only text files, and to do so in an environment
>where text files had a different storage representation on every
>system...as they did in the late 70's ARPANET. The only brain damage
>is in (a) trying to use unmodified SMTP for something other than text
>messages, and (b) trying to use Content-Length to encode anything
>transmitted over SMTP.

Yes, so now we have modified UA's that try to work around the problems
of SMPT, but of course they have no way to know when sending whether
the recipient's UA is compatible. There are also modified versions
of SMPT, which *can* know whether the other end is compatible, but
the modifications don't quite ensure binary transparency, so another
round of modifications is guaranteed. And of course there is almost
no hope that IMAP will track the SMTP modifications. The most likely
result of all this is that in a few years we will all have UA's
encoding everything even though the transports no longer need it.
Worse, the encoding into a text format by the UA makes the data
nonportable when you attempt to move it by any means other than SMTP.

Les Mikesell
l...@mcs.com

Kari E. Hurtta

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
In article <01HS4LYPI...@SIGURD.INNOSOFT.COM> N...@sigurd.innosoft.com (Ned Freed) wrote:
<...>
» See the Internet Draft draft-mailext-smtp-binary-05.txt for further details, or

» else wait for the RFC to come out.

You mean: draft-ietf-mailext-smtp-binary-06.txt ?
----

--
- K E H / Elämä on monimutkaista
Kari....@Helsinki.FI

Keith Moore

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
> So maybe there should be an SMTP extension where, for example, only
> two CR-LF pairs in a row are considered an actual line terminator,
> so that all single pairs are soft breaks for transmission. An
> implementation supporting this extension as well as 8BITMIME would
> be suitable for the transmission of binary data, since this extension
> allows one to get around the 1000 character line length limit.

Someone's already written an SMTP extension that is binary transparent,
I believe the IESG just approved it as an Experimental Protocol.
(see:
ftp://ds.internic.net/internet-drafts/draft-ietf-mailext-smtp-binary-07.txt
)

BTW, binary transparent SMTP opens up a HUGE can of worms when you
try to interface it to any other binary transparent mail transport.
If you're trying to relay a MIME message from one environment to
the other, you have to translate some body parts to the EOL convention
of the destination environment (those that are text encoded with
content-transfer-encoding: binary) but not others. This is why
it's an experimental protocol -- the protocol itself is fine, but
the implications of interfacing it with other mail systems
are not yet well understood.

Anyway, the problem with the "From " line is not with SMTP; it's with
mailbox formats. Solving the SMTP problem won't fix the lack of
transparency in the mailbox.

Keith

Phillip Vandry

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
Keith Moore Wrote:
>
> Someone's already written an SMTP extension that is binary transparent,
> I believe the IESG just approved it as an Experimental Protocol.
> (see:
> ftp://ds.internic.net/internet-drafts/draft-ietf-mailext-smtp-binary-07.txt
> )

Interestingly, I came across that draft only about an hour after my post.

> Anyway, the problem with the "From " line is not with SMTP; it's with
> mailbox formats. Solving the SMTP problem won't fix the lack of
> transparency in the mailbox.

As I view it, the "From " problem is much less of a pain. If you want
it solved for yourself, just use an MUA that doesn't have it, and
arrange for your local mail to be delivered differently. Of course
solving the problem for yourself doesn't make it go away :-(

-Phil

Keith Moore

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to

I'll put it another way: if there is to be a standard mailbox format
capable of handling binary MIME messages, it must be incompatible with
both of the commonly used UNIX mailbox formats. (Because one of them
isn't transparent, and you can't reliably tell one from the other.)

Keith

Keith Moore

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
> Yes, so now we have modified UA's that try to work around the problems
> of SMPT, but of course they have no way to know when sending whether
> the recipient's UA is compatible. There are also modified versions
> of SMPT, which *can* know whether the other end is compatible, but
> the modifications don't quite ensure binary transparency, so another
> round of modifications is guaranteed.

Yes, but SMTP "Classic"'s 7-bit-only design was an attempt to work
around the imcompatibilities between the machines of the early 1970s
ARPAnet. What we see today as "stripping the 8th bit on transmit"
was intended as "everyone will use the same character set on-the-wire".

In that day there was no such thing as a binary file format which was
portable across machines, because the machines were too different
from one another and because most i/o devices were limited to text
anyway.

It's also worth pointing out that SMTP and 822 were very successful,
while some more capable and ambitious mail systems aren't used very
much. If SMTP is so bad, why didn't the more capable systems succeed?

Of course, you could say that they should have had the foresight
to realize that there would be tens of millions of folks on the
Internet in just 20 years time, that the Internet would reach
around the world, that high resolution color video displays,
audio devices, laser printers, etc. would be cheap and commonplace,
and that RFC 822 and SMTP would be relatively unchanged for something
like 20 years before they got substantially revised.

If you've got a crystal ball that can see that well into the
future, why aren't you rich yet?

Keith

Phillip Vandry

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
Keith Moore Wrote:
>
> I'll put it another way: if there is to be a standard mailbox format
> capable of handling binary MIME messages, it must be incompatible with
> both of the commonly used UNIX mailbox formats. (Because one of them
> isn't transparent, and you can't reliably tell one from the other.)

Oops, I think I misread your intentions. I thought you wanted to start
an IMAP-like protocol from scratch.

-Phil

Leslie Mikesell

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
In article <1995062604...@wilma.cs.utk.edu>,
Keith Moore <mo...@cs.utk.edu> wrote:

>Yes, but SMTP "Classic"'s 7-bit-only design was an attempt to work
>around the imcompatibilities between the machines of the early 1970s
>ARPAnet. What we see today as "stripping the 8th bit on transmit"
>was intended as "everyone will use the same character set on-the-wire".

And we now know that design was inadequate for today's needs. But
even though we know the only correct way to handle multiple character
sets is to label them at the source and let the destination do its
best to map them into something useful, no one wants to re-think
text handling in general in that same context so that binary transparent
transports can work.

>In that day there was no such thing as a binary file format which was
>portable across machines, because the machines were too different
>from one another and because most i/o devices were limited to text
>anyway.

I understand the history, and agree that a text-only mechanism with
forced conversions is useful for dealing with unusual machines.
However where the only difference is the end-of-line convention giving
up the possibility of simple binary transparent mechanisms seems
like a pretty high price compared to just dealing with the difference
at the receiving side.

>It's also worth pointing out that SMTP and 822 were very successful,
>while some more capable and ambitious mail systems aren't used very
>much. If SMTP is so bad, why didn't the more capable systems succeed?

The success of SMTP has more to do with the existence of a centrally
managed name and address system and free software than its feature
set.

>Of course, you could say that they should have had the foresight
>to realize that there would be tens of millions of folks on the
>Internet in just 20 years time, that the Internet would reach
>around the world, that high resolution color video displays,
>audio devices, laser printers, etc. would be cheap and commonplace,
>and that RFC 822 and SMTP would be relatively unchanged for something
>like 20 years before they got substantially revised.

Not at all. I have no particular problem with SMTP/RFC822 as a
70's design to move text around or even with maintaining it as
a least-common-denominator fallback. My complaints are about the
changes that are supposed to bring it into the 90's. Even though
everyone should know that we need a general and efficient mechanism
to label and transport arbitrary content we have instead lots of
separate and diverse add-ons that don't solve the whole problem
and each break something else. The differences between the
requirements for a system backup/archive and a mailbox holding
arbitrary content are trivial compared to the similarities but no
one seems to consider using a mechanism that would accomodate both.
Likewise the needs are very similar for host<->host transport and
user-agent<->mailbox transport, especially as we move to one or
more machines per user instead of many users/machine, but there
is not a protocol to handle both. The worst part about this is
that each specific mechanism is inherently tied to tcp/ip in it's
own specific way. That means that you have to solve the same problem
over again for machines that are not ip-connected, even if you
already have arranged convienient binary-transparent file exchange.
It's almost enough to make me hope that IPng doesn't solve the
address space problem anytime soon so that everyone will be forced
to deal with a mix of IP and alternative protocols and perhaps
realize that standards that only define on-the-wire formats are
not enough.

Les Mikesell
l...@mcs.com

Keith Moore

unread,
Jun 26, 1995, 3:00:00 AM6/26/95
to
> But
> even though we know the only correct way to handle multiple character
> sets is to label them at the source and let the destination do its
> best to map them into something useful

Actually, we don't know anything of the sort. It happens that this
is about the best we can do at the present time. In a few years,
it's possible that a truly international character set will be
deployed on and used by most platforms, and people like you will
be saying that we were brain-damaged to have so many separate
character sets in MIME.

> no one wants to re-think
> text handling in general in that same context so that binary transparent
> transports can work.

Wrong. It's been thought about a great deal, but it's very difficult
to do without significant negative impact to the installed base, and
very little immediate benefit to drive it.

> >In that day there was no such thing as a binary file format which was
> >portable across machines, because the machines were too different
> >from one another and because most i/o devices were limited to text
> >anyway.
>
> I understand the history, and agree that a text-only mechanism with
> forced conversions is useful for dealing with unusual machines.

Except that such machines weren't unusual in that day.

If we want a mail transport, message format, or mailbox format
to be accepted, it needs to be targeted toward usual machines
today, and the kinds of support that they already have in place.

> However where the only difference is the end-of-line convention giving
> up the possibility of simple binary transparent mechanisms seems
> like a pretty high price compared to just dealing with the difference
> at the receiving side.

The price of fixing things as you suggest is a mandantory change
on every end-system, with little or no user-perceived benefit.
That's a pretty high price.

> >It's also worth pointing out that SMTP and 822 were very successful,
> >while some more capable and ambitious mail systems aren't used very
> >much. If SMTP is so bad, why didn't the more capable systems succeed?
>
> The success of SMTP has more to do with the existence of a centrally
> managed name and address system and free software than its feature
> set.

Except that SMTP doesn't use a centrally managed name and address system.
And the existance of usable free software for SMTP is not unrelated to
its simplicity.

> >Of course, you could say that they should have had the foresight
> >to realize that there would be tens of millions of folks on the
> >Internet in just 20 years time, that the Internet would reach
> >around the world, that high resolution color video displays,
> >audio devices, laser printers, etc. would be cheap and commonplace,
> >and that RFC 822 and SMTP would be relatively unchanged for something
> >like 20 years before they got substantially revised.
>
> Not at all. I have no particular problem with SMTP/RFC822 as a
> 70's design to move text around or even with maintaining it as
> a least-common-denominator fallback. My complaints are about the
> changes that are supposed to bring it into the 90's. Even though
> everyone should know that we need a general and efficient mechanism
> to label and transport arbitrary content we have instead lots of
> separate and diverse add-ons that don't solve the whole problem
> and each break something else.

MIME and SMTP extensions are both compromises, designed to give
as much new value as possible with minimum impact to the installed
base. We could have provided a little bit more value, but doing
so would have had a much greater impact, which in turn would have
made it much more difficult for MIME to be accepted.

> The differences between the
> requirements for a system backup/archive and a mailbox holding
> arbitrary content are trivial compared to the similarities but no
> one seems to consider using a mechanism that would accomodate both.

Maybe you just haven't looked at the real requirements for each.
Off the top of my head I can think of a lot of things that a
backup format should do that would be superfulous for mailboxes,
(like tape management and incremental backup/restore)
and a lot of things that I want a mailbox format to support
(like searching of content, efficient deleting, annotating) that
have nothing to do with a backup format.

> Likewise the needs are very similar for host<->host transport and
> user-agent<->mailbox transport, especially as we move to one or
> more machines per user instead of many users/machine, but there
> is not a protocol to handle both.

People do use SMTP for posting messages. It turns out that it's
not quite adequate. On the other hand, NNTP was *designed* to
use the same protocol for both, but it's moving toward separate
protocols for reading/posting and relaying, because combining
the two turns out to be sub-optimal.

> The worst part about this is
> that each specific mechanism is inherently tied to tcp/ip in it's
> own specific way.

SMTP is not tied to tcp/ip. People can and do use it over X.25,
DECnet, and even bare modem connections. RFC 822 isn't tied to
tcp either.

Having a canonical end of line convention is not a constraint
imposed by TCP. It's a practical way to get dissimilar systems
to interoperate.

> That means that you have to solve the same problem
> over again for machines that are not ip-connected, even if you
> already have arranged convienient binary-transparent file exchange.
> It's almost enough to make me hope that IPng doesn't solve the
> address space problem anytime soon so that everyone will be forced
> to deal with a mix of IP and alternative protocols and perhaps
> realize that standards that only define on-the-wire formats are
> not enough.

On the other hand, standards that are grossly incompatible with
current practice are non-starters. If the purpose of standards
is to encourage interoperation, the ones that get implemented
do much better than those that sit on a shelf.

Keith

Kari E. Hurtta

unread,
Jun 28, 1995, 3:00:00 AM6/28/95
to
van...@CAM.ORG (Phillip Vandry) writes:
»hur...@dionysos.fmi.fi (Kari E. Hurtta) writes:

»>»Is there a way to avoid this?

»>Send mail with encoded with base64 (also quoted-printable may be
»>sufficient -- most of implementations encodes "From ") by using MIME.

»Aren't we supposed to use Content-Length: instead of "From " to detect
»the start of messages?

Content-Length: is one of mailbox formats. It have nothing do with
formats in wire. And because it is mailbox format, you must able
to control receivers environment to use that solution.

So therefore that isn't solution.

[ COmment cc'ed to Phillip Vandry ]


Michael J Matthews

unread,
Jun 30, 1995, 3:00:00 AM6/30/95
to
In article <1995062219...@wilma.cs.utk.edu>,

Keith Moore <mo...@CS.UTK.EDU> wrote:
>> Aren't we supposed to use Content-Length: instead of "From " to detect
>> the start of messages?
>
>NO. Content-Length is NOT standard. It is brain-damage left over
>from AT&T Mail (which is NOT compatible with Internet mail, though
>they are similar), and leaked into the Internet via various SysV-based
>products, though much of the damage is from Solaris. It doesn't work
>in Internet mail transport because different hosts have different
>representation of end-of-line.

I think the point of CL is to eliminate the need for mail transports to
worry about the concept of 'lines' and was put there to support the
transfer of non-ascii mail. If all mail transports adhered it is a very
robust solution. You might wonder why the mail transport has to go in and
fool with the message. Let the mail agent do that. I guess if there is a
proliferation of bizarre EOL's out there it would be necessary but doesn't
boil down to DOS/UNIX CRNL to NL. If you must filter why not CRNL to SPNL
which won't effect the byte count.

Using ATT mail I received mail with From starting a line, no problem.

I think email should embody a simple straight forward ascii header
encapsulating some kind of message, whatever that might be. I think that
is what ATT Mail AND MIME embody.
--
Michael J Matthews
mmat...@fast.net

Hugh McIntyre

unread,
Jun 30, 1995, 3:00:00 AM6/30/95
to
In article <3svms0$1...@nn.fast.net>, m...@mmatthew.fast.net (Michael J Matthews) writes:
|> I think the point of CL is to eliminate the need for mail transports to
|> worry about the concept of 'lines' and was put there to support the
|> transfer of non-ascii mail. If all mail transports adhered it is a very
|> robust solution. You might wonder why the mail transport has to go in and
|> fool with the message. Let the mail agent do that. I guess if there is a
|> proliferation of bizarre EOL's out there it would be necessary but doesn't
|> boil down to DOS/UNIX CRNL to NL. If you must filter why not CRNL to SPNL
|> which won't effect the byte count.

Many files have the convention of backslash-newline (without intervening space)
meaning a line continuation. If you add a space you'll screw this up.

Hugh.

--
| Hugh McIntyre, | hu...@bristol.st.com
| SGS-Thomson Microelectronics Ltd, |or hugh.m...@bristol.st.com
| 1000 Aztec West, Bristol, BS12 4SQ, UK. |or hu...@inmos.co.uk
| Tel:+44(0)1454 611443, Fax:+44(0)1454 620688 |or hugh.m...@st.com

Keith Moore

unread,
Jun 30, 1995, 3:00:00 AM6/30/95
to
> I think the point of CL is to eliminate the need for mail transports to
> worry about the concept of 'lines' and was put there to support the
> transfer of non-ascii mail. If all mail transports adhered it is a very
> robust solution.

I hate to keep beating on a dead horse, but there are some dangerous
misconceptions here.

(a) the vast majority of mail transports *don't* adhere to it, and
never have in the 20+ year history of email on the {ARPA,Inter}net.
Trying to change all of those MTAs is a complete non-starter.

(b) the message header is for use by USER AGENTS, not by mail transport.

(c) while content-length might be usable end-to-end in some environments,
the only practical use in Internet mail is in the mailbox.

> You might wonder why the mail transport has to go in and
> fool with the message. Let the mail agent do that. I guess if there is a
> proliferation of bizarre EOL's out there it would be necessary but doesn't
> boil down to DOS/UNIX CRNL to NL.

Actually, no, it doesn't. There's a lot more "bizarre" behavior out there
than you imagine.

> If you must filter why not CRNL to SPNL which won't effect the byte count.

Because doing so would change the content/meaning of many messages.

Keith Moore

0 new messages