Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

IMAP (RfC 3501) not adequate to handle non MIME mails?

2 views
Skip to first unread message

Sascha Wilde

unread,
Nov 16, 2009, 10:10:05 AM11/16/09
to
Hi *,

I just learned that the proposed IMAP standard forbids unencoded NUL
characters in mail bodies. At least this is how I understand:

4.3.1. 8-bit and Binary Strings

[...]
Although a BINARY body encoding is defined, unencoded binary strings
are not permitted. A "binary string" is any string with NUL
characters. Implementations MUST encode binary data into a textual
form, such as BASE64, before transmitting the data.
[...]

by contrast RfC822 (STD11) as well as all proposed successors (RfC2822
and 5322), _do_ allow any ASCII character (including NUL) except CRLF in
the mail body. For example in RfC2822:

2.3. Body

The body of a message is simply lines of US-ASCII characters. The
only two limitations on the body are as follows:

- CR and LF MUST only occur together as CRLF; they MUST NOT appear
independently in the body.

- Lines of characters in the body MUST be limited to 998 characters,
and SHOULD be limited to 78 characters, excluding the CRLF.

[...]

In conclusion this seems to mean, that despite the goals stated in the
abstract section of RfC3501 IMAP is not suitable as a generic tool to
handle arbitrary RfC2822 conforming messages.

I would like to know if there are any flaws in my above analysis.

Actually I would really be happy if somebody could proof me wrong as
otherwise I would have to consider IMAP being broken by design, which
would be a pity. ;-)

cheers
sascha
--
Sascha Wilde
- no sig today... sorry!

Netsurfeur

unread,
Nov 16, 2009, 1:48:09 PM11/16/09
to Sascha Wilde
Sascha Wilde wrote :

Hi,

From RFC 8822:

2.1. General Description

At the most basic level, a message is a series of characters. A
message that is conformant with this standard is comprised of
characters with values in the range 1 through 127 and interpreted as
US-ASCII characters [ASCII]. For brevity, this document sometimes
refers to this range of characters as simply "US-ASCII characters".

Note: This standard specifies that messages are made up of characters
in the US-ASCII range of 1 through 127. There are other documents,
specifically the MIME document series [RFC2045, RFC2046, RFC2047,
RFC2048, RFC2049], that extend this standard to allow for values
outside of that range. Discussion of those mechanisms is not within
the scope of this standard.
[...]

US-ASCII is defined as characters in the range 1 to 127; so obviously,
NUL (0) is out of range.

--
Netsurfeur

Mark Crispin

unread,
Nov 16, 2009, 4:25:03 PM11/16/09
to
On Mon, 16 Nov 2009, Sascha Wilde posted:

> I just learned that the proposed IMAP standard forbids unencoded NUL
> characters in mail bodies.

Correct.

NULs in data can be transmitted using the uncommon BINARY extension (RFC
3516) but not otherwise.

> by contrast RfC822 (STD11) as well as all proposed successors (RfC2822
> and 5322), _do_ allow any ASCII character (including NUL) except CRLF in
> the mail body.

Read RFC 2822 and 5322 more carefully. Both explicitly define US-ASCII as
being the range from 0x01 - 0x7f. For example, in RFC 5322:

At the most basic level, a message is a series of characters. A

message that is conformant with this specification is composed of
characters with values in the range of 1 through 127 and interpreted
as US-ASCII [ANSI.X3-4.1986] characters. For brevity, this document


sometimes refers to this range of characters as simply "US-ASCII
characters".

Note: This document specifies that messages are made up of


characters in the US-ASCII range of 1 through 127. There are

other documents, specifically the MIME document series ([RFC2045],
[RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that
extend this specification to allow for values outside of that


range. Discussion of those mechanisms is not within the scope of

this specification.

> In conclusion this seems to mean, that despite the goals stated in the
> abstract section of RfC3501 IMAP is not suitable as a generic tool to
> handle arbitrary RfC2822 conforming messages.

As an RFC 2822/5322 compliant message does not contain NULs, your
conclusion is not valid.

> Actually I would really be happy if somebody could proof me wrong as
> otherwise I would have to consider IMAP being broken by design, which
> would be a pity. ;-)

I did indeed set out to abolish NULs in email by design, and eventually
succeeded (as can be noted in 2822 and 5322). If I had been able to, I
would have abolished the lamentable CTRL characters as well.

2822 and 5322 also prohibit the use of CR and LF except as part of a CRLF
sequence to indicate newline. Specifically, the historical use of CR for
overstriking text, and LF to move the cursor down vertically without
horizontal motion to the left edge, is abolished. IMAP does not enforce
that prohibition, but in general anything that you see in IMAP will also
respect it.

There is no use case today for unencoded NULs in strings that can not be
better accomodated with some form of encoding.

There are abundant reasons not to allow NULs in strings. First and
foremost, no matter how we rant and rave that programmers should not use
it, many programming environments (including C) use the NUL-terminated
string as the basic string type. It is all too tempting for programmers
to use it. [For example, OpenSSL has a nasty security bug in which the
name of an X509 certificate is passed as a char* with no size count, which
causes a naive validator that extracts the CN from this string to get a
truncated string.]

A secondary issue is that NUL has no widely accepted representation in
visible text. It, like other CTRLs, is often represented as its name in a
box; but there is no reason to believe that the recipient of an email
message will see it in that form.

One old reason for allowing NUL (its use for the center-dot glyph) is
ancient history that people under the age of 50 never experienced, and
those of us older than that prefer to forget.

Transport imposes further issues that make email unsuitable for binary
without some form of encoding. An email message is a series of "lines",
which in turn is defined as 0-998 7-bit values followed by 0x0c 0x0a (CR
LF) as a newline marker. Different systems have different newline
markers; for example, UNIX uses 0x0a without 0x0c. Other system-dependent
issues can also interfere with the transport of unencoded binary in email.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

Sascha Wilde

unread,
Nov 17, 2009, 4:04:10 AM11/17/09
to
Mark Crispin <m...@panda.com> wrote:
> Read RFC 2822 and 5322 more carefully. Both explicitly define
> US-ASCII as being the range from 0x01 - 0x7f. For example, in RFC
> 5322:
>
> At the most basic level, a message is a series of characters. A
> message that is conformant with this specification is composed of
> characters with values in the range of 1 through 127 and interpreted
> as US-ASCII [ANSI.X3-4.1986] characters. [...]

Many thanks for your elaborated and insightful answer. And for reading
the more current RfCs to me. I have to admit, that I'm more familiar
with 822 and only skimmed briefly through 2822 and 5322 to verify my
interpretation -- as it turns out much too briefly, sorry.

Thanks again for your reply and the bunch of valuable extra information
in it!

cheers
sascha
--
Sascha Wilde

Well, *my* brain likes to think it's vastly more powerful than any
finite Turing machine but it hasn't proven that to me...
-- Christopher Koppler in comp.lang.lisp

Grant Taylor

unread,
Nov 19, 2009, 8:58:53 PM11/19/09
to
On 11/16/2009 9:10 AM, Sascha Wilde wrote:
> Actually I would really be happy if somebody could proof me wrong...

It is nice to see someone else that can get unhappy if they are right.

Grant. . . .

0 new messages