Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Semantics of empty quoted strings (RFC 2822)

5 views
Skip to first unread message

Stephan Bergmann

unread,
Jun 13, 2007, 4:35:10 AM6/13/07
to
Hi all,

Should one be fuzzy about any differences between

From: m...@example.com
From: <m...@example.com>
From: "" <m...@example.com>

or do the three lines contain the same information, just formatted
differently? RFC 2822 "3.2.5. Quoted Strings" states that semantically
the quote characters do not belong to the quoted string (so the second
and third line would have the same semantics). However, the distinction
between second and third line could be used to disambiguate whether a
name-addr contains an optional display-name, even if the textual content
represented by the display-name happens to be the empty string.
(Admittedly a /very/ fine hair to split...)

-Stephan

Mark Crispin

unread,
Jun 13, 2007, 8:51:25 AM6/13/07
to
On Wed, 13 Jun 2007, Stephan Bergmann wrote:
> From: m...@example.com
> From: <m...@example.com>
> From: "" <m...@example.com>

For the purposes of reply, all three are the same.

Semantically, each of these are different.

The first two are effectively the same thing but formatted differently.
The reason for the second syntax was to allow route-addr without a leading
display-name. Now that route-addr is fully deprecated by RFC 2822 ("It's
dead! The corpse stinks! Bury it!"), there is no reason to use the
second syntax at all. However, all software must handle it correctly if
it encounters it on input.

The third syntax is, distinctly, "display-name is empty string" as opposed
to "no display-name". However, IMHO, this is a difference that makes no
difference; most people treat these two cases as the same. If you want to
uphold the specification strictly, you'll treat these as separate cases,
but that begs the question: to what purpose? Your interface to the human
user will probably not make any distinction; and a lot of other software
won't do so either.

-- Mark --

http://panda.com/mrc
Democracy is two wolves and a sheep deciding what to eat for lunch.
Liberty is a well-armed sheep contesting the vote.

Randolf Richardson

unread,
Aug 8, 2007, 10:44:29 AM8/8/07
to
On Wed, 13 Jun 2007 01:35:10 -0700, Stephan Bergmann
<stephan....@sun.com> wrote:

> Hi all,
>
> Should one be fuzzy about any differences between
>
> From: m...@example.com
> From: <m...@example.com>
> From: "" <m...@example.com>
>
> or do the three lines contain the same information, just formatted

The first example is simply an "addr-spec" address. The rest include an
"angle-addr" formatted address, with optional display-name (albeit an
empty one).

> differently? RFC 2822 "3.2.5. Quoted Strings" states that semantically
> the quote characters do not belong to the quoted string (so the second
> and third line would have the same semantics). However, the distinction
> between second and third line could be used to disambiguate whether a
> name-addr contains an optional display-name, even if the textual content
> represented by the display-name happens to be the empty string.
> (Admittedly a /very/ fine hair to split...)

When parsing this sort of data, in my own code I preserve the quotation
marks and pass it along to the users, especially since the following is
also valid because a "phrase" (which is merely "display-name") may be
comprised of 1 or more ["word" and/or] "quoted-string" rules according to
section 3.2.6 of RFC 2822 (Miscellaneous tokens):

From: """" <m...@example.com>

As I understand the syntax, this is not a set of quotation marks being
quoted, rather this is two pairs of quotation marks, hence the intended
quoted text should probably occur between the first two and/or last two
quotation marks (but the specifications don't detail this, although it's a
safe bet that nesting isn't supported since comments do support nesting).

In another example, this is also valid:

From: "abc"def"ghi" <m...@example.com>

Only "abc" and "ghi" are quoted-stings while "def" is merely an atom.

I hope that helps.

--
Randolf Richardson - kingpi...@lumbercartel.ca
The Lumber Cartel, local 42 (Canadian branch)
http://www.lumbercartel.ca/

D. Stussy

unread,
Sep 28, 2007, 12:26:44 AM9/28/07
to
Mark Crispin wrote:
> The third syntax is, distinctly, "display-name is empty string" as opposed
> to "no display-name". However, IMHO, this is a difference that makes no
> difference; most people treat these two cases as the same. ...

Maybe for you, it's the same. However, so many spammers have failed
to properly set up their spamware, this construct has leaked out often
enough to be noticable as a tell-tale sign of a spam message. My
system kills null-display-name messages on sight as spam - without
exception.

Mark Crispin

unread,
Sep 28, 2007, 11:37:39 AM9/28/07
to

That works real well until you lose a message from a clueless family
member whose mailer was not properly set up.

Some SMS->email gateways also do this. So a false positive may be your
little sister sending you email from her cell phone.

What you're filtering against is a clueless sender, not a spammer.
There's an overlap between the two, but you'll have both false positives
and false negatives. As annoying as spam may be, false positives from a
spam filter are far more annoying.

0 new messages