Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Is Accept-language an email header field?

1 view
Skip to first unread message

Graham Klyne

unread,
Apr 6, 2004, 11:57:55 AM4/6/04
to ietf...@imc.org, bli...@erols.com

Following its definition in RFC 3282, should the Accept-language header
field be considered to be an email header field, in the sense of being a
recognized extension of RFC 2822. (If so, what does it mean in the context
of an RFC2822 mail message?)

#g


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Keith Moore

unread,
Apr 6, 2004, 12:32:39 PM4/6/04
to Graham Klyne, mo...@cs.utk.edu, ietf...@imc.org, bli...@erols.com

I don't think it makes sense to use the accept-language field, as it's
currently defined, with email. partially this is because there's no clear
indication as to whose preferences are being described, and partially because
a reply to a message (the most likely use of accept-language) might go to the
author, the reply-to field, some subset of to and cc recipients, etc..

more generally, I don't think it makes sense to try to add descriptive
information about an address in a header field by using other fields
that don't explicitly reference that address. (note that these addresses
are sometimes changed in transit while leaving the other fields intact)

Keith

> Following its definition in RFC 3282, should the Accept-language header
> field be considered to be an email header field, in the sense of being a
> recognized extension of RFC 2822. (If so, what does it mean in the context
> of an RFC2822 mail message?)

--
Power corrupts; Powerpoint corrupts absolutely. - Vint Cerf

ned+ie...@mrochek.com

unread,
Apr 6, 2004, 1:39:26 PM4/6/04
to Keith Moore, Graham Klyne, mo...@cs.utk.edu, ietf...@imc.org, bli...@erols.com

> I don't think it makes sense to use the accept-language field, as it's
> currently defined, with email. partially this is because there's no clear
> indication as to whose preferences are being described, and partially because
> a reply to a message (the most likely use of accept-language) might go to the
> author, the reply-to field, some subset of to and cc recipients, etc..

> more generally, I don't think it makes sense to try to add descriptive
> information about an address in a header field by using other fields
> that don't explicitly reference that address. (note that these addresses
> are sometimes changed in transit while leaving the other fields intact)

Let me start by pointing out that the accept-language is currently in fairly
widespread use in email. A number of very popular clients generate these
headers and a fair number of automatic response agents honor them. Strong
customer demand led us to support them in the autoresponses our product
generates a few years ago, which means I have a fair amount of experience with
them and the problems they do and do not have.

As you might expect, I have observed a number of operational problems with
this header:

(1) Far and away the most common problem has been the absence of a single,
recommended field being defined for this purpose. As is so often the
case, this has led to a number of different fields being used, some
supported by some agents and others not. The fields I've observed
operationally are: X-Accept-language, Accept-Language and
Preferred-Language. I've observed X-Accept-Language to be the
most popular, but that's from a small and probably unrepresentative
sample.

(2) Lack of use of the field in all cases. Although several popular clients
generate the field, other popular clients do not. The result is that
sometimes you get nicely internationalized responses and other times
you don't. This violates the least astonishment principle, and leads
to frenzied attempts to guess the appropriate language to use from other
data, e.g. the domain of some address ends in .jp, use Japanese. These
other mechanisms don't work nearly as well, of course, and have been
known to cause more problems than they solve.

(3) There are a lot of languages out there, and internationalization
is expensive. Implementations have to decide what languages they want
to support - there's no way to support them all. Additionally, some
products are still caught up in the old localization mindset, where
you have to take extra steps to install "language support". This all
leads to sporadic support for properly internationalized responses,
which again violates the least astonishment principle.

(4) Support for language subtags is hard to get right. Cases exist
where it is better to fall back to English than to use the wrong
dialect.

(5) Charsets. Figuring out the right charset to use can be a problem since
support for utf-8 is nowhere near universal and there can be disparate
user communities that use the same language but written with different
charsets.

Notable by its absence from my list is Keith's concern that the lack of a
binding of this information to a specific address or addresses in the header
will lead the wrong language being used in some cases. Simply put, I have never
encountered a case where this has been a problem. The reality seems to be that
language choice is a fairly coarse thing, and that if the originator of the
message expresses a preference in the header of the message, it seems to work
at least as well as using some sort of implementation default (usually
English). Alternately, since I'm dealing with automatically generated responses
here, I suppose you could say I'm mostly using a binding to the MAIL FROM
address and finding that it works pretty well.

Another potential issue I haven't found to be a problem in practice is the
syntax of the field itself. The HTTP syntax for the field is rather complex and
allows for each value to have an attached weight. Weights make sense when there
are other factors to consider when deciding what document to return, but
they're just unnecessary complexity for email and I doubt that most agents that
look at these fields handle them properly. But this has been a nonissue in
practice - the only fields with weights in them I've ever seen have been ones
I've generated myself.

In summary, I think this is a case where we've let the best be the enemy of the
good in a fairly major way. Is there a potential problem with multiplicity of
addresses in the header and with there not being a way to attach language
preference information to each address? Sure, but in practice the biggest
problem with having a single header for this information has been the lack of
single, standardized field, which we could have fixed easily had we been able
to get past the binding issue.

Ned

Graham Klyne

unread,
Apr 7, 2004, 10:28:08 AM4/7/04
to Charles Lindsey, ietf...@imc.org

At 09:40 07/04/04 +0000, Charles Lindsey wrote:
>This raises a more general problem regarding headers that migrate from one
>medium to another.

I see no great difficulty here. Header fields can be listed under multiple
protocols so that these subtleties can be captured faithfully. The
registration document also notes that an entry may refer to multiple
specifications in cases like this:

[[
In some cases, the same field name may be specified differently (by
different documents) for use with different application protocols;
e.g. The Date: header field used with HTTP has a different syntax
than the Date: used with Internet mail. In other cases, a field name
may have a common specification across multiple protocols (ignoring
protocol-specific lexical and character set conventions); e.g. this
is generally the case for MIME header fields with names of the form
'Content-*'.

Thus, we need to accommodate application-specific fields, while
wishing to recognize and promote (where appropriate) commonality of
other fields across multiple applications. Common repositories are
used for all applications, and each registered header field specifies
the application protocol for which the corresponding definition
applies. A given field name may have multiple registry entries for
different protocols; in the Permanent Message Header Field registry,
a given header field name may be registered only once for any given
protocol. (In some cases, the registration may reference several
defining documents.)
]]
-- http://www.ietf.org/internet-drafts/draft-klyne-msghdr-registry-07.txt
section 2.2.1

In the case of Accept-Language, it is defined generically in RFC 3282, but
it wasn't clear to me that it is defined for use specifically with email,
hence my question here. It is defined for use with HTTP, and I had
originally anticipated that it would be included in the registry for use
with HTTP.

In summary, the registry gives us an opportunity to record the use of
common headers with several protocols with which they are applicable.

As for the designated expert mechanism: you are correct. But that should
not be seen as displacing discussion on the mailing lists appropriate for
the protocol(s) concerned, which is why I raised the matter of
Accept-Language *for email* on this list.

For netnews-defined header fields that are also used in email, I think it
is appropriate for the registration *as an email header field* to be
discussed by the email community. Personally, in this case, I think it
appropriate that the mail use registration be contained in a separate
document for which review by the email community is sought, said document
referencing Usefor for the technical specification.

#g
--

PS: while talking about the registry in general, I have recently revived
some report generation software I adapted for generating
registration-template XML2RFC source code and browseable HTML pages from
header field descriptions in RDF/N3. The most recent version of the
software is not yet documented, but the Python source code is on the web
[1]. An earlier version of the software is described [2], et link. I have
recently implemented a program to compile a more friendly form of "report
definition" (the current definition for header field registry generation is
[3]) into RDF/N3 (e.g. see [4]) for interpretation by the report
generator; the Haskell [8] source code of this compiler is at
[5][6][7]. More information about RDF/N3 (aka Notation3) can be found at
[9]. All this software is work-in-progress.

[1] http://www.ninebynine.org/Software/PythonN3/
(The main module is N3GenReport.py)

[2] http://www.ninebynine.org/Software/Intro.html#RDFReportGenerator
http://www.ninebynine.org/RDFNotes/RDFForLittleLanguages.htm

[3] Source of header registry report definition:
http://www.ninebynine.org/Software/HdrRegistry/GenHeaderRegistry.rep
Directory with related data:
http://www.ninebynine.org/Software/HdrRegistry/

[4] Compiled to Notation3 (not for human consumption [dogfood only?]):
http://www.ninebynine.org/Software/HdrRegistry/GenHeaderRegistry.n3

[5] http://www.ninebynine.org/Software/CompileRDF/
(Main program is RepToRDF.hs)
[6] http://www.ninebynine.org/Software/HaskellRDF/
[7] http://www.ninebynine.org/Software/HaskellUtils/

[8] Info about Haskell:
http://www.haskell.org/

[9] More info about RDF/Notation3:
http://www.w3.org/DesignIssues/Notation3.html
http://www.w3.org/2000/10/swap/Primer.html
http://infomesh.net/2002/notation3/
See also:
http://www.w3.org/2000/10/swap/doc/cwm.html


At 09:40 07/04/04 +0000, Charles Lindsey wrote:

>In <5.1.0.14.2.200404...@127.0.0.1> Graham Klyne
><g...@ninebynine.org> writes:
>
> >Notwithstanding the operational problems you mention, this suggests to me
> >that Accept-language should be included in the initial (permanent) registry
> >of email message headers, possibly carrying a warning about the operational
> >issues you note?
>
>This raises a more general problem regarding headers that migrate from one
>medium to another.
>
>AIUI, Accept-Language is already an official HTTP header. Therefore it
>will appear in the registry already, which itself should act as a strong
>hint that it is the preferred header-name for a similar feature in other
>media.
>
>But if you want to go further and list it under those other media, then
>you are required to refer to a defining document, and in such a case the
>defining document will say nothing about those other media. So is that
>allowed? I think the answer has to be that you go through the mechanism
>defined in Graham's RFC-to-be which is to ask the IESG (via their
>"designated expert"), following discussion in the "desginated email
>discussion list". Well, none of that machinery is set up yet, so I suppose
>discussing it on this list is the next best thing (BTW, could this list be
>designated for that purpose? I doubt the suggested some...@iana.org will
>receive many subscriptions.)
>
>A similar problem arises with the User-Agent header. Currently, this is
>defined for use in HTTP, but is quite widely used in email and news. In
>this case, the USEFOR WG has taken it on board and is defining it as an
>official News header, with a throwaway remark that "It is also intended
>that this header be suitable for use in Email".
>
>Now I have just been writing the registration templates for the USEFOR
>IANA Considerations section, so naturally the User-Agent header appears in
>it. But also, on the stregth of that throwaway remark, I have tentatively
>included is for both netnews and email. Opinions welcomed on that.
>
>--
>Charles H. Lindsey ---------At Home, doing my own
>thing------------------------
>Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web:
>http://www.cs.man.ac.uk/~chl
>Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU,
>U.K.
>PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4
>AB A5

Jacob Palme

unread,
Apr 8, 2004, 5:20:54 AM4/8/04
to IETF mailing list on MIME and e-mail

At 07.43 -0400 04-04-07, Keith Moore wrote:
>"Accept-language wasn't designed for email, but has been found to be useful
>as input into the generation of automatic replies. It should apply to the
>SMTP MAIL FROM or Return-Path address rather than some other address."

That sounds reasonable. Because Accept-Language will mostly
be used to control automatically generated responses, and
those should not be sent to all the members of a mailing
list. For one-to-one messages, the MAIL FROM and the From
mailbox is usually identical.
--
Jacob Palme <jpa...@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/

Charles Lindsey

unread,
Apr 8, 2004, 12:20:14 PM4/8/04
to ietf...@imc.org

In <5.1.0.14.2.200404...@127.0.0.1> Graham Klyne <GK-l...@ninebynine.org> writes:

>This header field [[[Accept-language]]] is commonly used in email, but some
>problems have been noted, including but not limited to: determination of
>the email address to which it refers; use of different field names names
>by some mail agents for the same purpose; lack of consistent recognition
>and use by receiving agents; cost and lack of effective
>internationalization of email responses; problems with interpretation of
>language subtags; problems determining what character set encoding should
>be used (UTF-8 is not universally supported).
>]]

I would remove the mention of different field names. The main purpose of
the Registry AISI is to discourage unnecessary proliferation of lots of
headers all doing the same thing. So whichever the registry endorses, that
effectively causes the others to be deprecated.

As to the problems in determining the address to which it refers (I
presume that particular issue does not arise in its HTTP usage), either it
is too severe a problem, in which case the proper procedure is for someone
to write a standards-track proposal, possibly introducing some additional
syntax; or else it is OK as it stands, in which case it should be
registered. If someone does produce such a draft, then of course it could be
registered in the provisional registry.

Graham Klyne

unread,
Apr 13, 2004, 1:07:27 PM4/13/04
to Charles Lindsey, ietf...@imc.org

At 10:51 08/04/04 +0000, Charles Lindsey wrote:

>In <5.1.0.14.2.200404...@127.0.0.1> Graham Klyne
><GK-l...@ninebynine.org> writes:
>
> >This header field [[[Accept-language]]] is commonly used in email, but some
> >problems have been noted, including but not limited to: determination of
> >the email address to which it refers; use of different field names names
> >by some mail agents for the same purpose; lack of consistent recognition
> >and use by receiving agents; cost and lack of effective
> >internationalization of email responses; problems with interpretation of
> >language subtags; problems determining what character set encoding should
> >be used (UTF-8 is not universally supported).
> >]]
>
>I would remove the mention of different field names.

Done.

>The main purpose of
>the Registry AISI is to discourage unnecessary proliferation of lots of
>headers all doing the same thing. So whichever the registry endorses, that
>effectively causes the others to be deprecated.


>As to the problems in determining the address to which it refers (I
>presume that particular issue does not arise in its HTTP usage), either it
>is too severe a problem, in which case the proper procedure is for someone
>to write a standards-track proposal, possibly introducing some additional
>syntax; or else it is OK as it stands, in which case it should be
>registered. If someone does produce such a draft, then of course it could be
>registered in the provisional registry.

HTTP usage is per-session, so the problem doesn't arise. But the suggested
text here is specifically with regard to email use.

#g

0 new messages