Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Will case of non-ascii character be preserved by IDNA?

1 view
Skip to first unread message

Jacob Palme

unread,
Jan 16, 2004, 1:29:41 PM1/16/04
to IETF mailing list on MIME and e-mail

Will case of non-ascii character be preserved by IDNA?

My understanding of the current standards is as follows:

If I send a message with the following header:
To: Maria <MA...@DEMO.NET>
Then the mail will arrive to all recipients with this header
exacly as when I sent it as shown above, i.e. in upper case.

If I send a message with the following header:
To: Maria <ma...@demo.net>
Then the mail will arrive to the same recipients with this header
exacly as when I sent it as shown above, i.e. in lower case.

If I send a message with the following header:
To: Maria <maria@münchen.net>
Then the mail will arrive to all recipients with this header
exacly as when I sent it as shown above, i.e. in lower case.
"münchen" will be translated to IDNA format, and then back
to user-friendly format at receipt again.

If I send a message with the following header:
To: MARIA <MARIA@MÜNCHEN.NET>
Then the mail will arrive to recipients in the following
format:
To: MARIA <MARIA@MüNCHEN.NET>
i.e. The upper case Ü in the original message has through
translation to IDMA format, and back again, been converted
to a lower case ü.

Is this right, or have I not correctly understood the IDMA
standard? Will, in fact, upper case Ü come to the recipients
as upper case Ü, which of course would be a little neater,
although this is surely not a very important issue?
--
Jacob Palme <jpa...@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/jpalme/

Simon Josefsson

unread,
Jan 16, 2004, 10:01:50 PM1/16/04
to ietf...@imc.org

Jacob Palme <jpa...@dsv.su.se> writes:

> Will case of non-ascii character be preserved by IDNA?

No. Characters (ASCII and non-ASCII) are case-folded into lowercase
by Nameprep, iff the string contain _any_ non-ASCII character.

> If I send a message with the following header:
> To: Maria <maria@münchen.net>
> Then the mail will arrive to all recipients with this header
> exacly as when I sent it as shown above, i.e. in lower case.
> "münchen" will be translated to IDNA format, and then back
> to user-friendly format at receipt again.

Headers cannot contain non-ASCII, so you must use IDNA before putting
the data in the header, but that's a technicality.

münchen.net => xn--mnchen-3ya.net

Thus, you will send:

To: Maria <ma...@xn--mnchen-3ya.net>

Which is translated, by the receiver, into:

To: Maria <maria@münchen.net>

> If I send a message with the following header:
> To: MARIA <MARIA@MÜNCHEN.NET>
> Then the mail will arrive to recipients in the following
> format:
> To: MARIA <MARIA@MüNCHEN.NET>
> i.e. The upper case Ü in the original message has through
> translation to IDMA format, and back again, been converted
> to a lower case ü.

Similar in this case; you invoke IDNA on the IDN and then put the
output in the header.

MÜNCHEN.NET => xn--mnchen-3ya.NET

Thus, you will send:

To: MARIA <MA...@xn--mnchen-3ya.NET>

Which is translated, by the receiver, into:

To: Maria <maria@münchen.NET>

Regards,
Simon

Adam M. Costello

unread,
Jan 17, 2004, 12:45:26 AM1/17/04
to IETF mailing list on MIME and e-mail

Jacob Palme <jpa...@dsv.su.se> wrote:

> Will case of non-ascii character be preserved by IDNA?

Simon gave a perfect explanation of what will happen in practice.

> Characters (ASCII and non-ASCII) are case-folded into lowercase by
> Nameprep, iff the string contain _any_ non-ASCII character.

That is indeed what will happen if the sender applies ToASCII, which is
almost certainly what all existing implementations of IDNA do.

In theory, a conformant IDNA implementation need not apply ToASCII,
but could instead apply some other operation that returns equivalent
results. The requirement in the IDNA spec is:

Whenever a domain name is put into an IDN-unaware domain name slot,
it MUST contain only ASCII characters. Given an internationalized
domain name (IDN), an equivalent domain name satisfying this
requirement can be obtained by applying the ToASCII operation to
each label...

In other words, ToASCII is sufficient, but is a little stricter than
necessary. For example, given the IDN München.Net, a sender that simpl=
y
uses ToASCII will convert it to xn--mnchen-3ya.Net, but it would also
be permissible to send XN--MNCHEN-3YA.Net or Xn--MnChEn-3Ya.NeT or
xn--Mnchen-3ya.Net. That last possibility is interesting because when
the receiver applies ToUnicode to it, the result will be München.Net.
Thus it would be trivial for senders to preserve the case of ASCII
letters without relying on any special cooperation from receivers.

Non-ASCII letters are another story. Although the underlying Punycode
encoding is capable of carrying case information, IDNA makes no use of
that capability (IDNA is complex enough without it). For a receiver
that uses ToUnicode to display IDNs (as all existing implementations
surely do), there is no way to make it output uppercase Ü.

In theory, receivers need not use ToUnicode, but could instead use some
other operation that returns equivalent results. The requirement in the
IDNA spec is:

ACE labels obtained from domain name slots SHOULD be hidden from
users... Given an internationalized domain name, an equivalent
domain name containing no ACE labels can be obtained by applying the
ToUnicode operation to each label.

Therefore, it is conceivable that someday a pair of operations
newToASCII and newToUnicode could be implemented that return results
equivalent to those of ToASCII and ToUnicode, but which work together to
preserve the case of non-ASCII letters. For example, given the label
MÜNCHEN, newToASCII might return xN--MNCHEN-3yA, which a receiver using
newToUnicode would display as MÜNCHEN, and which a receiver using the
old ToUnicode would display as MüNCHEN. All of these are equivalent
(IDNs are case-insensitive). Working out the details of newToASCII and
newToUnicode would be non-trivial.

But it's not clear to what extent case preservation is demanded, or
even expected. I've noticed that email addresses often get coerced to
all-caps, even the local part (in blatant disregard of the standards,
which say that local parts can be case sensitive).

AMC

Charles Lindsey

unread,
Jan 19, 2004, 7:16:40 AM1/19/04
to ietf...@imc.org

In <iluk73r...@latte.josefsson.org> Simon Josefsson <j...@extundo.com=
> writes:

>Similar in this case; you invoke IDNA on the IDN and then put the
>output in the header.

>MÜNCHEN.NET => xn--mnchen-3ya.NET

>Thus, you will send:

>To: MARIA <MA...@xn--mnchen-3ya.NET>

>Which is translated, by the receiver, into:

>To: Maria <maria@münchen.NET>

That doesn't seem right. How did the local-part maria come to be in lower
case?

--
Charles H. Lindsey ---------At Home, doing my own thing------------------=
------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.u=
k/~chl
Email: c...@clerew.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU=
, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4=
AB A5

Adam M. Costello

unread,
Jan 19, 2004, 8:56:26 AM1/19/04
to ietf...@imc.org

Charles Lindsey <c...@clerew.man.ac.uk> wrote:

> In <iluk73r...@latte.josefsson.org> Simon Josefsson


> <j...@extundo.com> writes:
>
> > Thus, you will send:
> >
> > To: MARIA <MA...@xn--mnchen-3ya.NET>
> >
> > Which is translated, by the receiver, into:
> >
> > To: Maria <maria@münchen.NET>
>
> That doesn't seem right. How did the local-part maria come to be in
> lower case?

Good catch. That must have been a typo. IDNA has nothing to say about
(and no effect on) local parts.

AMC

Simon Josefsson

unread,
Jan 19, 2004, 10:15:52 AM1/19/04
to ietf...@imc.org

"Charles Lindsey" <c...@clerew.man.ac.uk> writes:

> In <iluk73r...@latte.josefsson.org> Simon Josefsson <jas@extundo.c=


om> writes:
>
>>Similar in this case; you invoke IDNA on the IDN and then put the
>>output in the header.
>
>>MÜNCHEN.NET => xn--mnchen-3ya.NET
>
>>Thus, you will send:
>
>>To: MARIA <MA...@xn--mnchen-3ya.NET>
>
>>Which is translated, by the receiver, into:
>
>>To: Maria <maria@münchen.NET>
>

> That doesn't seem right. How did the local-part maria come to be in low=
er
> case?

A cut and paste error by the author. Unfortunately, IDNA doesn't
protect against that.

Seriously, the local part and text name is not relevant in the
example. For completeness, though, the last header should have been:

To: MARIA <MARIA@münchen.NET>

I hope she is running a spam filter... (It is with mixed feelings I
have noticed that I get spam for non-ASCII domains.)

Regards,
Simon

0 new messages