••UTF8 slugs & domain names- Are we ready?••

94 views
Skip to first unread message

infograf768

unread,
Nov 17, 2009, 10:35:10 AM11/17/09
to Joomla! CMS Development
See
http://search.conduit.com/ResultsExt.aspx?ctid=CT2336579&SearchSource=3&q=first+Arabic+language+internet+domain
or simply http://www.idnnews.com/?p=9550

Not only slugs can be utf8 using percent encodings (see the 1.5 system
plugin I released recently) , but since last week's ICANN decision,
new utf8 domain names are to be created soon. Egypt takes the lead.

Is Joomla ready ?
Not that I can see.

I tested using an external Link menu item entering what we would have
to deal quite soon.
http://نطاقي.مصر
which basically means
http://my domain.egypt

And I get the following slug
http://xn--mgb5a8anx.xn--wgbh1c/
instead of percent encoding
http://%D9%86%D8%B7%D8%A7%D9%82%D9%8A.%D9%85%D8%B5%D8%B1

That is obviously an approximative translation (as I have no idea how
rtl domain names will look like, but the issue is present and I guess
we better deal with it sooner than later.

1. By implementing a global configuraion parameter letting user choose
between tranlsiteration and utf8 slugs
2. By allowing Joomla to use utf8 domain names

Any volunteer to help this done, at least in 1.6 ?

Andrew Eddie

unread,
Nov 18, 2009, 3:27:22 AM11/18/09
to joomla-...@googlegroups.com
If there is anyone that can help with this, please contact us. This
is a very important issue that we need solved and we really need some
fair-dinkum, committed developers to get some traction on this and
other transliteration issues.

Regards,
Andrew Eddie
http://www.theartofjoomla.com - the art of becoming a Joomla developer




2009/11/18 infograf768 <infog...@gmail.com>:

Ole Ottosen (ot2sen)

unread,
Nov 18, 2009, 3:44:25 AM11/18/09
to joomla-...@googlegroups.com
Agree.
We need to pay attention to this fact that ICANN did approve after such long preparation.
 
With joomla 1.6 in alpha stage we do have a golden chance of being prepared for what is now reality.
Was at my radar http://twitter.com/ot2sen/status/5315336849 and certainly would repeat Andrew call for help.
 
Anyone from the wider third party development community having good insight is more than welcomed to share their skills and thought on this important matter.
 
Shoot, please :)
 
Ole

infograf768

unread,
Nov 18, 2009, 5:57:18 AM11/18/09
to Joomla! CMS Development
Looks like we have to separate the domain name issue with the percent
encoding slug
Contrary to what I wrote above (thanks Sam), we shall not expect
percent encodings for the domain names but punycode.
Looks like Joomla can handle that.
Exemple: http://tūdaliņ.lv/
gives
http://xn--tdali-d8a8w.lv/ (not nice to look in FF, but a IE
parameter lets see the original utf8)
We have no experience yet with the last part of the name i.e the
".something" .
Remains to test this all over J. Only way to do it seems to be to get
such a domain.
For the slugs i.e after the domain url, it's just a matter of
implementing the choice in global config.

On Nov 18, 9:44 am, "Ole Ottosen (ot2sen)" <ot2...@gmail.com> wrote:
> Agree.
> We need to pay attention to this fact that ICANN did approve after such long
> preparation.
>
> With joomla 1.6 in alpha stage we do have a golden chance of being prepared
> for what is now reality.
> Was at my radarhttp://twitter.com/ot2sen/status/5315336849and certainly
> would repeat Andrew call for help.
>
> Anyone from the wider third party development community having good insight
> is more than welcomed to share their skills and thought on this important
> matter.
>
> Shoot, please :)
>
> Ole
>
> On Wed, Nov 18, 2009 at 9:27 AM, Andrew Eddie <mambob...@gmail.com> wrote:
>
> > If there is anyone that can help with this, please contact us.  This
> > is a very important issue that we need solved and we really need some
> > fair-dinkum, committed developers to get some traction on this and
> > other transliteration issues.
>
> > Regards,
> > Andrew Eddie
> >http://www.theartofjoomla.com- the art of becoming a Joomla developer
>
> > 2009/11/18 infograf768 <infograf...@gmail.com>:
>
> > > See
>
> >http://search.conduit.com/ResultsExt.aspx?ctid=CT2336579&SearchSource...
> > > or simplyhttp://www.idnnews.com/?p=9550
>
> > > Not only slugs can be utf8 using percent encodings (see the 1.5 system
> > > plugin I released recently) , but since last week's ICANN decision,
> > > new utf8 domain names are to be created soon. Egypt takes the lead.
>
> > > Is Joomla ready ?
> > > Not that I can see.
>
> > > I tested using an external Link menu item entering what we would have
> > > to deal quite soon.
> > > http://نطاقي.مصر
> > > which basically means
> > >http://mydomain.egypt
>
> > > And I get the following slug
> > >http://xn--mgb5a8anx.xn--wgbh1c/<http://نطاقي.مصر/>

ndee

unread,
Nov 20, 2009, 11:09:45 AM11/20/09
to Joomla! CMS Development
Hi guys,

maybe I can help you with that. Tell me which domain you want (which
special chars it should have) and I will register one for testing
purposes. I can also setup a testinstallation if you want. Or just
give me the ip of the webserver it should point to.


On 18 Nov, 11:57, infograf768 <infograf...@gmail.com> wrote:
> Looks like we have to separate the domain name issue with the percent
> encoding slug
> Contrary to what I wrote above (thanks Sam), we shall not expect
> percent encodings for the domain names but punycode.
> Looks like Joomla can handle that.
> Exemple:http://tūdaliņ.lv/
> giveshttp://xn--tdali-d8a8w.lv/ (not nice to look in FF, but a IE
> > >http://www.theartofjoomla.com-the art of becoming a Joomla developer

infograf768

unread,
Nov 21, 2009, 1:24:24 PM11/21/09
to Joomla! CMS Development
We have no access yet to utf8 tlds although part of domains before the
tld are available.
i.e. one could get a "dûmanié.org", not a "dûmanié.рф", where .рф is
the future tld for Russia (accepted already by ICANN, not on the
market yet)

Testing 1.5.15 further, I found out that punycode is used
systematically when using http
i.e
http://www.domanié.рф,
whether in an external link or in an article, is rightly transformed
to
http://www.xn--domani-gva.xn--p1ai/
when displayed, which IMHO is real good, concerning JURI as well as
external links.

What fails miserably are emails, i.e
infograf@domanié.рф in an article
will give something like
infograf@domanié.рф
in an e-mail client and I guess similar descrepancies through J!.
(Evidently email cloak is totally killed)

(Percent encodings for the aliases is here not at stake and easy to
implement)
> > > >http://www.theartofjoomla.com-theart of becoming a Joomla developer

ndee

unread,
Nov 22, 2009, 6:05:43 PM11/22/09
to Joomla! CMS Development
aight, got it.

com_weblinks (J! 1.5.15) seems to be broken too - if you did not know
it already. If I ad link to http://höhö.at I get redirected to h%c3%b6h
%c3%b6.at instead of the right punycode xn--hh-fkab.at


On 21 Nov, 19:24, infograf768 <infograf...@gmail.com> wrote:
> We have no access yet to utf8 tlds although part of domains before the
> tld are available.
> i.e. one could get a "dûmanié.org", not a "dûmanié.рф", where .рф is
> the future tld for Russia (accepted already by ICANN, not on the
> market yet)
>
> Testing 1.5.15 further, I found out that punycode is used
> systematically when using http
> i.ehttp://www.domanié.рф,
> whether in an external link or in an article, is rightly transformed
> tohttp://www.xn--domani-gva.xn--p1ai/
> > > > >http://www.theartofjoomla.com-theartof becoming a Joomla developer

infograf768

unread,
Nov 23, 2009, 4:25:32 AM11/23/09
to Joomla! CMS Development
yes, we have to look into this weblinks issue too.
I guess we would have to prevent percent encodings for these urls.

Concerning emails:

1. Concerning internal Joomla mail functionnalities:
I hacked
function isEmailAddress($email)
in
libraries/joomla/mail/helper.php
and I now can get utf8 emails saved in db (the hack needs some look to
be sure no forbidden characters is accepted).
Now, what remains to do is to punyencode these before they get to
PHPmailer as it chokes on them.
Would be good to find all places in code where this has to be done.
We could use for the moment a simplepie class we have in core to
punyencode.
libraries/simplepie/idn/idna_convert.class.php

2. Concering mail adresses in content
That looks more complex. Some choices have to be made.
Do we let the utf mail address display and filter to always change the
href to punycode or systematically punyencode all before saving in db?
infôgraf@domanié.рф
gives
xn--inf...@xn--domani-gva.xn--p1ai

We could force to
<a href="mailto:xn--inf...@xn--domani-gva.xn--
p1ai">infograf@domanié.рф</a>
or
<a href="mailto:xn--inf...@xn--domani-gva.xn--p1ai">xn--infgraf-
v...@xn--domani-gva.xn--p1ai</a>

here is an online punycode translator
http://idn2.com/IDNTools/Punycode/Default.aspx

On 23 Nov, 00:05, ndee <andy.ta...@gmail.com> wrote:
> aight, got it.
>
> com_weblinks (J! 1.5.15) seems to be broken too - if you did not know
> it already. If I ad link tohttp://höhö.at I get redirected to h%c3%b6h
> > > > > >http://www.theartofjoomla.com-theartofbecoming a Joomla developer

Sam Moffatt

unread,
Nov 23, 2009, 4:48:51 AM11/23/09
to joomla-...@googlegroups.com
Are you sure that the email address username part needs to be punycode
encoded? I would have thought that would be different again with the
destination mail server working out where to route something. I
supposed it could use punycode but they could just as easily speak
straight UTF-8.

http://tools.ietf.org/html/rfc4952 and
http://tools.ietf.org/html/rfc5335 seem to imply that UTF-8 is going
to be used, at least encouraged for local parts. Doesn't seem to be a
clear defined standard on these yet especially given the punycode
decision. Is there any MTA that supports this stuff yet anyway?

Sam Moffatt
http://pasamio.id.au

infograf768

unread,
Nov 23, 2009, 5:32:31 AM11/23/09
to Joomla! CMS Development
I am not sure of anything Sam ;) That was the reason behind this
topic.
I guess that whatever is ruled, we may only punycode the domain part
by separating the username from it so as to have infôgraf@xn--domani-
gva.xn--p1ai for example.

As for the real world implementation for mail servers, we may have
some time in front of us, not as urgent maybe as the url matter.
Do we have to code these right now? Not sure. Just trying to forecast
issues and prepare solutions.

->I confirm the com_weblinks issue with utf tlds btw on FIREFOX. On
Opera it works OK. Weird...
test made with
http://tūdaliņ.lv/

On 23 Nov, 10:48, Sam Moffatt <pasa...@gmail.com> wrote:
> Are you sure that the email address username part needs to be punycode
> encoded? I would have thought that would be different again with the
> destination mail server working out where to route something. I
> supposed it could use punycode but they could just as easily speak
> straight UTF-8.
>
> http://tools.ietf.org/html/rfc4952andhttp://tools.ietf.org/html/rfc5335seem to imply that UTF-8 is going
> to be used, at least encouraged for local parts. Doesn't seem to be a
> clear defined standard on these yet especially given the punycode
> decision. Is there any MTA that supports this stuff yet anyway?
>
> Sam Moffatthttp://pasamio.id.au
>
> On Mon, Nov 23, 2009 at 7:25 PM, infograf768 <infograf...@gmail.com> wrote:
>
> > yes, we have to look into this weblinks issue too.
> > I guess we would have to prevent percent encodings for these urls.
>
> > Concerning emails:
>
> > 1. Concerning internal Joomla mail functionnalities:
> > I hacked
> > function isEmailAddress($email)
> > in
> > libraries/joomla/mail/helper.php
> > and I now can get utf8 emails saved in db (the hack needs some look to
> > be sure no forbidden characters is accepted).
> > Now, what remains to do is to punyencode these before they get to
> > PHPmailer as it chokes on them.
> > Would be good to find all places in code where this has to be done.
> > We could use for the moment a simplepie class we have in core to
> > punyencode.
> > libraries/simplepie/idn/idna_convert.class.php
>
> > 2. Concering mail adresses in content
> > That looks more complex. Some choices have to be made.
> > Do we let the utf mail address display and filter to always change the
> > href to punycode or systematically punyencode all before saving in db?
> > infôgraf@domanié.рф
> > gives
> > xn--infgraf-...@xn--domani-gva.xn--p1ai
>
> > We could force to
> > <a href="mailto:xn--infgraf-...@xn--domani-gva.xn--
> > p1ai">infograf@domanié.рф</a>
> > or
> > <a href="mailto:xn--infgraf-...@xn--domani-gva.xn--p1ai">xn--infgraf-
> >> > > > > >http://www.theartofjoomla.com-theartofbecominga Joomla developer

ndee

unread,
Nov 27, 2009, 4:44:00 PM11/27/09
to Joomla! CMS Development
Hi guys,

imho IDN support for domain names is more important as any browser >
IE6 supports them.

IDN emails (EAI, IMA) is another story, it seems that this is in an
very early stage. See: http://www.idnnews.com/?tag=idn-e-mail
Anyway, I registered an IDN domain and will test it with gmail, gmx,
my own postfix servers. But if the major ISP and email provider do not
support idn I do not think it should be addressed now?

Greets,
ndee

On 23 Nov, 11:32, infograf768 <infograf...@gmail.com> wrote:
> I am not sure of anything Sam ;) That was the reason behind this
> topic.
> I guess that whatever is ruled, we may only punycode the domain part
> by separating the username from it so as to have infôgraf@xn--domani-
> gva.xn--p1ai for example.
>
> As for the real world implementation for mail servers, we may have
> some time in front of us, not as urgent maybe as the url matter.
> Do we have to code these right now? Not sure. Just trying to forecast
> issues and prepare solutions.
>
> ->I confirm the com_weblinks issue with utf tlds btw on FIREFOX. On
> Opera it works OK. Weird...
> test made withhttp://tūdaliņ.lv/
>
> On 23 Nov, 10:48, Sam Moffatt <pasa...@gmail.com> wrote:
>
> > Are you sure that the email address username part needs to be punycode
> > encoded? I would have thought that would be different again with the
> > destination mail server working out where to route something. I
> > supposed it could use punycode but they could just as easily speak
> > straight UTF-8.
>
> >http://tools.ietf.org/html/rfc4952andhttp://tools.ietf.org/html/rfc53...to imply that UTF-8 is going
> > >> > > > > >http://www.theartofjoomla.com-theartofbecomingaJoomla developer

infograf768

unread,
Nov 28, 2009, 1:42:45 AM11/28/09
to Joomla! CMS Development
Thanks for the research.
This document replies to Sam's question.
"In the future, Internet users can use their native languages as
mailbox names to send and receive e-mail"
i.e. we will indeed have some emails like
xn--inf...@xn--domani-gva.xn--p1ai

As soon as these will be implemented, ISPs and all will have to adapt.
As 1.6 and subsequent versions will certainly have to deal with this,
I guess it would still be good to be prepared.
What we could maybe do is list all necessary tasks and where to
implement them in code.
1. Accept utf8 mails in clear from registered users (rather simple)
2. Verify that rss feeds from a IDN domain do work (as Simplepie
contains this idna_convert.class, it could already be solved)
3. Research if Filter to Punycode wherever we need to send mail from
Joomla will be necessary or if it is forecasted that the encoding will
be done by the mail apps in the same way urls are by browsers (with
some hics as stated above for FF)

On 27 Nov, 22:44, ndee <andy.ta...@gmail.com> wrote:
> Hi guys,
>
> > >http://tools.ietf.org/html/rfc4952andhttp://tools.ietf.org/html/rfc53...imply that UTF-8 is going

ndee

unread,
Nov 28, 2009, 1:11:49 PM11/28/09
to Joomla! CMS Development
Hi guys,

my findings.

1. email, local part without special chars domain with special chars
e.g. test@dömäin.at:
Send from GMX (http://www.gmx.net) -> Error 5.1.2: cannot resolve your
domain name
Send from my postfix 2.5.x -> Error 5.1.3 Bad recipient address syntax
Send form Gmail -> Error that there are unsupported chars in the mail
address

2. email, local part without s. chars, domain in punycode e.g.
te...@xn--dmin-moa0i.at:
works with all 3 above

3. local part with special chars e.g. löcäl...@test.at, löcäl@dömäin.at,
löcäl...@xn--dmin-moa0i.at
Does not work in any case. Tested with Thunderbird 2.0.0.23, Gmail,

4. local part punycode + domain punycode:
works with all 3 (gmail, postfix, gmx)

---
@infograf768
You are right, with encoding to punycode J! could also support IDN
emails as of today. Regarding rss feeds and weblinks, this should work
too as IDN domains had been around for some years now.

On Nov 28, 7:42 am, infograf768 <infograf...@gmail.com> wrote:
> Thanks for the research.
> This document replies to Sam's question.
> "In the future, Internet users can use their native languages as
> mailbox names to send and receive e-mail"
> i.e. we will indeed have some emails like
> xn--infgraf-...@xn--domani-gva.xn--p1ai
>
> As soon as these will be implemented, ISPs and all will have to adapt.
> As 1.6 and subsequent versions will certainly have to deal with this,
> I guess it would still be good to be prepared.
> What we could maybe do is list all necessary tasks and where to
> implement them in code.
> 1. Acceptutf8mails in clear from registered users (rather simple)
> > > >http://tools.ietf.org/html/rfc4952andhttp://tools.ietf.org/html/rfc53...that UTF-8 is going
> > > > to be used, at least encouraged for local parts. Doesn't seem to be a
> > > > clear defined standard on these yet especially given the punycode
> > > > decision. Is there any MTA that supports this stuff yet anyway?
>
> > > > Sam Moffatthttp://pasamio.id.au
>
> > > > On Mon, Nov 23, 2009 at 7:25 PM, infograf768 <infograf...@gmail.com> wrote:
>
> > > > > yes, we have to look into this weblinks issue too.
> > > > > I guess we would have to prevent percent encodings for these urls.
>
> > > > > Concerning emails:
>
> > > > > 1. Concerning internal Joomla mail functionnalities:
> > > > > I hacked
> > > > > function isEmailAddress($email)
> > > > > in
> > > > > libraries/joomla/mail/helper.php
> > > > > and I now can getutf8emails saved in db (the hack needs some look to
> > > > >> > We have no access yet toutf8tlds although part of domains before the
> > > > >> > > > > > > Not only slugs can beutf8using percent encodings (see the 1.5 system
> > > > >> > > > > > > plugin I released recently) , but since last week's ICANN decision,
> > > > >> > > > > > > newutf8domain names are to be created soon. Egypt takes the lead.
>
> > > > >> > > > > > > Is Joomla ready ?
> > > > >> > > > > > > Not that I can see.
>
> > > > >> > > > > > > I tested using an external Link menu item entering what we would have
> > > > >> > > > > > > to deal quite soon.
> > > > >> > > > > > > http://نطاقي.مصر
> > > > >> > > > > > > which basically means
> > > > >> > > > > > >http://mydomain.egypt
>
> > > > >> > > > > > > And I get the following slug
> > > > >> > > > > > >http://xn--mgb5a8anx.xn--wgbh1c/<http://نطاقي.مصر/>
> > > > >> > > > > > > instead of percent encoding
>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages