Translatable: URLs

370 views
Skip to first unread message

Ingo Schommer

unread,
Apr 1, 2009, 9:49:47 PM4/1/09
to silverst...@googlegroups.com
Now that we're getting close to a releasable Translatable update,
the question remains on how we treat URLs in different languages -
this behaviour hasn't really changed much since 2.2.

There's different ways to determine the language through URLs:

1. Cookies/Sessions

The first time the website is accessed in a certain language, the language is stored for further requests. Having languages stored in session is hard to debug and bad SEO (different content under same URL)

2. ?locale= GET parameter 
On each page URL (automatically appended) - hard to work with, as you have to merge with other GET parameters. Most of our core and modules doesn't assume URLSegments with GET parameters

3. Subdomains for each language 
(en-US.mysite.com, de-DE.mysite.com) - these could then be handled similiar to the subsites module by setting a global locale, or rewriting the URLs to auto-include a ?locale GET parameter through Apache. Subdomains are hard to set up for most SS devs, might not even be possible on shared hosting, so not a feasible default? Also, subdomains might cause SSL and cookie problems.
 
4. Enforcing unique URLs
By default, appending locale to a page URL, e.g. "mypage-de_DE". This way, its clear which language a page refers to. This is how Translatable handles the uniqueness problem on trunk at the moment - I don't see this as the ideal solution.

5. Subfolders for each language: 
/de_DE/mypage, /en_US/mypage - this way, the same URLSegment "mypage" could be reused for different languages. This is imposing a URL structure on the system, but seems to be a widespread way of doing things. Would have to be tested for compatibility for the nestedurl work.

In "HTTP speak", you can either see each translation as a separate "resource", and argue that it should have its own URI. Or you can see it as the same resource with a different modifier, after all there might be some global properties which are not translated. In this case a GET parameter denoting the language seems feasible?

I tend towards #5 as a default, with #4 as an alternative fallback. #2 should work regardless, if just for debugging purposes. #3 should be a recipe at most, doesn't need to be baked into core I think.

In terms of using fully qualified locales (de_DE) instead of short language codes (de), I think
we should provide a setting to use short language codes instead for subfolders. The mapping
is of course ambiguous (why is de->de_DE, and not de->de_AT?). I've added some good
defaults for this from unicode.org in i18n::$likely_subtags - its a public property,
so devs could override it with their own preferences.

Thoughts? Best practices from other frameworks?
Interesting articles where developers went down one path
or the other?


-------
Ingo Schommer | Senior Developer
SilverStripe

Skype: chillu23

Sigurd Magnusson

unread,
Apr 1, 2009, 10:23:41 PM4/1/09
to silverst...@googlegroups.com
This seems on the right track. To further your point about #3. While
it might be difficult to set up, this mode would be common on larger
production environments. A common way for this to be implemented would
be with regional TLDs. In other words www.mysite.com, www.mysite.de, www.mysite.fr
, etc. To me, it seems this is a needed 'mode' (either core or recipe)
but certainly not a default. To be able to use this system on a
development environment easily, you would want 5. (sub folders) to
work at the same time, so you didn't have to set up a complex DNS for
a local developer copy.

I agree #5 and #4 seem like good default modes that are easily to set
a site to. I don't see one as outright better or worse; personalities
and the scope of the project will determine which mode is more
relevant for that particular project.

Sigurd

Jamie Neil

unread,
Apr 2, 2009, 2:47:15 AM4/2/09
to silverst...@googlegroups.com
#5 gets my vote, although I could put up with using a GET (assuming it
is stateless) as it could probably be converted to #5 with some
rewrite rules.

Jamie

Johannes Weberhofer, Weberhofer GmbH

unread,
Apr 3, 2009, 6:23:21 AM4/3/09
to silverst...@googlegroups.com
Ingo,

I think #5 is the most useful way to handle languages. #2 whould work, too. When #2 and #5 works, it is an easy thing to append the "?locale=" parameter using a simple mod-rewrite rule to have the different languages in different (sub-)domains.
==
#1 should be avoided at all: Search machines handle cookies/sessions very bad: One langue's content overwrites the other languages (when you use cookies), and the user will not see the result in the same language as the search-machine.
When you handle session data using get-parameters, search machines will endless crawl your website and make it slow, because they get a new session ID each time they start crawling; Users will have the same problem as above.
==
It would be very usefull to allow usage of the language code only, because in many cases there will be no difference on the websites in - for example - de_DE and de_AT.
==
In another multilingual project I'm working on, we are analyzing the browsers language setting when the user come to the start-page without having the locale parameter set. E.g. A visit from http://www.test.com/ makes a temporary http-redirect 302 to http://www.test.com/LOCALE/. This has the big advantage, that (because Google and other crawlers are using locales, too), the search machines get redirected to the best starting-page, too.

Best regards,
Johannes
--


|---------------------------------
| weberhofer GmbH | Johannes Weberhofer
| information technologies
| Austria, 1080 Wien, Blindengasse 52/3
|----------------------------------------------------------->>

Sigurd Magnusson

unread,
Apr 20, 2009, 5:29:25 PM4/20/09
to SilverStripe Development
Ingo, what was committed or decided in the end, or is the jury still
out?

Ingo Schommer

unread,
Apr 20, 2009, 5:34:44 PM4/20/09
to silverst...@googlegroups.com
We're planning to implement subfolders, see http://open.silverstripe.com/ticket/3877.
Not going to happen for 2.3.2 though, as we have too much else on.

For now, we'll enforce unique URLs across languages to work around
appending ?locale properties to every URL (which will have its own share
of side-effects if it becomes a requirement for Director to find a page).

On 21/04/2009, at 9:29 AM, Sigurd Magnusson wrote:


Ingo, what was committed or decided in the end, or is the jury still
out?




Chris Bryer

unread,
May 4, 2011, 12:12:54 AM5/4/11
to silverst...@googlegroups.com
Hey guys,
I built a little module to handle the 'top level domains' approach sigurd mentioned, and the code is currently on github:

https://github.com/cbryer/Translatable-Domains

I had a brief conversation with Ingo and this may make a nice addition to the new translatable module but needs some further testing, review, and a little more development.

The module currently lets you register what locale each top level domain should use, then enforces the locale.  if a request is made for an english page on a german domain, it will try to find the german translation and show that if it exists and if it doesnt, it'll find the requested page and set the url to the correct top level domain. 

i also realized that i could test this configuration locally by setting up additional virtual hosts (localhost-de, localhost-fr, etc) to mimic different tld's, so i built that into the module and included directions to set it up as well.

I did a bit of research and google seems to recommend handling multi-lingual website urls by using different top level domains if you can afford all the domain registrations, primarily because the domain name has a country attached to it and users have a better idea what the language of the resulting page will be just from looking at the base url.  Google does recommend different strategies depending on target audiences and cost..  if you dont want to buy a bunch of domain names you may want to use subdomains if you want to target a global audience or you may want to use subdirectories if you have a regional audience that is multi-lingual.

I'm thinking that the subdomain approach is very similar to the tld approach i'm using and it'd just look at the beginning of the http_host instead of the end.  subdirectory support will be a little different, but i could ultimately see this module being able to support 3 different approaches for handling multilingual url's.

Thoughts?  i hope this comes in useful for other developers,
-Chris

Sigurd Magnusson

unread,
Jun 1, 2011, 10:26:48 PM6/1/11
to silverst...@googlegroups.com
Chris,

A belated reply given some weeks on vacation, but I just wanted to mention I enjoyed seeing the common sense being espoused by Google that you included in your email below;

"I did a bit of research and google seems to recommend handling multi-lingual website urls by using different top level domains if you can afford all the domain registrations, primarily because the domain name has a country attached to it and users have a better idea what the language of the resulting page will be just from looking at the base url."

The idea of setting domains by various means you've described makes sense; anyone care to offer their thoughts on the implementation done here?

Cheers,
Sigurd.

--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.

To post to this group, send email to silverst...@googlegroups.com.
To unsubscribe from this group, send email to silverstripe-d...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/silverstripe-dev?hl=en.



Rafał Raatz

unread,
Jun 2, 2011, 8:52:37 AM6/2/11
to silverst...@googlegroups.com
I have tried with the constructor "mysite.com / en / contact"and "mysite.com / de / contact". But does not work. Someone found a solution without digging in the root domain?

Chris Bryer

unread,
Jun 2, 2011, 9:26:30 PM6/2/11
to SilverStripe Core Development
Thanks Sigurd,
when i first researched multi-lingual approaches i wasnt sure if it
mattered to search engines how multi lingual urls were structured.
its generally recommended to use different country code top level
domains (ccTLD's like .fr, .de etc) but using sub-domains and sub-
directories are also acceptable, and there may be different
circumstances why people would prefer one over the other. i built a
small module, (https://github.com/cbryer/Translatable-Domains) that
handles ccTLD's, and the code could easily be re-written to handle
subdomains. i havent put any thought into how to handle
subdirectories though (not sure how it would relate or integrate with
nested URL controller), but its written to extend sitetree so it
doesnt hack the core. my original thought was there could be 3
different modules that could work with the new translatable module,
then Ingo mentioned that there is no dependency management and module
loader so if this module was added to translatable it would be
better. I'd love to get comments on this though. before it can be
integrated with translatable it needs more eyes on it and some unit
tests. i havent had a chance to pick this up again yet but am hoping
in a month or so i can do a little more on it.

i did notice (after i posted the code to github) that there is no
content-language metatag, so i extended the MetaTags function to add
that locale in.. im not sure if something is broken in 2.4.4 that
makes it not appear or if its just code that should be included. this
bug is causes problems with ss18n javascript.

anyways, i'd love to get some feedback on this module.

-Chris

Ingo Schommer

unread,
Jun 7, 2011, 5:26:56 PM6/7/11
to silverst...@googlegroups.com
Hey Chris,

Just for reference, I'm pasting my email to you from a while ago here -
maybe somebody else wants to pick up these suggestions and
turn them into patches? :)

i did notice (after i posted the code to github) that there is no 
content-language metatag, so i extended the MetaTags function to add 
that locale in..  im not sure if something is broken in 2.4.4 that 
makes it not appear or if its just code that should be included.  this 
bug is causes problems with ss18n javascript. 

<html lang=...> over <meta> tags.

On 3/05/2011, at 6:37 PM, Chris Bryer wrote:
hey guys,

I built a little module to handle the 'top level domains' approach
sigurd mentioned, the code is on github and i'll probably be
submitting it as a silverstripe module shortly:
https://github.com/cbryer/Translatable-Domains
The module lets you register what locale each top level domain should

use, then enforces the locale.  if a request is made for an english
page on a german domain, it will try to find the german translation
and show that if it exists and if it doesnt, it'll find the requested
page and set the url to the correct top level domain.  my customer has
several domains all with the same batch of tld's, and the module
handles this domain-masking situation pretty well, but if you have
mysite.de and anothersite.fr, it wont be able to handle a switch
between those two, it'll only handle a switch between mysite.de to
mysite.fr.

i also realized that i could test this configuration locally by
setting up additional virtual hosts (localhost-de, localhost-fr, etc)
to mimic different tld's, so i built that into the module and included
directions to set it up as well.


I'd leave the environment specific switches (isLocalhost()) out of the main logic,

and move that to mysite/_config.php which could set different url/locale maps

based on environments. Stuff like stripping out HTTP port information should

apply for all environments. I was about to suggest PHP's built-in parse_url()

for TLD detection, but then remember that it doesn't deal with double TLDs like co.uk heh.

Either way, I'd try to find a more solid way to parse URLs then handrolled regexes.


i did a bit of research and google seems to recommend handling multi-

lingual website urls by using different top level domains if you can
afford all the domain registrations, primarily because the domain name
has a country attached to it and users have a better idea what the
language of the resulting page will be just from looking at the base
url.  Google does recommend different strategies depending on target
audiences and cost..  if you dont want to buy a bunch of domain names
you may want to use subdomains if you want to target a global audience
or you may want to use subdirectories if you have a regional audience
that is multi-lingual.

Are you ensuring that each URL is canonical, or redirects appropriately? E.g. what happens if I call

mywebsite.de/mypage?Locale=fr_FR on a page with german and french language?

It'd have to ignore the Locale parameter, or redirect to the french version.

Same goes for multiple TLDs pointing to the same language -

maybe it should inject <link rel="canonical"> in SiteTree->MetaTags()?


It looks like the talk in this thread is suggesting one approach being
built into the translatable module, however i think this choice could
come in separate modules instead of baking it all into the
translatable module.

Given that Translatable is now a module, I don't see a reason why we can't

add this capability to the codebase. It'll need some unit testing,

and more peer review, but in general I see per-TLD routing

as quite essential for multilingual sites. If you're worried

about maintenance and commit access of this addition, I'm sure we find a workable solution.


Thanks!

Ingo

Chris Bryer

unread,
Jun 7, 2011, 10:04:52 PM6/7/11
to silverst...@googlegroups.com
Thanks Ingo,
My time has been split between iOS development and building a foxycart module for silverstripe lately, and i can probably get back to this in a month.  In the meantime, if anyone else wants to add patches that would be great.

regarding the $ContentLocale flag, thats awesome theres a fix for this.  I didnt see anything about that in any docs like http://doc.silverstripe.org/sapphire/en/topics/translation.  itd be a nice recipe to document.

I havent done anything with canonical urls, thats a good thought. regarding the locale parameter, the domain always overrides the parameters.  the one time that is a problem is when you click to view the published or draft site inside the cms in a non-default locale.. the cms always sticks inside the current TLD when viewing a page which will make you view the default locale's record instead.  any ideas on this?  should we make it so that using a ?locale param overrides the TLD or switches the domain to the correct one for the record?  any other options?

-Chris




--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.

Ingo Schommer

unread,
Jun 8, 2011, 7:34:07 PM6/8/11
to silverst...@googlegroups.com
Hey Chris,


On 8/06/2011, at 2:04 PM, Chris Bryer wrote:

Thanks Ingo,
My time has been split between iOS development and building a foxycart module for silverstripe lately, and i can probably get back to this in a month.  In the meantime, if anyone else wants to add patches that would be great.

regarding the $ContentLocale flag, thats awesome theres a fix for this.  I didnt see anything about that in any docs like http://doc.silverstripe.org/sapphire/en/topics/translation.  itd be a nice recipe to document.

I havent done anything with canonical urls, thats a good thought. regarding the locale parameter, the domain always overrides the parameters. 
Override in which way? Browser language detection? Google?
I'm not aware of any built-in domain logic to Translatable in 2.4,
so that shouldn't interfere with things.

the one time that is a problem is when you click to view the published or draft site inside the cms in a non-default locale.. the cms always sticks inside the current TLD when viewing a page which will make you view the default locale's record instead.  any ideas on this?  should we make it so that using a ?locale param overrides the TLD or switches the domain to the correct one for the record?  any other options?
You could influence this by customizing SiteTree->AbsoluteLink(). SilverStripeNavigatorItem (the class responsible for the bottom links)
already uses this method rather than RelativeLink() in 2.4.

Ingo

-Chris

Chris Bryer

unread,
Jun 9, 2011, 9:22:43 PM6/9/11
to silverst...@googlegroups.com
Hey Ingo,
 
Override in which way? Browser language detection? Google?
I'm not aware of any built-in domain logic to Translatable in 2.4,
so that shouldn't interfere with things.

sorry, my explanation was a little brief.  the module lets users associate locales with tld's.  the module's logic looks at the tld, and looks up the locale that pages in that domain should be in and enforces that locale.  if the returned record is not the correct locale, the module finds the correct translation and presents it.  adding the locale parameter returns a record that doesnt match the locale of the domain, and this module's logic sees this as an incorrect locale and again finds the record with the domain's locale.  i could make the module allow locale parameters, but i'm not sure if thats necessary..  the locale parameters have really been a way to do what this module is doing.  any thoughts on this?  i'm probably overlooking something here.


You could influence this by customizing SiteTree->AbsoluteLink().

perfect!  thanks for the insight!

-Chris



--
You received this message because you are subscribed to the Google Groups "SilverStripe Core Development" group.

Chris Bryer

unread,
Jun 21, 2011, 3:57:16 AM6/21/11
to SilverStripe Core Development
Hey Ingo,
I just pushed some updates to the translatable-domains code on
github.com/cbryer/Translatable-Domains. i removed the isLocalhost
switch, added some unit tests, and made the domain-locale registration
a little less specific so people could register virtual hosts or
domains in the same array with the same methods.

i did quite a bit of searching for a better way to find the tld's and
i havent found any great way to do it.. some scripts have a giant
list of acceptable tld's, but i just saw that ICANN will be opening up
a whole bunch of generic TLD's in 2012 (http://mashable.com/2011/06/20/
new-gtld-faq/), so i dont think those scripts will hold up long. the
best approach that i can think of is to compare part of the HTTP_HOST
with user-registered domains and find matches. the regex is fairly
simple and i boiled it down to only existing in one function now. the
regex is more flexible now as well, so people can register mysite.com
to use en_US and mygermansite.de to use de_DE, instead of registering
just the tld's.

I've tested the code in a localhost environment with unit tests that
simulate live url's, and will be testing it a little more tomorrow on
a staging server, but i'd love to get some thoughts on this if you or
anyone else has time.

-Chris
Reply all
Reply to author
Forward
0 new messages