we have somewhat unorganized discussions in at least two bugs on how
URLs with localized content should look like on mozilla.com.
https://bugzilla.mozilla.org/show_bug.cgi?id=314119 and
https://bugzilla.mozilla.org/show_bug.cgi?id=336338. If you have more,
please add them in your replies. Replies should go to the newsgroup
only, I CCed a bunch of people to make sure they get the initial
invitation. I didn't CC anyone from IT, I assume they read .planning, as
well as many others who participated in the bugs.
I'd like to gather a few requirements that I have seen in the bug, and
that I draw from what I consider the experience of hosting the
mozilla-europe.org content as well as the startpage snippets.
Please add your bullet points or comments, both from a architecture as
from an implementation point of view.
Bullets:
* hosted on mozilla.com
* locale should be part of URL [1]
* non-existing content should fall back gracefully [2]
[1] is actually an open question. What's the locale? There have been
concerns about using the user agent, is the value of
general.useragent.locale good enough? Or do we need a build config
setting, maybe something additional in browserconfig.properties?
[2] We should try to keep the amount of localized web content to a
realistic amount. We have a really disappointing experience with
managing changesets across a bunch of locales, both on
mozilla-europe.org and for the snippets to be included in
ab-CD.start.mozilla.com/firefox. The solution here is to both cut down
on the number of languages that actually localize content, and to make
the content itself optional and fall back without 404s.
For fallback, there are two options, fallback to en-US hard, or use
accept_lang as an intermediate step. This step would be good for the
bunch of minority languages we get, as the localizations have usually
set the majority language in the localization to be part of
intl.accept_languages, thus giving us a better user experience over all.
I'm pretty pessimistic about maintaining these redirects by hand, too. I
think this time I'm allowed to be, because I don't see another lap than
mine to land this on, and I expect that to be error prone, and a waste
of cycles that I could spend adding more value to the project.
Are there bullets or arguments in other peoples heads? I'm pretty sure
there are, please let them be read.
Axel
Thanks for consolidating this discussion.
On Jul 4, 2006, at 10:35, Axel Hecht wrote:
> we have somewhat unorganized discussions in at least two bugs on how
> URLs with localized content should look like on mozilla.com.
>
> https://bugzilla.mozilla.org/show_bug.cgi?id=314119 and
> https://bugzilla.mozilla.org/show_bug.cgi?id=336338. If you have more,
> please add them in your replies
See also 340629.
> Bullets:
>
> * hosted on mozilla.com
or mozilla.org, as the case may be.
> * non-existing content should fall back gracefully [2]
>
> [2] We should try to keep the amount of localized web content to a
> realistic amount. We have a really disappointing experience with
> managing changesets across a bunch of locales, both on
> mozilla-europe.org and for the snippets to be included in
> ab-CD.start.mozilla.com/firefox. The solution here is to both cut down
> on the number of languages that actually localize content, and to make
> the content itself optional and fall back without 404s.
I'm not sure that localizing less content is the right answer to what
sounds essentially like a process problem.
I have no doubt that you're right about the disappointing state of
change management, but I would like to at least try to improve that
process before we accept a degraded user experience for all of those
locales.
> For fallback, there are two options, fallback to en-US hard, or use
> accept_lang as an intermediate step. This step would be good for the
> bunch of minority languages we get, as the localizations have usually
> set the majority language in the localization to be part of
> intl.accept_languages, thus giving us a better user experience over
> all.
> I'm pretty pessimistic about maintaining these redirects by hand,
> too. I
> think this time I'm allowed to be, because I don't see another lap
> than
> mine to land this on, and I expect that to be error prone, and a waste
> of cycles that I could spend adding more value to the project.
I tend to agree that a manually-maintained redirection table is a non-
starter, no matter whose lap it falls on. I think we could live with
a single top-level redirection -- ie, all ab-CD content gets
redirected to ef-GH -- but a piecemeal approach, some pages localized
and some not, sounds like a disaster.
Otherwise, I think you hit the important points. I don't have much
to add.
-Phil
It's not merely a process problem. Localizing full-text webpages has a
slightly different gene-pool-requirement than localizing applications.
That is, localizing web content may feel more painful to some of our
localizers and may take away resources that could be better spent on the
app itself.
On top of that, I do expect that the big locales will find resources,
either inside the localization team itself or in its vicinity to
localize some web content. Usually, we get more offers than we need. But
when it comes down to small teams on scarce resources, I'd rather have
them spend three hours on fixing dialog sizes or the like than do yet
another webpage that only a fraction of the users will see.
>> For fallback, there are two options, fallback to en-US hard, or use
>> accept_lang as an intermediate step. This step would be good for the
>> bunch of minority languages we get, as the localizations have usually
>> set the majority language in the localization to be part of
>> intl.accept_languages, thus giving us a better user experience over all.
>> I'm pretty pessimistic about maintaining these redirects by hand, too. I
>> think this time I'm allowed to be, because I don't see another lap than
>> mine to land this on, and I expect that to be error prone, and a waste
>> of cycles that I could spend adding more value to the project.
>
> I tend to agree that a manually-maintained redirection table is a
> non-starter, no matter whose lap it falls on. I think we could live
> with a single top-level redirection -- ie, all ab-CD content gets
> redirected to ef-GH -- but a piecemeal approach, some pages localized
> and some not, sounds like a disaster.
Yeah, I'm not so fond of fragments either, but usually that's what
happens to some locales over time. Having a staging server for the
content helps mozilla-europe.org a good deal for not adding fragmentary
translations, but for locales not catching up on changes, there's little
constructive one can do. At least, we haven't found something yet.
> Otherwise, I think you hit the important points. I don't have much to add.
Good, thanks for the feedback
Axel
On Jul 5, 2006, at 9:14, Axel Hecht wrote:
> It's not merely a process problem. Localizing full-text webpages
> has a slightly different gene-pool-requirement than localizing
> applications. That is, localizing web content may feel more painful
> to some of our localizers and may take away resources that could be
> better spent on the app itself.
> On top of that, I do expect that the big locales will find
> resources, either inside the localization team itself or in its
> vicinity to localize some web content. Usually, we get more offers
> than we need. But when it comes down to small teams on scarce
> resources, I'd rather have them spend three hours on fixing dialog
> sizes or the like than do yet another webpage that only a fraction
> of the users will see.
OK, that's fair. But for the localization teams that have the
bandwidth, it sounds like we are in agreement that we would like to
accommodate that content?
Given that some web content is more important than others -- the
start page and first-run page, say, which practically everyone will
see at least once -- and that many teams may not have the bandwidth
the localize the others, it seems like the discussion below about
piecemeal content becomes all the more relevant.
> Yeah, I'm not so fond of fragments either, but usually that's what
> happens to some locales over time. Having a staging server for the
> content helps mozilla-europe.org a good deal for not adding
> fragmentary translations, but for locales not catching up on
> changes, there's little constructive one can do. At least, we
> haven't found something yet.
If that's the case (I'm not surprised), then I propose to flesh out
your requirement along the lines of:
* non-existent content should fall back gracefully, either
- automatically, based on accept-languages; or
- with not more than one top-level redirect per locale to catch un-
localized pages; and
- using en-US as content of last resort
So that the server functions roughly as either:
if the page exists for the user's locale:
display the page and exit
for each language in accept-languages:
if the page exists for that language:
display the page and exit
display the en-US page
or:
if the page exists for the user's locale:
display the page and exit
if there is a top-level redirect for missing pages in this locale:
if the page exists for the redirected language:
display the page and exit
display the en-US page as a last resort
I'd love to hear from someone familiar with the infrastructure we're
talking about; are either of these proposals realistic?
-Phil
I'd say "strongly recommended" is a good attitude for us to have here.
> Given that some web content is more important than others -- the start
> page and first-run page, say, which practically everyone will see at
> least once -- and that many teams may not have the bandwidth the
> localize the others, it seems like the discussion below about piecemeal
> content becomes all the more relevant.
Start page is not really on mozilla.com, we don't need to bother about
that one. And the snippets are already part of the requirements we have,
as are a few other webpages not hosted on mozilla.com/.org.
See http://wiki.mozilla.org/L10n:Firefox_Extras for more details. I
totally forgot to forward that page to you, sorry for that.
>> Yeah, I'm not so fond of fragments either, but usually that's what
>> happens to some locales over time. Having a staging server for the
>> content helps mozilla-europe.org a good deal for not adding
>> fragmentary translations, but for locales not catching up on changes,
>> there's little constructive one can do. At least, we haven't found
>> something yet.
>
> If that's the case (I'm not surprised), then I propose to flesh out your
> requirement along the lines of:
>
> * non-existent content should fall back gracefully, either
> - automatically, based on accept-languages; or
> - with not more than one top-level redirect per locale to catch
> un-localized pages; and
> - using en-US as content of last resort
>
> So that the server functions roughly as either:
>
> if the page exists for the user's locale:
> display the page and exit
> for each language in accept-languages:
> if the page exists for that language:
> display the page and exit
> display the en-US page
>
> or:
>
> if the page exists for the user's locale:
> display the page and exit
> if there is a top-level redirect for missing pages in this locale:
> if the page exists for the redirected language:
> display the page and exit
> display the en-US page as a last resort
>
> I'd love to hear from someone familiar with the infrastructure we're
> talking about; are either of these proposals realistic?
Yeah, we need some IT wisdom here.
Axel
Both options are reasonable. The second sounds like it might take a
little more manual work (someone has to manage the redirects), but
either would get the job done.
Wil
On Jul 6, 2006, at 14:07, Wil Clouser wrote:
>> if the page exists for the user's locale:
>> display the page and exit
>> if there is a top-level redirect for missing pages in this locale:
>> if the page exists for the redirected language:
>> display the page and exit
>> display the en-US page as a last resort
>
> Both options are reasonable. The second sounds like it might take
> a little more manual work (someone has to manage the redirects),
> but either would get the job done.
I want to make sure that we're both saying the same things. We'd
like to avoid the fate of manually redirecting each and every missing
page, because that will quickly get out of hand.
So are you referring to the extra work of maintaining a single top-
level redirect (for example: any page missing from the es-AR/* tree
should fall back to its es-ES/* equivalent)?
Or are you saying that that's not possible, and we need to redirect
each page? In which case, I think option #2 comes off the table.
Thanks,
-Phil
I was talking about maintaining the single top level redirect. In other
words, deciding that non-existant es-AR pages map to es-ES and not es-XY
(just an example :).
The example you gave above should be no problem - we won't have to map
individual pages.
Wil
Ok, so should we just settle for
http://www.mozilla.com/ab-CD/path-for-english ?
As in, build automation would make the release notes link
http://www.mozilla.com/fy-NL/firefox/releases/
which would fallback to
http://www.mozilla.com/nl/firefox/releases/, which would fallback to
http://www.mozilla.com/firefox/releases/,
given that fy-NL has
intl.accept_languages=fy-nl, nl, en-us, en
That'd be pretty sweet.
That'd leave only one question open, from a build perspective, it may be
easier to make the links go to mozilla.com/en-US, but it's probably only
slightly easier, and may not be worth the server trouble.
Axel
Hey, I haven't forgotten about this. The URL scheme above would work
great but I'm a little worried about performance. I wrote some proof of
concept code and it's a bit slow (about 25% of the speed of flat files).
I sent a couple of questions to IT and I'm waiting for a response.
Wil
Are you using mod_rewrite rules or PHP? For this stuff, I think we'd
be a lot faster with mod_rewrite, though we would probably want to
generate the rules rather than maintain them by hand.
Mike
Alternatively, if we're careful enough with how we put it together, we
could use 404 handlers to incur basically zero overhead for the most
common cases (for which we will have content).
That would let us optimize "downgrade" cases with symlinks as well, if
we had a strange hotspot.
Mike
That sounds attractive to me.
Axel
The numbers above are using mod_rewrite. justdave suggested looking at
apache's "MultiViews" so I'm going to run some numbers using that next.
Wil
https://bugzilla.mozilla.org/show_bug.cgi?id=343203 is on this list, too.
Wil, got any new numbers to share with us?
Axel
Hi Axel,
It looked into Apache's Multiviews, but it doesn't sound like that is
going to do what we'd like (namely, we're not interested in picking up
on the accept-lang header so much as using a paremeter in the URI).
Regardless, it looks like we're going in a different direction at the
moment. Mozilla Europe (http://mozilla-europe.org/) has already
implemented l10n on their site and I'm working with them and morgamic to
evaluate if their site layout and CMS would work for us. It sounds like
we'll be incorporating a lot of their efforts in the end product (it's
based on URL parameters and gettext).
Wil
Yeah, that's what we're doing with various AMO URLs. Where it's
static content, I suspect we'll use 404 handlers to do the rarer
fallback cases, and let the common static cases scream through the
serving-a-file-from-disk fast path.
I'm not sure what the CMS system's characteristics are, but I suspect
it won't be in play on AMO, so I haven't really been that motivated to
dig deeply into it.
Mike