A proposal for how to encode locales in Ambuda's URLs. I'm sharing it for feedback from the group.
We want to make Ambuda avaible in a variety of different languages. (More technically, we could call these locales, which refer to a language-region pair. But for our purposes, a language and a locale are usually synonymous.)
It's useful to encode a locale in a page's URL. By doing so, we:
- give our users stable URLs that they can share with other people in their language community.
- help search engines index multi-lingual content.
- better indicate to users that our site is multi-lingual.
- improve our site's usability by giving users an obvious way to change the site language.
What options are there?
Broadly, there are
four ways to encode this information in the URL:
(1) is expensive since we need to buy each domain separately. It also doesn't fit well for India, where a single .in country code has at least dozens of associated languages. (4) is discouraged by Google, and it is much clumsier technically than (2) and (3).
Why I think we should use subdirectories
I prefer subdomains for a variety of reasons:
- They display nicely on mobile browsers: the user knows that they're on
en.ambuda.org specifically.
- They make the URL more navigable for common use cases. A user who wants to return to the main page can just replace the URL path with `/` as opposed to `/en/`.
- They don't have an SEO penalty. Google SEO is flexible if we provide the right sitemap, and they recommend subdomains along with subdirectories in their docs.
- I think they look nicer.
But here's why I think we should use subdirectories:
- They're much simpler to support in the dev environment.
- They have a 1:1 mapping to subdomains, so it's easy to convert one structure to another if we ever choose to.
- If we ever choose to, we can migrate cheaply by adding redirects to the site.
Proposal
The splash page can prioritize Sanskrit and English. We can show other languages in their alphabetical order as done on a site like Wikipedia.
Other schemes I considered:
- We could list all languages alphabetically, but some languages are going to be much more common than others. In particular, a simple alphabetical sort will list all Western languages before all Indian ones, which is poor UX given that roughly 60% of our users are from India.
- We could split languages into two categories ("Indian" and "International"), but that raises further questions. For example, Urdu is a scheduled language in India, but it is also the national language of Pakistan. With an Indian/International scheme, it's not obvious how to list Urdu.