agency_lang: two-letter ISO 639-1 to BCP 47

404 views
Skip to first unread message

Tom Brown

unread,
May 1, 2009, 8:01:49 PM5/1/09
to gtfs-c...@googlegroups.com
Today the spec says:
The agency_lang field contains a two-letter ISO 639-1 code for the primary language used by this transit agency.

I've heard people say that this is old fashioned and should be updated because ISO 639-1 alone can not represent many languages such as:
zh-Hant Traditional Chinese
es-VE (Venezuela)
es-AR (Argentina)
fil (Filipino)
sr-Latn (Serbian-written-with-Latin-letters)

This proposal is to change to BCP 47. Justifications:
- Let GTFS use the same language codes as other standards, in particular XML and HTML
- ISO 639-1 has no code for the majority of the world's languages; see above
- BCP 47 tags are a superset of ISO 639-1, so old GTFS files continue to be valid
- See http://www.w3.org/International/articles/bcp47/ for additional justification.

Here is the proposed text:
agency_lang, Optional -  The agency_lang field contains a <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">IETF BCP 47 language code</a> for the primary language used by this transit agency, for example <code>en</code> for English or <code>es-AR</code> for Spanish (Argentina).  BCP-47 are the language identifiers used in HTML and XML documents. Please refer to http://www.w3.org/International/articles/language-tags/ for an  introduction.

Are there any data providers who have had difficulty picking an agency_lang value because of the limitations of ISO 639-1? In any case, I think that xml and html uses a language tag for the same reason that GTFS has agency_lang and it won't hurt us to take advantage of their experience.

Tom B

unread,
May 19, 2009, 9:27:09 PM5/19/09
to Google Transit Feed Spec Changes
It may also be handy to link from GTFS spec to
http://cldr.unicode.org/index/cldr-spec/picking-the-right-language-code
because feed creators probably won't be language code experts.

On May 1, 5:01 pm, Tom Brown <tom.brown.c...@gmail.com> wrote:
> Today the spec says:
> The agency_lang field contains a two-letter ISO 639-1 code for the primary
> language used by this transit agency.
>
> I've heard people say that this is old fashioned and should be updated
> because ISO 639-1 alone can not represent many languages such as:
> zh-Hant Traditional Chinese
> es-VE (Venezuela)
> es-AR (Argentina)
> fil (Filipino)
> sr-Latn (Serbian-written-with-Latin-letters)
>
> This proposal is to change to BCP 47. Justifications:
> - Let GTFS use the same language codes as other standards, in particular XML
> and HTML
> - ISO 639-1 has no code for the majority of the world's languages; see above
> - BCP 47 tags are a superset of ISO 639-1, so old GTFS files continue to be
> valid
> - Seehttp://www.w3.org/International/articles/bcp47/for additional
> justification.
>
> Here is the proposed text:
> agency_lang, Optional -  The agency_lang field contains a <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt">IETF BCP 47 language code</a>
> for the primary language used by this transit agency, for example
> <code>en</code> for English or <code>es-AR</code> for Spanish (Argentina).
>  BCP-47 are the language identifiers used in HTML and XML documents. Please
> refer tohttp://www.w3.org/International/articles/language-tags/for an
Reply all
Reply to author
Forward
0 new messages