names of locales

Helmut Richter

unread,

Aug 7, 1996, 3:00:00 AM8/7/96

to

POSIX 1003.1 recommends (but does not prescribe) to use the following
syntax of locale names: language_TERRITORY.Code, e.g.:

de_AT.ISO8859-1
hu_HU.ISO8859-2
ja_JP.AJEC

It looks as if it were meant that abbreviations (e.g. just "de") are
allowed. On the system where I write this note (Sun Solaris 2), not
one locale name is according to this syntax. They have:

C de es iso_8859_1 sv
POSIX en_US fr it

A portable program should therefore consider variants in the names of
locales. This ranges from simple spelling problems (e.g. upper/lower
case) to complicated decisions (somebody requesting the non-existent
de_AT.ISO8859-1 is probably better off with de_DE.ISO8859-1 than with
ja_JP.AJEC). Here are my questions:

1. Is an implementation of the "setlocale" function POSIX conformant
if it makes such decisions? The answer depends on whether a
"supported" locale is one that has a definition file or is one
than happens to be accepted by the setlocale function.

2. (Not a standards question!) What is a reasonable solution for the
poor programmer who wants to write a program that works not only
in places where different locales are valid but also on systems
where different locale naming conventions are used?

Looking forward to any hint.

Helmut Richter

Antoine Leca

unread,

Aug 7, 1996, 3:00:00 AM8/7/96

to Helmut....@lrz-muenchen.de

Helmut Richter wrote:
>
> POSIX 1003.1 recommends (but does not prescribe) to use the following
> syntax of locale names: language_TERRITORY.Code, e.g.:
>

I thought _TERRITORY and .codeset parts was optionnal, but I don't have
actually read POSIX.1 original text.

> de_AT.ISO8859-1
> hu_HU.ISO8859-2
> ja_JP.AJEC
>
> It looks as if it were meant that abbreviations (e.g. just "de") are
> allowed. On the system where I write this note (Sun Solaris 2), not
> one locale name is according to this syntax. They have:
>
> C de es iso_8859_1 sv
> POSIX en_US fr it
>

C and POSIX are required outer of the rule you point: "C" is the default
locale for ISO/IEC 9899 standard (ANSI C, if you want), and "POSIX" is
the default locale for a conforming POSIX system.

If my above thinking is correct, all but iso_8859_1 are correct.

> A portable program should therefore consider variants in the names of
> locales. This ranges from simple spelling problems (e.g. upper/lower
> case) to complicated decisions (somebody requesting the non-existent
> de_AT.ISO8859-1 is probably better off with de_DE.ISO8859-1 than with
> ja_JP.AJEC). Here are my questions:
>
> 1. Is an implementation of the "setlocale" function POSIX conformant
> if it makes such decisions? The answer depends on whether a
> "supported" locale is one that has a definition file or is one
> than happens to be accepted by the setlocale function.

I don't know exactly about POSIX, but with respect to standard C, if you
specify a string other than the implementation-dependent set of allowed
ones, no change of locale occurs and NULL is returned.

You correctly point out that this set is not always the set of the
available definition files (or the "installed locales") (except if the
POSIX standard states this).

> 2. (Not a standards question!) What is a reasonable solution for the
> poor programmer who wants to write a program that works not only
> in places where different locales are valid but also on systems
> where different locale naming conventions are used?
>

Giving the existing base, I will try the following (for German in
Belgium):
de_BE.ISO8859-1
de_BE.iso8859-1
de_BE.iso88591
de_BE
De_BE AIX like, IBM 850 codepage
DEB Microsoft Win32-like, CP 1252
belgian-german HP-UX like, HP Roman8 set
nl_BE
Nl_BE AIX like, IBM 850 codepage
NLB Microsoft Win32, CP 1252
flemish HP-UX like, HP Roman8 set
fr_BE
Fr_BE AIX like, IBM 850 codepage
FRB Microsoft Win32, CP 1252
wallon HP-UX like, HP Roman8 set
de
de_DE
De_DE AIX, IBM 850 codepage
DEU Microsoft Win32, CP 1252
german HP-UX, HP Roman8 set

The relative order of de/german/... versus xx_BE/... depends of your
application: if you are interrested in LC_MONETARY or LC_TIME, you will
prefer xx_BE before de. At the contrary, if your goal is LC_COLLATE, you
will prefer de before xx_BE.

Of course, you have to put all datas in tables and do lookups.

Note that with any algorithm of this sort (except for the first 3 cases),
you need to bet about character sets. A reasonable guess in Western
Europe is to suppose ISO 8859-1, but this is not world-wide applicable.
See my comments at the right.

Hope it helps.

NUMATA Toshinori

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

In article <4ua48b$k...@sparcserver.lrz-muenchen.de>

Helmut....@lrz-muenchen.de (Helmut Richter) writes:
> POSIX 1003.1 recommends (but does not prescribe) to use the following
> syntax of locale names: language_TERRITORY.Code, e.g.:
>

> de_AT.ISO8859-1
> hu_HU.ISO8859-2
> ja_JP.AJEC

I never saw such recommendation in the POSIX.1 standard. Could you
tell me in what section such recommendation is made?

As far as I know, the format is defined in XPG4. The XPG4
specification allows ommission of "territory" part and "codeset" part,
and XPG4 does not specify the contents of each field, so there can be
lots of variations. For example, locale name for Japanese can be
"ja_JP.AJEC", "ja_JP", "ja", "Japanese_Japan", or "Japanese".

> A portable program should therefore consider variants in the names of
> locales. This ranges from simple spelling problems (e.g. upper/lower
> case) to complicated decisions (somebody requesting the non-existent
> de_AT.ISO8859-1 is probably better off with de_DE.ISO8859-1 than with
> ja_JP.AJEC). Here are my questions:

> 1. Is an implementation of the "setlocale" function POSIX conformant
> if it makes such decisions? The answer depends on whether a
> "supported" locale is one that has a definition file or is one
> than happens to be accepted by the setlocale function.

The POSIX standard only mandates the support of the "C" and "POSIX"
locales. Support of other locales are implementation-defined.

> 2. (Not a standards question!) What is a reasonable solution for the
> poor programmer who wants to write a program that works not only
> in places where different locales are valid but also on systems
> where different locale naming conventions are used?

Call the setlocale() function with null string as the second argument:

setlocale(LC_ALL, "");

and let the users select their favorite locale. On POSIX conforming
systems, a user can select locale by setting the environment variables
LANG, LC_ALL, and so on, if you called setlocale() in the above
mentioned way.
--
NUMATA, Toshinori Fujitsu Limited
Planning Dept. 1, CSS Strategy and Alliance, 1-1, Kamikodanaka 4-Chome,
Open Systems Group Nakahara-ku, Kawasaki 211 JAPAN
Phone: +81-44-754-3474 Fax: +81-44-754-3585

Ulrich Drepper

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

In article <4ua48b$k...@sparcserver.lrz-muenchen.de> Helmut....@lrz-muenchen.de (Helmut Richter) writes:

> POSIX 1003.1 recommends (but does not prescribe) to use the following
> syntax of locale names: language_TERRITORY.Code, e.g.:
>
> de_AT.ISO8859-1
> hu_HU.ISO8859-2
> ja_JP.AJEC

POSIX does not standardize this. It only mentions what XPG3 describes:

language[_TERRITORY[.codeset]]

What your system accepts only depends on the names of the files you
indirectly describe by the locale name.

The setlocale() function has some methods to map the name for the
specified category to a file name. In case one name is not supported
you can add some symlinks at the right place or you can run localedef
using the correct locale name (of course this requires root access or
your implementation must know about LOCPATH).

Of course this is a poor solution. In the locale implementation for
GNU libc I added something which is also available in the X Window System:
locale alias. E.g., I have in my alias file

German de_DE.ISO_8859-1

The implementation also knows about the different writings for the
codeset name (say, iso-8859-1, 88591, etc).

Beside the X/Open definition above there is also a standard by some
commitees of the European community. This paper defines a much better
format. You can express things which do not fit in the limited
X/Open scheme.

And one final word:

> 2. (Not a standards question!) What is a reasonable solution for the
> poor programmer who wants to write a program that works not only
> in places where different locales are valid but also on systems
> where different locale naming conventions are used?

Why should the programmer care about the name? Normally you should
simply have

setlocale (LC_ALL, "");

in your programs. The user has to specify the value. If you want to
have some "user friendly" menus to select the name consider using
a configuration file.
--
-- Uli
--------------. dre...@cygnus.com ,-. Rubensstrasse 5
Ulrich Drepper \ ,--------------------' \ 76149 Karlsruhe/Germany
Cygnus Support `--' dre...@gnu.ai.mit.edu `------------------------

Helmut Richter

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

nu...@rp.open.cs.fujitsu.co.jp (NUMATA Toshinori) writes:

>In article <4ua48b$k...@sparcserver.lrz-muenchen.de>
> Helmut....@lrz-muenchen.de (Helmut Richter) writes:
>> POSIX 1003.1 recommends (but does not prescribe) to use the following
>> syntax of locale names: language_TERRITORY.Code, e.g.:
>>
>> de_AT.ISO8859-1
>> hu_HU.ISO8859-2
>> ja_JP.AJEC

>I never saw such recommendation in the POSIX.1 standard. Could you

>tell me in what section such recommendation is made?

E.1.3 (an informative annex)

The funny thing is that they use a different syntax in the example in
section B.8.1.2 (also an informative annex).

>> 2. (Not a standards question!) What is a reasonable solution for the
>> poor programmer who wants to write a program that works not only
>> in places where different locales are valid but also on systems
>> where different locale naming conventions are used?

>Call the setlocale() function with null string as the second argument:

> setlocale(LC_ALL, "");

>and let the users select their favorite locale. On POSIX conforming
>systems, a user can select locale by setting the environment variables
>LANG, LC_ALL, and so on, if you called setlocale() in the above
>mentioned way.

This requires that users are aware of the supported locales (which is,
of course, not in the responsibility of the application programmer).
Since the actual names of the supported locales vary from one system
to another, it cannot be reasonably expected that users know which
spelling of locale names is to be used on which systems. The two
questions I asked are two different possible solutions:

1. to shift it into the setlocale function (on which the application
programmer has no influence, however)

2. to write locale-conscious code in the application: if the locale
specified by the user does not exist, select a "similar" one instead.
This is dubious from a standards standpoint, but possibly useful.

Helmut Richter

Gianni Mariani

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

The other added feature is the ability to select parts of the
locale.

Category 1st Env. Var. 2nd Env. Var
___________________________________________
LC_CTYPE: LC_CTYPE LANG
LC_COLLATE: LC_COLLATE LANG
LC_TIME: LC_TIME LANG
LC_NUMERIC: LC_NUMERIC LANG
LC_MONETARY: LC_MONETARY LANG
LC_MESSAGES: LC_MESSAGES LANG

Systems based on AT&T MNLS also use a special LANG syntax,

LANG="/fr/fr/de/en/en_US/es"

The above says, I want my data treated as French, my time
in German format, numbers in English format, money is
US dollars and messages in Spanish.

This allows you to select different bits of individual locales.

In particular, with the up-coming IRIX releases we will be making
it very easy for users to do this. Hence it's important that
your application makes no assumption about the locale and locale
names.

--

_ ` _ ` Globalization R&D
/ \ / / \ /-- /-- /
/ // / / / / / / / Graphics is cool
\_/ \ \_ \/ /_/ /_/ o Internationalization c'est magnifique
/ /
\_/ (415) 933 4387 Opinions mine etc ...

Bob Goudreau

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

NUMATA Toshinori (nu...@rp.open.cs.fujitsu.co.jp) wrote:
: In article <4ua48b$k...@sparcserver.lrz-muenchen.de>

: Helmut....@lrz-muenchen.de (Helmut Richter) writes:
: > POSIX 1003.1 recommends (but does not prescribe) to use the following
: > syntax of locale names: language_TERRITORY.Code, e.g.:
: >
: > de_AT.ISO8859-1
: > hu_HU.ISO8859-2
: > ja_JP.AJEC

: I never saw such recommendation in the POSIX.1 standard. Could you
: tell me in what section such recommendation is made?

See Annex E to ISO 9945-1 (POSIX.1-1990), the "Sample National Profile"
for Denmark. Lines 76-78 say:

The following guideline is used for specifying the locale
identification string:

"%2.2s_%2.2s.%s,%s",<language>,<territory>,<character_set>,
<version>

A footnote on the guideline notes that it "was inspired by the X/Open
Portability Guide", as you noted.

----------------------------------------------------------------------
Bob Goudreau Data General Corporation
goud...@dg-rtp.dg.com 62 Alexander Drive
+1 919 248 6231 Research Triangle Park, NC 27709, USA

Ulrich Drepper

unread,

Aug 8, 1996, 3:00:00 AM8/8/96

to

In article <320A09...@engr.sgi.com> Gianni Mariani <gia...@engr.sgi.com> writes:

> Systems based on AT&T MNLS also use a special LANG syntax,
>
> LANG="/fr/fr/de/en/en_US/es"

This is not unusual. Any system must provide the possibility to
return the formerly used locale names when calling setlocale(). When
some category are set to different locale names this means the
returned string must indicate this. In GNU libc the syntax would be
"LC_TIME=fr:LC_NUMERIC=de:LC_MONETARY=en:LC_MESSAGES=es:LC_CTYPE=C:LC_COLLATE=C".