Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Portable locale usage

54 views
Skip to first unread message

ssegvic

unread,
Sep 6, 2011, 5:59:07 AM9/6/11
to
Hi,

I am musing on how to write portable Python3 code which would
take advantage of the standard locale module.

For instance, it would be very nice if we could say something like:

# does not work!
myISOCountryCode='hr'
locale.setlocale(locale.LC_ALL, (myISOCountryCode,
locale.getpreferredencoding()))

Up to now, I have found ways to set locale on Linux and Windows:

import locale
locale.setlocale(locale.LC_ALL, 'hr_HR.utf8') # works on linux
locale.setlocale(locale.LC_ALL, 'hrv_HRV.1250') # works on windows

I have noticed that locale defines a dictionary locale.locale_alias,
and that it contains the following promising keys: 'hr_hr',
'hrvatski', 'hr'.
Unfortunately, both on Windows and Linux all these keys
are bound to the same outdated string 'hr_HR.ISO8859-2'.

My questions are the following:

1. Is there a way for writing portable Python code dealing with
locales
(as sketched in the beginning)?

2. If not, is there anything wrong with that idea?

3. What is the status of locale.locale_alias (official documentation
does not mention it)?


Cheers,

Sinisa

http://www.zemris.fer.hr/~ssegvic/index_en.html

Thomas Jollans

unread,
Sep 6, 2011, 7:16:50 AM9/6/11
to pytho...@python.org
On 06/09/11 11:59, ssegvic wrote:
> Hi,
>
> I am musing on how to write portable Python3 code which would
> take advantage of the standard locale module.
>
> For instance, it would be very nice if we could say something like:
>
> # does not work!
Doesn't it?

> myISOCountryCode='hr'

This is a language code. (there also happens to be a country code 'hr',
but you're referring to the Croatian language, 'hr')

> locale.setlocale(locale.LC_ALL, (myISOCountryCode,
> locale.getpreferredencoding()))

As far as I can tell, this does work. Can you show us a traceback?

> Up to now, I have found ways to set locale on Linux and Windows:
>
> import locale
> locale.setlocale(locale.LC_ALL, 'hr_HR.utf8') # works on linux
> locale.setlocale(locale.LC_ALL, 'hrv_HRV.1250') # works on windows
>
> I have noticed that locale defines a dictionary locale.locale_alias,
> and that it contains the following promising keys: 'hr_hr',
> 'hrvatski', 'hr'.
> Unfortunately, both on Windows and Linux all these keys
> are bound to the same outdated string 'hr_HR.ISO8859-2'.

It looks like you don't actually care about the encoding: in your first
example, you use the default system encoding, which you do not control,
and in your second example, you're using two different encodings on the
two platforms. So why do you care whether or not the default uses ISO
8859-2 ?

> My questions are the following:
>
> 1. Is there a way for writing portable Python code dealing with
> locales
> (as sketched in the beginning)?
>
> 2. If not, is there anything wrong with that idea?

As I said, I believe the above code should work. It works on my Linux
system.

What are you attempting to achieve with this setting of the locale,
without even setting the encoding? Doesn't it make more sense to simply
use the user's usual locale, and interact with them on their own terms?

> 3. What is the status of locale.locale_alias (official documentation
> does not mention it)?

I don't know, but I'd assume it's not considered part of the public API,
and you that shouldn't assume that it'll exist in future versions of Python.

Thomas

Vlastimil Brom

unread,
Sep 6, 2011, 9:13:17 AM9/6/11
to pytho...@python.org
2011/9/6 ssegvic <sinisa...@fer.hr>:
> --
> http://mail.python.org/mailman/listinfo/python-list
>

There may be some differences btween OSes end the versions, but using
python 2.7 and 3.2 on Win XP and Win7 (Czech)
I get the following results for setlocale:

>>> locale.setlocale(locale.LC_ALL,'Croatian')
'Croatian_Croatia.1250'
>>> locale.getlocale()
('Croatian_Croatia', '1250')
>>> locale.getpreferredencoding(do_setlocale=False)
'cp1250'
>>>

However, "hr" is not recognised on this systems:

>>> locale.setlocale(locale.LC_ALL, "hr")
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "locale.pyc", line 531, in setlocale
Error: unsupported locale setting
>>>

regards,
vbr

ssegvic

unread,
Sep 6, 2011, 10:46:46 AM9/6/11
to
On 6 ruj, 13:16, Thomas Jollans <t...@jollybox.de> wrote:
> > locale.setlocale(locale.LC_ALL, (myISOCountryCode,
> > locale.getpreferredencoding()))
>
> As far as I can tell, this does work. Can you show us a traceback?

Sorry, I was imprecise.

I wanted to say that the above snippet
does not work both on Windows and Linux.

This is what I get on Windows:

>>> import sys
>>> sys.version
'3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)]'


>>> myISOCountryCode='hr'
>>> locale.setlocale(locale.LC_ALL, (myISOCountryCode, locale.getpreferredencoding()))

Traceback (most recent call last):

File "<pyshell#113>", line 1, in <module>
locale.setlocale(locale.LC_ALL, (myISOCountryCode,
locale.getpreferredencoding()))
File "C:\apps\Python32\lib\locale.py", line 538, in setlocale
return _setlocale(category, locale)
locale.Error: unsupported locale setting

The snippet actually works on Linux, as you note.

> It looks like you don't actually care about the encoding: in your first
> example, you use the default system encoding, which you do not control,
> and in your second example, you're using two different encodings on the
> two platforms.

That's true.

That's because currently I care most about
lists of strings being sorted properly (see below).

Nevertheless, it *appears* to me that, in the Unicode era,
the locales could well be decoupled from particular encodings.
But this is another topic.

> So why do you care whether or not the default uses ISO 8859-2 ?

It's not that I care about encoding,
it's that Windows throws locale.Error at me :-)

> > My questions are the following:
>
> > 1. Is there a way for writing portable Python code dealing with
> > locales
> >     (as sketched in the beginning)?
>
> > 2. If not, is there anything wrong with that idea?
>
> As I said, I believe the above code should work. It works on my Linux
> system.
>
> What are you attempting to achieve with this setting of the locale,
> without even setting the encoding? Doesn't it make more sense to simply
> use the user's usual locale, and interact with them on their own terms?

For the moment, I only wish to properly sort a Croatian text file
both on Windows and Linux (I am a cautious guy, I like reachable
goals).
When the locale is properly set, sorting works like a charm
with mylist.sort(key=locale.strxfrm).

My current solution to the portability problem is:

import locale
try:
locale.setlocale(locale.LC_ALL, 'hr_HR.utf8') # linux
except locale.Error:
locale.setlocale(locale.LC_ALL, 'Croatian_Croatia.1250') #
windows

Thanks for your feedback!

Sinisa

ssegvic

unread,
Sep 6, 2011, 11:31:59 AM9/6/11
to
On 6 ruj, 15:13, Vlastimil Brom <vlastimil.b...@gmail.com> wrote:

> There may be some differences btween OSes end the versions, but using
> python 2.7 and 3.2 on Win XP and Win7 (Czech)
> I get the following results for setlocale:
>
> >>> locale.setlocale(locale.LC_ALL,'Croatian')
>
> 'Croatian_Croatia.1250'>>> locale.getlocale()
>
> ('Croatian_Croatia', '1250')
>
> >>> locale.getpreferredencoding(do_setlocale=False)
> 'cp1250'
>
> However, "hr" is not recognised on this systems:
>
> >>> locale.setlocale(locale.LC_ALL, "hr")
>
> Traceback (most recent call last):
>   File "<input>", line 1, in <module>
>   File "locale.pyc", line 531, in setlocale
> Error: unsupported locale setting

Thanks for your feedback!


So this works only on Linux (in concordance with the documentation):

locale.setlocale(locale.LC_ALL, ('croatian',
locale.getpreferredencoding()))

And this works only on Windows (incomplete locale spec probably filled
in by Windows API):

locale.setlocale(locale.LC_ALL, 'croatian')


Obviously, there is a misunderstanding between Python
which uses standard (IANA) language codes
and Windows which, as usual, have their own ways :-(


One possible solution would be to change
locale.locale_alias on Windows so that
it honors the custom Windows conventions:
'hr' -> 'Croatian_Croatia.1250'
instead of
'hr' -> 'hr_HR.ISO8859-2'

In addition, locale.getpreferredencoding()
should probably be changed in order to return
valid Windows encodings ('1250' instead of 'cp1250').

Cheers,

Sinisa

Thomas Jollans

unread,
Sep 6, 2011, 11:53:37 AM9/6/11
to pytho...@python.org
On 06/09/11 16:46, ssegvic wrote:
> For the moment, I only wish to properly sort a Croatian text file
> both on Windows and Linux (I am a cautious guy, I like reachable
> goals).
> When the locale is properly set, sorting works like a charm
> with mylist.sort(key=locale.strxfrm).

The problem with that is of course that a Croatian locale has to be
installed. Many Linux systems don't have locales that aren't used.

garabik-ne...@kassiopeia.juls.savba.sk

unread,
Sep 6, 2011, 4:58:09 PM9/6/11
to
Thomas Jollans <t...@jollybox.de> wrote:

> It looks like you don't actually care about the encoding: in your first
> example, you use the default system encoding, which you do not control,
> and in your second example, you're using two different encodings on the
> two platforms. So why do you care whether or not the default uses ISO
> 8859-2 ?
>

Maybe because using 8859-2 locale, (unicode) strings not representable in the
encodings will be sorted - how?

I would care, I prefer not to have undefined behaviour.

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!

ssegvic

unread,
Sep 7, 2011, 7:02:18 AM9/7/11
to
On 6 ruj, 22:58, garabik-news-2005...@kassiopeia.juls.savba.sk wrote:
> Thomas Jollans <t...@jollybox.de> wrote:
> > It looks like you don't actually care about the encoding: in your first
> > example, you use the default system encoding, which you do not control,
> > and in your second example, you're using two different encodings on the
> > two platforms. So why do you care whether or not the default uses ISO
> > 8859-2 ?
>
> Maybe because using 8859-2 locale, (unicode) strings not representable in the
> encodings will be sorted - how?

Exactly.

Additionally, fonts supporting 8859-2 are scarce.
My favourite fonts were never available in 8859-2.

Sinisa

ssegvic

unread,
Sep 7, 2011, 6:39:14 AM9/7/11
to

It appears we did not understand each other completely.

Python locales on Linux work as advertised,
I have no problems with locales on Linux whatsoever
(yes, the Croatian locale had to be manually installed).

On the other hand, it appears that
Python locales on Windows do not work as advertised.
Consider for instance my initial example:
locale.setlocale(locale.LC_ALL, ('hr',
locale.getpreferredencoding()))
The code above does not work on Windows even though the fine manual
says:
http://docs.python.org/py3k/library/locale.html
'''
locale.setlocale(category, locale=None)
...
If (the locale) is a tuple, it is converted to a string using the
locale aliasing engine.
...
'''
I do not believe my troubles could be solved by installing anything,
since the OS support for Croatian apperas to be present:
locale.setlocale(locale.LC_ALL, 'Croatian_Croatia.1250')

To conclude, it seems to me that the Windows implementation
of the locale aliasing engine has some space for improvement.

All further comments shall be greatly appreciated :-)

Cheers,

Sinisa

ssegvic

unread,
Sep 7, 2011, 7:17:52 AM9/7/11
to
On 6 ruj, 17:53, Thomas Jollans <t...@jollybox.de> wrote:

I already concluded that on Linux there are no problems whatsoever
(the Croatian locale was kindly installed by the distribution setup).

Since my initial snippet does not work on Windows, I would conclude
that the locale aliasing engine on Windows should be improved.

Any opposing views will be appreciated :-)

For convenience, I repeat the snippet here:
import locale


locale.setlocale(locale.LC_ALL, ('hr', locale.getpreferredencoding()))

Cheers,

Sinisa

Thomas Jollans

unread,
Sep 7, 2011, 9:15:53 AM9/7/11
to pytho...@python.org
On 07/09/11 12:39, ssegvic wrote:
> On 6 ruj, 17:53, Thomas Jollans <t...@jollybox.de> wrote:
>> On 06/09/11 16:46, ssegvic wrote:
>>
>>> For the moment, I only wish to properly sort a Croatian text file
>>> both on Windows and Linux (I am a cautious guy, I like reachable
>>> goals).
>>> When the locale is properly set, sorting works like a charm
>>> with mylist.sort(key=locale.strxfrm).
>>
>> The problem with that is of course that a Croatian locale has to be
>> installed. Many Linux systems don't have locales that aren't used.
>
> It appears we did not understand each other completely.

Yes we did. I was just pointing out that your code wouldn't be portable
to systems that don't have that specific locale.

Siniša Šegvić

unread,
Sep 7, 2011, 2:33:57 PM9/7/11
to Laszlo Nagy, pytho...@python.org
> From: "Laszlo Nagy" <gan...@shopzeus.com>
> To: "ssegvic" <sinisa...@fer.hr>, pytho...@python.org
> Sent: Wednesday, September 7, 2011 4:51:20 PM
> Subject: Re: Portable locale usage

> > 1. Is there a way for writing portable Python code dealing with
> > locales (as sketched in the beginning)?
> I usually do this at the top of my main program, before importing
> other modules:
>
> import locale
> locale.setlocale(locale.LC_ALL, '')

I have set the system-wide locale to Croatian (Croatia)
on my development system as instructed by:
http://windows.microsoft.com/en-US/windows-vista/Change-the-system-locale

Nevertheless, your proposal produces:
('English_United States','1252')

Note that I would very much like
to avoid changing the system locale
(this requires Administrator password and system restart).

Setting the locale for my program only would be interesting,
but AFAIK this can not be done on Windows (?).

> Why are you trying to force a specific locale to your program anyway?

Because I wish to be able to correctly sort Croatian names.

I expect that most of my Windows users will not care
to configure their computers with the national locale
(and besides, that does not seem to work, anyway).

Cheers,

Sinisa

Thomas Jollans

unread,
Sep 7, 2011, 5:14:26 PM9/7/11
to pytho...@python.org
On 07/09/11 20:33, Siniša Šegvić wrote:
> I expect that most of my Windows users will not care
> to configure their computers with the national locale
> (and besides, that does not seem to work, anyway).

Are, on Windows, the default system region/language setting, and the
locale, distinct? (And, if so, why?!)

Laszlo Nagy

unread,
Sep 8, 2011, 4:41:22 AM9/8/11
to Siniša Šegvić, pytho...@python.org

> I have set the system-wide locale to Croatian (Croatia)
> on my development system as instructed by:
> http://windows.microsoft.com/en-US/windows-vista/Change-the-system-locale
>
> Nevertheless, your proposal produces:
> ('English_United States','1252')
This is what I see on my Hungarian Windows:


C:\Users\User>python
Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
(AMD64)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'Hungarian_Hungary.1250'
>>> locale.getlocale()
('Hungarian_Hungary', '1250')
>>>

So I'm 100% sure that the problem is with your system locale settings,
not Python.

> Note that I would very much like
> to avoid changing the system locale
> (this requires Administrator password and system restart).
All right. But you understand, that Croatian ISO8859-2 is not supported
on windows? So you will not be able to sort names with that under a
windows system? (And it is not a limitation of Python.)
>> Why are you trying to force a specific locale to your program anyway?
> Because I wish to be able to correctly sort Croatian names.
Well, all right. If you want to sort Croatian names from a program that
runs on an English (or whatever) system, then you will have to check the
platform and use a locale that is supported by the platform. (But again,
this is not Python's limitation. Python doesn't know what encodings are
supported, in advance, and you cannot use a locale that is not supported...)
> I expect that most of my Windows users will not care
> to configure their computers with the national locale
> (and besides, that does not seem to work, anyway).
Croatian users will most likely use a Croatian Windows, out of the box.
And on those systems, using locale.setlocale(locale.LC_ALL, '') will
work perfectly. I'm not sure why it doesn't work on an English Windows
with locale changed... I'm not a big fan of Windows, but I remember once
I had to install a language pack for Windows before I could use a
localized program. This might be what you need?

Best,

Laszlo

Siniša Šegvić

unread,
Sep 8, 2011, 11:39:42 AM9/8/11
to Laszlo Nagy, pytho...@python.org
> From: "Laszlo Nagy" <gan...@shopzeus.com>
> To: "Siniša Šegvić" <sse...@zemris.fer.hr>, pytho...@python.org
> Sent: Thursday, September 8, 2011 10:41:22 AM
> Subject: Re: Portable locale usage
> > I have set the system-wide locale to Croatian (Croatia)
> > on my development system as instructed by:
> > http://windows.microsoft.com/en-US/windows-vista/Change-the-system-locale
> >
> > Nevertheless, your proposal produces:
> > ('English_United States','1252')
> This is what I see on my Hungarian Windows:
>
>
> C:\Users\User>python
> Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit
> (AMD64)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import locale
> >>> locale.setlocale(locale.LC_ALL, '')
> 'Hungarian_Hungary.1250'
> >>> locale.getlocale()
> ('Hungarian_Hungary', '1250')
> >>>
>
> So I'm 100% sure that the problem is with your system locale settings,
> not Python.

I've just found out how to set the user locale on Windows.

One has to go to Control panel -> Regional and language options,
then select the tab named Formats, and finally
set the box Current format to the desired language,
which is in my case Croatian (Croatia).

The whole tab says nothing about locales, I found this by try and test.
This recipe is not affected by the system locale (which I was setting before)!

Now locale.setlocale(locale.LC_ALL, '') sets the Croatian locale.

> > I expect that most of my Windows users will not care
> > to configure their computers with the national locale
> > (and besides, that does not seem to work, anyway).
> Croatian users will most likely use a Croatian Windows, out of the
> box.
> And on those systems, using locale.setlocale(locale.LC_ALL, '') will
> work perfectly.

Yes it's true, you were right, I was setting
the Croatian language at the wrong place
(I am not a Windows fan neither, I normally work on Linux).

However, I am not completely happy with this.
OK, no need for system restart, but still,
it would be nice if Python program could
manage around this by itself,
of course, provided that the required locale is installed.

> > Note that I would very much like
> > to avoid changing the system locale
> > (this requires Administrator password and system restart).
> All right. But you understand, that Croatian ISO8859-2 is not
> supported on windows?

Yes I do understand that.

I have commented that the Python's locale aliasing engine
should not propose iso8859-2 on Windows systems,
exactly for the reason you mention.


> >> Why are you trying to force a specific locale to your program
> >> anyway?
> > Because I wish to be able to correctly sort Croatian names.
> Well, all right. If you want to sort Croatian names from a program that
> runs on an English (or whatever) system, then you will have to check the
> platform and use a locale that is supported by the platform. (But again,
> this is not Python's limitation. Python doesn't know what encodings are
> supported, in advance, and you cannot use a locale that is not supported...)

I fully agree.

I commented that, if a proper locale is installed,
the following should work on any system:

locale.setlocale(locale.LC_ALL, ('hr', locale.getpreferredencoding()))

Currently the above does not work on Windows,
and that is because the locale_alias for 'hr'
is bound to 'hr_HR.ISO8859-2'.
Check the source: .../Python-3.2.2/Lib/locale.py, line 537

I was arguing that, on a Windows system,
the locale_alias for 'hr' should be bound
to 'Croatian_Croatia.1250'.

Cheers,

Sinisa

Laszlo Nagy

unread,
Sep 9, 2011, 5:39:52 AM9/9/11
to Siniša Šegvić, pytho...@python.org

>>>> Why are you trying to force a specific locale to your program
>>>> anyway?
>>> Because I wish to be able to correctly sort Croatian names.
>> Well, all right. If you want to sort Croatian names from a program that
>> runs on an English (or whatever) system, then you will have to check the
>> platform and use a locale that is supported by the platform. (But again,
>> this is not Python's limitation. Python doesn't know what encodings are
>> supported, in advance, and you cannot use a locale that is not supported...)
> I fully agree.
>
> I commented that, if a proper locale is installed,
> the following should work on any system:
>
> locale.setlocale(locale.LC_ALL, ('hr', locale.getpreferredencoding()))
>
> Currently the above does not work on Windows,
> and that is because the locale_alias for 'hr'
> is bound to 'hr_HR.ISO8859-2'.
> Check the source: .../Python-3.2.2/Lib/locale.py, line 537
>
> I was arguing that, on a Windows system,
> the locale_alias for 'hr' should be bound
> to 'Croatian_Croatia.1250'.

Looks like you have found a bug! :-) Why don't you post a bug report?

L

Siniša Šegvić

unread,
Sep 12, 2011, 8:45:38 AM9/12/11
to Laszlo Nagy, pytho...@python.org
> From: "Laszlo Nagy" <gan...@shopzeus.com>
> To: "Siniša Šegvić" <sse...@zemris.fer.hr>, pytho...@python.org
> Sent: Friday, September 9, 2011 11:39:52 AM
> Subject: Re: Portable locale usage
>
> Looks like you have found a bug! :-) Why don't you post a bug report?

I just did:

http://bugs.python.org/issue12964

Thanks everyone for helping me to sort this out!

Sinisa
0 new messages