set default encoding to UTF=8 for Windows using Python 2.7

5,121 views
Skip to first unread message

Lynn Oliver

unread,
Nov 5, 2012, 4:08:59 PM11/5/12
to pyins...@googlegroups.com
For OSX, you can set the encoding by placing the following as the first or second line of a module:
#-*- coding: utf-8 -*-

That doesn't work on Windows, as Python apparently sets the encoding when starting up and won't change it later.  

It appears that the proper way is to is to create a module called sitecustomize.py in PYTHONPATH that contains:
import sys
sys.setdefaultencoding('utf-8')

Or to edit site.py to set the default encoding.  Either of these methods has the drawback that it affects all Python processes.

As a practical matter, this seems to work if placed so it executes at the beginning of a script:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")

Does this sound correct?

WRT pyinstaller, what is the best way to handle this?


Martin Zibricky

unread,
Nov 5, 2012, 5:31:30 PM11/5/12
to pyins...@googlegroups.com
Lynn Oliver píše v Po 05. 11. 2012 v 13:08 -0800:
> For OSX, you can set the encoding by placing the following as the
> first or second line of a module: #-*- coding: utf-8 -*-
>
> That doesn't work on Windows, as Python apparently sets the encoding
> when starting up and won't change it later.

That's weird. The coding line should work everywhere, right?

When you speak about Python do you mean 'Python interpreter' or the
executable created by pyinstaller?

How is it related to pyinstaller itself? Pyinstaller just uses compiled
python byte code and thus I think the coding line does not matter in
this situation.

Could it be that the python locale should be set properly in your app?

What are you trying to fix?






Lynn Oliver

unread,
Nov 5, 2012, 6:03:45 PM11/5/12
to pyins...@googlegroups.com
When I refer to Python, I mean the Python interpreter. The coding line *should* work everywhere but it is well known that it doesn't.

This is related to pyinstaller only to the extent that the "proper" solution involves either modifying a Python system file (site.py) or adding a file that gets executed as Python starts. I don't know if either of those is a problem when configuring a build with pyinstaller. It seems this would be a common problem unless no-one uses unicode, so I thought the best solution might be known.

As I posted separately, I'm unable to build on Windows XP now anyway because the "ImportError: No module named multiarray" is back despite the file being present in the build. Since I was able to build fine yesterday, it seems likely that something I've added for unicode support is related to the failure. I honestly don't see how, since the changes are all pretty simple and do not involve any new libraries.

Lynn
> --
> You received this message because you are subscribed to the Google Groups "PyInstaller" group.
> To post to this group, send email to pyins...@googlegroups.com.
> To unsubscribe from this group, send email to pyinstaller...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/pyinstaller?hl=en.
>

Hartmut Goebel

unread,
Nov 7, 2012, 2:43:18 PM11/7/12
to pyins...@googlegroups.com
Am 05.11.2012 22:08, schrieb Lynn Oliver:
For OSX, you can set the encoding by placing the following as the first or second line of a module:
#-*- coding: utf-8 -*-

That doesn't work on Windows, as Python apparently sets the encoding when starting up and won't change it later.  

[...]
Does this sound correct?

No, this sound wrong.

The coding-line effects only the *single* file, which contains this line. It does not set any default-encoding nor does it effect other files. Each file has to define its encoding by itself. See http://docs.python.org/2/reference/lexical_analysis.html#encoding-declarations

What are the problems you encounter?

--
Schönen Gruß
Hartmut Goebel
Dipl.-Informatiker (univ), CISSP, CSSLP

Goebel Consult
http://www.goebel-consult.de

Monatliche Kolumne: http://www.cissp-gefluester.de/2011-02-fleisige-datensammler-fur-lukratives-geschaeftsmodell-gesucht
Blog: http://www.goebel-consult.de/blog/20060215

Goebel Consult ist Mitglied bei http://www.7-it.de/

Lynn Oliver

unread,
Nov 8, 2012, 1:36:22 AM11/8/12
to pyins...@googlegroups.com
You are correct.  #-*- coding: utf-8 -*- specifies the source encoding for that module.  I drew an incorrect conclusion from the results of some debugging.  On OSX Python handles utf-8 fine as-is, but the same fails on Windows unless you do something to change the default encoding as described earlier.

The problem I've run into on Windows is that one or more Tkinter widgets generates an error when passed unicode text that contains code points greater than 127.  After changing the encoding to utf-8 it works just fine.

My question, however, was whether pyinstaller will have any problems picking up changes to site.py.  So far the reload(sys) hack has been working fine, so I may not need to deal with the 'proper' solution in any case.

Lynn

Hartmut Goebel

unread,
Nov 8, 2012, 2:44:17 PM11/8/12
to pyins...@googlegroups.com
Am 08.11.2012 07:36, schrieb Lynn Oliver:
The problem I've run into on Windows is that one or more Tkinter widgets generates an error when passed unicode text that contains code points greater than 127.  After changing the encoding to utf-8 it works just fine.

My question, however, was whether pyinstaller will have any problems picking up changes to site.py.  So far the reload(sys) hack has been working fine, so I may not need to deal with the 'proper' solution in any case.

So did I understand corretly: This is not a PyInstaller problem, but a TkInter problem. You are only seeking a solution to make your work-around work with PyIntaller, too.

Correct?

PyInstaller does not handle chanes to site.py. Even worse (for your case): site.py is replaced by module `fake-site.py` o make PyInstaller work in virtualenv. See hook_site.py and PyInstaller/fake/fake-site.py. Thus using sitecustomize would not solve your problem as it is no used.

A simple solution could be to add *two* scripts in your .spec file: one for fixing the encoding and then the real one.

HTH

--
Schönen Gruß
Hartmut Goebel
Dipl.-Informatiker (univ), CISSP, CSSLP

Goebel Consult
http://www.goebel-consult.de

Martin Zibricky

unread,
Nov 8, 2012, 3:27:55 PM11/8/12
to pyins...@googlegroups.com
Hartmut Goebel píše v Čt 08. 11. 2012 v 20:44 +0100:
> Am 08.11.2012 07:36, schrieb Lynn Oliver:
> > The problem I've run into on Windows is that one or more Tkinter
> > widgets generates an error when passed unicode text that contains
> > code points greater than 127. After changing the encoding to utf-8
> > it works just fine.
> >
> >
> > My question, however, was whether pyinstaller will have any problems
> > picking up changes to site.py. So far the reload(sys) hack has been
> > working fine, so I may not need to deal with the 'proper' solution
> > in any case.
>
> So did I understand corretly: This is not a PyInstaller problem, but a
> TkInter problem. You are only seeking a solution to make your
> work-around work with PyIntaller, too.

If he is experiencing issues with Tkinter then we should probably fix
the tkinter hook in pyinstaller.

Hartmut Goebel

unread,
Nov 8, 2012, 3:40:54 PM11/8/12
to pyins...@googlegroups.com
Am 08.11.2012 21:27, schrieb Martin Zibricky:
If he is experiencing issues with Tkinter then we should probably fix the tkinter hook in pyinstaller.

If I understood him right, this problem occurs even if *not* using PyInstaller.

Lynn Oliver

unread,
Nov 8, 2012, 9:06:40 PM11/8/12
to pyins...@googlegroups.com
There is no problem.  There is a question: If I modify c:\Python27\Lib\site.py will pyinstaller pick it up?  Ssite.py is imported during python initialization.

Martin Zibricky

unread,
Nov 9, 2012, 8:54:33 AM11/9/12
to pyins...@googlegroups.com
Lynn Oliver píše v Čt 08. 11. 2012 v 18:06 -0800:
> There is no problem. There is a question: If I modify c:\Python27\Lib
> \site.py will pyinstaller pick it up? Ssite.py is imported during
> python initialization.

Pyinstaller will not pick it up. It will replace it with its own dummy
site.py module.

This way we can ensure that the created executables won't load any
modules from user-specific locations.

Reply all
Reply to author
Forward
0 new messages