Chinese language support of Python?

Leon Wang

unread,

Jul 6, 2002, 10:19:45 AM7/6/02

to

How can enable Chinese language support of Python? In IDLE, even can
not save the source file if contain any >128 ASCII code charactors. I
want to set the Window title in Chinese, but the bit7 is masked by
Tkinter:

from Tkinter import *
from Tkconstants import *

def filenew():
print 'filenew'
def fileopen():
print 'fileopen'
def fileexit():
print 'fileexit'
def helpabout():
print 'helpabout'
root=Tk()
menu = Menu(root)
root.config(menu=menu)
root.title('中文') # this is Chinese

filemenu = Menu(menu)
menu.add_cascade(label="File", menu=filemenu)
filemenu.add_command(label="New", command=filenew)
filemenu.add_command(label="Open...", command=fileopen)
filemenu.add_separator()
filemenu.add_command(label="Exit", command=fileexit)

helpmenu = Menu(menu)
menu.add_cascade(label="Help", menu=helpmenu)
helpmenu.add_command(label="About...", command=helpabout)

#frame=Frame(root)
#frame.master.title('ETUS')
#frame.pack()
mainloop()

Boudewijn Rempt

unread,

Jul 6, 2002, 3:25:28 PM7/6/02

to

Leon Wang wrote:

> How can enable Chinese language support of Python? In IDLE, even can
> not save the source file if contain any >128 ASCII code charactors. I
> want to set the Window title in Chinese, but the bit7 is masked by
> Tkinter:

> root.title('中文') # this is Chinese
>

Well, this isn't Chinese -- at least not when it arrived at my
server, but probably not even when it originated with you, because
I see your headers advertise ISO-8859-1 as the encoding. It's plain
string that contains an assortment of ampersands, hash marks, semicolons
and numbers in an order that hasn't much meaning.

If you use _real_ unicode -- for instance
root.title(u'\u028A\u0288') # no chinese, because of lacking fonts, but IPA
then everything works fine -- at least, with my window manager, on my OS.

--
Boudewijn Rempt | http://www.valdyas.org

Martin v. Loewis

unread,

Jul 6, 2002, 4:52:26 PM7/6/02

to

Boudewijn Rempt <bo...@valdyas.org> writes:

> > root.title('中文') # this is Chinese

[...]

> If you use _real_ unicode -- for instance
> root.title(u'\u028A\u0288') # no chinese, because of lacking fonts, but IPA
> then everything works fine -- at least, with my window manager, on my OS.

It's more likely that the OP meant

root.title(u'\u4e2d\u6587')

Regards,
Martin

Leon Wang

unread,

Jul 7, 2002, 1:54:30 AM7/7/02

to

Hi, I got the Chinese displayed correctly in window title without
change the default encoding in site.py by:

root.title(u'\u4e2d\u6587')

But still can not put Chinese directly as string in source, I can not
live with so much \u... for a whole Chinese sensence/paragraph, it's
impossible to read and edit them :(
However, I can print Chinese string (normal string, without u prefix
and \u codes) in console with command line python.exe. How can I let
Tkinter accept that?

mar...@v.loewis.de (Martin v. Loewis) wrote in message news:<m3sn2wv...@mira.informatik.hu-berlin.de>...

Boudewijn Rempt

unread,

Jul 7, 2002, 2:22:10 AM7/7/02

to

Leon Wang wrote:

> Hi, I got the Chinese displayed correctly in window title without
> change the default encoding in site.py by:
>
> root.title(u'\u4e2d\u6587')
>
> But still can not put Chinese directly as string in source, I can not
> live with so much \u... for a whole Chinese sensence/paragraph, it's
> impossible to read and edit them :(
> However, I can print Chinese string (normal string, without u prefix
> and \u codes) in console with command line python.exe. How can I let
> Tkinter accept that?

I don't think that's going to work (caveat: I use PyQt which has different
conventions). If you absolutely want to have Chinese characters in your
source files*, you can do something like the following**:

root.title(unicode('伱好?', 'utf-8')

Note that you _will_ have to construct a unicode object, not an ordinary
string, since ordinary strings are just containers for bytes, one character
per byte. If you want the system to understand what you mean.

You can find out which encodings are available by inspecting the
python/lib/encodings directory (or, python\lib\encodings): you can use
any encoding instead of the 'utf-8'. Of course, the string must then
be in the right encoding, too.

There are some errors in my handling of this topic in my book, but it might
still be useful to you:

http://www.opendocspublishing.com/pyqt/index.lxp?lxpwrap=c2029%2ehtm

errata:

http://www.valdyas.org/python/book.html

The paper version has nice pictures that are quite useful in this chapter.

* Actually I still think it would be great to be able to have sourcefiles
in utf-8, not limited to unicode strings. I want to type:

def 印刷():
pass

That this would make my source code unreadable for a lot other people, tant
pis, I still would like the power. Just as I want the power to do a quick
sys.setAppDefaultEncoding('utf-8') to make sure this application sees all
its strings as encoded in utf-8.

** Note that this posting is encoded in utf-8. If you see gibberish instead
of a friendly greeting, then either the message is mangled, or your
newsreader can't handle the encoding, or you don't have the fonts to show
Chinese.

David LeBlanc

unread,

Jul 7, 2002, 2:49:28 AM7/7/02

to

This may not be of much help, but Tk, the library behind Tkinter, is quite
popular for displaying Asian languages and there was a substantial body of
work done. Some changes where made to standard Tk to accomodate Asian
language display. You might find it necessary to build a custom version of
Tk for Tkinter to link with and possibly also modify some of the Tkinter
wrapper. You should be able to get more information about Asian language
support in Tk by asking on the comp.lang.tcl newsgroup.

You can also find some good information on Tk at http://www.tcl.tk/.
You can also find a pointer to a "traditional Chinese" page from
http://tcl.sourceforge.net/faqs/tcl/

Then, of course, there's Chinese Python at
http://chinesepython.cosoft.org.cn/cgi-bin/cgb/home.html (in chinese - also
has a Sourceforge page at http://sourceforge.net/projects/chinesepython/ in
english).

There are other links about tkinter and chinese at yahoo - I just entered
tkinter and chinese (without the word "and").

Note: while Tcl is a nice language, I do not suggest you abandon Python for
Tcl, especially if more then basic MS-Windows support is important to you. I
personally think Python might have a speed edge too.

David LeBlanc
Seattle, WA USA

> --
> http://mail.python.org/mailman/listinfo/python-list

Martin v. Loewis

unread,

Jul 7, 2002, 4:01:05 AM7/7/02

to

guidance...@yahoo.com.cn (Leon Wang) writes:

> But still can not put Chinese directly as string in source, I can not
> live with so much \u... for a whole Chinese sensence/paragraph, it's
> impossible to read and edit them :(

This is a known problem, and it will be addressed with PEP 263
(http://www.python.org/peps/pep-0263.html).

Meanwhile, you have the following options:

- Don't use IDLE to edit Python source code (but, say, notepad), and
only put Chinese text into string literals.
- Set the default encoding in site.py to the encoding you want to use.
- Apply patch
http://sourceforge.net/tracker/index.php?func=detail&aid=508973&group_id=9579&atid=309579

which allows you to declare the source encoding for IDLE.

In either case, you cannot use Chinese in Unicode literals. Instead,
you should always use

unicode("chinese string", "chinese encoding")

For portability, and if your editors support it, I recommend to use
UTF-8 as the "chinese encoding".

Regards,
Martin

Martin v. Loewis

unread,

Jul 7, 2002, 4:08:17 AM7/7/02

to

Boudewijn Rempt <bo...@valdyas.org> writes:

> I don't think that's going to work (caveat: I use PyQt which has different
> conventions). If you absolutely want to have Chinese characters in your
> source files*, you can do something like the following**:
>
> root.title(unicode('伱好?', 'utf-8')

The problem is that this won't work in IDLE.

> * Actually I still think it would be great to be able to have sourcefiles
> in utf-8, not limited to unicode strings. I want to type:

This can only happen after PEP 263 is adopted, otherwise, it will be
difficult to find out which bytes denote letters. Even then, it will
be difficult to find out when two identifiers are equal - __dict__
dictionaries would need to allow Unicode strings as keys.

Notice that this only a step towards what ChinesePython is doing,

http://www.python.org/doc/NonEnglish.html#chinese

which changes all the keywords to allow you to type Chinese-based
keywords instead of the traditional English-based keywords.

> That this would make my source code unreadable for a lot other people, tant
> pis, I still would like the power. Just as I want the power to do a quick
> sys.setAppDefaultEncoding('utf-8') to make sure this application sees all
> its strings as encoded in utf-8.

It could not guarantee this. If you read a byte string from some
external source, it might well not be UTF-8, and Python had no way to
find out.

Regards,
Martin

Leon Wang

unread,

Jul 7, 2002, 8:26:08 AM7/7/02

to

I installed ChineseCodecs1.2.0 into lib/encodings, it converts GB2312
(simplified Chinese) to UTF-8, and I can use this:

root.title(unicode('中文',"eucgb2312_cn"))

Great! I can put whole raw Chinese string in source now!
Before that, I also tried ChinesePython, a Chinese translation version
of Python 2.1, it even enabled this:

root.title('中文') #directly put Chinese in normal string! The Best!!

But a little pity, it translated all prompt/error messages into BIG5
(Traditional Chinese), I can not view them in my GB Windows
environment, and no GB version available now yet. I have to uninstall
it and adopted the first solution.

More pity: the "Python GUI" utility -- IDLE, can not handle Chinese
string in source file(seems bit7 removed), neither the Python2.2.1
from python.org nor above chinesepython versions. If I open my source
with IDLE and save back, all Chinese string will be changed, this
means I cannot use it even for edit. Then, how can I debug the script
in GUI?

Thanks!
Leon Wang

Boudewijn Rempt <bo...@valdyas.org> wrote in message news:<3d27de14$0$94898$e4fe...@dreader3.news.xs4all.nl>...

> Leon Wang wrote:
>
> > Hi, I got the Chinese displayed correctly in window title without
> > change the default encoding in site.py by:
> >
> > root.title(u'\u4e2d\u6587')
> >
> > But still can not put Chinese directly as string in source, I can not

> > live with so much \u... for a whole Chinese sentence/paragraph, it's

Leon Wang

unread,

Jul 7, 2002, 6:25:53 PM7/7/02

to

I found the best option: pythonwin, in win32 extension module,
including source editor and debugger!
Summarize the python for Chinese installation:
1) Python package from python.org
2) Win32all module from
http://starship.python.net/crew/mhammond/win32/
3) ChineseCodecs module from
ftp://freebsd.sinica.edu.tw/pub/ycheng/python/ChineseCodecs1.2.0.tar.gz

I think these are the best solution so far. Use this to display
Chinese(GB) in Tkinter:
>>> root.title(unicode('中文',"eucgb2312_cn"))
and this in console:
>>> root.title('中文')

Thanks for all of your help!!
Leon Wang

Wenshan Du

unread,

Jul 8, 2002, 12:10:11 AM7/8/02

to

hi,
I think this problem is simple.
Try MBCSP 1.0
http://www.dohao.org/python/mbcsp
or visit http://www.dohao.org/python

glace

unread,

Jul 8, 2002, 5:08:34 AM7/8/02

to

There will not be a GB version for chinesepython. GB support is
invoked by the '-g' flag when you starts the chinesepython
interpreter.

{'-b':"for BIG5(default)", '-g':"for GB"}.

For source files stored with different encodings, "#--GBK--" and
"#--BIG5--" magic comment strings should be place in sources to
indicate the respective encodings used.

Try again ^_^
glace

guidance...@yahoo.com.cn (Leon Wang) wrote in message news:<d5d388d4.02070...@posting.google.com>...

> I installed ChineseCodecs1.2.0 into lib/encodings, it converts GB2312
> (simplified Chinese) to UTF-8, and I can use this:
>

> root.title(unicode('中文',"eucgb2312_cn"))
>

> Great! I can put whole raw Chinese string in source now!
> Before that, I also tried ChinesePython, a Chinese translation version
> of Python 2.1, it even enabled this:
>

> root.title('中文') #directly put Chinese in normal string! The Best!!