unable to print Unicode characters in Python 3

jefm

unread,

Jan 26, 2009, 1:16:28 PM1/26/09

to

Hi,
while checking out Python 3, I read that all text strings are now
natively Unicode.
In the Python language reference (http://docs.python.org/3.0/reference/
lexical_analysis.html) I read that I can show Unicode character in
several ways.
"\uxxxx" supposedly allows me to specify the Unicode character by hex
number and the format "\N{name}" allows me to specify by Unicode
name.
Neither seem to work for me.
What am I doing wrong ?

Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):

Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u20ac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>
>>>
>>> print ("\N{EURO SIGN}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>

Martin

unread,

Jan 26, 2009, 1:36:33 PM1/26/09

to pytho...@python.org

Hmm this works for me,

it's a self compiled version:

~ $ python3
Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u20ac")

€

>>> print ("\N{EURO SIGN}")

€
>>>

2009/1/26 jefm <jef.mang...@gmail.com>:

> What am I doing wrong ?

"\N{EURO SIGN}".encode("ISO-8859-15") ## could be something but I'm
pretty sure I'm totally wrong on this

--
http://soup.alt.delete.co.at
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher

You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

jefm

unread,

Jan 26, 2009, 2:09:05 PM1/26/09

to

>Hmm this works for me,
>it's a self compiled version:
>~ $ python3
>Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
>[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2

You are running on Linux. Mine is on Windows.
Anyone else have this issue on Windows ?

Michael Torrie

unread,

Jan 26, 2009, 3:53:57 PM1/26/09

to pytho...@python.org

As Benjamin Kaplin said, Windows terminals use the old cp1252 character
set, which cannot display the euro sign. You'll either have to run it in
something more modern like the cygwin rxvt terminal, or output some
other way, such as through a GUI.

Terry Reedy

unread,

Jan 26, 2009, 3:58:41 PM1/26/09

to pytho...@python.org

jefm wrote:
> Hi,
> while checking out Python 3, I read that all text strings are now
> natively Unicode.

True

> In the Python language reference (http://docs.python.org/3.0/reference/
> lexical_analysis.html) I read that I can show Unicode character in
> several ways.
> "\uxxxx" supposedly allows me to specify the Unicode character by hex
> number and the format "\N{name}" allows me to specify by Unicode
> name.

These are ways to *specify* unicode chars on input.

> Neither seem to work for me.

If you separate text creation from text printing, you would see that
they do. Try
s='\u20ac'
print(s)

> What am I doing wrong ?

Using the interactive interpreter running in a Windows console.

> Please see error output below where I am trying to show the EURO sign
> (http://www.fileformat.info/info/unicode/char/20ac/index.htm):
>
> Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print('\u20ac')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "c:\python30\lib\io.py", line 1491, in write
> b = encoder.encode(s)
> File "c:\python30\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
> position 0: character maps to <undefined>

With the standard console, I get the same. But with IDLE, using the
same Python build but through a different interface

>>> s='\u20ac'
>>> len(s)
1
>>> str(s)
'€' # euro sign

I have fiddled with the shortcut to supposed make it work better as
claimed by posts found on the web, but to no avail. Very frustrating
since I have fonts on the system for at least all of the first 64K
chars. Scream at Microsoft or try to find or encourage a console
replacement that Python could use. In the meanwhile, use IDLE. Not
perfect for Unicode, but better.

Terry Jan Reedy

jefm

unread,

Jan 26, 2009, 4:26:56 PM1/26/09

to

>As Benjamin Kaplin said, Windows terminals use the old cp1252 character
>set, which cannot display the euro sign. You'll either have to run it in
> something more modern like the cygwin rxvt terminal, or output some
>other way, such as through a GUI.

>With the standard console, I get the same. But with IDLE, using the

>same Python build but through a different interface

>Scream at Microsoft or try to find or encourage a console

>replacement that Python could use. In the meanwhile, use IDLE. Not
>perfect for Unicode, but better.

So, if I understand it correctly, it should work as long as you run
your Python code on something that can actually print the Unicode
character.
Apparently, the Windows command line can not.

I mainly program command line tools to be used by Windows users. So I
guess I am screwed.

Other than converting my tools to have a graphic interface, is there
any other solution, other than give Bill Gates a call and bring his
command line up to the 21st century ?

Jean-Paul Calderone

unread,

Jan 26, 2009, 4:38:07 PM1/26/09

to jefm, pytho...@python.org

cp1252 can represent the euro sign (<http://en.wikipedia.org/wiki/Windows-1252>). Apparently the chcp command can be used to change the code page
active in the console (<http://technet.microsoft.com/en-us/library/bb490874.aspx>). I've never tried this myself, though.

Jean-Paul

jefm

unread,

Jan 26, 2009, 4:58:19 PM1/26/09

to

Now that I know the problem, I found the following on Google.

Windows uses codepages to display different character sets. (http://
en.wikipedia.org/wiki/Code_page)

The Windows chcp command allows you to change the character set from
the original 437 set.

When you type on the command line: chcp 65001
it sets your console in UTF-8 mode.
(http://en.wikipedia.org/wiki/Code_page_65001)

Unfortunately, it still doesn't do what I want. Instead of printing
the error message above, it prints nothing.

jefm

unread,

Jan 26, 2009, 5:03:20 PM1/26/09

to

chcp 1252 does allow me to print the EURO sign. Thanks for pointing
that out.
However, it does not show me some ALL Unicode characters. Very
frustrating.
I was hoping to find something that allows me to print any Unicode
character on the console.

John Machin

unread,

Jan 26, 2009, 5:38:50 PM1/26/09

to

On Jan 27, 8:38 am, Jean-Paul Calderone <exar...@divmod.com> wrote:

Short answer: it doesn't work.

Test [Windows XP SP3, Python 2.6.1]:

C:\junk>chcp
Active code page: 850

C:\junk>chcp 1252
Active code page: 1252

C:\junk>chcp
Active code page: 1252

C:\junk>\python26\python
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit

(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

>>> import sys; sys.stdout.encoding; sys.stderr.encoding
'cp1252'
'cp1252'

# So far, so good

>>> import unicodedata as ucd
>>> for b in range(128, 256):
... c = chr(b)
... u = c.decode('cp1252', 'replace')
... name = ucd.name(u)
... print hex(b), c, repr(u), name
...
0x80 € u'\u20ac' EURO SIGN
0x81 u'\ufffd' REPLACEMENT CHARACTER
0x82 ‚ u'\u201a' SINGLE LOW-9 QUOTATION MARK
[snip]
0xfb û u'\xfb' LATIN SMALL LETTER U WITH CIRCUMFLEX
0xfc ü u'\xfc' LATIN SMALL LETTER U WITH DIAERESIS
0xfd ý u'\xfd' LATIN SMALL LETTER Y WITH ACUTE
[snip]
Ignore what you are seeing in the second field of each above line; it
could well look OK. However what I see on the console is:
capital C with cedilla
small u with diaeresis (umlaut)
small e with acute
superscript one
superscript three
superscript two [yes, out of order]

IOW, the bridge might think it's in cp1252 mode, but nobody told the
engine room, which is still churning out cp850.

"Martin v. Löwis"

unread,

Jan 26, 2009, 5:42:55 PM1/26/09

to jefm

> I was hoping to find something that allows me to print any Unicode
> character on the console.

You will have to debug the Python interpreter to find out what's
going wrong in code page 65001. Nobody has ever resolved that mystery,
although it's been known for some time.

If you merely want to see *something* (and not actually the glyph
for the character (*)):

py> print(ascii('\u20ac'))
'\u20ac'

should work fine.

Regards,
Martin

(*) Windows doesn't support displaying *all* unicode characters even
in code page 65001, nor is it reasonable to expect it to. It can, at
best, only display those characters it has glyphs for in the font
that it is using. As Unicode constantly evolves, the fonts necessarily
get behind. Plus, in a fixed-size font, some characters just don't
render too well.

"Martin v. Löwis"

unread,

Jan 26, 2009, 6:00:52 PM1/26/09

to

> IOW, the bridge might think it's in cp1252 mode, but nobody told the
> engine room, which is still churning out cp850.

I think you must use a different font in the console, too, such as
Lucida Sans Unicode.

Regards,
Martin

John Machin

unread,

Jan 26, 2009, 6:39:16 PM1/26/09

to

True. I was just about to post that I'd stumbled across that!

John Machin

unread,

Jan 26, 2009, 6:53:02 PM1/26/09

to

On Jan 27, 9:42 am, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
> > I was hoping to find something that allows me to print any Unicode
> > character on the console.
>
> You will have to debug the Python interpreter to find out what's
> going wrong in code page 65001. Nobody has ever resolved that mystery,
> although it's been known for some time.

Maybe the problem is not in the Python interpreter. Running this tiny
C program

#include "stdio.h"
int main(int argc, char **argv) {
printf("<\xc2\x80>\n");
}

compiled with mingw32 (gcc (GCC) 3.4.5 (mingw-vista special r3))
and using "Lucida Console" font:

After CHCP 1252, this prints < A-circumflex Euro >, as expected.
After CHCP 65001, it prints < hollow-square >.

Perhaps you could try that with an MS C compiler [which I don't
have] ...

"Martin v. Löwis"

unread,

Jan 27, 2009, 2:17:01 AM1/27/09

to Benjamin Kaplan, pytho...@python.org

> Well, the first step would be to tell Python that there is a code page
> 65001. On Python 2.6, I get a LookupError for an unknown encoding after
> doing "chcp 65001". I checked the list of aliases in Python 3 and there
> was no entry for cp65001.

I see. What happens if you add it to encoding/aliases.py?

Regards,
Martin

Giampaolo Rodola'

unread,

Jan 27, 2009, 7:52:16 AM1/27/09

to

I have this same issue on Windows.
Note that on Python 2.6 it works:

Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit

(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.

>>> print unicode('\u20ac')
\u20ac

This is pretty serious, IMHO, since breaks any Windows software
priting unicode to stdout.
I've filed an issue on the Python bug tracker:
http://bugs.python.org/issue5081

--- Giampaolo
http://code.google.com/p/pyftpdlib/

Denis Kasak

unread,

Jan 27, 2009, 8:22:32 AM1/27/09

to Giampaolo Rodola', pytho...@python.org

On Tue, Jan 27, 2009 at 1:52 PM, Giampaolo Rodola' <gne...@gmail.com> wrote:
> I have this same issue on Windows.
> Note that on Python 2.6 it works:
>
> Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
> (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print unicode('\u20ac')
> \u20ac

Shouldn't this be

print unicode(u'\u20ac')

on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
(byte) string and escape the backslash. In Python 3.0 you don't need
to do this because all strings are "unicode" to start with. I suspect
you will see the same error with 2.6 on Windows once you correct this.

(note to Giampaolo: sorry, resending this because I accidentally
selected "reply" instead of "reply to all")

--
Denis Kasak

John Machin

unread,

Jan 27, 2009, 8:45:00 AM1/27/09

to

Hello hello -- (1) that's *not* attempting to print Unicode. Look at
your own output ... "\u20ac"" was printed, not a euro character!!!
With 2.X for *any* X:
>>> guff ='\u20ac'
>>> type(guff)
<type 'str'>
>>> len(guff)
6

(2) Printing Unicode to a Windows console has never *worked*; that's
why this thread was pursuing the faint ray of hope offered by cp65001.

Thorsten Kampe

unread,

Jan 27, 2009, 1:00:09 PM1/27/09

to

* Giampaolo Rodola' (Tue, 27 Jan 2009 04:52:16 -0800 (PST))

> I have this same issue on Windows.
> Note that on Python 2.6 it works:
>
> Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
> (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> print unicode('\u20ac')
> \u20ac
>
> This is pretty serious, IMHO, since breaks any Windows software
> priting unicode to stdout.
> I've filed an issue on the Python bug tracker:
> http://bugs.python.org/issue5081

For printing to stdout you have to give an encoding that the terminal
understands and that contains the character. In your case the terminal
says "I speak cp 850" but of course there is no Euro sign in there. Why
should that be a bug?

Thorsten

Thorsten Kampe

unread,

Jan 27, 2009, 1:08:20 PM1/27/09

to

* Denis Kasak (Tue, 27 Jan 2009 14:22:32 +0100)

> On Tue, Jan 27, 2009 at 1:52 PM, Giampaolo Rodola' <gne...@gmail.com>
> wrote:
> > I have this same issue on Windows.
> > Note that on Python 2.6 it works:
> >
> > Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
> > (Intel)] on
> > win32
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> print unicode('\u20ac')
> > \u20ac
>
> Shouldn't this be
>
> print unicode(u'\u20ac')

You are trying to create a Unicode object from a Unicode object. Doesn't
make any sense.

> on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
> (byte) string and escape the backslash.

You are confusing encoding and decoding. unicode(str) = str.decode. To
print it you have to encode it again to a character set that the
terminal understands and that contains the desired character.

Thorsten

Denis Kasak

unread,

Jan 27, 2009, 1:35:49 PM1/27/09

to Thorsten Kampe, pytho...@python.org

On Tue, Jan 27, 2009 at 7:08 PM, Thorsten Kampe
<thor...@thorstenkampe.de> wrote:
> * Denis Kasak (Tue, 27 Jan 2009 14:22:32 +0100)
>> On Tue, Jan 27, 2009 at 1:52 PM, Giampaolo Rodola' <gne...@gmail.com>
>> wrote:

<snip>

>> >>>> print unicode('\u20ac')
>> > \u20ac
>>
>> Shouldn't this be
>>
>> print unicode(u'\u20ac')
>
> You are trying to create a Unicode object from a Unicode object. Doesn't
> make any sense.

Of course it doesn't. :-)

Giampaolo's example was wrong because he was creating a str object
with a non-escaped backslash inside it (which automatically got
escaped) and then converting it to a unicode object. In other words,
he was doing:

print unicode('\\u20ac')

so the Unicode escape sequence didn't get interpreted the way he
intended it to. I then modified that by adding the extra 'u' but
forgot to delete the extraneous unicode().

> You are confusing encoding and decoding. unicode(str) = str.decode. To
> print it you have to encode it again to a character set that the
> terminal understands and that contains the desired character.

I agree (except for the first sentence :-) ). As I said, I simply
forgot to delete the call to the unicode builtin.

--
Denis Kasak

"Martin v. Löwis"

unread,

Jan 27, 2009, 1:56:03 PM1/27/09

to John Machin

> #include "stdio.h"
> int main(int argc, char **argv) {
> printf("<\xc2\x80>\n");
> }
>
> compiled with mingw32 (gcc (GCC) 3.4.5 (mingw-vista special r3))
> and using "Lucida Console" font:
>
> After CHCP 1252, this prints < A-circumflex Euro >, as expected.
> After CHCP 65001, it prints < hollow-square >.

This is not surprising: this character is U+0080, which is a control
character. Try \xe2\x82\xac instead.

Regards,
Martin

John Machin

unread,

Jan 27, 2009, 7:41:16 PM1/27/09

to

A slight improvement. Get this:

C:\junk\console>chcp 65001
Active code page: 65001

C:\junk\console>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit

(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

>>> import sys; sys.stdout.encoding
'cp65001'
>>> print u'\xff'

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

LookupError: unknown encoding: cp65001
>>> print u'\xff'.encode('utf8')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

IOError: [Errno 13] Permission denied
>>>

Adding an entry to ...\lib\encodings\aliases.py as suggested did fix
the Lookup error; it took it straight to the same IOError as above.

Next step?

John Machin

unread,

Jan 27, 2009, 9:03:55 PM1/27/09

to

Doh! I'm a nutter. That works. Thanks. The only font choice offered
apart from "Raster Fonts" in the Command Prompt window's Properties
box is "Lucida Console", not "Lucida Sans Unicode". It will let me
print Cyrillic characters from a C program, but not Chinese. I'm off
looking for how to get a better font.

Cheers,
John

Gabriel Genellina

unread,

Jan 27, 2009, 9:16:29 PM1/27/09

to pytho...@python.org

En Wed, 28 Jan 2009 00:03:55 -0200, John Machin <sjma...@lexicon.net>
escribió:

In this post, Raymond Chen explains all the conditions a font must met to
actually be usable in a console window:
http://blogs.msdn.com/oldnewthing/archive/2007/05/16/2659903.aspx
In short, most TrueType font's (even the fixed-width ones) aren't eligible.

--
Gabriel Genellina

"Martin v. Löwis"

unread,

Jan 28, 2009, 2:32:53 AM1/28/09

to John Machin

> Next step?

You need to use the Visual Studio debugger to find out where
precisely the IOError comes from.

Regards,
Martin

John Machin

unread,

Jan 28, 2009, 3:35:03 AM1/28/09

to pytho...@python.org, "Martin v. Löwis"

On 28/01/2009 6:32 PM, Martin v. Löwis wrote:
>> Next step?
>
> You need to use the Visual Studio debugger to find out where
> precisely the IOError comes from.

Big step. I don't have Visual Studio and have never used it before.
Which version of VS do I need to debug which released version of Python
2.X and where do I get that VS from? Or do I need to build Python from
source to be able to debug it?

Thorsten Kampe

unread,

Jan 28, 2009, 4:53:08 AM1/28/09

to

* John Machin (Tue, 27 Jan 2009 18:03:55 -0800 (PST))

I have
[HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
"00"="DejaVu Sans Mono"

Note that you have to /reboot/ (no, I'm not kidding) to make this work.

Thorsten

jefm

unread,

Jan 28, 2009, 11:36:19 AM1/28/09

to

this is alink explaining how to add new fonts to the command line
(e.g. Lucida Sans Unicode)
http://phatness.com/node/1643

"Martin v. Löwis"

unread,

Jan 28, 2009, 12:38:46 PM1/28/09

to John Machin, pytho...@python.org

> Big step. I don't have Visual Studio and have never used it before.
> Which version of VS do I need to debug which released version of Python
> 2.X and where do I get that VS from? Or do I need to build Python from
> source to be able to debug it?

You need Visual Studio 2008 (Professional, not sure about Standard), or
Visual C++ 2008 (which is free). You need to build Python from source in
debug mode; the released version is in release mode, and with no debug
information.

Regards,
Martin

Ross Ridge

unread,

Jan 28, 2009, 1:29:08 PM1/28/09

to

John Machin writes:
> The only font choice offered apart from "Raster Fonts" in the Command
> Prompt window's Properties box is "Lucida Console", not "Lucida Sans
> Unicode". It will let me print Cyrillic characters from a C program,
> but not Chinese. I'm off looking for how to get a better font.

Thorsten Kampe <thor...@thorstenkampe.de> wrote:
>I have
>[HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
>"00"="DejaVu Sans Mono"

As near as I can tell the DejaVu Sans Mono font doesn't include Chinese
characters. If you want to display Chinese characters in a console window
on Windows you'll probably have change the (global) system locale to an
appropriate Chinese locale and reboot.

Note that a complete Unicode console font is essentially an impossibility,
so don't bother looking for one. There are many characters in Unicode
that can't be reasonably mappped to a single fixed-width "console" glyph.
There are also characters in Unicode that should be represented as
single-width gylphs in Western contexts, but as double-width glyphs in
Far-Eastern contexts.

Ross Ridge

--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //