Please see error output below where I am trying to show the EURO sign
(http://www.fileformat.info/info/unicode/char/20ac/index.htm):
Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u20ac')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>
>>>
>>> print ("\N{EURO SIGN}")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\python30\lib\io.py", line 1491, in write
b = encoder.encode(s)
File "c:\python30\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
position 0: character maps to <undefined>
it's a self compiled version:
~ $ python3
Python 3.0 (r30:67503, Dec 29 2008, 21:35:15)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\u20ac")
€
>>> print ("\N{EURO SIGN}")
€
>>>
2009/1/26 jefm <jef.mang...@gmail.com>:
> What am I doing wrong ?
"\N{EURO SIGN}".encode("ISO-8859-15") ## could be something but I'm
pretty sure I'm totally wrong on this
--
http://soup.alt.delete.co.at
http://www.xing.com/profile/Martin_Marcher
http://www.linkedin.com/in/martinmarcher
You are not free to read this message,
by doing so, you have violated my licence
and are required to urinate publicly. Thank you.
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
You are running on Linux. Mine is on Windows.
Anyone else have this issue on Windows ?
As Benjamin Kaplin said, Windows terminals use the old cp1252 character
set, which cannot display the euro sign. You'll either have to run it in
something more modern like the cygwin rxvt terminal, or output some
other way, such as through a GUI.
True
> In the Python language reference (http://docs.python.org/3.0/reference/
> lexical_analysis.html) I read that I can show Unicode character in
> several ways.
> "\uxxxx" supposedly allows me to specify the Unicode character by hex
> number and the format "\N{name}" allows me to specify by Unicode
> name.
These are ways to *specify* unicode chars on input.
> Neither seem to work for me.
If you separate text creation from text printing, you would see that
they do. Try
s='\u20ac'
print(s)
> What am I doing wrong ?
Using the interactive interpreter running in a Windows console.
> Please see error output below where I am trying to show the EURO sign
> (http://www.fileformat.info/info/unicode/char/20ac/index.htm):
>
> Python 3.0 (r30:67507, Dec 3 2008, 20:14:27) [MSC v.1500 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> print('\u20ac')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "c:\python30\lib\io.py", line 1491, in write
> b = encoder.encode(s)
> File "c:\python30\lib\encodings\cp437.py", line 19, in encode
> return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in
> position 0: character maps to <undefined>
With the standard console, I get the same. But with IDLE, using the
same Python build but through a different interface
>>> s='\u20ac'
>>> len(s)
1
>>> str(s)
'€' # euro sign
I have fiddled with the shortcut to supposed make it work better as
claimed by posts found on the web, but to no avail. Very frustrating
since I have fonts on the system for at least all of the first 64K
chars. Scream at Microsoft or try to find or encourage a console
replacement that Python could use. In the meanwhile, use IDLE. Not
perfect for Unicode, but better.
Terry Jan Reedy
>With the standard console, I get the same. But with IDLE, using the
>same Python build but through a different interface
>Scream at Microsoft or try to find or encourage a console
>replacement that Python could use. In the meanwhile, use IDLE. Not
>perfect for Unicode, but better.
So, if I understand it correctly, it should work as long as you run
your Python code on something that can actually print the Unicode
character.
Apparently, the Windows command line can not.
I mainly program command line tools to be used by Windows users. So I
guess I am screwed.
Other than converting my tools to have a graphic interface, is there
any other solution, other than give Bill Gates a call and bring his
command line up to the 21st century ?
cp1252 can represent the euro sign (<http://en.wikipedia.org/wiki/Windows-1252>). Apparently the chcp command can be used to change the code page
active in the console (<http://technet.microsoft.com/en-us/library/bb490874.aspx>). I've never tried this myself, though.
Jean-Paul
Windows uses codepages to display different character sets. (http://
en.wikipedia.org/wiki/Code_page)
The Windows chcp command allows you to change the character set from
the original 437 set.
When you type on the command line: chcp 65001
it sets your console in UTF-8 mode.
(http://en.wikipedia.org/wiki/Code_page_65001)
Unfortunately, it still doesn't do what I want. Instead of printing
the error message above, it prints nothing.
Short answer: it doesn't work.
Test [Windows XP SP3, Python 2.6.1]:
C:\junk>chcp
Active code page: 850
C:\junk>chcp 1252
Active code page: 1252
C:\junk>chcp
Active code page: 1252
C:\junk>\python26\python
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding; sys.stderr.encoding
'cp1252'
'cp1252'
# So far, so good
>>> import unicodedata as ucd
>>> for b in range(128, 256):
... c = chr(b)
... u = c.decode('cp1252', 'replace')
... name = ucd.name(u)
... print hex(b), c, repr(u), name
...
0x80 € u'\u20ac' EURO SIGN
0x81 u'\ufffd' REPLACEMENT CHARACTER
0x82 ‚ u'\u201a' SINGLE LOW-9 QUOTATION MARK
[snip]
0xfb û u'\xfb' LATIN SMALL LETTER U WITH CIRCUMFLEX
0xfc ü u'\xfc' LATIN SMALL LETTER U WITH DIAERESIS
0xfd ý u'\xfd' LATIN SMALL LETTER Y WITH ACUTE
[snip]
Ignore what you are seeing in the second field of each above line; it
could well look OK. However what I see on the console is:
capital C with cedilla
small u with diaeresis (umlaut)
small e with acute
superscript one
superscript three
superscript two [yes, out of order]
IOW, the bridge might think it's in cp1252 mode, but nobody told the
engine room, which is still churning out cp850.
You will have to debug the Python interpreter to find out what's
going wrong in code page 65001. Nobody has ever resolved that mystery,
although it's been known for some time.
If you merely want to see *something* (and not actually the glyph
for the character (*)):
py> print(ascii('\u20ac'))
'\u20ac'
should work fine.
Regards,
Martin
(*) Windows doesn't support displaying *all* unicode characters even
in code page 65001, nor is it reasonable to expect it to. It can, at
best, only display those characters it has glyphs for in the font
that it is using. As Unicode constantly evolves, the fonts necessarily
get behind. Plus, in a fixed-size font, some characters just don't
render too well.
I think you must use a different font in the console, too, such as
Lucida Sans Unicode.
Regards,
Martin
True. I was just about to post that I'd stumbled across that!
Maybe the problem is not in the Python interpreter. Running this tiny
C program
#include "stdio.h"
int main(int argc, char **argv) {
printf("<\xc2\x80>\n");
}
compiled with mingw32 (gcc (GCC) 3.4.5 (mingw-vista special r3))
and using "Lucida Console" font:
After CHCP 1252, this prints < A-circumflex Euro >, as expected.
After CHCP 65001, it prints < hollow-square >.
Perhaps you could try that with an MS C compiler [which I don't
have] ...
I see. What happens if you add it to encoding/aliases.py?
Regards,
Martin
I have this same issue on Windows.
Note that on Python 2.6 it works:
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print unicode('\u20ac')
\u20ac
This is pretty serious, IMHO, since breaks any Windows software
priting unicode to stdout.
I've filed an issue on the Python bug tracker:
http://bugs.python.org/issue5081
--- Giampaolo
http://code.google.com/p/pyftpdlib/
Shouldn't this be
print unicode(u'\u20ac')
on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
(byte) string and escape the backslash. In Python 3.0 you don't need
to do this because all strings are "unicode" to start with. I suspect
you will see the same error with 2.6 on Windows once you correct this.
(note to Giampaolo: sorry, resending this because I accidentally
selected "reply" instead of "reply to all")
--
Denis Kasak
Hello hello -- (1) that's *not* attempting to print Unicode. Look at
your own output ... "\u20ac"" was printed, not a euro character!!!
With 2.X for *any* X:
>>> guff ='\u20ac'
>>> type(guff)
<type 'str'>
>>> len(guff)
6
(2) Printing Unicode to a Windows console has never *worked*; that's
why this thread was pursuing the faint ray of hope offered by cp65001.
For printing to stdout you have to give an encoding that the terminal
understands and that contains the character. In your case the terminal
says "I speak cp 850" but of course there is no Euro sign in there. Why
should that be a bug?
Thorsten
You are trying to create a Unicode object from a Unicode object. Doesn't
make any sense.
> on 2.6? Without the 'u' prefix, 2.6 will just encode it as a normal
> (byte) string and escape the backslash.
You are confusing encoding and decoding. unicode(str) = str.decode. To
print it you have to encode it again to a character set that the
terminal understands and that contains the desired character.
Thorsten
<snip>
>> >>>> print unicode('\u20ac')
>> > \u20ac
>>
>> Shouldn't this be
>>
>> print unicode(u'\u20ac')
>
> You are trying to create a Unicode object from a Unicode object. Doesn't
> make any sense.
Of course it doesn't. :-)
Giampaolo's example was wrong because he was creating a str object
with a non-escaped backslash inside it (which automatically got
escaped) and then converting it to a unicode object. In other words,
he was doing:
print unicode('\\u20ac')
so the Unicode escape sequence didn't get interpreted the way he
intended it to. I then modified that by adding the extra 'u' but
forgot to delete the extraneous unicode().
> You are confusing encoding and decoding. unicode(str) = str.decode. To
> print it you have to encode it again to a character set that the
> terminal understands and that contains the desired character.
I agree (except for the first sentence :-) ). As I said, I simply
forgot to delete the call to the unicode builtin.
--
Denis Kasak
This is not surprising: this character is U+0080, which is a control
character. Try \xe2\x82\xac instead.
Regards,
Martin
A slight improvement. Get this:
C:\junk\console>chcp 65001
Active code page: 65001
C:\junk\console>python
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding
'cp65001'
>>> print u'\xff'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: cp65001
>>> print u'\xff'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IOError: [Errno 13] Permission denied
>>>
Adding an entry to ...\lib\encodings\aliases.py as suggested did fix
the Lookup error; it took it straight to the same IOError as above.
Next step?
Doh! I'm a nutter. That works. Thanks. The only font choice offered
apart from "Raster Fonts" in the Command Prompt window's Properties
box is "Lucida Console", not "Lucida Sans Unicode". It will let me
print Cyrillic characters from a C program, but not Chinese. I'm off
looking for how to get a better font.
Cheers,
John
In this post, Raymond Chen explains all the conditions a font must met to
actually be usable in a console window:
http://blogs.msdn.com/oldnewthing/archive/2007/05/16/2659903.aspx
In short, most TrueType font's (even the fixed-width ones) aren't eligible.
--
Gabriel Genellina
You need to use the Visual Studio debugger to find out where
precisely the IOError comes from.
Regards,
Martin
Big step. I don't have Visual Studio and have never used it before.
Which version of VS do I need to debug which released version of Python
2.X and where do I get that VS from? Or do I need to build Python from
source to be able to debug it?
I have
[HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
"00"="DejaVu Sans Mono"
Note that you have to /reboot/ (no, I'm not kidding) to make this work.
Thorsten
You need Visual Studio 2008 (Professional, not sure about Standard), or
Visual C++ 2008 (which is free). You need to build Python from source in
debug mode; the released version is in release mode, and with no debug
information.
Regards,
Martin
Thorsten Kampe <thor...@thorstenkampe.de> wrote:
>I have
>[HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont]
>"00"="DejaVu Sans Mono"
As near as I can tell the DejaVu Sans Mono font doesn't include Chinese
characters. If you want to display Chinese characters in a console window
on Windows you'll probably have change the (global) system locale to an
appropriate Chinese locale and reboot.
Note that a complete Unicode console font is essentially an impossibility,
so don't bother looking for one. There are many characters in Unicode
that can't be reasonably mappped to a single fixed-width "console" glyph.
There are also characters in Unicode that should be represented as
single-width gylphs in Western contexts, but as double-width glyphs in
Far-Eastern contexts.
Ross Ridge
--
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rri...@csclub.uwaterloo.ca
-()-/()/ http://www.csclub.uwaterloo.ca/~rridge/
db //