Unicode issue on Windows cmd line

jeffg

unread,

Feb 11, 2009, 1:35:35 PM2/11/09

to

Having issue on Windows cmd.
> Python.exe
>>>a = u'\xf0'
>>>print a

This gives a unicode error.

Works fine in IDLE, PythonWin, and my Macbook but I need to run this
from a windows batch.

Character should look like this "ð".

Please help!

Albert Hopkins

unread,

Feb 11, 2009, 2:35:56 PM2/11/09

to pytho...@python.org

You forgot to paste the error.

jeffg

unread,

Feb 11, 2009, 2:50:32 PM2/11/09

to

The error looks like this:
File "<stdin", line 1, in <module>
File "C:\python25\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xf0' in
position 0
: character maps to <undefined>

Running Python 2.5.4 on Windows XP

"Martin v. Löwis"

unread,

Feb 11, 2009, 3:57:26 PM2/11/09

to

> Character should look like this "š".
>
> Please help!

Well, your terminal just cannot display this character by default; you
need to use a different terminal program, or reconfigure your terminal.

For example, do

chcp 1252

and select Lucida Console as the terminal font, then try again.

Of course, this will cause *different* characters to become
non-displayable.

Regards,
Martin

MRAB

unread,

Feb 11, 2009, 4:34:59 PM2/11/09

to pytho...@python.org

Benjamin Kaplan wrote:
[snip]
> Whoops. Didn't mean to hit send there. I was going to say, you can't
> have everything when Microsoft is only willing to break the programs
> that average people are going to use on a daily basis. I mean, why
> would they do something nice for the international community at the
> expense of breaking some 20 year old batch scripts? Those were the
> only things that still worked when Vista first came out.
>
I remember when I had to use MS-Access but it could be either of 2 versions.

The newer version couldn't open a database from the older version unless
I let it convert it first, after which point I wouldn't be able to open
it in the older version... :-(

jeffg

unread,

Feb 11, 2009, 5:10:50 PM2/11/09

to

On Feb 11, 3:57 pm, "Martin v. Löwis" <mar...@v.loewis.de> wrote:
> > Having issue on Windows cmd.
> >> Python.exe
> >>>> a = u'\xf0'
> >>>> print a
>
> > This gives a unicode error.
>
> > Works fine in IDLE, PythonWin, and my Macbook but I need to run this
> > from a windows batch.
>

> > Character should look like this "ð".

>
> > Please help!
>
> Well, your terminal just cannot display this character by default; you
> need to use a different terminal program, or reconfigure your terminal.
>
> For example, do
>
> chcp 1252
>
> and select Lucida Console as the terminal font, then try again.
>
> Of course, this will cause *different* characters to become
> non-displayable.
>
> Regards,
> Martin

Thanks, I ended up using encode('iso-8859-15', "replace")
Perhaps more up to date than cp1252...??

It still didn't print correctly, but it did write correctly, which was
my main problem.

"Martin v. Löwis"

unread,

Feb 11, 2009, 6:30:07 PM2/11/09

to jeffg

> Thanks, I ended up using encode('iso-8859-15', "replace")
> Perhaps more up to date than cp1252...??
>
> It still didn't print correctly, but it did write correctly, which was
> my main problem.

If you encode as iso-8859-15, but this is not what your terminal
expects, it certainly won't print correctly. To get correct printing,
the output encoding must be the same as the terminal encoding. If the
terminal encoding is not up to date (as you consider cp1252), then
the output encoding should not be up to date, either.

If you want a modern encoding that supports all of Unicode, and you
don't care whether the output is legible, use UTF-8.

Regards,
Martin

jeffg

unread,

Feb 11, 2009, 8:11:37 PM2/11/09

to

I did try UTF-8 but it produced the upper case character instead of
the proper lower case, so the output was incorrect for the unicode
supplied.
I think both 8859-15 and cp1252 produced the correct output, but I
figured 8859-15 would have additional character support (though not
sure this is the case - if it is not, please let me know and I'll use
1252). I'm dealing with large data sets and this just happend to be
one small example. I want to have the best ability to write future
unicode characters properly based on running from the windows command
line (unless there is a better way to do it on windows).

Gabriel Genellina

unread,

Feb 11, 2009, 10:00:45 PM2/11/09

to pytho...@python.org

En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <jeffg...@gmail.com> escribió:

> On Feb 11, 6:30 pm, "Martin v. Löwis" <mar..@v.loewis.de> wrote:

>> > Thanks, I ended up using encode('iso-8859-15', "replace")
>> > Perhaps more up to date than cp1252...??

>> If you encode as iso-8859-15, but this is not what your terminal
>> expects, it certainly won't print correctly. To get correct printing,
>> the output encoding must be the same as the terminal encoding. If the
>> terminal encoding is not up to date (as you consider cp1252), then
>> the output encoding should not be up to date, either.

> I did try UTF-8 but it produced the upper case character instead of
> the proper lower case, so the output was incorrect for the unicode
> supplied.
> I think both 8859-15 and cp1252 produced the correct output, but I
> figured 8859-15 would have additional character support (though not
> sure this is the case - if it is not, please let me know and I'll use
> 1252). I'm dealing with large data sets and this just happend to be
> one small example. I want to have the best ability to write future
> unicode characters properly based on running from the windows command
> line (unless there is a better way to do it on windows).

As Martin v. Löwis already said, the encoding used by Python when writing
to the console, must match the encoding the console expects. (And you also
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table
shows the differences (less than 30 printable characters):
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the
windows-1252 encoding as the "Ansi code page" (GUI applications), and
cp850 as the "OEM code page" (console applications) -- cp437 in the US
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de códigos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
œ
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de códigos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
œ
py> unichr(0x0153).encode("latin9")
'\xbd'

--
Gabriel Genellina

jeffg

unread,

Feb 11, 2009, 11:16:09 PM2/11/09

to

On Feb 11, 10:00 pm, "Gabriel Genellina" <gagsl-...@yahoo.com.ar>
wrote:
> En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <jeffgem...@gmail.com> escribió:

Thanks, switched it to windows-1252.