Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: how to get my character?

5 views
Skip to first unread message

Peter Otten

unread,
Jan 26, 2012, 8:05:30 AM1/26/12
to pytho...@python.org
contro opinion wrote:

> how can i get "你好" from '\xc4\xe3\xba\xc3' ?

>>> print '\xc4\xe3\xba\xc3'.decode("gbk")
你好

General rule: use the decode() method to convert from bytestring to unicode
and encode() to convert from unicode to bytestring.

bytestring.encode(x) will implicitly try
bytestring.decode("ascii").encode(x) which is likely to fail.

Lutz Horn

unread,
Jan 26, 2012, 8:03:14 AM1/26/12
to pytho...@python.org
Hi,

On Thu, 26 Jan 2012 20:52:48 +0800, contro opinion wrote:
> how can i get "你好" from 'xc4xe3xbaxc3' ?

Please share any results you get from
http://stackoverflow.com/questions/9018303/how-to-get-my-character with
python-list.

Lutz

Dave Angel

unread,
Jan 26, 2012, 8:17:49 AM1/26/12
to contro opinion, python-list
On 01/26/2012 07:52 AM, contro opinion wrote:
> my system:xp+python27 the codec, xp gbk;python 27 ascii
>
> a = '你好'
> a
> '\xc4\xe3\xba\xc3'
> print a
> 你好
> '\xc4\xe3\xba\xc3'.decode('gbk')
> u'\u4f60\u597d'
> '\xc4\xe3\xba\xc3'.encode('gbk')
> Traceback (most recent call last): File "", line 1, in UnicodeDecodeError:
> 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in
> range(128)
>
> how can i get "你好" from '\xc4\xe3\xba\xc3' ?
>
I don't have 'gbk' as my encoding. But on your system, if you simply
print it, you should get the proper characters.

Try:
a = '\xc4\xe3\xba\xc3'
print repr(a)
print a

And see if it now make sense. You're looking at the encoded form of the
two characters. You could decode it to the two-character unicode string,
as you showed above. But it makes no sense to try to encode something
that's already encoded.

--

DaveA

0 new messages