[Python-3000] Bytes to Unicode Conversion

1 view
Skip to first unread message

Pb2Au

unread,
Nov 15, 2008, 11:46:31 AM11/15/08
to pytho...@python.org
Hello,

I recently changed from Python 2.5 to Python 3.0 rc2, and have
been trying to find out how to convert byte strings (b"example")
to unicode strings ("example").  I noticed that some of these had
changed in the latest version.

One reason for a conversion between the two is the urllib.request.urlopen()
feature, which requires the string to be unicode rather than bytes, or else
you would receive an AttributeError error about 'bytes' object having no
attribute 'timeout'.  The read() attribute of the urllib.request.urlopen()
function returns a byte string, which means I can't parse for information
in the bytes string to use in a second
urllib.request.urlopen() function unless
it was to be converted to unicode first.

Am I simply overlooking something, or is there a built in function for
converting bytes to unicode?  It seems like a function could be created
pretty easily if it has already not, but there isn't much sense in
reinventing the wheel if the function is already there.

Thanks for your help.

Chris Rebert

unread,
Nov 16, 2008, 1:31:22 PM11/16/08
to Pb2Au, pytho...@python.org

Already exists. Has for quite a while now:

the_unicode = unicode(some_bytes, "name of encoding")

Cheers,
Chris
--
Follow the path of the Iguana...
http://rebertia.com

>
> Thanks for your help.
>
> _______________________________________________
> Python-3000 mailing list
> Pytho...@python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com
>
>
_______________________________________________
Python-3000 mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: http://mail.python.org/mailman/options/python-3000/python-3000-garchive-63646%40googlegroups.com

Pb2Au

unread,
Nov 16, 2008, 3:13:48 PM11/16/08
to Chris Rebert, pytho...@python.org
On Sun, Nov 16, 2008 at 4:31 PM, Chris Rebert <cvre...@gmail.com>: wrote:
>> Hello,
>>
>> I recently changed from Python 2.5 to Python 3.0 rc2, and have
>> been trying to find out how to convert byte strings (b"example")
>> to unicode strings ("example"). I noticed that some of these had
>> changed in the latest version.
>>
>> One reason for a conversion between the two is the urllib.request.urlopen()
>> feature, which requires the string to be unicode rather than bytes, or else
>> you would receive an AttributeError error about 'bytes' object having no
>> attribute 'timeout'. The read() attribute of the urllib.request.urlopen()
>> function returns a byte string, which means I can't parse for information
>> in the bytes string to use in a second urllib.request.urlopen() function
>> unless
>> it was to be converted to unicode first.
>>
>> Am I simply overlooking something, or is there a built in function for
>> converting bytes to unicode? It seems like a function could be created
>> pretty easily if it has already not, but there isn't much sense in
>> reinventing the wheel if the function is already there.
>>
>> Thanks for your help.

>

>Already exists. Has for quite a while now:
>
>the_unicode = unicode(some_bytes, "name of encoding")
>
>Cheers,
>Chris
>--
>Follow the path of the Iguana...
>http://rebertia.com

I know that it had worked in the version 2.5, Python 3.0 rc2 doesn't
seem to recognize it as a function.

Python 3.0rc2 (r30rc2:67141, Nov  7 2008, 11:43:46) [MSC v.1500 32 bit (Intel)]
on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'unicode' is not defined

"Martin v. Löwis"

unread,
Nov 16, 2008, 3:28:24 PM11/16/08
to Pb2Au, pytho...@python.org
> I know that it had worked in the version 2.5, Python 3.0 rc2 doesn't
> seem to recognize it as a function.

a) I discourage usage of unicode and str converters; consider using
.encode/.decode instead
b) unicode is now called str

Regards,
Martin

Chris Rebert

unread,
Nov 16, 2008, 3:20:45 PM11/16/08
to Pb2Au, pytho...@python.org
Ah, my bad. Should never have referred to the Python 2.6 docs. :)

Replace "unicode" with "str" in my line of code and I think it should work.

Cheers,
Chris

Nick Coghlan

unread,
Nov 16, 2008, 3:34:01 PM11/16/08
to Pb2Au, pytho...@python.org
Pb2Au wrote:
> On Sun, Nov 16, 2008 at 4:31 PM, Chris Rebert <cvre...@gmail.com>: wrote:
>>
>>Already exists. Has for quite a while now:
>>
>>the_unicode = unicode(some_bytes, "name of encoding")
>
> I know that it had worked in the version 2.5, Python 3.0 rc2 doesn't
> seem to recognize it as a function.
>
> Python 3.0rc2 (r30rc2:67141, Nov 7 2008, 11:43:46) [MSC v.1500 32 bit
> (Intel)]
> on win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> unicode()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'unicode' is not defined

unicode becomes str in Py3k (as "type('')" will tell you).

bytes.decode() works as well.

Use str.encode() to go the other way.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Pb2Au

unread,
Nov 16, 2008, 3:37:15 PM11/16/08
to pytho...@python.org
Ah, I understand now.  Thank you everyone for your help.
Reply all
Reply to author
Forward
0 new messages