Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

unicode bit me

124 views
Skip to first unread message

anurag...@yahoo.com

unread,
May 8, 2009, 10:53:44 AM5/8/09
to
#how can I print a list of object which may return unicode
representation?
# -*- coding: utf-8 -*-

class A(object):

def __unicode__(self):
return u"©au"

__str__ = __repr__ = __unicode__

a = A()

try:
print a # doesn't work?
except UnicodeEncodeError,e:
print e
try:
print unicode(a) # works, ok fine, great
except UnicodeEncodeError,e:
print e
try:
print unicode([a]) # what!!!! doesn't work?
except UnicodeEncodeError,e:
print e
"""
Now how can I print a list of object which may return unicode
representation?
loop/map is not an option as it goes much deepr in my real code
any can anyoen explain what is happening here under the hood?
"""

Diez B. Roggisch

unread,
May 8, 2009, 11:08:46 AM5/8/09
to
anurag...@yahoo.com wrote:

> #how can I print a list of object which may return unicode
> representation?
> # -*- coding: utf-8 -*-
>
> class A(object):
>
> def __unicode__(self):
> return u"©au"
>
> __str__ = __repr__ = __unicode__

__str__ and __repr__ are supposed to return *byte*strings. Yet you return
unicode here.

Diez

Scott David Daniels

unread,
May 8, 2009, 11:47:11 AM5/8/09
to

<rant>It would be a bit easier if people would bother to mention
their Python version, as we regularly get questions from people
running 2.3, 2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers
with differing operating systems and versions such as: Windows 2000,
OS/X Leopard, ubuntu Hardy Heron, SuSE, ....

You might shocked to learn that a good answer often depends on the
particular situation above. Even though it is easy to say, for example:
platform.platform() returns 'Windows-XP-5.1.2600-SP3'
sys.version is
'2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)]'
</rant>

What is happening is that print is writing to sys.stdout, and
apparently that doesn't know how to send unicode to that destination.
If you are running under IDLE, print goes to the output window, and
if you are running from the command line, it is going elsewhere.
the encoding that is being used for output is sys.stdout.encoding.

--Scott David Daniels
Scott....@Acm.Org

Terry Reedy

unread,
May 8, 2009, 2:22:32 PM5/8/09
to pytho...@python.org
Scott David Daniels wrote:

> <rant>It would be a bit easier if people would bother to mention
> their Python version, as we regularly get questions from people
> running 2.3, 2.4, 2.5, 2.6, 2.7a, 3.0, and 3.1b. They run computers
> with differing operating systems and versions such as: Windows 2000,
> OS/X Leopard, ubuntu Hardy Heron, SuSE, ....

And if they copy and paste the actual error messages instead of saying
'It doesn't work'

J. Cliff Dyer

unread,
May 8, 2009, 3:04:55 PM5/8/09
to anurag...@yahoo.com, pytho...@python.org
On Fri, 2009-05-08 at 07:53 -0700, anurag...@yahoo.com wrote:
> #how can I print a list of object which may return unicode
> representation?
> # -*- coding: utf-8 -*-
>
> class A(object):
>
> def __unicode__(self):
> return u"©au"
>
> __str__ = __repr__ = __unicode__
>

Your __str__ and __repr__ methods don't return strings. You should
encode your unicode to the encoding you want before you try to print it.

class A(object):
def __unicode__(self):
return u"©au"

def get_utf8_repr(self):
return self.__unicode__().encode('utf-8')

def get_koi8_repr(self):
return self.__unicode__().encode('koi-8')

__str__ = __repr__ = self.get_utf8_repr

> a = A()
>
> try:
> print a # doesn't work?
> except UnicodeEncodeError,e:
> print e
> try:
> print unicode(a) # works, ok fine, great
> except UnicodeEncodeError,e:
> print e
> try:
> print unicode([a]) # what!!!! doesn't work?
> except UnicodeEncodeError,e:
> print e
> """
> Now how can I print a list of object which may return unicode
> representation?
> loop/map is not an option as it goes much deepr in my real code
> any can anyoen explain what is happening here under the hood?
> """

> --
> http://mail.python.org/mailman/listinfo/python-list
>

Piet van Oostrum

unread,
May 8, 2009, 5:22:26 PM5/8/09
to
>>>>> "J. Cliff Dyer" <j...@sdf.lonestar.org> (JCD) a �crit:

>JCD> On Fri, 2009-05-08 at 07:53 -0700, anurag...@yahoo.com wrote:
>>> #how can I print a list of object which may return unicode
>>> representation?
>>> # -*- coding: utf-8 -*-
>>>
>>> class A(object):
>>>
>>> def __unicode__(self):
>>> return u"�au"
>>>
>>> __str__ = __repr__ = __unicode__
>>>

>JCD> Your __str__ and __repr__ methods don't return strings. You should
>JCD> encode your unicode to the encoding you want before you try to print it.

>JCD> class A(object):
>JCD> def __unicode__(self):
>JCD> return u"�au"

>JCD> def get_utf8_repr(self):
>JCD> return self.__unicode__().encode('utf-8')

>JCD> def get_koi8_repr(self):
>JCD> return self.__unicode__().encode('koi-8')

>JCD> __str__ = __repr__ = self.get_utf8_repr

It might be nicer to have a method that specifies the encoding to be
used in order to make switching encodings easier:

*untested code*

class A(object):
def __unicode__(self):
return u"�au"

def set_encoding(self, encoding):
self._encoding = encoding

def __repr__(self):
return self.__unicode__().encode(self._encoding)

__str__ = __repr__

Of course this feels very wrong because the encoding should be chosen when
the string goes to the output channel, i.e. outside of the object.
Unfortunately this is one of the leftovers from Python's pre-unicode
heritage. Hopefully in Python3 this will work without problems. Anyway,
in Python 3 the string type is unicode, so at least __repr__ can return
unicode.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Steven D'Aprano

unread,
May 8, 2009, 8:47:24 PM5/8/09
to

"I tried to copy and paste the actual error message, but it doesn't
work..."


*grin*


--
Steven

anurag...@yahoo.com

unread,
May 9, 2009, 12:44:39 AM5/9/09
to
sorry for not being specfic and not given all info

"""
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
'Linux-2.6.24-19-generic-i686-with-debian-lenny-sid'
"""

My question has not much to do with stdout because I am able to print
unicode
so
print unicode(a) works
print unicode([a]) doesn't

without print too
s1 = u"%s"%a works
s2 = u"%s"%[a] doesn't
niether does s3 = u"%s"%unicode([a])
error is UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in
position 1: ordinal not in range(128)

so question is how can I use a list of object whose representation
contains unicode in another unicode string

I am now using __repr__ = unicode(self).encode("utf-8")
but it give error anyway

anurag...@yahoo.com

unread,
May 9, 2009, 3:04:46 AM5/9/09
to
also not sure why (python 2.5)
print a # works
print unicode(a) # works
print [a] # works
print unicode([a]) # doesn't works

Piet van Oostrum

unread,
May 9, 2009, 8:01:51 AM5/9/09
to
>>>>> "anurag...@yahoo.com" <anurag...@yahoo.com> (ac) a �crit:

>ac> also not sure why (python 2.5)
>ac> print a # works
>ac> print unicode(a) # works
>ac> print [a] # works
>ac> print unicode([a]) # doesn't works

Which code do you use now?

And what does this print?

import sys
print sys.stdout.encoding

J. Clifford Dyer

unread,
May 9, 2009, 11:26:53 AM5/9/09
to anurag...@yahoo.com, pytho...@python.org
You're still not asking questions in a way that we can answer them.

Define "Doesn't work." Define "a".

> --
> http://mail.python.org/mailman/listinfo/python-list
>

anurag...@yahoo.com

unread,
May 9, 2009, 11:37:59 AM5/9/09
to
Sorry being unclear again, hmm I am becoming an expert in it.

I pasted that code as continuation of my old code at start
i.e


class A(object):
def __unicode__(self):
return u"©au"

def __repr__(self):
return unicode(self).encode("utf-8")
__str__ = __repr__

doesn't work means throws unicode error
my question boils down to
what is diff between, why one doesn't throws error and another does
print unicode(a)
vs
print unicode([a])

Steven D'Aprano

unread,
May 9, 2009, 12:08:37 PM5/9/09
to
On Sat, 09 May 2009 08:37:59 -0700, anurag...@yahoo.com wrote:

> Sorry being unclear again, hmm I am becoming an expert in it.
>
> I pasted that code as continuation of my old code at start i.e
> class A(object):
> def __unicode__(self):
> return u"©au"
>
> def __repr__(self):
> return unicode(self).encode("utf-8")
> __str__ = __repr__
>
> doesn't work means throws unicode error my question

What unicode error?

Stop asking us to GUESS what the error is, and please copy and paste the
ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
for the people trying to help you. If you expect them to copy and paste
your code and run it just to answer the smallest questions, most of them
won't bother.


--
Steven

ru...@yahoo.com

unread,
May 9, 2009, 12:41:45 PM5/9/09
to
On May 9, 10:08 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:

Creua H Jiest!

It took me less then 45 seconds to open a terminal window, start
Python, and paste the OPs code to get:
>>> class A(object):
... def __unicode__(self):
... return u"©au"
... def __repr__(self):
... return unicode(self).encode("utf-8")
... __str__ = __repr__
...
>>> print unicode(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> a=A()
>>> print unicode(a)
©au
>>> print unicode([a])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)

Which is the same error he had already posted!

I am all for encouraging posters to provide a good description
but let's not be ridiculous.

Anecdote:
My sister always gives her dogs the table scraps after eating
dinner. One day when I ate there, I tossed the dogs a piece
of meat I hadn't eaten. "No", she cried! "You mustn't give
him anything without making him do a trick first! Otherwise
he'll forget that you are the boss!".

Scott David Daniels

unread,
May 9, 2009, 1:39:50 PM5/9/09
to
ru...@yahoo.com wrote:
> On May 9, 10:08 am, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sat, 09 May 2009 08:37:59 -0700, anuraguni...@yahoo.com wrote:
>>> Sorry being unclear again, hmm I am becoming an expert in it.
>>> I pasted that code as continuation of my old code at start i.e
>>> class A(object):
>>> def __unicode__(self):
>>> return u"�au"
>>> def __repr__(self):
>>> return unicode(self).encode("utf-8")
>>> __str__ = __repr__
>>> doesn't work means throws unicode error my question
>> What unicode error?
>>
>> Stop asking us to GUESS what the error is, and please copy and paste the
>> ENTIRE TRACEBACK that you get. When you ask for free help, make it easy
>> for the people trying to help you. If you expect them to copy and paste
>> your code and run it just to answer the smallest questions, most of them
>> won't bother.

> It took me less then 45 seconds to open a terminal window, start


> Python, and paste the OPs code to get:
>>>> class A(object):
> ... def __unicode__(self):
> ... return u"�au"
> ... def __repr__(self):
> ... return unicode(self).encode("utf-8")
> ... __str__ = __repr__
> ...
>>>> print unicode(a)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'a' is not defined
>>>> a=A()
>>>> print unicode(a)
> �au
>>>> print unicode([a])
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
>
> Which is the same error he had already posted!

It is _not_clear_ that is what was going on.
Your 45 seconds could have been his 45 seconds.
He was describing results rather than showing them.

From your demo, I get to:
unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'))
raises an exception (which it should).
unicode(u'\N{COPYRIGHT SIGN}au'.encode('utf-8'), 'utf-8')
Does _not_ raise an exception (as it should not).
Note that his __repr__ produces characters which are not ASCII.
So, str or repr of a list containing those elements will also
be non-ascii. To convert non-ASCII strings to unicode, you must
specify a character encoding.

The object a (created with A()) can be converted directly to
unicode (via its unicode method). No problem.
The object A() may have its repr taken, which is a (non-unicode)
string which is not ASCII. But you cannot take unicode(repr(a)),
because repr(a) contains a character > '\x7f'.
What he was trying to do was masking the issue. Imagine:

class B(object):
def __unicode__(self):
return u'one'
def __repr__(self):
return 'two'
def __str__(self):
return 'three'

b = B()
print b, unicode(b), [b]

By the way, pasting code with non-ASCII characters does not mean
your recipient will get the characters you pasted.

--Scott David Daniels
Scott....@Acm.Org

Mark Tolonen

unread,
May 9, 2009, 2:06:56 PM5/9/09
to pytho...@python.org
<anurag...@yahoo.com> wrote in message
news:994147fb-cdf3-4c55...@u9g2000pre.googlegroups.com...

That is still an incomplete example. Your results depend on your source
code's encoding and your system's stdout encoding. Assuming a=A(),
unicode(a) returns u'©au', but then is converted to stdout's encoding for
display. An encoding such as cp437 (U.S. Windows console) will fail. the
repr of [a] is a byte string in the encoding of your source file. The
unicode() function, given a byte string of unspecified encoding, uses the
ASCII codec. Assuming your source encoding was utf-8, unicode([a],'utf-8')
will correctly convert it to unicode, and then printing that unicode string
will attempt to convert it to stdout encoding. On a utf-8 console, it will
work, on a cp437 console it will not.

Here's a new one:

In PythonWin (from pywin32-313), stdout is utf-8, so:

>>> print '©' # this is a utf8 byte string
©
>>> '©' # view the utf8 bytes
'\xc2\xa9'
>>> u'©' # view the unicode character
u'\xa9'
>>> print '\xc2\xa9' # stdout is utf8, so it is understood
©
>>> print u'\xa9' # auto-converts to utf8.
©
>>> print unicode('\xc2\xa9') # encoding not given, defaults to ASCII.


Traceback (most recent call last):

File "<interactive input>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)
>>> print unicode('\xc2\xa9','utf8') # provide the encoding
©

This gives different results when the stdout encoding is different. Here's
a couple of the same instructions on my Windows console with cp437 encoding,
which doesn't support the copyright character:

>>> print '\xc2\xa9' # stdout is cp437
©
>>> print u'\xa9' # tries to convert to cp437


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xa9' in
position 0: character maps to <undefined>

Hope that helps your understanding,
Mark

Piet van Oostrum

unread,
May 9, 2009, 3:31:26 PM5/9/09
to
>>>>> "Mark Tolonen" <metolon...@gmail.com> (MT) wrote:

>MT> <anurag...@yahoo.com> wrote in message
>MT> news:994147fb-cdf3-4c55...@u9g2000pre.googlegroups.com...


>>> Sorry being unclear again, hmm I am becoming an expert in it.
>>>
>>> I pasted that code as continuation of my old code at start
>>> i.e
>>> class A(object):
>>> def __unicode__(self):
>>> return u"�au"
>>>
>>> def __repr__(self):
>>> return unicode(self).encode("utf-8")
>>> __str__ = __repr__
>>>
>>> doesn't work means throws unicode error
>>> my question boils down to
>>> what is diff between, why one doesn't throws error and another does
>>> print unicode(a)
>>> vs
>>> print unicode([a])

>MT> That is still an incomplete example. Your results depend on your source
>MT> code's encoding and your system's stdout encoding. Assuming a=A(),
>MT> unicode(a) returns u'�au', but then is converted to stdout's encoding for
>MT> display.

You are confusing the issue. It does not depend on the source code's
encoding (supposing that the encoding declaration in the source is
correct). repr returns unicode(self).encode("utf-8"), so it is utf-8
encoded even when the source code had a different encoding. The u"�au"
string is not dependent on the source encoding.

Mark Tolonen

unread,
May 9, 2009, 5:21:43 PM5/9/09
to pytho...@python.org

"Piet van Oostrum" <pi...@cs.uu.nl> wrote in message
news:m263gag...@cs.uu.nl...

Sorry about that. I'd forgotten that the OP'd forced __repr__ to utf-8.
You bring up a good point, though, that the encoding the file is actually
saved in and the encoding declaration in the source have to match. Many
people get that wrong as well.

-Mark


anurag...@yahoo.com

unread,
May 10, 2009, 12:19:36 AM5/10/09
to
First of all thanks everybody for putting time with my confusing post
and I apologize for not being clear after so many efforts.

here is my last try (you are free to ignore my request for free
advice)

# -*- coding: utf-8 -*-

class A(object):

def __unicode__(self):
return u"©au"

def __repr__(self):
return unicode(self).encode("utf-8")

__str__ = __repr__

a = A()
u1 = unicode(a)
u2 = unicode([a])

now I am not using print so that doesn't matter stdout can print
unicode or not
my naive question is line u2 = unicode([a]) throws


UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position

1: ordinal not in range(128)

shouldn't list class call unicode on its elements? I was expecting
that
so instead do i had to do this
u3 = "["+u",".join(map(unicode,[a]))+"]"

anurag...@yahoo.com

unread,
May 10, 2009, 12:21:22 AM5/10/09
to
and yes replace string by u'\N{COPYRIGHT SIGN}au'
as mentioned earlier non-ascii char may not come correct posted here.

On May 10, 9:19 am, "anuraguni...@yahoo.com" <anuraguni...@yahoo.com>
wrote:

Scott David Daniels

unread,
May 10, 2009, 2:19:21 AM5/10/09
to
anurag...@yahoo.com wrote:
> class A(object):
> def __unicode__(self):
> return u"�au"
> def __repr__(self):
> return unicode(self).encode("utf-8")
> __str__ = __repr__
> a = A()
> u1 = unicode(a)
> u2 = unicode([a])
>
> now I am not using print so that doesn't matter stdout can print
> unicode or not
> my naive question is line u2 = unicode([a]) throws
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
>
> shouldn't list class call unicode on its elements?
> I was expecting that so instead do i had to do this
> u3 = "["+u",".join(map(unicode,[a]))+"]"

Why would you expect that? str([a]) doesn't call str on its elements.
Using our simple expedient:
class B(object):
def __unicode__(self):
return u'unicode'
def __repr__(self):
return 'repr'
def __str__(self):
return 'str'
>>> unicode(B())
u'unicode'
>>> unicode([B()])
u'[repr]'
>>> str(B())
'str'
>>> str([B()])
'[repr]'

Now if you ask _why_ call repr on its elements,
the answer is, "so that the following is not deceptive:

>>> repr(["a, b", "c"])
"['a, b', 'c']"
which does not look like a 3-element list.

--Scott David Daniels
Scott....@Acm.Org

Piet van Oostrum

unread,
May 10, 2009, 2:29:12 AM5/10/09
to
>>>>> "anurag...@yahoo.com" <anurag...@yahoo.com> (ac) wrote:

>ac> and yes replace string by u'\N{COPYRIGHT SIGN}au'
>ac> as mentioned earlier non-ascii char may not come correct posted here.

That shouldn't be a problem for any decent new agent when there is a
proper charset declaration in the headers.

Peter Otten

unread,
May 10, 2009, 2:32:34 AM5/10/09
to
anurag...@yahoo.com wrote:

> First of all thanks everybody for putting time with my confusing post
> and I apologize for not being clear after so many efforts.
>
> here is my last try (you are free to ignore my request for free
> advice)

Finally! This is the first of your posts that makes sense to me ;)

> # -*- coding: utf-8 -*-
>
> class A(object):
>
> def __unicode__(self):
> return u"©au"
>
> def __repr__(self):
> return unicode(self).encode("utf-8")
>
> __str__ = __repr__
>
> a = A()
> u1 = unicode(a)
> u2 = unicode([a])
>
> now I am not using print so that doesn't matter stdout can print
> unicode or not
> my naive question is line u2 = unicode([a]) throws
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)

list doesn't have a __unicode__ method. unicode() therefore converts the
list to str as a fallback and then uses sys.getdefaultencoding() to convert
the result to unicode.

> shouldn't list class call unicode on its elements?

No, it calls repr() on its elements. This is done to avoid confusing output:

>>> items = ["a, b", "[c]"]
>>> items
['a, b', '[c]']
>>> "[%s]" % ", ".join(map(str, items))
'[a, b, [c]]'

> I was expecting that so instead do i had to do this
> u3 = "["+u",".join(map(unicode,[a]))+"]"

Peter

Nick Craig-Wood

unread,
May 10, 2009, 3:30:05 AM5/10/09
to
anurag...@yahoo.com <anurag...@yahoo.com> wrote:
> First of all thanks everybody for putting time with my confusing post
> and I apologize for not being clear after so many efforts.
>
> here is my last try (you are free to ignore my request for free
> advice)
>
> # -*- coding: utf-8 -*-
>
> class A(object):
>
> def __unicode__(self):
> return u"©au"
>
> def __repr__(self):
> return unicode(self).encode("utf-8")
>
> __str__ = __repr__
>
> a = A()
> u1 = unicode(a)
> u2 = unicode([a])
>
> now I am not using print so that doesn't matter stdout can print
> unicode or not
> my naive question is line u2 = unicode([a]) throws
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
> 1: ordinal not in range(128)
>
> shouldn't list class call unicode on its elements?

You mean when you call unicode(a_list) it should unicode() on each of
the elements to build the resultq?

Yes that does seem sensible, however list doesn't have a __unicode__
method at all so I guess it is falling back to using __str__ on each
element, and which explains your problem exactly.

If you try your example on python 3 then you don't need the
__unicode__ method at all (all strings are unicode) and you won't have
the problem I predict. (I haven't got a python 3 in front of me at the
moment to test.)

So I doubt you'll find the momentum to fix this since unicode and str
integration was the main focus of python 3, but you could report a
bug. If you attach a patch to fix it - so much the better!

Here is my demonstration of the problem with python 2.5.2

>> class A(object):
... def __unicode__(self):

... return u"\N{COPYRIGHT SIGN}au"


... def __repr__(self):
... return unicode(self).encode("utf-8")
... __str__ = __repr__
...

>>> a = A()
>>> str(a)
'\xc2\xa9au'
>>> repr(a)
'\xc2\xa9au'
>>> unicode(a)
u'\xa9au'
>>> L=[a]
>>> str(L)
'[\xc2\xa9au]'
>>> repr(L)
'[\xc2\xa9au]'
>>> unicode(L)


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)

>>> unicode('[\xc2\xa9au]')


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position
1: ordinal not in range(128)

>>> L.__unicode__


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

AttributeError: 'list' object has no attribute '__unicode__'
>>> unicode(str(L),"utf-8")
u'[\xa9au]'

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

anurag...@yahoo.com

unread,
May 10, 2009, 5:46:46 AM5/10/09
to
ok that explains it,
so
unicode(obj) calls __unicode__ on that object and if it isn't there
__repr__ is used
__repr__ of list by default return a str even if __repr__ of element
is unicode


so my only solution looks like to use my own list class everywhere i
use list
class mylist(list):
def __unicode__(self):
return u"["+u''.join(map(unicode,self))+u"]"

Diez B. Roggisch

unread,
May 10, 2009, 5:59:58 AM5/10/09
to
anurag...@yahoo.com schrieb:

Or you use a custom unicode_list-function whenever you care to print out
a list.

Diez

anurag...@yahoo.com

unread,
May 10, 2009, 12:04:03 PM5/10/09
to
yes but my list sometimes have list of lists

On May 10, 2:59 pm, "Diez B. Roggisch" <de...@nospam.web.de> wrote:
> anuraguni...@yahoo.com schrieb:

Terry Reedy

unread,
May 11, 2009, 1:47:06 AM5/11/09
to pytho...@python.org
anurag...@yahoo.com wrote:

> so unicode(obj) calls __unicode__ on that object

It will look for the existence of type(ob).__unicode__ ...

> and if it isn't there __repr__ is used

According to the below, type(ob).__str__ is tried first.

> __repr__ of list by default return a str even if __repr__ of element
> is unicode

From the fine library manual, built-in functions section:
(I reccommend using it, along with interactive experiments.)

"repr( object)
Return a string ..."

"str( [object])
Return a string ..."

"unicode( [object[, encoding [, errors]]])

Return the Unicode string version of object using one of the following
modes:

If encoding and/or errors are given, ...

If no optional parameters are given, unicode() will mimic the behaviour
of str() except that it returns Unicode strings instead of 8-bit
strings. More precisely, if object is a Unicode string or subclass it
will return that Unicode string without any additional decoding applied.

For objects which provide a __unicode__() method, it will call this
method without arguments to create a Unicode string. For all other
objects, the 8-bit string version or representation is requested and
then converted to a Unicode string using the codec for the default
encoding in 'strict' mode.
"

'unicode(somelist)' has no optional parameters, so skip to third
paragraph. Somelist is not a unicode instance, so skip to the last
paragraph. If you do dir(list) I presume you will *not* see
'__unicode__' listed. So skip to the last sentence.
unicode(somelist) == str(somelist).decode(default,'strict').

I do not believe str() and repr() are specifically documented for
builtin classes other than the general description, but you can figure
that str(collection) or repr(collection) will call str or repr on the
members of the collection in order to return a str, as the doc says.
(Details are available by experiment.) Str(uni_string) encodes with the
default encoding, which seems to be 'ascii' in 2.x. I am sure it uses
'strict' errors.

I would agree that str(some_unicode) could be better documented, like
unicode(some_str) is.

> so my only solution looks like to use my own list class everywhere i
> use list
> class mylist(list):
> def __unicode__(self):
> return u"["+u''.join(map(unicode,self))+u"]"

Or write a function and use that instead, or, if and when you can,
switch to 3.x where str and repr accept and produce unicode.

tjr

anurag...@yahoo.com

unread,
May 11, 2009, 8:14:18 AM5/11/09
to
On May 11, 10:47 am, Terry Reedy <tjre...@udel.edu> wrote:
Thanks for the explanation.

norseman

unread,
May 11, 2009, 1:16:19 PM5/11/09
to pytho...@python.org
==========================
In Linux get/use gpm and copy paste is simple.
In Microsoft see: Python-List file dated May 6, 2009 (05/06/2009) sent
by norseman.
0 new messages