Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

__str__ vs. __repr__

82 views
Skip to first unread message

Randall Hopper

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to pytho...@python.org

1) Is there a convention for what __str__ and __repr__ should return for
classes?
2) Or, whatever they return, should they return the same value?
3) If so, why have both in the language?

Searching the archives yields an example with odd behavior that
suggests they should generally return the same value:

class c:
def __str__(self): return 'foo'
def __repr__(self): return 'bar'

>>> str(c())
'foo'

>>> str([c(), c(), c()])
'[bar, bar, bar]'

--
Randall Hopper
aa...@yahoo.com

Gerrit Holl

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to Randall Hopper
Randall Hopper wrote:
>
> 1) Is there a convention for what __str__ and __repr__ should return for
> classes?

__str__ should return a string, __repr__ may also return, for example, a
dictionairy.

> 2) Or, whatever they return, should they return the same value?

I don't know: I think it's wrong to use both.

regards,
Gerrit.
--
linuxgames.nl.linux.org All about games on Linux in Dutch (under construction).
www.nl.linux.org The Dutch resource for Dutch Linux documentation.
www.nl.linux.org/discoverb Learn foreign words and definitions.
www.nl.linux.org/~gerrit/asperger I'm an Asperger, go here for more info.
www.nl.linux.org/~gerrit About me and pictures of me.

Gordon McMillan

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to Randall Hopper, pytho...@python.org
Randall Hopper wrote:

> 1) Is there a convention for what __str__ and __repr__ should
> return for
> classes?

> 2) Or, whatever they return, should they return the same value?

> 3) If so, why have both in the language?

If possible, __repr__ should return something that when eval'd
yields an identical object. If that's not possible, it should be
the programmer's view of the object.

__str__ is free to return a highly munged user-friendly string.
str(obj) will use obj.__repr__ if obj.__str__ doesn't exist.

Consider this:

class HTMLTable:
blah blah blah

t = HTMLTable(...)
repr(t) # -> "MyModule.HTMLTable([...])"
str(t) # -> "<TABLE><TR>....."

(see IPWP for this idea carried to extreme lengths).

> Searching the archives yields an example with odd behavior
> that
> suggests they should generally return the same value:
>
> class c:
> def __str__(self): return 'foo'
> def __repr__(self): return 'bar'
>
> >>> str(c())
> 'foo'
>
> >>> str([c(), c(), c()])
> '[bar, bar, bar]'

Sick enough, but not convoluted enough, to be a Tim-ism...

- Gordon

David Gobbi

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to
Gerrit Holl <gerri...@pobox.com> wrote:
> Randall Hopper wrote:
>>
>> 1) Is there a convention for what __str__ and __repr__ should return for
>> classes?

> __str__ should return a string, __repr__ may also return, for example, a
> dictionairy.

The Python Language Reference Manual has a few lines on this subject,
specifically in Section 3.3.1.

__repr__ (self)
Called by the repr() built-in function and by string conversions
(reverse quotes) to compute the "official" string representation
of an object. This should normally look like a valid Python
expression that can be used to recreate an object with the same
value. By convention, objects which cannot be trivially converted
to strings which can be used to create a similar object produce a
string of the form "<...some useful description...>".

__str__ (self)
Called by the str() built-in function and by the print statement
to compute the "informal" string representation of an object.
This differs from __repr__() in that it does not have to be a valid
Python expression: a more convenient or concise representation may
be used instead.

In short: no they are not the same, yes you can define them both. If you
don't define __str__(), then str(object) will default to
repr(object)

- David


>> 2) Or, whatever they return, should they return the same value?

> I don't know: I think it's wrong to use both.

> regards,
> Gerrit.
> --
> linuxgames.nl.linux.org All about games on Linux in Dutch (under construction).
> www.nl.linux.org The Dutch resource for Dutch Linux documentation.
> www.nl.linux.org/discoverb Learn foreign words and definitions.
> www.nl.linux.org/~gerrit/asperger I'm an Asperger, go here for more info.
> www.nl.linux.org/~gerrit About me and pictures of me.

--
--David Gobbi, MSc dgo...@irus.rri.on.ca
Advanced Imaging Research Group
Robarts Research Institute, University of Western Ontario

Tino Wildenhain

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to Randall Hopper
Hallo Randell,

Randall Hopper wrote:
>
> 1) Is there a convention for what __str__ and __repr__ should return for
> classes?

> 2) Or, whatever they return, should they return the same value?

> 3) If so, why have both in the language?
>

> Searching the archives yields an example with odd behavior that
> suggests they should generally return the same value:

If you test, its oviously (I thin its mentioned somewhere in the docs
too) :

__str__ represents an human-readable version of the object as string for
use
with "print" or formatet string ("%s")
__repr__ should produce a form that can be cut & pasted to construct the
named object.

print "Hello"
Hello

"Hello"
'Hello'

see the difference?

best regards

Tino Wildenhain

Michael Hudson

unread,
Nov 1, 1999, 3:00:00 AM11/1/99
to
Gerrit Holl <gerri...@pobox.com> writes:

> Randall Hopper wrote:
> >
> > 1) Is there a convention for what __str__ and __repr__ should return for
> > classes?
>

> __str__ should return a string, __repr__ may also return, for example, a
> dictionairy.

Ehh? No, I don't think so:

>>> class Class:
... def __repr__(self):
... return {}
...
>>> Class()
Traceback (innermost last):
File "<stdin>", line 1, in ?
TypeError: repr not string

From the reference manual (which I could paraphrase, but there'd be no
real point):

__repr__ (self)
Called by the repr() built-in function and by string
conversions (reverse quotes) to compute the ``official'' string
representation of an object. This should normally look like a
valid Python expression that can be used to recreate an object
with the same value. By convention, objects which cannot be
trivially converted to strings which can be used to create a
similar object produce a string of the form "<...some useful
description...>".

__str__ (self)
Called by the str() built-in function and by the print
statement to compute the ``informal'' string representation of
an object. This differs from __repr__() in that it does not
have to be a valid Python expression: a more convenient or
concise representation may be used instead.

The idea is that

eval(repr(x)) == x

should be true as much as is possible, whereas

str(x)

should be fit for human consumption.

> > 2) Or, whatever they return, should they return the same value?

It's worth noting that the *only* type I know of that has a different
`str' to its `repr' is the string type:

>>> print str("hello")
hello
>>> print repr("hello")
'hello'

Regards,
Michael

Tim Peters

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to pytho...@python.org
[Randall Hopper gives an example]

> class c:
> def __str__(self): return 'foo'
> def __repr__(self): return 'bar'
>
> >>> str(c())
> 'foo'
>
> >>> str([c(), c(), c()])
> '[bar, bar, bar]'

[which Gordon McMillan explains, adding]


> Sick enough, but not convoluted enough, to be a Tim-ism...

It's too convoluted for my tastes! The real illness is that lists (and
dicts, and tuples) don't pass str-vs-repr'ness *down*. That is, even though
the example explicitly asks for str of a list, the list object asks for the
repr of its elements. That's convoluted: if (as happens to be true) str is
meant to give a friendly string, why do the builtin container types ask
their containees to produce unfriendly strings regardless? For that matter,
why is repr implied at an interactive prompt given a raw expression? Seems
that friendly strings would be more appropriate there. And why is there a
shorthand (`x`) for repr(x) but not for str(x)?

These are Great Mysteries debated even among the Ancient Ones. If the
scheme didn't already exist, I don't think we'd have much luck arguing for
its adoption <wink>.

3/4ths-of-one-idea-plus-1/4th-of-another-adds-up-to-2/3rds-ly y'rs - tim

Randall Hopper

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to Tim Peters
Tim Peters:

|[Randall Hopper gives an example]
|> class c:
|> def __str__(self): return 'foo'
|> def __repr__(self): return 'bar'
|>
|> >>> str(c())
|> 'foo'
|>
|> >>> str([c(), c(), c()])
|> '[bar, bar, bar]'
|
|[which Gordon McMillan explains, adding]
|> Sick enough, but not convoluted enough, to be a Tim-ism...
|
|The real illness is that lists (and dicts, and tuples) don't pass
|str-vs-repr'ness *down*. That is, even though the example explicitly
|asks for str of a list, the list object asks for the repr of its
|elements. That's convoluted: if (as happens to be true) str is meant to
|give a friendly string, why do the builtin container types ask their
|containees to produce unfriendly strings regardless?

Thanks. That's the source of my confusion. I wanted to follow the
language convention. But Python's internals currently don't follow it, so
the point is somewhat moot.

For consistency, would it make sense to change this for Python 1.5.3 (that
is, have sequence and dict types pass 'str-vs-repr'ness down)?

Thanks,

Randall

--
Randall Hopper
aa...@yahoo.com

Guido van Rossum

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to
Randall Hopper <aa...@yahoo.com> writes:

> Tim Peters:


> |The real illness is that lists (and dicts, and tuples) don't pass
> |str-vs-repr'ness *down*. That is, even though the example explicitly
> |asks for str of a list, the list object asks for the repr of its
> |elements. That's convoluted: if (as happens to be true) str is meant to
> |give a friendly string, why do the builtin container types ask their
> |containees to produce unfriendly strings regardless?
>
> Thanks. That's the source of my confusion. I wanted to follow the
> language convention. But Python's internals currently don't follow it, so
> the point is somewhat moot.

Actually, Python's internals *do* follow the convention. Dicts and
lists don't define __str__(), so str() for them defaults to
__repr__(), and then of course repr() is used for the items. It may
be confusing, it may not be what you want, but it *is* consistent ;-(

> For consistency, would it make sense to change this for Python 1.5.3 (that
> is, have sequence and dict types pass 'str-vs-repr'ness down)?

This has been asked a few times before (and usually it's been reported
as a bug, which it isn't -- see above). I happened to see this post
and it made me delve deep into my intuition trying to figure out why I
don't like propagating str() down container items.

Here's what I think it is. There's no reason why an object's str()
should be particularly suited to being included in list syntax. For
example, I could have a list containing the following items:

1 # an integer
'1' # a string (of an integer literal)
'2, 3' # a string containing a comma and a space
'], [' # a string containing list delimiters

Under the proposed rules, this list would print as:

[1, 1, 2, 3], []

I would find this confusing and I worry that it could be used to fool
the user.

--Guido van Rossum (home page: http://www.python.org/~guido/)

Donn Cave

unread,
Nov 2, 1999, 3:00:00 AM11/2/99
to
Quoth Randall Hopper <aa...@yahoo.com>:
...

| Thanks. That's the source of my confusion. I wanted to follow the
| language convention. But Python's internals currently don't follow it, so
| the point is somewhat moot.
|
| For consistency, would it make sense to change this for Python 1.5.3 (that
| is, have sequence and dict types pass 'str-vs-repr'ness down)?

I'd go back to an earlier followup on this and try to refine the
critical point, why __str__? The man said __str__ is intended to
be ``friendly''.

I think it might help to be a little more specific about whose friend
this string is. For me, anyway, __str__ is for an object whose data
model supports a string value. For example, if we toss a TypeError
exception value into a context where a string is expected, it might
act like a string 'illegal argument type for built-in operation'.
This isn't just an abridged representation. In a sense it doesn't
represent the object at all directly, but rather allows the object
to substitute a data value that the programmer expects will be useful,
based on some insight into the meaning of the object and its data
in some context. In this context, that string value is friendly, but
in any other context, it's deceptive and inappropriate. In the
distribution library, you'll find lots of __repr__ methods and
relatively few __str__ methods, I guess because it's relatively
unusual to find a string value that's really friendly in a broad
enough context for a library class.

Now there is no such accounting for the context in which sequences
and dicts will be used, they're used everywhere. The string value
of these objects should directly represent them as the kind of objects
they are, and I think it follows that in this context their contents
should be rendered in the same literal way - to substitute __str__
values instead would only compromise that representation.

Donn Cave, University Computing Services, University of Washington
do...@u.washington.edu

Tim Peters

unread,
Nov 3, 1999, 3:00:00 AM11/3/99
to Guido van Rossum, pytho...@python.org
[Randall Hopper wonders about str-vs-repr, Tim explains that lists ask
their elements to produce repr() even if the list itself was passed
to str()
]

[Guido]


> Actually, Python's internals *do* follow the convention. Dicts and
> lists don't define __str__(), so str() for them defaults to
> __repr__(), and then of course repr() is used for the items. It may
> be confusing, it may not be what you want, but it *is* consistent ;-(

Not to mention undocumented <wink>.

[Randall]


> For consistency, would it make sense to change this for Python 1.5.3 (that
> is, have sequence and dict types pass 'str-vs-repr'ness down)?

[Guido]


> This has been asked a few times before (and usually it's been reported
> as a bug, which it isn't -- see above). I happened to see this post
> and it made me delve deep into my intuition trying to figure out why I
> don't like propagating str() down container items.
>
> Here's what I think it is. There's no reason why an object's str()
> should be particularly suited to being included in list syntax.

This seems much more a consequence of the current design than an argument in
favor of it. That is, had Python been designed so that the builtin
container types did "pass down" str-vs-repr'ness, an object's str() would
have every reason to produce a string suited to being etc.

> For example, I could have a list containing the following items:
>
> 1 # an integer
> '1' # a string (of an integer literal)
> '2, 3' # a string containing a comma and a space
> '], [' # a string containing list delimiters
>
> Under the proposed rules, this list would print as:
>
> [1, 1, 2, 3], []
>
> I would find this confusing

Me too, but I find the current design more rigidly consistent than useful
(see below). In a world where containers passed str() down, a container's
str() would presumably be responsible for adding disambiguating delimeters
to element str() results when needed (the container knows its own output
syntax, and can examine the strings produced by its elements -- not rigidly
consistent, but useful <wink>).

> and I worry that it could be used to fool the user.

People can already define __repr__ to return anything whatsoever; the
reports of people getting fooled by this are conspicuous by absence <wink>.

Here's something wholly typical of what I dislike:

>>> from Rational import Rat, Format
>>> Rat.set_df(Format.Format(mode=Format.FIXED, prec=3, use_tag=0))
Format(mode=Format.MIXED, prec=8, base=10, round=Round.NEAREST_EVEN,
use_tag=1, use_letters=1)
>>> one_tenth = Rat.Rat(.1)
>>> one_tenth
Rat(3602879701896397L, 36028797018963968L)
>>> print one_tenth
0.100
>>>

That is, in interactive mode, I'm forever using "print" because the default
of applying repr() to raw expressions produces the output least useful in
interactive hacking (I don't care about reproducing the object exactly from
the string when typing a raw expression at the prompt! The mental ratio of
two giant integers isn't helpful here.).

Carry it one more step, and nothing simple suffices anymore:

>>> values = [one_tenth, one_tenth + 100]
>>> values
[Rat(3602879701896397L, 36028797018963968L),
Rat(3606482581598293197L, 36028797018963968L)]
>>> print values
[Rat(3602879701896397L, 36028797018963968L),
Rat(3606482581598293197L, 36028797018963968L)]
>>>

So I'm forever typing this instead:

>>> map(str, values)
['0.100', '100.100']
>>>

Throw a dict into it, and it's hopeless:

>>> recip = {one_tenth: 1/one_tenth, 1/one_tenth: one_tenth}
>>> print recip
{Rat(3602879701896397L, 36028797018963968L):
Rat(36028797018963968L, 3602879701896397L),
Rat(36028797018963968L, 3602879701896397L):
Rat(3602879701896397L, 36028797018963968L)}
>>>

Having gone thru the same business in dozens of classes over the years, I
find the current design simply unusable in interactive mode. For a while I
defined just __str__, bound __repr__ to that too, and added a .repr()
*method* for the unusual cases in which I really needed a faithful string.
But that frustrated other code that expected explict repr() calls, and/or
the `` notation, to produce the long-winded version. So that sucked too.

It's even an irritation sticking to builtin types; e.g., here assuming
Latin-1 comes across intact:

>>> names = ["François", "Tim"]
>>> print names[0]
François
>>> >>> print names
['Fran\347ois', 'Tim']
>>>

That isn't helpful either -- it's frustrating.

in-the-face-of-ambiguity-refuse-the-temptation-to-do-the-opposite-of-
what-the-user-asked-for<0.9-wink>-ly y'rs - tim

Guido van Rossum

unread,
Nov 3, 1999, 3:00:00 AM11/3/99
to Tim Peters
[Tim Peters, arguing against the current design of str() and repr()]

Hm... What kind of things would you expect e.g. the list str() to do
to its item str()s? Put backslashes before commas?

These are all good points.

In a typical scenario which I wanted to avoid, a user has a variable
containing the string '1' but mistakenly believes that it contains the
integer 1. (This happens a lot, e.g. it could be read from a file
containing numbers.) The user tries various numeric operations on the
variable and they all raise exceptions. The user is inexperienced and
doesn't understand what the exceptions are, but gets the idea to
display its value to see if something's wrong with it. One of the
first things users learn is to use interactive Python as a power
calculator, so my hypothetical user just types the name of the
variable. If this would use str() to format the value, the user is no
wiser, and perhaps more confused, since str('1') is the same as
str(1). So I designed Python's read-eval-print loop to use repr()
instead of str(): when the user tries to display the variable, it will
show the string quotes which are a pretty good hint that it's a
string. (Alternative scenario: the user has this problem and shows it
to a somewhat more experienced user, who displays the variable and
notes the problem from the output.)

Where have I gone wrong? It seems that you are suggesting that the
read-eval-print loop (a.k.a. the >>> prompt or the interactive
interpreter) should use str(), not repr(). This would solve your
first example; if str() for lists, tuples and dictionaries were to
apply str() to their items, your second and third example would also
be solved.

We can then argue over what str() of a list L should return; one
extreme possibility would be to return string.join(map(str, L)); a
slightly less radical solution would be '[' + string.join(map(str, L),
', ') + ']'. In the first case, your last example would go like this:

>>> names
François Tim
>>>

while the choice would give

>>> names
[François, Tim]
>>>

There may be other solutions -- e.g. in Tcl, a list is displayed as
the items separated by spaces, with the proviso that items containing
spaces are displayed inside (matching) curly braces; unmatched braces
are displayed using backslashes, guaranteeing that the output can be
parsed back into a list with the same value as the original. (Hey!
That's the same as Python's rule! So why does it work in Tcl?
Because variables only contain strings, and the equivalent of the Rat
class used above can't be coded in Tcl.)

The problem with displaying 'François' has been mentioned to me
before. (By the way, I have no idea how to *type* that. I'm just
cutting and pasting it from Tim's message.)

There's another scenario I was trying to avoid. This is probably
something that happened once too many times when I was young and
innocent, so I may be overracting. Consider the following:

>>> f = open("somefile")
>>> a = f.readline()
>>> print a
%âãÏÓ1.3

>>>

Now this example is relatively harmless. Using repr(), I see that the
string contains a \r character that caused the cursor to back up to
the start of the line, overwriting what was already written:

>>> a
'%PDF-1.3\015%\342\343\317\323\015\012'
>>>

But the thing that some big bad program did to me long ago was more
like spit out several thousand garbage bytes which contained enough
escape sequences to lock up my terminal requiring me to powercycle and
log in again. (The fact that the story refers to a terminal indicates
how long ago this was. :-)

So I vowed that *my* language would not (easily) let this happen by
accident, and the way I enforced that was by making sure that all
non-ASCII characters would be printed as octal escapes, unless you use
the print statement.

It's a separate dilemma from the other examples. My problem here is
that I hate to make assumptions about the character set in use. How
do I know that '\237' is unprintable but '\241' is a printable
character? How do I know that the latter is an upside-down
exclamation point? Should I really assume stdout is capable of
displaying Latin-1? Strictly, the str() function doesn't even know
that it's output is going to stdout. I suppose I could use isprint(),
and then François could use the locale module in his $PYTHONSTARTUP
file to make it do the right thing. Is that good enough? (I just
tried this briefly. It seems that somehow the locale module doesn't
affect this?!?!

I still think that ['a', 'b'] should be displayed like that, and not
like [a, b]. I'm not sure what that to do about a dict of Rat()
items, except perhaps giving up on the fiction that __repr__() should
return something expression-like for user-defined classes...

Any suggestions?

David Ascher

unread,
Nov 3, 1999, 3:00:00 AM11/3/99
to Guido van Rossum
On Wed, 3 Nov 1999, Guido van Rossum wrote:

> Any suggestions?

I certainly appreciate the justification for the current behavior, which I
didn't really understand either.

I have two suggestions which would make the current read-eval-print
behavior somewhat more palatable to me:

-- container types (lists, dicts, tuples) should have both repr and str
methods, even if they differ only in the methods of the contents that
they call. You can make the str methods use the repr-type display
(with brackets/parens/braces and commas), as that wouldn't break
anyone's expectations (since that's what is currently being used), and
would 'solve' the current inconsistency.

I understand str output is meant to be human readable -- that
doesn't mean that []'s and ,'s can't be used.

-- make the str of a long integer not have the postfix L. After all, if
the __str__ is supposed to be for human consumption, the L is no more
useful than the quotes around strings.

The issue of determining the printability of characters is one *I*
wouldn't touch with a 10-foot snake.

--david


Bernhard Herzog

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to
Guido van Rossum <gu...@CNRI.Reston.VA.US> writes:

> It's a separate dilemma from the other examples. My problem here is
> that I hate to make assumptions about the character set in use. How
> do I know that '\237' is unprintable but '\241' is a printable
> character? How do I know that the latter is an upside-down
> exclamation point? Should I really assume stdout is capable of
> displaying Latin-1? Strictly, the str() function doesn't even know
> that it's output is going to stdout. I suppose I could use isprint(),
> and then François could use the locale module in his $PYTHONSTARTUP
> file to make it do the right thing. Is that good enough? (I just
> tried this briefly. It seems that somehow the locale module doesn't
> affect this?!?!
>
> I still think that ['a', 'b'] should be displayed like that, and not
> like [a, b]. I'm not sure what that to do about a dict of Rat()
> items, except perhaps giving up on the fiction that __repr__() should
> return something expression-like for user-defined classes...
>
> Any suggestions?

How about a hook function a la __import__? I.e. have a function
__format__ [1] in __builtin__ that is bound to repr by default. Whenever
Python prints the result of an expression in interactive mode, it calls
__format__ with the result as parameter and expects it to return a
string ready for printing.

If somebody want str instead they can put

import __builtin__
__builtin__.__format__ == str

in their $PYTHONSTARTUP file.

This scheme even allows for more fancy formatting like not escaping
printable chars in latin1 if the user wants it.


[1] the name __format__ is perhaps a bit too generic, but something more
descriptive like __interactive_repr__ may be a bit awkward, but then
again you don't normally call it manually.

--
Bernhard Herzog | Sketch, a drawing program for Unix
her...@online.de | http://www.online.de/home/sketch/

Guido van Rossum

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to David Ascher
[David Ascher]

> I certainly appreciate the justification for the current behavior, which I
> didn't really understand either.
>
> I have two suggestions which would make the current read-eval-print
> behavior somewhat more palatable to me:
>
> -- container types (lists, dicts, tuples) should have both repr and str
> methods, even if they differ only in the methods of the contents that
> they call. You can make the str methods use the repr-type display
> (with brackets/parens/braces and commas), as that wouldn't break
> anyone's expectations (since that's what is currently being used), and
> would 'solve' the current inconsistency.
>
> I understand str output is meant to be human readable -- that
> doesn't mean that []'s and ,'s can't be used.

I'm reluctant but Tim's dictionary-of-rationals example suggests
you're right.

> -- make the str of a long integer not have the postfix L. After all, if
> the __str__ is supposed to be for human consumption, the L is no more
> useful than the quotes around strings.

Good one!

> The issue of determining the printability of characters is one *I*
> wouldn't touch with a 10-foot snake.

Unfortunately I will have to touch this... I cannot think of
something better than isprint(), which is designed for this. But the
weirdness is that if I write a tiny test program, the default (== C
locale) behavior is for isprint() to return false for anything over
\177; however in Python, it seems to always assume a Latin character
set. To be precise, the following non-ASCII characters are considered
printable: 'ФХШЩЪЯазийкмнорстфхцшщьэю'. Is there anything I may have
forgotten about isprint()? Using the Python locale module's
setlocale() function doesn't seem to make a difference, which is also
weird (Python doesn't call setlocale(), so it *should* default to the
C locale. (And yes, I'm including <ctype.h>.)

I have one more suggestion myself: add an "expert mode" where values
are displayed using str() instead of repr() by the read-eval-print
loop. I call it expert mode because it's only useful when (a) you are
manipulating custom types like Tim's rational number class, and (b)
you are aware of the ambiguities and risks of using str() (garbage
characters and multiple interpretations).

By the way, there's another issue here that fortunately hasn't muddled
the discussion. The C API allows an object to specify *three* ways of
formatting itself: str(), repr() and print-to-file. The print-to-file
call should faithfully mimic either str() or repr() depending on a
'mode' argument flag; however some 3rd party extensions (none in the
core distribution as far as I know) have a str() or repr()
implementation that deviates from print-to-file. These should be
fixed. Forget I even mentioned this in this thread, though!

Thomas Heller

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to python-list
> How about a hook function a la __import__? I.e. have a function
> __format__ [1] in __builtin__ that is bound to repr by default. Whenever
> Python prints the result of an expression in interactive mode, it calls
> __format__ with the result as parameter and expects it to return a
> string ready for printing.

But you have a custom python interpreter, don't you? Or how do you
manage to use your __format__ as the printer for interactive mode?

Thomas Heller


Bernhard Herzog

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to
"Thomas Heller" <thomas...@ion-tof.com> writes:

> > How about a hook function a la __import__? I.e. have a function
> > __format__ [1] in __builtin__ that is bound to repr by default. Whenever
> > Python prints the result of an expression in interactive mode, it calls
> > __format__ with the result as parameter and expects it to return a
> > string ready for printing.
>
> But you have a custom python interpreter, don't you?

No, I haven't.

> Or how do you
> manage to use your __format__ as the printer for interactive mode?

My phrasing might have been a bit misleading. I meant it as a suggestion
how to deal with the different user preferences when it comes to format
the result in interactive mode for printing. The python interpreter
would have to be changed if my suggestion were adopted.

Emile van Sebille

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to Tim Peters, Guido van Rossum
----- Original Message -----
From: Guido van Rossum <gu...@cnri.reston.va.us>
To: Tim Peters <tim...@email.msn.com>
Cc: <pytho...@python.org>
Sent: Wednesday, November 03, 1999 5:57 PM
Subject: Re: __str__ vs. __repr__
<snip>
> Any suggestions?
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)


It seems to me that issues arise when using the interactive interpreter.
I suspect that little useful work is done in this mode, and is used
predominately for quick testing while creating python programs.

Also, as entering a value without the print is a shortcut/convenience
used during testing, why not supply even more information? Supply the
class, type, lengths, indexes, nested, pretty-printed, etc. Use the
repr and str if it helps. Interactive mode is for learning, testing,
and experimenting. Let it be of as much help as possible.

I-type-print-so-fast-sometimes-it-comes-out-printprint-ly yr's


Emile van Sebille
em...@fenx.com
-------------------

Guido van Rossum

unread,
Nov 4, 1999, 3:00:00 AM11/4/99
to
Bernhard Herzog <her...@online.de> writes:

> How about a hook function a la __import__? I.e. have a function
> __format__ [1] in __builtin__ that is bound to repr by default. Whenever
> Python prints the result of an expression in interactive mode, it calls
> __format__ with the result as parameter and expects it to return a
> string ready for printing.

Brilliant! The default function could do all the current built-in
magic -- print nothing if it's None, assign it to builtin _, and so
on.

I wonder if it would have to have a __magic__ name? It could be
called display and it could be documented and usable in
non-interactive programs as well. Or am I getting carried away?
(Possibly the habit if assigning to _ and suppressing None would make
the default display() function a bit clumsy to use.)

Tim Peters

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Guido van Rossum
[amazingly enough, still sticking to what the subject line sez <ahem>!]

Let me back off to what repr and str "should do":

repr(obj) should return a string such that
eval(repr(obj)) == obj
I'd go further and formalize a property that's currently honored but
implicitly: the string returned by repr should contain only characters c
such that
c in "\t\n" or 32 <= ord(c) < 127
i.e. it should be restricted to the set of 7-bit ASCII characters that the C
std guarantees can be read back in faithfully in text mode. Together, those
properties spell out what repr() is "good for": a highly portable and
eval'able text representation that captures *everything* important about an
object's value.

In contrast, str() should return a string that's pleasant for people to
read. This may involve suppressing obscure or minor details, displaying it
in a form that's inefficient-- or impossible --to process, or even plain
lying. Examples include the Rat class I used before (the ratio of two huge
integers is exact but unhelpful), a DateTime class that stores times as
seconds from the epoch (ditto), or a Matrix class whose instances may
contain megabytes of information (and so a faithful repr() string may be
intolerable almost all the time).

Viewed that way, I'd collapse all of your (Guido's) characterizations of
what I'm after into one: repr should never be invoked implicitly. If you
want a faithful (but often unhelpful) string, use repr(obj) or `obj`
explicitly. Part of not invoking repr implicitly includes containers not
inventing repr() calls out of thin air <wink>.

David Ascher suggested that str(long) drop the trailing "L", and that's a
good example. The "L" is appropriate for repr(), for so long as Python
maintains such a sharp distinction between ints and longs, but is really of
no help to most users most of the time.

repr() currently cheats in another way: for purely aesthetic reasons,
repr(float) and repr(complex) don't generate enough digits to allow exact
reconstruction of their arguments (assuming high-quality float<->string in
the platform libc). This is a case where an argument appropriate to str()
was applied to repr(), not because it makes *sense* for repr, but because so
much output goes thru an *implicit* repr() now and you didn't want to see
all those "ugly" long float strings <0.50000000000000079 wink>.

repr(float) should do its job correctly here; str(float) should continue to
produce the "nice" (but numerically inadequate) strings. (BTW, on an
IEEE-754 platform repr(float) should generate up to 17 significant digits
(suppression of trailing zeroes is fine) -- the 754 std guarantees that if
conforming output generates that many, a conforming input routine will
exactly reconstruct the original value -- note that this does *not* require
best-possible conversion in either direction -- it turns out that 17 is
enough for somewhat sloppy conversions to succeed, and many platforms are up
to this easier task)

Onward. Or backward:

[Tim]


>> In a world where containers passed str() down, a container's
>> str() would presumably be responsible for adding disambiguating
>> delimeters to element str() results when needed (the container
>> knows its own output syntax, and can examine the strings produced

>> by its elements ...

[Guido]


> Hm... What kind of things would you expect e.g. the list str() to do
> to its item str()s? Put backslashes before commas?

This depends on how seriously you take all this <wink>.

If you take it very seriously, __str__ should take a second flag argument,
saying whether the consumer of the string will be embedding the string in a
larger context or displaying it directly. Then it's up to __str__ to put
its own brand of delimiters around its output when appropriate.

I suspect that's overkill, though -- that the problem here is very specific
to a string object's __str__ (containers will put matching brackets at each
end of their str() or repr() output regardless, numbers of various sort
don't need delimiters, and a user class __str__ generally produces a highly
stylized string -- seems that it's *only* string objects that have wildly
unpredictable display forms).

If true, it may work fine just to be simple and pragmatic: let containers
special-case the snot out of elements of string type, sticking a pair of
quotes around what string's __str__ returns and backslash-escaping any
embedded quotes of the same type.

Then

>>> names = ["François", "Tim", "Gu'ido"]
>>> names # same as str(names) in this make-believe world
['François', 'Tim', 'Gu\'ido']
or
['François', 'Tim', "Gu'ido"]

is what I'd expect.

I certainly don't want to get rid of the commas, colons, parens, braces and
brackets for list, tuple and dict str() output: they're very helpful! I
just want to be able to read what's *between* all that stuff.

> These are all good points.

Ya, well ... everything sounds kinda good before it's implemented <0.6
wink>.

> In a typical scenario which I wanted to avoid, a user has a variable
> containing the string '1' but mistakenly believes that it contains the
> integer 1. (This happens a lot, e.g. it could be read from a file
> containing numbers.) The user tries various numeric operations on the
> variable and they all raise exceptions. The user is inexperienced and
> doesn't understand what the exceptions are, but gets the idea to
> display its value to see if something's wrong with it. One of the
> first things users learn is to use interactive Python as a power
> calculator, so my hypothetical user just types the name of the
> variable. If this would use str() to format the value, the user is no
> wiser, and perhaps more confused, since str('1') is the same as
> str(1).

Understood and appreciated. Using the pragmatic approach above, I would
have PRINT_EXPR call the same routine that container str() implementations
call, i.e. one that special-cases the snot out of strings. Then

>>> x, y = 1, '1'
>>> x
1
>>> y
'1'
>>> print x
1
>>> print y
1
>>>

is what I'd expect.

> ...


> We can then argue over what str() of a list L should return; one
> extreme possibility would be to return string.join(map(str, L));

I definitely don't want to lose the brackets or the commas.

> a slightly less radical solution would be
> '[' + string.join(map(str, L), ', ') + ']'

Yes, but replacing str with str_special_casing_the_snot_out_of_strings.

> In the first case, your last example would go like this:
>
> >>> names
> François Tim
> >>>

Yuck!

> while the [second] choice would give
>
> >>> names
> [François, Tim]
> >>>

As above, ['François', 'Tim'] is my ideal.

> There may be other solutions -- e.g. in Tcl, a list is displayed as
> the items separated by spaces, with the proviso that items containing
> spaces are displayed inside (matching) curly braces; unmatched braces
> are displayed using backslashes, guaranteeing that the output can be
> parsed back into a list with the same value as the original. (Hey!
> That's the same as Python's rule! So why does it work in Tcl?
> Because variables only contain strings, and the equivalent of the Rat
> class used above can't be coded in Tcl.)

I'm explicitly giving up the property that str(obj) always be eval'able in a
sensible way. That's repr()'s job. For example, I expect

>>> x = "\\t"
>>> x
'\t'
>>>

and eval('\t') certainly isn't x. This is a tradeoff: we stop pretending
that str() is adequate for equality under evalability <wink>, but take EUE
seriously for repr(). In return, str() gets to do what it was meant to do:
produce "nice" strings, without compromise for the sake of faux
(inconsistent, unpredictable) EUE.

> The problem with displaying 'François' has been mentioned to me
> before. (By the way, I have no idea how to *type* that. I'm just
> cutting and pasting it from Tim's message.)

European keyboard, programmable keyboard, C-q something-or-other in Emacs,
Alt+0231 in Windows (keyboard generation of any 8-bit code is built in to
Windows keyboard drivers), or even copy+paste from Accessories->Character
Map under Windows. My employer spends a lot of time worrying about other
countries' foolish character sets <wink>.

> There's another scenario I was trying to avoid. This is probably
> something that happened once too many times when I was young and
> innocent, so I may be overracting. Consider the following:
>
> >>> f = open("somefile")
> >>> a = f.readline()
> >>> print a
> %âãÏÓ1.3
>
> >>>
>
> Now this example is relatively harmless. Using repr(), I see that the
> string contains a \r character that caused the cursor to back up to
> the start of the line, overwriting what was already written:
>
> >>> a
> '%PDF-1.3\015%\342\343\317\323\015\012'
> >>>
>
> But the thing that some big bad program did to me long ago was more
> like spit out several thousand garbage bytes which contained enough
> escape sequences to lock up my terminal requiring me to powercycle and
> log in again. (The fact that the story refers to a terminal indicates
> how long ago this was. :-)
>
> So I vowed that *my* language would not (easily) let this happen by
> accident, and the way I enforced that was by making sure that all
> non-ASCII characters would be printed as octal escapes, unless you use
> the print statement.

An irony is that I've locked up a DOS box doing this on many occasions -- in
a script, "print a" is not exactly a *hard* mistake to fall into <wink>.

I'm not unsympathetic, but this is something you really can't stop without
rendering Python useless to grownups. How does Python's Unicode story
relate to this? Displaying a Unicode string as a pile of \uxxxx (whatever)
escapes defeats the whole purpose of Unicode. Ditto displaying it in UTF-7.
OTOH, for some number of years to come it will be a rare display device that
*won't* treat Unicode 16-bit values (whether or not UTF-8 encoded) as a pile
of 8-bit control codes.

> It's a separate dilemma from the other examples. My problem here is
> that I hate to make assumptions about the character set in use. How
> do I know that '\237' is unprintable but '\241' is a printable
> character?

We have no idea.

> How do I know that the latter is an upside-down exclamation point?

Ditto.

> Should I really assume stdout is capable of displaying Latin-1?

No. But I don't grasp why you think you need to know *anything* about this.
Until Unicode takes over the world, there's nothing you can do other than
tell users the truth: "most of" printable 7-bit ASCII displays the same way
across the world, but outside of that *all* bets are off. It will vary by
all of OS, display device, displaying program, and user configuration
choices.

> Strictly, the str() function doesn't even know that it's output is
> going to stdout.

That's right, and C doesn't even guarantee that I can read François back in
from a text file! It's that bad. *So* bad that everyone lives with it and
never notices it <0.9 wink>.

> I suppose I could use isprint(), and then François could use the locale
> module in his $PYTHONSTARTUP file to make it do the right thing. Is that
> good enough?

I don't know what you're trying to *accomplish* here. isprint() is a piece
of crap, not least because what can or can't be displayed has much more to
do with the font you're currently using than with anything C knows about.
That is, it's outside C's areas of competence.

For example, here's the relatively new euro symbol: €. That's not in
Latin-1. Windows added it as an extension to Latin-1, at hex code 0x80 (one
of the 32 codes Latin-1 didn't assign glyphs to). In my mailer at the
moment, even on Windows, it shows up as a square box. That's because my
font at the moment happens to be Courier. If I switch my font to Courier
New, it shows up as intended -- but even then only because I happened to
install the service pack that added the euro symbol, and Courier New (unlike
Courier) is a "code page 1252" font. On any non-Windows system, God only
knows what you'll see. The point is that C's isprint(0x80) can't give a
reasonable answer even sticking solely to my machine! What I can or can't
display has, alas, nothing to do with C.

> (I just tried this briefly. It seems that somehow the locale module
> doesn't affect this?!?!

Locale is supposed to affect all of C's isxxx functions, but I wouldn't bet
that most implementations do it correctly. As above, even if they did,
what's actually displayable is a different issue.

> I still think that ['a', 'b'] should be displayed like that, and not
> like [a, b].

Me too! I don't want to make the currently-readable unreadable -- I want to
keep that and get the converse too.

> I'm not sure what that to do about a dict of Rat() items,

str(recip) (my example) produces {0.100: 1.000, 1.000: 0.100} (which follows
automatically from str(dict) "passing str() down", and that neither the keys
nor the values are themselves of string type); repr(recip) produces the
unreadable but precise and eval'able string that's produced today (which
follows automatically from repr(dict) passing repr() down).

> except perhaps giving up on the fiction that __repr__() should
> return something expression-like for user-defined classes...
>
> Any suggestions?

Yes: stop being so radical <wink>. That eval(repr(obj)) == obj is a
*wonderful* property of the language design! I don't want to lose that
either. My claim is that repr() is rarely appropriate, but is essential
when it is. This is very often very apparent in user-defined classes (at
least mine <wink -- but I seem to hit this every time!>). It was hard to
see at first because the builtin types other than string treated repr() and
str() as synonyms, and nobody bitched enough about str(long)'s trailing "L",
and I didn't bitch enough about repr(float)'s == str(float)'s systematic
inaccuracy, and Python's European groupies didn't bitch enough about
repr(euro_string) being an unreadable backslashed mess.

IOW, I don't want to get rid of the str/repr design! I want to take it
seriously and do all the obvious <wink> things that follow from that.

telling-someone-what-they-really-think-takes-many-words-ly y'rs - tim

Thomas Wouters

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to pytho...@python.org

On Fri, Nov 05, 1999 at 02:04:30AM -0500, Tim Peters wrote:

> and Python's European groupies didn't bitch enough about
> repr(euro_string) being an unreadable backslashed mess.

We're too busy bitching about the idiocy of people who expect to change
a country's (usually) ancient currency, habits and not to mention every
keyboard, font and accountancy software in the world in a matter of years.
Or perhaps we just dont _care_ about the euro symbol. Your Euro symbol, by
the way, Tim, ended up as '?' in my font -- but possibly because isprint()
returned 0 and my mail-reader therefor refused to print it ;)

Oh, and we bitch about why they had to rotate the Quake II logo 90 degrees,
too. I think acceptance of the logo would have been easier if it'd been
sitting up straight.

ISO-8859-15-aka-Latin-9'ly y'rs,
--
Thomas Wouters <tho...@xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

Mikael Olofsson

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Thomas Wouters

On 05-Nov-99 Thomas Wouters wrote:
> We're too busy bitching about the idiocy of people who expect to change
> a country's (usually) ancient currency, habits and not to mention every
> keyboard, font and accountancy software in the world in a matter of years.
> Or perhaps we just dont _care_ about the euro symbol. Your Euro symbol, by
> the way, Tim, ended up as '?' in my font -- but possibly because isprint()
> returned 0 and my mail-reader therefor refused to print it ;)

Actually, many European currecies are not more than approximately 100
years old, and so are many of the countries as well. That's not so
ancient to me. But it's true; many of us don't care.

/Mikael

-----------------------------------------------------------------------
E-Mail: Mikael Olofsson <mik...@isy.liu.se>
WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael
Phone: +46 - (0)13 - 28 1343
Telefax: +46 - (0)13 - 28 1339
Date: 05-Nov-99
Time: 11:26:19

This message was sent by XF-Mail.
-----------------------------------------------------------------------

Just van Rossum

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Tim Peters, Guido van Rossum
At 2:04 AM -0500 11/5/99, Tim Peters wrote:
>rendering Python useless to grownups. How does Python's Unicode story
>relate to this? Displaying a Unicode string as a pile of \uxxxx (whatever)
>escapes defeats the whole purpose of Unicode. Ditto displaying it in UTF-7.
>OTOH, for some number of years to come it will be a rare display device that
>*won't* treat Unicode 16-bit values (whether or not UTF-8 encoded) as a pile
>of 8-bit control codes.
[ ... ]

>Until Unicode takes over the world, there's nothing you can do other than
>tell users the truth: "most of" printable 7-bit ASCII displays the same way
>across the world, but outside of that *all* bets are off. It will vary by
>all of OS, display device, displaying program, and user configuration
>choices.

As a step towards a Unicode-savvy Python, would it be an idea to define
Unicode versions of repr(), str(), ob.__repr__() and ob.__str__()?

eg.
urepr()
ustr()
ob.__urepr__()
ob.__ustr__()

'print' could then use these if it knows stdout is a Unicode-savvy output
stream.

Just

M.-A. Lemburg

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Tim Peters
Tim Peters wrote:
>
> David Ascher suggested that str(long) drop the trailing "L", and that's a
> good example. The "L" is appropriate for repr(), for so long as Python
> maintains such a sharp distinction between ints and longs, but is really of
> no help to most users most of the time.

If you should really plan to do this, please add a new PyLong_AsString()
API, so that extensions can query the long int value using the
string format (probably the most portable way of passing long int
values from one implementation to another). A PyLong_FromString() API
would be a nice complement to PyLong_AsString()... it could use the
code from the builtin long() as engine.

I'm currently using PyObject_Str() on Python long integers to
have converted to strings and chop off the trailing 'L' "by hand".
Not really very elegant, but it works :-)

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 56 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/

Bernhard Herzog

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to
Guido van Rossum <gu...@cnri.reston.va.us> writes:

> Bernhard Herzog <her...@online.de> writes:
>
> > How about a hook function a la __import__? I.e. have a function
> > __format__ [1] in __builtin__ that is bound to repr by default. Whenever
> > Python prints the result of an expression in interactive mode, it calls
> > __format__ with the result as parameter and expects it to return a
> > string ready for printing.
>
> Brilliant!

Glad you like it!

> The default function could do all the current built-in
> magic -- print nothing if it's None, assign it to builtin _, and so
> on.

It seems to me that assigning to _ and printing the result are two
separate things. IMO, the hook function should only do the formatting,
the assignment could still be hardwired in the interpreter loop. Pseudo
code:

while 1:
statement = read_statement()
if not statement:
break
result = run_statement(statement)
__builtin__._ = result
if result is not None:
print __format__(result)


> I wonder if it would have to have a __magic__ name? It could be
> called display and it could be documented and usable in
> non-interactive programs as well. Or am I getting carried away?
> (Possibly the habit if assigning to _ and suppressing None would make
> the default display() function a bit clumsy to use.)

Well, I chose a magic name because it's called in special circumstances.
Calling it display is fine with me, especially if it really does only
formatting and no assignment to _ and no special-casing of None, as is
assumed in my pseudo code above. Such a version of display would also be
usable in non-interactive programs.

One could take this even further by moving the entire interactive mode
to a python module. Most of the python code for that is already
available in the code.py module, IIRC.

Fred L. Drake, Jr.

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Just van Rossum

Just van Rossum writes:
> As a step towards a Unicode-savvy Python, would it be an idea to define
> Unicode versions of repr(), str(), ob.__repr__() and ob.__str__()?

Perhaps it would be more useful for file objects to offer an
attribute or query method that allowed client code to determine if the
stream supports unicode?


-Fred

--
Fred L. Drake, Jr. <fdr...@acm.org>
Corporation for National Research Initiatives

Donn Cave

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to
Quoth "Tim Peters" <tim...@email.msn.com>:

| Let me back off to what repr and str "should do":
|
| repr(obj) should return a string such that
| eval(repr(obj)) == obj
| I'd go further and formalize a property that's currently honored but
| implicitly: the string returned by repr should contain only characters c
| such that
| c in "\t\n" or 32 <= ord(c) < 127
| i.e. it should be restricted to the set of 7-bit ASCII characters that the C
| std guarantees can be read back in faithfully in text mode. Together, those
| properties spell out what repr() is "good for": a highly portable and
| eval'able text representation that captures *everything* important about an
| object's value.

For real? I'm very used to seeing repr's that don't eval at all -

'<socket object, fd=3, family=2, type=1, protocol=0>'
"<open file '<stdout>', mode 'w' at 80005660>"
'<exceptions.NameError instance at 80048ae0>'

Now it's fine with me if the author does choose to implement repr that
way, it's just hard to imagine it could be meaningfully implemented by
all objects, or that there's any purpose in trying.

Well, really I guess I would prefer that authors _not_ try to capture
*everything* important, if that means a large data dump. repr is used
in lots of places, like the interactive interpreter, where the output
only has to be roughly indicative, and I have never seen and can't find
any example of its use with eval or in any context where it has to be
definitive.

If there are folks who want to provide this functionality at the expense
of making a string that's unusable for repr's normal applications,
they should put it in a different function.

| In contrast, str() should return a string that's pleasant for people to
| read. This may involve suppressing obscure or minor details, displaying it
| in a form that's inefficient-- or impossible --to process, or even plain
| lying. Examples include the Rat class I used before (the ratio of two huge
| integers is exact but unhelpful), a DateTime class that stores times as
| seconds from the epoch (ditto), or a Matrix class whose instances may
| contain megabytes of information (and so a faithful repr() string may be
| intolerable almost all the time).

Bah. If we get str() because it's pleasant to read, aren't we worried
that repr() will be harsh and stressful? Will one repr() too many send
one of us into a homicidal rage one of these days?

Let us think about taming the cruel repr()! If I may speculate on what
you're trying to say, str() is pleasant because it shows the data in
form that has been digested for human consumption. In contrast to
repr(), which is intended for the machine.

But as I don't agree with with the latter, I don't have to agree with
the former either. On the contrary, it's str() that's intended for the
machine, and repr() for the human. Consider rfc822, one of the three
Python modules in the standard distribution that actually implements
__str__:

>>> fp = open('mimetools', 'r')
>>> m = rfc822.Message(fp)
>>> str(m)
'Path: news.u.washington.edu!logbridge.uoregon.edu!news.mind.net!not-for
-mail\012From: drag...@integral.org (The Dragon De Monsyne)\012Newsgrou
ps: comp.lang.python\012Subject: RFC complient mimetools...\012Date: 29
Sep 1999 13:27:03 GMT\012Organization: InfoStructure - Ashland, OR\012Li
nes: 443\012Sender: drag...@polar.integral.org\012Message-ID: <7st437$r
9s$2...@utah.laird.net>\012NNTP-Posting-Host: ip202.mind.net\012Mime-Versio
n: 1.0\012Content-Type: text/plain; charset=us-ascii\012X-Newsreader: kn
ews 1.0b.0\012Xref: news.u.washington.edu comp.lang.python:68258\012'

Pleasant, eh? [wink, wink, nudge, nudge] I wrapped the lines, in case
the original line length of 529 might hurt someone's software.

So look at the way it's really used. repr is for us, str is for the
machine. Now for sure, there are quite a number of objects out there
out there that don't repr as readable as they could, but I think very
few of them are that way because they didn't get a chance to use their
str instead.

Gordon McMillan

unread,
Nov 5, 1999, 3:00:00 AM11/5/99
to Donn Cave, pytho...@python.org
Donn Cave writes:

> Quoth "Tim Peters" <tim...@email.msn.com>:
>
> | Let me back off to what repr and str "should do":
> |
> | repr(obj) should return a string such that
> | eval(repr(obj)) == obj

> For real? I'm very used to seeing repr's that don't eval at all


> -
>
> '<socket object, fd=3, family=2, type=1, protocol=0>'
> "<open file '<stdout>', mode 'w' at 80005660>"
> '<exceptions.NameError instance at 80048ae0>'

import StringIO
x = ['a', 1, 'b', {'x':(4,5L),'z':None}]
f = StringIO.StringIO()
f.write(repr(x))
y = eval(f.getvalue())
if y == x:
print "You betcha"

I've even been known to implement __repr__ in classes and C
extension types so this property holds true, and I've been
bitten by the fact that UserList just forwards repr to self.data.

- Gordon

Tim Peters

unread,
Nov 6, 1999, 3:00:00 AM11/6/99
to pytho...@python.org
[Tim]

>> repr(obj) should return a string such that
>> eval(repr(obj)) == obj

[Donn Cave]
> For real?

This, along with the later claim about what str() "should do", are
paraphrased from the Language Reference Manual (3.3.1 Basic customization).
It's "why" e.g. the equation above holds for repr(string) but not for
str(string). The builtin types took this stuff as seriously as they could,
modulo some unfortunate lapses (like repr(float)) covered elsewhere in this
thread.

> I'm very used to seeing repr's that don't eval at all -
>
> '<socket object, fd=3, family=2, type=1, protocol=0>'
> "<open file '<stdout>', mode 'w' at 80005660>"
> '<exceptions.NameError instance at 80048ae0>'

Are you for real <0.9 wink>? Be reasonable.

> ...


> Well, really I guess I would prefer that authors _not_ try to capture
> *everything* important, if that means a large data dump. repr is used
> in lots of places, like the interactive interpreter, where the output
> only has to be roughly indicative,

The latter is str()'s purpose (see 3.3.1 again), and the thread was arguing
(among other things) that the interpreter use it (instead of repr())
interactively.

> and I have never seen and can't find any example of its use with
> eval or in any context where it has to be definitive.

I've seen it used often as a poor-man's pickle, and as a way to email Python
objects as plain text. repr() is very useful for that. Unfortunately, most
class authors seem to ignore the distinction (my classes don't!), and that
limits its usefulness. The lapses in the core also hurt. One email
response I didn't see posted here describes a typical example of the latter:

<ANONYMOUS>

I've only been writing Python for a year or so, but bumped into this 'cheat'
and had to write floats out to a file using binary format, a portability
nightmare. The floats were in an array, and you can't pickle arrays.

>>> repr(12345678901234567.0)
'1.23456789012e+016' # Just 12 digits!

is currently what you get.

>>> x = 12345678901234567.0
>>> x == eval(repr(x)) # Nope.
0

It sure would be nice to be able to count on
eval(repr(obj)) == obj.

On the other hand, while repr(1L) has to return '1L', there's no reason
str(1L) shouldn't be '1'.

</ANONYMOUS>

OTOH,

>>> x == eval("%24.16e" % x)
1
>>>

on any IEEE-754 conforming platform, as was covered in detail earlier.
That's what repr(float) should do.

> If there are folks who want to provide this functionality at the expense
> of making a string that's unusable for repr's normal applications,
> they should put it in a different function.

The language took its stand on this issue in 1991 <wink>, and you're arguing
against *that*. I'm arguing for the language to finish what it started.

> ...


> Bah. If we get str() because it's pleasant to read, aren't we worried
> that repr() will be harsh and stressful? Will one repr() too many send
> one of us into a homicidal rage one of these days?

I said before that repr() is rarely appropriate -- but essential when it is.

> Let us think about taming the cruel repr()! If I may speculate on what
> you're trying to say, str() is pleasant because it shows the data in
> form that has been digested for human consumption. In contrast to
> repr(), which is intended for the machine.

Ah -- you *have* read section 3.3.1 <wink>.

> ...


> So look at the way it's really used.

Inconsistently, confusingly, and inappropriately, in both classes and (but
substantially less so) in the core types. We can't deduce anything useful
about what should happen now based on what is a mess in practice. My aim is
to get the mess cleaned up.

and-thanks-for-your-support<wink>-ly y'rs - tim

Thomas A. Bryan

unread,
Nov 6, 1999, 3:00:00 AM11/6/99
to
Donn Cave wrote:

> repr is used
> in lots of places, like the interactive interpreter,

I think that was Tim's point. Earlier he said:
"""For that matter, why is repr implied at an interactive prompt given
a raw expression? Seems that friendly strings would be more
appropriate there."""

He wants str to be used at the interpreter since it's supposed to be
the human readable form. One of Guido's complaints was that then
1 and '1' then look the same.

> On the contrary, it's str() that's intended for the machine, and
> repr() for the human.

From the library reference:
"""repr (object)
Return a string containing a printable representation of an object.
This is the same value yielded by conversions (reverse quotes). It
is sometimes useful to be able to access this operation as an
ordinary function. For many types, this function makes an attempt
to return a string that would yield an object with the same value
when passed to eval().
str (object)
Return a string containing a nicely printable representation
of an object. For strings, this returns the string itself.
The difference with repr(object) is that str(object) does not
always attempt to return a string that is acceptable to eval();
its goal is to return a printable string. """

The problem is that this thread is the first time many of us have
ever thought about the problem. I've never noticed those two lines
in the library reference, and I haven't seen the str/repr distinction
clearly made anywhere else. Furthermore, since the interactive i
nterpreter *does* use repr(), many of us make __repr__ return the
human readable form so that we can easily test our classes.

So, what I'm hearing is:
1. __str__ *should* be a human readable string
2. __repr__ *should* be a representation that can be eval'd back to the
original object
3. This convention is not widely known, but
3b. the interactive interpreter discourages the convention by using
repr to show the result of an expression

Therefore, we could switch the convention for str and repr, which breaks
things that are currently following the convention.

Or we could switch the interactive interpreter to use str to encourage
class writers to follow the convention. Of course, any class writers
who intentionally switched __repr__ and __str__ to get nice interactive
behavior will have to switch the mehtods back.

Since Tim voted for the second option, that's probably what Guido will
decide. ;) Either way, Guido still has to deal with non-printing
characters, string-vs-number distinctions, and other insanity in the
interpreter.

who-uses-the-interpreter-anyway-<wink>-ly yours
---Tom

Donn Cave

unread,
Nov 6, 1999, 3:00:00 AM11/6/99
to
Quoth "Thomas A. Bryan" <tbr...@python.net>:
...

| The problem is that this thread is the first time many of us have
| ever thought about the problem. I've never noticed those two lines
| in the library reference, and I haven't seen the str/repr distinction
| clearly made anywhere else. Furthermore, since the interactive i
| nterpreter *does* use repr(), many of us make __repr__ return the
| human readable form so that we can easily test our classes.
|
| So, what I'm hearing is:
| 1. __str__ *should* be a human readable string
| 2. __repr__ *should* be a representation that can be eval'd back to the
| original object
| 3. This convention is not widely known, but
| 3b. the interactive interpreter discourages the convention by using
| repr to show the result of an expression

One of the problems here is that there are not two types of string
representations, but at least three.

1. Human readable representation of object. Interpreter should display
this in interactive use, for example.
2. Machine readable representation of object, for eval.
3. Machine readable object qua string - intentionally, not the
object's faithful representation, but what it would be if it
had been a string. What the author of rfc822.Message.__str__
appears to have had in mind, for example.

I don't see how any two of these three can be made compatible enough
to comfortably use the same implementation. When usage parted with
documented intentions, it probably wasn't just a lot of people failing
to read the documentation. There's a fundamental need for a machine
__str__ that's different from a machine __repr__, and the machine
__str__ is worse for general display purpose if it's intentionally
inaccurate.

Now I'm arguing that one reason many of us have never considered
this issue is that the current implementation is the best resolution
of these conflicts. It's not completely OK, there is dissatisfaction
because people wish for the eval-able repr, but if we change course
in favor of the documented behavior, it's going to pinch somewhere else.

We already have a glimpse of this in the special cases that would
apparently be necessary. A string's str is of course itself - string
qua string - and its repr adds quotes, and that works out fine the way
we do things now. Change the context in which these are used - like
make str(container) use str() for contents - and you realize that the
repr was really a lot more friendly for some objects, like strings.
I'm saying this is not just a quirk of strings, it's because a string
is computable data and that's what we do use str() for.

Wouldn't it be great if that eval-able string could be shifted to
another function? Then if you exercised that function, you could
be more or less sure to really get something the author intended to
eval with a sensible result. Or you get an AttributeError. Not a
string that may or may not eval, when you get around to trying it.

Donn Cave, do...@u.washington.edu

Toby Dickenson

unread,
Nov 7, 1999, 3:00:00 AM11/7/99
to
Guido van Rossum <gu...@cnri.reston.va.us> wrote:

>Bernhard Herzog <her...@online.de> writes:
>
>> How about a hook function a la __import__? I.e. have a function
>> __format__ [1] in __builtin__ that is bound to repr by default. Whenever
>> Python prints the result of an expression in interactive mode, it calls
>> __format__ with the result as parameter and expects it to return a
>> string ready for printing.
>

>Brilliant! The default function could do all the current built-in


>magic -- print nothing if it's None, assign it to builtin _, and so
>on.
>

...and an IDE could open an object browser window. Mmmmmmmmm.

Toby Dickenson

Toby Dickenson

unread,
Nov 7, 1999, 3:00:00 AM11/7/99
to
"Tim Peters" <tim...@email.msn.com> wrote:

>Let me back off to what repr and str "should do":
>
>repr(obj) should return a string such that
> eval(repr(obj)) == obj

I don't understand the motivation for this requirement. Why would
anyone want to pass such a string to eval? If you anticipate the need
for reconstructing the object from a textual representation, then
surely pickle is a better option?

Alternatively, most classes could implement __repr__ as :-)

def __repr__(self):
return "pickle.loads(%s)" % repr(pickle.dumps(self))

>repr() currently cheats in another way: for purely aesthetic reasons,
>repr(float) and repr(complex) don't generate enough digits to allow exact
>reconstruction of their arguments (assuming high-quality float<->string in
>the platform libc). This is a case where an argument appropriate to str()
>was applied to repr(), not because it makes *sense* for repr, but because so
>much output goes thru an *implicit* repr() now and you didn't want to see
>all those "ugly" long float strings <0.50000000000000079 wink>.

OK, so repr as it stands today is clearly not suitable for formatting
for people, and is less than ideal for formatting for the machine.
However there is a third audience for who it is ideal, the programmer,
who need a representation for use in development tools such as
debuggers, or the interactive mode.

Tim's Rat class is a good example of how __repr__ should not be
implemented, because it exposes the internal representation of the
object, not the external value.

However, using a small number of digits for repr(float) does make
sense because it still provides sufficient precision for most
programmers.

Toby Dickenson

Gordon McMillan

unread,
Nov 7, 1999, 3:00:00 AM11/7/99
to Toby Dickenson, pytho...@python.org
Toby Dickenson writes:

> "Tim Peters" <tim...@email.msn.com> wrote:
>
> >Let me back off to what repr and str "should do":
> >
> >repr(obj) should return a string such that
> > eval(repr(obj)) == obj
>
> I don't understand the motivation for this requirement. Why would
> anyone want to pass such a string to eval? If you anticipate the
> need for reconstructing the object from a textual representation,
> then surely pickle is a better option?

It's not human editable. I guess you've never used
open('xxx.xxx','w').write(repr(dict))
and
dict = eval(open('xxx.xxx','r'),read())

as a cheap config file.

> OK, so repr as it stands today is clearly not suitable for
> formatting for people, and is less than ideal for formatting for
> the machine.

For strings, ints, lists, tuples and dicts it is dandy for both
right now. For longs it's good for machines; for some others
it's OK for some users.

It doesn't make a whole lot of sense for low-level objects
involving open resources, but it does make a great deal of
sense for user objects which would represent a URL as a
string, or a file as a (filename, mode). You can then make
repr(object) yield a valid constructor and include high level
objects in your config file, or cut and paste from interactive to
a script.

Whether you use them or not, these are very valuable features
that are promised (but not entirely fulfilled).

> However there is a third audience for who it is
> ideal, the programmer, who need a representation for use in
> development tools such as debuggers, or the interactive
mode.
>

There's been some debate about interactive mode. I would
personally prefer that interactive mode err on the side of
excess accuracy, (I find pprint to be a necessity for complex
structures, and whether repr or str is used, that's not going to
change). Debuggers should not be relying on either, but
should pick apart the object through knowledge of Python
internals.

> Tim's Rat class is a good example of how __repr__ should not be
> implemented, because it exposes the internal representation of
> the object, not the external value.

Tim's Rat class should provide a separate __str__
implementation.



> However, using a small number of digits for repr(float) does make
> sense because it still provides sufficient precision for most
> programmers.

For human consumption, I *always* use % formatting with
floats - there's no way that float can know whether the human
is a chemist or a bean-counter, (if the latter qualify at all).


- Gordon

Andy Robinson

unread,
Nov 7, 1999, 3:00:00 AM11/7/99
to
"Tim Peters" <tim...@email.msn.com> wrote:

>> Should I really assume stdout is capable of displaying Latin-1?
>
>No. But I don't grasp why you think you need to know *anything* about this.
>Until Unicode takes over the world, there's nothing you can do other than
>tell users the truth: "most of" printable 7-bit ASCII displays the same way
>across the world, but outside of that *all* bets are off. It will vary by
>all of OS, display device, displaying program, and user configuration
>choices.
>

Well said. I'm going to pass this message around to my team at work as
it encapsulates so many of the issues involved in internationalisation
(and if you think inputting François is bad, try figuring out a way to
clean up Japanese names using vi on an English Solaris box).

I have found there are only three really sane points of reference when
dealing with encodings - anywhere between the gaps just causes
confusion all around:

Level 1: Code that is assumed to work as above (tab, newline and
32-127)
Level 2: Code that keeps all 256 values intact without understanding
the contents, like Python strings.
Level 3: A fully general multi-byte encoding toolkit, where the
programmer is in explicit control of which encoding a string is in,
and has capable libraries which can convert from one to the other, and
can reason in advance about whether a particular data set can survive
a particular round-trip conversion, and knows the exact capabilities
of the fonts ultimately used for display or printing.

AFAIK (3) does not exist yet - we got most of the way at work with
Unilib and a load of ad hoc Python code, and Java goes most of the
way. I think the key concept is a special kind of string which is
tagged to know which encoding it is in.

Regards,

Andy

Ivan Frohne

unread,
Nov 7, 1999, 3:00:00 AM11/7/99
to

Toby Dickenson <htr...@zepler.org> wrote in message
news:u4glOIWXzGrqvJ...@4ax.com...

> >repr(obj) should return a string such that
> > eval(repr(obj)) == obj
>
> I don't understand the motivation for this requirement. Why would
> anyone want to pass such a string to eval? If you anticipate the need
> for reconstructing the object from a textual representation, then
> surely pickle is a better option?

Some objects can't be pickled. Arrays, for example.

--Ivan Frohne


Cliff Crawford

unread,
Nov 8, 1999, 3:00:00 AM11/8/99
to
I'm glad this thread cropped up when it did. I've been writing a
simple parser module, and had been trying to track down a "bug"
involving the token scanner inserting extra backslashes in strings.
But then I thought of the __repr__ vs __str__ distinction, and
tried this:

>>> r'blah\\blah'
'blah\\\\blah'
>>> 'blah\\blah'
'blah\\blah'
>>> print r'blah\\blah'
blah\\blah
>>> print 'blah\\blah'
blah\blah

So actually there wasn't anything wrong with my code, it was just
that I was printing my results in the interpreter, which uses
__repr__ and thus inserts an extra backslash for each backslash in
the string.
Well, it only cost me about a half hour of needless debugging =)


--
cliff crawford http://www.people.cornell.edu/pages/cjc26/
-><- Shall she hear the lion's roar?

Charles Boncelet

unread,
Nov 8, 1999, 3:00:00 AM11/8/99
to Tim Peters
Tim Peters wrote:

> David Ascher suggested that str(long) drop the trailing "L", and that's a
> good example. The "L" is appropriate for repr(), for so long as Python
> maintains such a sharp distinction between ints and longs, but is really of
> no help to most users most of the time.

On the dissenting side, I find the "L" useful (just as I find the
difference between 1 and '1' useful. Consider the following:

>>> a,b = 2L, 2
>>> a,b
(2L, 2)
>>> a**32
4294967296L
>>> b**32
Traceback (innermost last):
File "<stdin>", line 1, in ?
OverflowError: integer pow()

Without the "L", it is not at all clear why a**32 works and b**32 fails.

--Charlie

-----
Charles Boncelet <bonc...@udel.edu>
University of Delaware
Newark DE 19716 USA
http://www.eecis.udel.edu/~boncelet/

Tim Peters

unread,
Nov 8, 1999, 3:00:00 AM11/8/99
to pytho...@python.org
>> repr(obj) should return a string such that
>> eval(repr(obj)) == obj

[Toby Dickenson]


> I don't understand the motivation for this requirement. Why would
> anyone want to pass such a string to eval? If you anticipate the need
> for reconstructing the object from a textual representation, then
> surely pickle is a better option?

This was well covered by others (not everything of common interest can be
pickled; and pickle strings, while portable, are wholly unreadable by
people).

> ...


> OK, so repr as it stands today is clearly not suitable for formatting
> for people,

It's much better suited for that than is pickle. That is, people can and do
cut & paste reprs with confidence, and even edit them by hand. So, indeed,
the requirement at the top of the msg doesn't capture *everything* a good
repr should do.

> and is less than ideal for formatting for the machine.

Based on the repr(float) example? That one is easy to repair. So long as
interactive mode uses repr() for raw expressions, though, and containers
don't "pass str() down", most people won't want to see the "extra" digits
most of the time.

> However there is a third audience for who it is ideal, the programmer,
> who need a representation for use in development tools such as
> debuggers, or the interactive mode.

People (no, not even programmers <wink>) aren't that monolithic; e.g., in
development tools it's likely you're going to want to see the repr() of your
own objects and the str() of mine. I'm not trying to address that --
there's only so much we can squeeze out of two functions. It would be a
real help if those two (repr & str) had clearly understood purposes,
though -- they would cover more of the space if they weren't treated like
fuzzy synonyms in practice.

> Tim's Rat class is a good example of how __repr__ should not be
> implemented, because it exposes the internal representation of the
> object, not the external value.

Beg to differ: it's an excellent example of how __repr__ should be
implemented. The information in a *typical* Rat can require terabytes of
string space to represent if expressed as, e.g., a decimal expansion. It's
a case where "the internal representation" is no accident: a pair of longs
is one of a very few tractable ways to capture the object's true value.
Approximations are fine for __str__, though (and, although I didn't show it,
Rat.__str__ supports a dozen options for making nice-looking and compact
approximation strings -- alas, it's hard to talk Python into *invoking*
Rat.__str__!).

> However, using a small number of digits for repr(float) does make
> sense because it still provides sufficient precision for most
> programmers.

See above: I told you you'd want to see the str() of everyone's objects
except your own <0.5 wink>. This isn't a joke to people who use floats
seriously -- just as you take your objects seriously.

most-americans-don't-give-a-rip-about-o-umlaut-either-ly y'rs - tim

Jonathan Giddy

unread,
Nov 8, 1999, 3:00:00 AM11/8/99
to
Charles Boncelet <bonc...@udel.edu> writes:

>On the dissenting side, I find the "L" useful (just as I find the
>difference between 1 and '1' useful. Consider the following:

>>>> a,b = 2L, 2
>>>> a,b
>(2L, 2)
>>>> a**32
>4294967296L
>>>> b**32
>Traceback (innermost last):
> File "<stdin>", line 1, in ?
>OverflowError: integer pow()

>Without the "L", it is not at all clear why a**32 works and b**32 fails.

Since the interpreter prints the __repr__ form of values, this would still
work the same. The proposal was to change the __str__ form of long integers.

In fact, your argument dissents against another idea in this thread that the
interpreter should display the __str__ values of results.

Personally, I tend to use the __repr__ form for details that the programmer
needs to know about an object. I use the __str__ form for objects that have
a natural string conversion (exception classes are the main thing that
spring to mind).

Someone using the interactive interpreter is not going to get very far
without 'import', and anyone who uses 'import' is, or is learning to be, a
programmer. Therefore, I would say that interactive users should get the
__repr__ form of the result.

Jon.

Jeremy Hylton

unread,
Nov 9, 1999, 3:00:00 AM11/9/99
to Tim Peters
>>>>> "TP" == Tim Peters <tim...@email.msn.com> writes:

[Toby Dickenson:]


>> Tim's Rat class is a good example of how __repr__ should not be
>> implemented, because it exposes the internal representation of
>> the object, not the external value.

TP> Beg to differ: it's an excellent example of how __repr__ should
TP> be implemented.

It's no accident that the function is called __repr__. It's role is
to expose the representation.

Jeremy


Jim Althoff

unread,
Nov 9, 1999, 3:00:00 AM11/9/99
to ne...@alum.mit.edu, pytho...@python.org
I agree with Neel. The interactive interpreter is an incredibly
powerful tool for learning Python and for developing code.
And with JPython, it is a wonderful, powerful tool
for learning the Java APIs as well.

Jim

At 01:17 AM 11/10/99 +0000, Neel Krishnaswami wrote:


>Emile van Sebille <em...@fenx.com> wrote:
> >
> >It seems to me that issues arise when using the interactive interpreter.
> >I suspect that little useful work is done in this mode, and is used
> >predominately for quick testing while creating python programs.
>

>I'd like to politely disagree . Combined with Tim Peters and Barry
>Warsaw's emacs py-mode, Python's interactive mode is the single most
>useful feature of the language for me. With it, I can test each and
>every method, loop, and statement as I code. This does wonders for my
>understanding of my code. And not incidentally, reliability goes way
>up as well.
>
>I learned Perl before Python, but my Python skill surpassed my Perl
>skill after a much smaller time investment. I lay this pretty much
>wholely at the feet of the interactive interpreter -- I could find out
>the answers to puzzles without a tedious edit-save-run cycle.
>
>
>Neel
>
>--
>http://www.python.org/mailman/listinfo/python-list

Emile van Sebille

unread,
Nov 9, 1999, 3:00:00 AM11/9/99
to pytho...@python.org
Neel,

Ah, but you didn't disagree! ;-) Testing certainly is useful, and
for the tester it *is* work, but that's code that is not in production,
which is when the code *is* working.

I quote now from Paul Foley <myc...@actrix.gen.nz>, who e-mailed me
a sample interactive session using a function describe():

>
> You mean something like this:
>
> >>> x="Testing"
> >>> describe(x)
> A string containing 7 characters.
> >>> import sys
> >>> describe(sys)
> The `sys' module.
> The module has no documentation string.
>
> It exports the following functions: ['setcheckinterval', 'exc_info',
'exit', 'setprofile', 'settrace', 'getrefcount']
>
> It exports the following variables: ['platform', 'stderr', 'version',
'builtin_module_names', 'modules', 'stdin', 'exec_prefix', 'copyright',
'executable', 'exc_type', 'ps1', 'ps2', 'path', 'maxint', 'prefix',
'argv', 'stdout']
> >>> describe(sys.exit)
> The built-in function `exit'.
> >>> describe(sys.stdin)
> A file open for reading on `<stdin>' (a TTY device)
> The underlying file descriptor is 0.
> The current file position is 0.

This is what I had in mind. To further enhance the utility of
interactive
mode by providing more information to the programmer. Certainly __str__
and __repr__ are The Wrong Way (tm) to do this, and describe() much
better.
Any and everything more that describe() could do should be added to it.
Add
-v, -vv, -vvv options as appropriate. Show the source entry line. Hey,
show
the whole routine. ;-0 (Let's see... get line number from bytecode,
read
source for line, note indent, display until match indent... is that it?)

is-what-we-do-work-ly yr's

--

Emile van Sebille
em...@fenx.com
-------------------


Neel Krishnaswami <ne...@brick.cswv.com> wrote in message
news:slrn82hhtv...@brick.cswv.com...

Neel Krishnaswami

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to

Tim Peters

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to pytho...@python.org
[Toby Dickenson:]
> Tim's Rat class is a good example of how __repr__ should not be
> implemented, because it exposes the internal representation of
> the object, not the external value.

[Rat's Tim class]


> Beg to differ: it's an excellent example of how __repr__ should

> be implemented.

[Jeremy Hylton]


> It's no accident that the function is called __repr__. It's role is
> to expose the representation.

No, in that case it would have been named __representation__. __repr__ is
actually an acronym, meaning Regular-Expression Parsable Rendition.
Similarly __str__ is short for Simpler Than Repr. I never learned what
__init__ stands for -- it never interested Tim <wink>.

guess-what-wink-stands-for-and-win-a-fabulous-prize!-ly y'rs - tim

Mikael Olofsson

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to Tim Peters

On 10-Nov-99 Tim Peters wrote:
> No, in that case it would have been named __representation__. __repr__ is
> actually an acronym, meaning Regular-Expression Parsable Rendition.
> Similarly __str__ is short for Simpler Than Repr. I never learned what
> __init__ stands for -- it never interested Tim <wink>.

My God! I thought this language was supposed to be easy to handle, to
understand, and to explain. If __repr__ has nothing to do with
representation and if __str__ has nothing to do with string, I guess
__init__ has nothing to do with initialization. That's not data
compression; that's information hiding!

someone-should-be-programming-intercal-instead-ly y'rs

/Mikael

-----------------------------------------------------------------------
E-Mail: Mikael Olofsson <mik...@isy.liu.se>
WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael
Phone: +46 - (0)13 - 28 1343
Telefax: +46 - (0)13 - 28 1339

Date: 10-Nov-99
Time: 08:58:29

pi...@cs.uu.nl

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to
>>>>> Toby Dickenson <htr...@zepler.org> (TD) writes:

TD> "Tim Peters" <tim...@email.msn.com> wrote:
>> Let me back off to what repr and str "should do":
>>

>> repr(obj) should return a string such that
>> eval(repr(obj)) == obj

TD> I don't understand the motivation for this requirement. Why would
TD> anyone want to pass such a string to eval? If you anticipate the need
TD> for reconstructing the object from a textual representation, then
TD> surely pickle is a better option?

for example, if you would want to write a couple of variables to a file in
the form of python statements: var=value.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: Piet.van...@gironet.nl

Gerrit Holl

unread,
Nov 10, 1999, 3:00:00 AM11/10/99
to pytho...@python.org
Tim Peters wrote:
> No, in that case it would have been named __representation__. __repr__ is
> actually an acronym, meaning Regular-Expression Parsable Rendition.
> Similarly __str__ is short for Simpler Than Repr. I never learned what
> __init__ stands for -- it never interested Tim <wink>.

:-)

> guess-what-wink-stands-for-and-win-a-fabulous-prize!-ly y'rs - tim

Windows-Is-Not-Kewl?

regards,
Gerrit.
--
linuxgames.nl.linux.org All about games on Linux in Dutch (under construction).
www.nl.linux.org The Dutch resource for Dutch Linux documentation.
www.nl.linux.org/discoverb Learn foreign words and definitions.
www.nl.linux.org/~gerrit/asperger I'm an Asperger, go here for more info.
www.nl.linux.org/~gerrit About me and pictures of me.

skaller

unread,
Nov 12, 1999, 3:00:00 AM11/12/99
to pytho...@python.org
> Quoth "Tim Peters" <tim...@email.msn.com>:

>
> | Let me back off to what repr and str "should do":
> |
> | repr(obj) should return a string such that
> | eval(repr(obj)) == obj

FYI: Viperi (my python interpreter) contains an extension, a function
erepr,
which provides these semantics for a much wider
class of types than CPython 1.5.2, including: modules, classes,
instances,
and functions, provided all the attributes are themselves
'erepr'able. Erepr will not handle (native) file objects.

There is, however, a caveat.

erepr(f)

returns a string representing the definition of the function f.
However, evaluating the string will NOT work correctly,
unless the globals is correctly set, something like:

eval(erepr(f), f.__module__)

is required. Similarly for classes.

Oh yes, I forgot to mention that all Viper statements are also
expressions, the 'e' in erepr means 'expression'.
In fact, given an expression, such as

x + 1

then, erepr applied to this expression yields a string

'x + 1'

There is currently no way to define such an expression in Viperi.
[A 'lazy' or 'defered' evaluation control is under consideration]
However, it has facilities for partial evaluation, they're just
disabled for Python compatibility. For example, with partial
evaluation enabled, the result of:

y = 1
z = x + y

is that x is an _expression_ whose value is

x + 1

In Viper, expressions are THE first class type. All other types
are cases of expressions (including statements, objects, and values
of almost all kinds -- slices are not expressions at the moment).

You may be wondering how this works. The answer is that Viperi does not
use bytecode, it executes the 'Abstract Syntax Tree' directly.
Consequently, the original code can be recovered. (without comments,
or the original formatting, of course).

Note that this will NOT work if the object has been bound,
that is, if names have been replaced by numeric indexes.
(And, it will not work in compiled code either, for the same reason:
bound or compiled code is the 'moral equivalent' of python bytecode)

| I'd go further and formalize a property that's currently honored but
> | implicitly: the string returned by repr should contain only characters c
> | such that

Viper repr functions uses only 'printable characters' in the range
33-126
of ISO10646, plus space. (i.e. printable ASCII characters plus space).
All other codes are represented either using \n (and related specials)
or using \uXXXX or \uXXXXXXXX encoding. [There will be a
python compatibility switch to force use of python lexicology]

--
John Skaller, mailto:ska...@maxtal.com.au
1/10 Toxteth Rd Glebe NSW 2037 Australia
homepage: http://www.maxtal.com.au/~skaller
downloads: http://www.triode.net.au/~skaller

M.-A. Lemburg

unread,
Nov 12, 1999, 3:00:00 AM11/12/99
to skaller
skaller wrote:
> ...viperi's version of repr...

> You may be wondering how this works. The answer is that Viperi does not
> use bytecode, it executes the 'Abstract Syntax Tree' directly.
> Consequently, the original code can be recovered. (without comments,
> or the original formatting, of course).

Could you point me to some resources ? Using ASTs for execution
is an interesting subject and I would like to know how you deal
with Python dynamic nature in this context (are the ASTs self
modifying ?).

BTW, all your software links are broken...

Thanks,
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 49 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/

Fredrik Lundh

unread,
Nov 13, 1999, 3:00:00 AM11/13/99
to M.-A. Lemburg
M.-A. Lemburg <m...@lemburg.com> wrote:
> > You may be wondering how this works. The answer is that Viperi does not
> > use bytecode, it executes the 'Abstract Syntax Tree' directly.
> > Consequently, the original code can be recovered. (without comments,
> > or the original formatting, of course).
>
> Could you point me to some resources ? Using ASTs for execution
> is an interesting subject and I would like to know how you deal
> with Python dynamic nature in this context (are the ASTs self
> modifying ?).

guess my brain doesn't work well today,
but maybe someone could tell me:

1) what's the reason AST's would have to change during
the execution of dynamic code (the byte code doesn't
exactly change, does it?)

2) what's the reason byte code cannot be used to recover
the original source code (Swedish readers may remember
the ABC80, which did exactly this).

</F>


Bjorn Pettersen

unread,
Nov 13, 1999, 3:00:00 AM11/13/99
to Fredrik Lundh, M.-A. Lemburg

> M.-A. Lemburg <m...@lemburg.com> wrote:
> > > You may be wondering how this works. The answer is that
> Viperi does not
> > > use bytecode, it executes the 'Abstract Syntax Tree' directly.
> > > Consequently, the original code can be recovered.
> (without comments,
> > > or the original formatting, of course).
> >
> > Could you point me to some resources ? Using ASTs for execution
> > is an interesting subject and I would like to know how you deal
> > with Python dynamic nature in this context (are the ASTs self
> > modifying ?).
>
> guess my brain doesn't work well today,
> but maybe someone could tell me:
>
> 1) what's the reason AST's would have to change during
> the execution of dynamic code (the byte code doesn't
> exactly change, does it?)

It doesn't have to. All you have to do is implement an "Environment" object
that maps variable names to objects in the current scope. E.g.:

def eval_assignment(lhs, rhs, env):
left = evaluate(lhs)
right = evaluate(rhs)
env[left] = right

Most simple interpreters work this way (unless you've spent a lot of time on
implementing your bytecode interpreter, it is generally a very modest win).

> 2) what's the reason byte code cannot be used to recover
> the original source code (Swedish readers may remember
> the ABC80, which did exactly this).

As long as the bytecode is relatively close to the language, this is no
problem. (You can even do this for Java bytecodes..)

-- bjorn

Phil Hunt

unread,
Nov 13, 1999, 3:00:00 AM11/13/99
to
In article <01fd01bf2dce$d2c03cf0$f29b...@secret.pythonware.com>

fre...@pythonware.com "Fredrik Lundh" writes:
> 2) what's the reason byte code cannot be used to recover
> the original source code

Perhaps because two different (but obviously functionally
identical and very similar) source codes could produce the same
byte code.

--
Phil Hunt - - - - - - - - - ph...@vision25.demon.co.uk
Moore's Law: processor speed doubles every 18 months
Gates' Law: software speed halves every 18 months


Michael Hudson

unread,
Nov 14, 1999, 3:00:00 AM11/14/99
to
ph...@vision25.demon.co.uk (Phil Hunt) writes:

> In article <01fd01bf2dce$d2c03cf0$f29b...@secret.pythonware.com>
> fre...@pythonware.com "Fredrik Lundh" writes:
> > 2) what's the reason byte code cannot be used to recover
> > the original source code
>
> Perhaps because two different (but obviously functionally
> identical and very similar) source codes could produce the same
> byte code.

I doubt it actually. Python's bytecode is really pretty high
level. Leastaways, I can't think of any examples.

It's one of those things I might get round to writing one of these
days. I don't really know how hard it would be. Not especially, I
suspect.

Regards,
Michael

Tres Seaver

unread,
Nov 15, 1999, 3:00:00 AM11/15/99
to

In article <m3n1shp...@atrus.jesus.cam.ac.uk>,

Besides, you need to defer it until _after_ this year's Pythonic award,
for which you are already in pole position for inventing bytecodehacks :).

Tres.
--
---------------------------------------------------------------
Tres Seaver tse...@palladion.com 713-523-6582
Palladion Software http://www.palladion.com

William Tanksley

unread,
Nov 15, 1999, 3:00:00 AM11/15/99
to
On 15 Nov 1999 20:48:39 GMT, Tres Seaver wrote:
>Michael Hudson <mw...@cam.ac.uk> wrote:
>>ph...@vision25.demon.co.uk (Phil Hunt) writes:

>>It's one of those things I might get round to writing one of these
>>days. I don't really know how hard it would be. Not especially, I
>>suspect.

>Besides, you need to defer it until _after_ this year's Pythonic award,
>for which you are already in pole position for inventing bytecodehacks :).

You might note that Tim is rather proud of being the only living recipient
of the Pythonic award (for some definition of 'life'). I understand he
might take action to keep that distinction if someone else were to win the
award.

If you get what I mean.

>Tres.

watching-my-back-and-underachieving-ly y'rs,
--
-William "Billy" Tanksley
<1.0 wink>

Michael Hudson

unread,
Nov 16, 1999, 3:00:00 AM11/16/99
to
wtan...@hawking.armored.net (William Tanksley) writes:

> On 15 Nov 1999 20:48:39 GMT, Tres Seaver wrote:
> >Michael Hudson <mw...@cam.ac.uk> wrote:
> >>ph...@vision25.demon.co.uk (Phil Hunt) writes:
>
> >>It's one of those things I might get round to writing one of these
> >>days. I don't really know how hard it would be. Not especially, I
> >>suspect.
>
> >Besides, you need to defer it until _after_ this year's Pythonic award,
> >for which you are already in pole position for inventing bytecodehacks :).
>
> You might note that Tim is rather proud of being the only living recipient
> of the Pythonic award (for some definition of 'life'). I understand he
> might take action to keep that distinction if someone else were to win the
> award.
>
> If you get what I mean.
>

I don't think the bytecodehacks, while sufficiently dark and useless
to be a tim-ism, qualify me in any way for a Pythonic Wizard Hat when
compared the sheer contribution in terms of wit, code, accurate
advice, experience, emacs modes and general amazingly useful stuff
that Tim Peters has given to the Python community.

<0.0 wink>.

Try http://www.python.org/tim_one/ if you don't see what I mean.

not-a-bot-yet-ly y'rs - Michael

Phil Hunt

unread,
Nov 16, 1999, 3:00:00 AM11/16/99
to
In article <m3n1shp...@atrus.jesus.cam.ac.uk>

mw...@cam.ac.uk "Michael Hudson" writes:
> ph...@vision25.demon.co.uk (Phil Hunt) writes:
>
> > In article <01fd01bf2dce$d2c03cf0$f29b...@secret.pythonware.com>
> > fre...@pythonware.com "Fredrik Lundh" writes:
> > > 2) what's the reason byte code cannot be used to recover
> > > the original source code
> >
> > Perhaps because two different (but obviously functionally
> > identical and very similar) source codes could produce the same
> > byte code.
>
> I doubt it actually. Python's bytecode is really pretty high
> level. Leastaways, I can't think of any examples.

How about:

x = a + b + c
x = (a + b) + c
x=a+b+c

I'd guess these all produce the same bytecode.

I'd also not be surprised if you could consistently change the
name of a local variable within a function without changing bytecodes,
i.e. Python internally doesn't know the names of local variables.

> It's one of those things I might get round to writing one of these
> days. I don't really know how hard it would be. Not especially, I
> suspect.

You might want to have a look at how Java bytecode decompilers
work.

Michael Hudson

unread,
Nov 18, 1999, 3:00:00 AM11/18/99
to
ph...@vision25.demon.co.uk (Phil Hunt) writes:

> In article <m3n1shp...@atrus.jesus.cam.ac.uk>
> mw...@cam.ac.uk "Michael Hudson" writes:
> > ph...@vision25.demon.co.uk (Phil Hunt) writes:
> >
> > > In article <01fd01bf2dce$d2c03cf0$f29b...@secret.pythonware.com>
> > > fre...@pythonware.com "Fredrik Lundh" writes:
> > > > 2) what's the reason byte code cannot be used to recover
> > > > the original source code
> > >
> > > Perhaps because two different (but obviously functionally
> > > identical and very similar) source codes could produce the same
> > > byte code.
> >
> > I doubt it actually. Python's bytecode is really pretty high
> > level. Leastaways, I can't think of any examples.
>
> How about:
>
> x = a + b + c
> x = (a + b) + c
> x=a+b+c
>
> I'd guess these all produce the same bytecode.

Well, yes. The original message said "the original code can be


recovered. (without comments, or the original formatting, of course)"

in reference to ASTs; I'm saying much the same about bytecodes.

If the above don't lead to identical ASTs, something odd's going on.



> I'd also not be surprised if you could consistently change the
> name of a local variable within a function without changing bytecodes,
> i.e. Python internally doesn't know the names of local variables.

It does, actually. The names of local variables are in
f.func_code.co_varnames for a function f. True, changing the name
uniformly wouldn't change the bytecode, but the bytecode isn't much
use without a code object.

> > It's one of those things I might get round to writing one of these
> > days. I don't really know how hard it would be. Not especially, I
> > suspect.
>
> You might want to have a look at how Java bytecode decompilers
> work.

Maybe; I know very little about Java (except from what I can deduce
from C++) and nothing about the bytecode. But yes, how these thing
work in other cases would be interesting. Do you have a handy URL?

Cheers,
Michael

Fredrik Lundh

unread,
Nov 18, 1999, 3:00:00 AM11/18/99
to ph...@vision25.demon.co.uk
Phil Hunt <ph...@vision25.demon.co.uk> wrote:
> How about:
>
> x = a + b + c
> x = (a + b) + c
> x=a+b+c
>
> I'd guess these all produce the same bytecode.

footnote: the abc80 computer I mentioned
in my earlier post had an extra byte code
for this purpose. users can buy that you
add spaces and reformat their code, but
they won't accept that you remove their
parentheses...

> I'd also not be surprised if you could consistently change the
> name of a local variable within a function without changing bytecodes,
> i.e. Python internally doesn't know the names of local variables.

print myfunction.func_code.co_names

</F>

<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
(news: just posted an errata for the 'first printing') -->


Steve M S Tregidgo

unread,
Nov 18, 1999, 3:00:00 AM11/18/99
to Fredrik Lundh

Fredrik Lundh wrote:
> Phil Hunt <ph...@vision25.demon.co.uk> wrote:
> > How about:
> >
> > x = a + b + c
> > x = (a + b) + c
> > x=a+b+c
> >
> > I'd guess these all produce the same bytecode.
>
> footnote: the abc80 computer I mentioned
> in my earlier post had an extra byte code
> for this purpose. users can buy that you
> add spaces and reformat their code, but
> they won't accept that you remove their
> parentheses...
>

Two solutions:

(1) When regenerating the source, wrap every single binary op with
parentheses; the bytecode generated by (a+b)+c will also be generated
by ((a+b)+c), and at least the user's original bracketing will be
there too. Looks very messy though.

(2) Use some intelligence in deciding where brackets need to go.
Clearly if (a+b)*c was compiled into bytecodes, we would have to
ensure the brackets remained, but (a*b)+c uses extra brackets that
can be ommitted during the decompiling stage.

This second solution wouldn't be so hard -- the order of the
bytecodes tells us which ops come first, and so we'd just need to
bracket a nested op if the op that uses the result is of a certain
type (so if a BINARY_ADD applied to elements a+b and c, there would
be no need to introduce protective parentheses, but there would for a
BINARY_MULTIPLY on a+b and c).

Anyhow, since decompiling should mainly be used to recover lost code,
as long as the code works the issue of (orignally extraneous)
brackets missing isn't really a problem -- I for one would just be
grateful that I got my code back. (And heck, if I actually noticed
that the odd bracket was missing from a thousand-line file, I'd
consider that I'd got a bit _too_ familiar with the code ;-)

Cheers,
Steve

0 new messages