Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

difference in binding between strings and tuples?

0 views
Skip to first unread message

Iwan van der Kleyn

unread,
May 13, 2003, 4:12:25 PM5/13/03
to
I tried to explain a Java minded collegue the concept of name binding or
variable assignement, straight from Python in a Nutshell. (which by the
way is, together with the Cookbook a very niece piece of work). But then
it proved that perhaps I really did not fully grasp all aspects of it
myself. Check out:

>>> a = 1
>>> b = 1
>>> a == b
1
>>> a is b
1
>>> a += 1
>>> b += 1
>>> a == b
1
>>> a is b
1
>>> x = 'test'
>>> y = 'test'
>>> x == y
1
>>> x is y
1
>>> x += 's'
>>> x == y
1
>>> x is y
0
>>> q = (0, 1)
>>> r = (0, 1)
>>> q == r
1
>>> q is r
0


Ok? Testing for equality gives no suprises, but testing for identity
does, especially considering the differences between strings and tuples
(both of which are immutable)
I had expected (a is b) == True, even after the addidtion. But how do I
explain the results for the strings and tuples?

Can I presume x and y to refer to the same string object before the
append ("addition" ) but to two seperate objects afterwards? Why isn't
that true for the tuples? In other words: why the difference in binding
between strings and tuples?

Regards,

iwan

Donn Cave

unread,
May 13, 2003, 5:01:08 PM5/13/03
to
Quoth Iwan van der Kleyn <nu...@null.com>:

| I tried to explain a Java minded collegue the concept of name binding or
| variable assignement, straight from Python in a Nutshell. (which by the
| way is, together with the Cookbook a very niece piece of work). But then
| it proved that perhaps I really did not fully grasp all aspects of it
| myself. Check out:
|
| >>> a = 1
| >>> b = 1
| >>> a == b
| 1
| >>> a is b
| 1
| >>> a += 1
| >>> b += 1
| >>> a == b
| 1
| >>> a is b
| 1

You should also try this with other integer values - start with 20407
and increment by 123409, for example. Then "is" will be false.
What you are seeing is not a fundamental consequence of Python's
name binding system, it's a cache of certain types of values. Small
integers, small strings, etc.


| >>> x = 'test'
| >>> y = 'test'
| >>> x == y
| 1
| >>> x is y
| 1
| >>> x += 's'
| >>> x == y
| 1
| >>> x is y
| 0

Now you are looking at the effect of +=. Did you mean to also
increment y? Oh well, the result would have been the same!
I don't know, perhaps the small-string cache is used for literals
only.

| >>> q = (0, 1)
| >>> r = (0, 1)
| >>> q == r
| 1
| >>> q is r
| 0
|
| Ok? Testing for equality gives no suprises, but testing for identity
| does, especially considering the differences between strings and tuples
| (both of which are immutable)
| I had expected (a is b) == True, even after the addidtion. But how do I
| explain the results for the strings and tuples?

There apparently is no cache for tuples.

| Can I presume x and y to refer to the same string object before the
| append ("addition" ) but to two seperate objects afterwards? Why isn't
| that true for the tuples? In other words: why the difference in binding
| between strings and tuples?

It would be safest not to expect any two values to be identical (in the
"is" sense) unless they're directly related by reference, but not to
expect them _not_ to be identical unless they are mutable (because
Python will never cache mutable objects like that.)

Note also that += may rebind the left hand side or may not, depends.
Like the caching thing, it may confuse the issue more than it helps.

Donn Cave, do...@u.washington.edu

Alex

unread,
May 13, 2003, 5:19:37 PM5/13/03
to

Should there have been a y += 's' in there somewhere?

I'll venture an uneducated guess:

With strings, if compairson equality implied identity equality, then every
time a new string was created during program execution, python would have
to compare it with every other string. That would be rather costly.
Pooling strings while compiling is relatively cheap.

As for tuples, they can consist of heterogeneous items. To pool them python
would have to check that the tuple were of the same size and shape and
check that all the tuple items were immutable and equal. Probably not
worth the effort.

I'm not clear why one need to use identity comparison on immutable objects
anyway. The effect you are seeing just strikes me as the inconsequential
side effect of an optimization.

But then again I've only been at this for a few weeks...

Chad Netzer

unread,
May 13, 2003, 5:42:04 PM5/13/03
to
On Tue, 2003-05-13 at 13:12, Iwan van der Kleyn wrote:

> >>> x = 'test'
> >>> y = 'test'

> >>> x += 's'
> >>> x == y
> 1

What??? This must be a typo.

What I would expect is that after this operation:

x == 'tests'
y == 'test'

thus,

x != y


To answer your question, though, the fact is that Python can, and
sometimes does, cache immutable objects, thus making the identity test a
little more complicated. These are really implementation details, and
is one reason you should be careful about testing identity.

for example:

x = y = 1

Here, the '1' integer object is created, and the names x and y both
refer to it. Thus it makes sense that x is y.

x = 1
y = 1

Here, the '1' integer object is (implictly) created twice. It is just
that, under the hood, Python knows to really create it once, since it is
immutable (You can't change the one object to a two object) and it makes
things faster and more memory efficient. So x is y is also true, but it
is not strictly mandated by the language, and is only done for some
immutable object (like small integers, which are actually
pre-allocated).

Consider this:

>>> a = 1L
>>> b = 1L
>>> a is b
0

Python does NOT cache long integer objects, and thus each '1L' object
is, in fact, a separate object (this is an implementation detail,
subject to change with different versions of the interpreter).

The long and short of it is that if you have mutable object, it
sometimes makes sense to test for identity (ie. using 'is'), to see if
different names are referring to the same object (since mutating the
object will change it for both names). With immutable objects, is it
always better to test equality, for the reasons you discovered and that
are (hopefully) explained above.


Note that this is almost orthogonal to the notion of name binding. In
either case, a name is bound to an object (or rather, objects can be
referred to with multiple names, from multiple scopes, etc.) The only
confusion is whether creating an object creates something with a unique
id or not.

One final note - Object ids are unique for any instant in time, but as
objects are destructed and recreated, the memory allocator is free to
reuse that memory location, and thus the id number. The result is that
if an object has a certain id at some point in time, and then an object
has that id at another point in time, there is no language guarantee
that they are the same object, or even equivalent objects.

So, in theory, identity should ONLY be used to establish that two names
refer to different objects, and never to establish that they refer to
the same object. In practice, this rule is often ignored for the None
object, because it is special. Also, it is unlikely to ever bite you in
practice for any object. Still, I prefer equivalence checking to
identity checking (the counter-argument is that equivalence checking can
be overridden, identity checking cannot).


# Short program showing objects that are not equivalent, but have been
# given the same id as previously deleted objects due to reuse of
# memory. Tested on Linux Python 2.2.2
d = {}
a = 1L
for i in range( 100000 ):
a += 1L
key = id(a)

if d.has_key( key ):
print "found reused id(): key=%s object=%s " % (key, a)
break
else:
d[key] = key
print key, a


--

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)


Terry Reedy

unread,
May 13, 2003, 7:24:32 PM5/13/03
to

"Iwan van der Kleyn" <nu...@null.com> wrote in message
news:3ec151a7$0$137$e4fe...@dreader4.news.xs4all.nl...

> >>> a = 1
> >>> b = 1
> >>> a is b
1

You are generalizing too quickly. Consider
>>> a = 1000
>>> b = 1000
>>> a is b
0

> >>> x = 'test'
> >>> y = 'test'

> >>> x is y
> 1

This is a version dependent internal implementation optimization.

> >>> x += 's'
> >>> x == y
> 1

No, 'tests' != 'test'! You left something out.

> >>> q = (0, 1)
> >>> r = (0, 1)

> >>> q is r
> 0

Again, implentation choice.

> Ok? Testing for equality gives no suprises, but testing for identity
> does, especially considering the differences between strings and
tuples
> (both of which are immutable)

Whether the implementation merges duplicate immutables or not is an
internal optimization matter. The main use for 'is' is 'x is None'
instead of 'x == None' which is dependable because None is explicitly
a singleton -- like the empty set in set theory. The next most common
use is for newbies to confuse themselves with insufficient data ;-)
(You are not the first to only test a=1; b=1 and stop there with
ints.)

As for your other questions, 'immutable += something' is (in the
absence of side effects) an abbreviation for 'immutable = immutable +
something'. The name is rebound to a new object. However list += seq
abbreviates list.extend(seq) to modify in place.

Terry J. Reedy


Bryan

unread,
May 14, 2003, 12:34:24 AM5/14/03
to
chad,

thank you for this excellent explanation!

bryan


"Chad Netzer" <cne...@mail.arc.nasa.gov> wrote in message
news:mailman.1052862200...@python.org...

0 new messages