while testing a module today I stumbled on something that I can work
around but I don't quite understand.
>>> a = "a"
>>> b = "a"
>>> a == b
True
>>> a is b
True
>>> c = "/a"
>>> d = "/a"
>>> c == d
True # all good so far
>>> c is d
False # eeeeek!
Why c and d point to two different objects with an identical string
content rather than the same object?
Manu
>>>> c = "/a"
>>>> d = "/a"
>>>> c == d
> True # all good so far
>>>> c is d
> False # eeeeek!
>
> Why c and d point to two different objects with an identical string
> content rather than the same object?
Because you instantiated two difference objects.
http://docs.python.org/reference/datamodel.html#objects-values-and-types
should get you started on Python and objects.
j
*Do NOT use "is" to compare immutable types.* **Ever! **
It is an implementation choice (usually driven by efficiency considerations) to choose when two strings with the same value are stored in memory once or twice. In order for Python to recognize when a newly created string has the same value as an already existing string, and so use the already existing value, it would need to search *every* existing string whenever a new string is created. Clearly that's not going to be efficient. However, the C implementation of Python does a limited version of such a thing -- at least with strings of length 1.
Gary Herron
>
>>>> a = "a"
>>>> b = "a"
>>>> a == b
>>>>
> True
>
>>>> a is b
>>>>
> True
>
>
>>>> c = "/a"
>>>> d = "/a"
>>>> c == d
>>>>
> True # all good so far
>
>>>> c is d
>>>>
> False # eeeeek!
>
> Why c and d point to two different objects with an identical string
> content rather than the same object?
>
> Manu
Well, "foo is None" is actually recommended practice....
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
Then it should be a detected error to do so.
John Nagle
But since newbies are always falling into this trap, it is still a good
rule to say:
Newbies: Never use "is" to compare immutable types.
and then later point out, for those who have absorbed the first rule:
Experts: Singleton immutable types *may* be compared with "is",
although normal equality with == works just as well.
Gary Herron
That's not really true. If my object overrides __eq__ in a funny way, "is None"
is much safer.
Use "is" when you really need to compare by object identity and not value.
> Emanuele D'Arrigo wrote:
>> Hi everybody,
>>
>> while testing a module today I stumbled on something that I can work
>> around but I don't quite understand.
>>
>
> *Do NOT use "is" to compare immutable types.* **Ever! **
Huh? How am I supposed to compare immutable types for identity then? Your
bizarre instruction would prohibit:
if something is None
which is the recommended way to compare to None, which is immutable. The
standard library has *many* identity tests to None.
I would say, *always* use "is" to compare any type whenever you intend to
compare by *identity* instead of equality. That's what it's for. If you use
it to test for equality, you're doing it wrong. But in the very rare cases
where you care about identity (and you almost never do), "is" is the
correct tool to use.
> It is an implementation choice (usually driven by efficiency
> considerations) to choose when two strings with the same value are stored
> in memory once or twice. In order for Python to recognize when a newly
> created string has the same value as an already existing string, and so
> use the already existing value, it would need to search *every* existing
> string whenever a new string is created.
Not at all. It's quite easy, and efficient. Here's a pure Python string
constructor that caches strings.
class CachedString(str):
_cache = {}
def __new__(cls, value):
s = cls._cache.setdefault(value, value)
return s
Python even includes a built-in function to do this: intern(), although I
believe it has been removed from Python 3.0.
> Clearly that's not going to be efficient.
Only if you do it the inefficient way.
> However, the C implementation of Python does a limited version
> of such a thing -- at least with strings of length 1.
No, that's not right. The identity test fails for some strings of length
one.
>>> a = '\n'
>>> b = '\n'
>>> len(a) == len(b) == 1
True
>>> a is b
False
Clearly, Python doesn't intern all strings of length one. What Python
actually interns are strings that look like, or could be, identifiers:
>>> a = 'heresareallylongstringthatisjustmade' \
... 'upofalphanumericcharacterssuitableforidentifiers123_'
>>>
>>> b = 'heresareallylongstringthatisjustmade' \
... 'upofalphanumericcharacterssuitableforidentifiers123_'
>>> a is b
True
It also does a similar thing for small integers, currently something
like -10 through to 256 I believe, although this is an implementation
detail subject to change.
--
Steven
> Hi everybody,
>
> while testing a module today I stumbled on something that I can work
> around but I don't quite understand.
Why do you have to work around it?
What are you trying to do that requires that two strings should occupy the
same memory location rather than merely being equal?
> Why c and d point to two different objects with an identical string
> content rather than the same object?
Why shouldn't they?
--
Steven
But that definition is the *source* of the trouble. It is *completely*
meaningless to newbies. Until one has experience in programming in
general and experience in Python in particular, the difference between
"object identity" and "value" is a mystery.
So in order to lead newbies away from this *very* common trap they often
fall into, it is still a valid rule to say
Newbies: Never use "is" to compare immutable types.
of even better
Newbies: Never use "is" to compare anything.
This will help them avoid traps, and won't hurt their use of the
language. If they get to a point that they need to contemplate using
"is", then almost be definition, they are not a newbie anymore, and the
rule is still valid.
Gary Herron
Just use:
if something == None
It does *exactly* the same thing.
But... I'm not (repeat NOT) saying *you* should do it this way.
I am saying that since newbies continually trip over incorrect uses of
"is", they should be warned against using "is" in any situation until
they understand the subtle nature or "is".
If they use a couple "something==None" instead of "something is None"
in their code while learning Python, it won't hurt, and they can change
their style when they understand the difference. And meanwhile they
will skip traps newbies fall into when they don't understand these
things yet.
Gary Herron
> Robert Kern wrote:
...
>> Use "is" when you really need to compare by object identity and not
>> value.
>
> But that definition is the *source* of the trouble. It is *completely*
> meaningless to newbies. Until one has experience in programming in
> general and experience in Python in particular, the difference between
> "object identity" and "value" is a mystery.
Then teach them the difference, rather than give them bogus advice.
> So in order to lead newbies away from this *very* common trap they often
> fall into, it is still a valid rule to say
>
> Newbies: Never use "is" to compare immutable types.
Look in the standard library, and you will see dozens of cases of
first-quality code breaking your "valid" rule.
Your rule is not valid. A better rule might be:
Never use "is" to compare equality.
Or even:
Never use "is" unless you know the difference between identity and equality.
Or even:
Only use "is" on Tuesdays.
At least that last rule is occasionally right (in the same way a stopped
clock is right twice a day), while your rule is *always* wrong. It is never
correct to avoid using "is" when you need to compare for identity.
> of even better
>
> Newbies: Never use "is" to compare anything.
Worse and worse! Now you're actively teaching newbies to write buggy code!
--
Steven
>> Huh? How am I supposed to compare immutable types for identity then? Your
>> bizarre instruction would prohibit:
>>
>> if something is None
>>
>
> Just use:
>
> if something == None
>
> It does *exactly* the same thing.
Wrong.
"something is None" is a pointer comparison. It's blindingly fast, and it
will only return True if something is the same object as None. Any other
object *must* return False.
"something == None" calls something.__eq__(None), which is a method of
arbitrary complexity, which may cause arbitrary side-effects. It can have
false positives, where objects with unexpected __eq__ methods may return
True, which is almost certainly not the intention of the function author
and therefore a bug.
[...]
> If they use a couple "something==None" instead of "something is None"
> in their code while learning Python, it won't hurt,
Apart from the subtle bugs they introduce into their code.
> and they can change
> their style when they understand the difference. And meanwhile they
> will skip traps newbies fall into when they don't understand these
> things yet.
How about teaching them the right reasons for using "is" instead of giving
them false information by telling them they should never use it?
--
Steven
>>> a = "a"
>>> b = "a"
>>> a is b
True
>>> a = "/a" <- same as above, except the forward slashes!
>>> b = "/a" <- same as above, except the forward slashes!
>>> a is b
False
So, it appears that in the first case a and b are names to the same
string object, while in the second case they are to two separate
objects. Why? What's so special about the forward slash that cause the
two "/a" strings to create two separate objects? Is this an
implementation-specific issue?
Manu
Gary, thanks for your reply: your explanation does pretty much answer
my question. One thing I can add however is that it really seems that
non-alphanumeric characters such as the forward slash make the
difference, not just the number of characters. I.e.
>>> a = "aaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> b = "aaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> a is b
True
>>> a = "/aaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> b = "/aaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> a is b
False
I just find it peculiar more than a nuisance, but I'll go to the
blackboard and write 100 times "never compare the identities of two
immutables". Thank you all!
Manu
Python special cases certain objects like str with one element or small
ints from -10 to +256 for performance reasons. It's version and
implementation specific and may change in the future. Do NOT rely on it!
Christian
Nonsense. Show me "newbie" level code that's buggy with "==" but
correct with "is".
However, I do like your restatement of the rule this way:
Never use "is" unless you know the difference between identity and
equality.
That warns newbies away from the usual pitfall, and (perhaps) won't
offend those
who seem to forget what "newbie" means.
Gary Herron
>
>
This question is ambiguous:
a) Why does the Python interpreter behave this way?
(i.e. what specific algorithm produces this result?)
or
b) Why was the interpreter written to behave this way?
(i.e. what is the rationale for that algorithm?)
For a), the answer is in Object/codeobject.c:
/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
PyObject *v = PyTuple_GetItem(consts, i);
if (!PyString_Check(v))
continue;
if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
continue;
PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}
So it interns all strings which only consist of name
characters.
For b), the rationale is that such string literals
in source code are often used to denote names, e.g.
for getattr() calls and the like. As all names are interned,
name-like strings get interned also.
> What's so special about the forward slash that cause the
> two "/a" strings to create two separate objects?
See above.
> Is this an implementation-specific issue?
Yes, see above.
Martin
Unless you are *trying* to discern something about the implementation
and its attempt at efficiencies. Here's several more interesting example:
>>> 101 is 100+1
True
>>> 1001 is 1000+1
False
>>> 10*'a' is 5*'aa'
True
>>> 100*'a' is 50*'aa'
False
Gary Herron
> Manu
> --
> http://mail.python.org/mailman/listinfo/python-list
>
The obvious followup question is then, "when is it ok to use 'is'?"
Robert> Well, "foo is None" is actually recommended practice....
Indeed. It does have some (generally small) performance ramifications as
well. Two trivial one-line examples:
% python -m timeit -s 'x = None' 'x is None'
10000000 loops, best of 3: 0.065 usec per loop
% python -m timeit -s 'x = None' 'x == None'
10000000 loops, best of 3: 0.121 usec per loop
% python -m timeit -s 'x = object(); y = object()' 'x == y'
10000000 loops, best of 3: 0.154 usec per loop
% python -m timeit -s 'x = object(); y = object()' 'x is y'
10000000 loops, best of 3: 0.0646 usec per loop
I imagine the distinction grows if you implement a class with __eq__ or
__cmp__ methods, but that would make the examples greater than one line
long. Of course, the more complex the objects you are comparing the
stronger the recommendation agaist using 'is' to compare two objects.
Skip
>>>> a = "a"
>>>> b = "a"
>>>> a is b
> True
>
>>>> a = "/a" <- same as above, except the forward slashes!
>>>> b = "/a" <- same as above, except the forward slashes!
>>>> a is b
> False
>
> So, it appears that in the first case a and b are names to the same
> string object, while in the second case they are to two separate
> objects. Why? What's so special about the forward slash that cause the
> two "/a" strings to create two separate objects? Is this an
> implementation-specific issue?
With all the answers you got, I hope you now understand that you put the
question backwards: it's not "why aren't a and b the very same object in
the second case?" but "why are they the same object in the first case?".
Two separate expressions, involving two separate literals, don't *have* to
evaluate as the same object. Only because strings are immutable the
interpreter *may* choose to re-use the same string. But Python would still
be Python even if all those strings were separate objects (although it
would perform a lot slower!)
--
Gabriel Genellina
No it isn't, it's asinine advice that's not even a simpllified truth,
it's just a lie.
Newbies who don't understand the difference between "==" and "is"
should not be using "is", for any object, immutable or mutable, aside
from None (which, whether you like it or not, is idomatic Python).
Everyone who's learned the difference between equality and same
identity, including experts, should be using "is" only to test if some
object is the same object they created themselves, or is an object
guaranteed by a library or the langauge to never change, irrespective
of whether the object is mutable or not.
At no point on the learning curve is the distinction of when to use
"is" or not ever mutability.
Carl Banks
> I just find it peculiar more than a nuisance, but I'll go to the
> blackboard and write 100 times "never compare the identities of two
> immutables". Thank you all!
That's the wrong lesson to learn from this.
The right lesson to learn is, “Equality comparison is not the same
operation as identity comparison. Use the right tool for the situation
at hand.”
--
\ “Facts are meaningless. You could use facts to prove anything |
`\ that's even remotely true!” —Homer, _The Simpsons_ |
_o__) |
Ben Finney
That is absolutely wrong:
>>> a = 2^100
>>> b = 2^100
>>> a == b
True
>>> a is b
False
When is it ever necessary to compare for identity?
> Of course, the more complex the objects you are comparing the
> stronger the recommendation agaist using 'is' to compare two objects.
Why is there so much voodoo advice about "is"? Is object identity really
such a scary concept that people are frightened of it?
Mutable versus immutable is irrelevant. The complexity of the object is
irrelevant. The phase of the moon is irrelevant. The *only* relevant factor
is the programmer's intention:
If you want to test whether two objects have the same value (equality), the
correct way to do it is with "==".
If you want to test whether two objects are actually one and the same
object, that is, they exist at the same memory location (identity), the
correct way to do it is with "is".
If you find it difficult to think of a reason for testing for identity,
you're right, there aren't many. Since it's rare to care about identity, it
should be rare to use "is". But in the few times you do care about
identity, the correct solution is to use "is" no matter what sort of object
it happens to be. It really is that simple.
--
Steven
Caches of arbitrary objects.
When checking if an object (which may be have an arbitrarily perverse __eq__) is
None.
Or a specifically constructed sentinel value.
Checking for cycles in a data structure that defines __eq__.
>>> Newbies: Never use "is" to compare anything.
>>
>> Worse and worse! Now you're actively teaching newbies to write buggy
>> code!
>
> Nonsense. Show me "newbie" level code that's buggy with "==" but
> correct with "is".
What's "newbie" level code? What does that even *mean*? There's no sandbox
for newbies to play in -- their code runs in the same environment as code
written by experts. Newbies can import modules written by experts and use
their code: any object, no matter how complex, might find itself imported
and used by a newbie. Sometimes code written by newbies might even find its
way into code used by experts.
But regardless of that, the point is, what's your motivation in giving
advice to newbies? Do you want them to learn correct coding techniques, or
to learn voodoo programming and superstition? If you want them to learn
correct coding, then teach them the difference between identity and
equality. If you want them to believe superstitions, then continue telling
them never to use "is" without qualifications.
--
Steven
Which for a new user not familiar with the differing concepts of "is" and
"==" can lead to mistakes.
Steven> If you find it difficult to think of a reason for testing for
Steven> identity, you're right, there aren't many. Since it's rare to
Steven> care about identity, it should be rare to use "is". But in the
Steven> few times you do care about identity, the correct solution is to
Steven> use "is" no matter what sort of object it happens to be. It
Steven> really is that simple.
Right. Again though, when newcomers conflate the concepts they can deceive
themselves into thinking "is" is just a faster "==".
Skip
> Steven> Mutable versus immutable is irrelevant. The complexity of the
> Steven> object is irrelevant. The phase of the moon is irrelevant. The
> Steven> *only* relevant factor is the programmer's intention:
>
> Which for a new user not familiar with the differing concepts of "is" and
> "==" can lead to mistakes.
Right. And for newbies unfamiliar with Python, they might mistakenly think
that ^ is the exponentiation operator rather than **.
So what do we do? Do we teach them what ^ actually is, or give them bad
advice "Never call ^" and then watch them needlessly write their own
exponentiation function?
Do we then defend this terrible advice by claiming that nobody needs
exponentiation? Or that only experts need it? Or that it's necessary to
tell newbies not to use ^ because they're only newbies and can't deal with
the truth?
No, of course not. But that's what some of us are doing with regard to "is".
Because *some* newbies are ignorant and use "is" for equality testing, we
patronisingly decide that *all* newbies can't cope with learning what "is"
really is for, give them bad advice, and thus ensure that they stay
ignorant longer.
> Steven> If you find it difficult to think of a reason for testing for
> Steven> identity, you're right, there aren't many. Since it's rare to
> Steven> care about identity, it should be rare to use "is". But in the
> Steven> few times you do care about identity, the correct solution is
> to Steven> use "is" no matter what sort of object it happens to be. It
> Steven> really is that simple.
>
> Right. Again though, when newcomers conflate the concepts they can
> deceive themselves into thinking "is" is just a faster "==".
Then teach them the difference, don't teach them superstition.
--
Steven
But _you_ only _just_ stated "It does have some (generally small)
performance ramifications as
well" and provided timing examples to show it. Without qualification.
And you're wondering why these mythical newcomers might be confused...
Is that a trick question? The obvious answer is, any time you need to.
The standard library has dozens of tests like:
something is None
something is not None
Various standard modules include comparisons like:
if frame is self.stopframe:
if value is not __UNDEF__:
if b is self.exit:
if domain is Absent:
if base is object:
if other is NotImplemented:
if type(a) is type(b):
although that last one is probably better written using isinstance() or
issubclass(). I have no doubt that there are many more examples.
Comparing by identity are useful for interning objects, for testing that
singletons actually are singletons, for comparing functions with the same
name, for avoiding infinite loops while traversing circular data
structures, and for non-destructively testing whether two mutable objects
are the same object or different objects.
--
Steven
The performance difference can be large if the objects are (for
example) long lists.
I would think (not having looked) that the implementation of == would
first check for identity (for performance reasons)... but then that lead
me to ask: can an object be identical but not equal to itself?
... answered my own question
class Foo:
def __eq__(self, b):
return False
>>> x == Foo()
>>> x is x
--> True
>>> x == x
--> False
> I would think (not having looked) that the implementation of == would
> first check for identity (for performance reasons)...
For some types, it may. I believe that string equality testing first tests
whether the two strings are the same string, then tests if they have the
same hash, and only then do a character-by-character comparison. Or so I've
been told.
> can an object be identical but not equal to itself?
Yes. Floating point NANs are required to compare unequal to all floats,
including themselves. It's part of the IEEE standard.
Python doesn't assume that == must mean equality. If, for some bizarre
reason you want to define == to mean something completely different, then
you can define x == x to return anything you like for your class.
--
Steven
What should this example show? And where's the singleton here? BTW:
In [367]: a = 2 ^ 100
In [368]: b = 2 ^ 100
In [369]: a == b
Out[369]: True
In [370]: a is b
Out[370]: True
Ciao,
Marc 'BlackJack' Rintsch
I misunderstood at first what you meant by "singleton". Sorry.
btw, have anybody noticed that the subject line "/a" is not "/a" is
actually False.
>>> "/a" is not "/a"
False
>>> a = "/a"
>>> b = "/a"
>>> a is not b
True
(Actually, we had this thread last week.) It's a question of strings that
might be Python names. Every line of code that looks up a name in a
namespace (e.g. global symbols, instance attributes, class attributes, etc.)
needs a string containing the name. This optimization keeps Python data
space from filling up with names. The same thing happens with small,
common, integers.
[ ... ]
> I just find it peculiar more than a nuisance, but I'll go to the
> blackboard and write 100 times "never compare the identities of two
> immutables". Thank you all!
The rule is to know the operations, and use the ones that do what you want
to do. `is` and `==` don't do the same thing. Never did, never will.
<sarcasm>
Python 2.5.2 (r252:60911, Oct 5 2008, 19:24:49)
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a=3
>>> b=0
>>> a+b == a-b
True
>>> b=1
>>> a+b == a-b
False
At this point, someone says 'Eek'. But why? `+` and `-` never were the
same operation, regardless of a coincidence or two.
</sarcasm>
Mel.
Ho-hum. MUDD game.
def broadcast (sender, message):
for p in all_players:
if p is not sender:
p.tell (message) # don't send a message to oneself
Mel.
As far as I remember that's not correct. It's just the way C has
interpreted the standard and Python inherited the behavior. But you may
proof me wrong on that.
Mark, you are the expert on IEEE 754.
Christian
Thank you Martin, and all others who have responded, I have a much
better picture of the whole issue now. Much appreciated.
Manu
And for a practical, real world example:
In [1]: inf = 1e200 * 1e200
In [2]: inf
Out[2]: inf
In [3]: nan = inf / inf
In [4]: nan
Out[4]: nan
In [5]: nan is nan
Out[5]: True
In [6]: nan == nan
Out[6]: False
Steven is correct. The standard defines how boolean comparisons like ==, !=, <,
etc. should behave in the presence of NaNs. Table 4 on page 9, to be precise.
Since in a MUD game, a player would always have a unique username, I'd
rather compare with that. It doesn't rely on some internals. There is
very, very rare case where 'is' is really, really needed.
The rationale behind the standard was because NaN can be returned by
many distinct operations, thus one NaN may not be equal to other NaN.
Steven's statement sounds about right to me: IEEE 754
(the current 2008 version of the standard, which supersedes
the original 1985 version that I think Robert Kern is
referring to) says that every NaN compares *unordered*
to anything else (including itself). A compliant language is
required to supply twenty-two(!) comparison operations, including
a 'compareQuietEqual' operation with compareQuietEqual(NaN, x)
being False, and also a 'compareSignalingEqual' operation, such
that compareSignalingEqual(NaN, x) causes an 'invalid operation
exception'. See sections 5.6.1 and 5.11 of the standard for
details.
Throughout the above, 'NaN' means quiet NaN. A comparison
involving a signaling NaN should always cause an invalid operation
exception. I don't think Python really supports signaling NaNs
in any meaningful way.
I wonder what happens if you create an sNaN using
struct.unpack(suitable_byte_string) and then try
to do arithmetic on it in Python...
Mark
Well, by that criterion you can dismiss almost anything.
Of course you can assign unique ids to most objects and perform your
identity tests that way. The point is that sometimes you do need to
test for the identity of the object, not merely the equivalent
semantic value.
If, faced with this problem (and I'm guessing you haven't faced it
much) your approach is always to define a unique id, so that you can
avoid ever having to use the "is" operator, be my guest. As for me, I
do program in the sort of areas where identity testing is common, and
I don't care to define ids just to test for identity alone, so for me
"is" is useful.
Carl Banks
Yes.
Newbies: never assume that the interpreter keeps just one copy of any
value. Just because a == b that doesn't mean that a is b. *Sometimes* it
will be, but it isn't usually guaranteed.
> of even better
>
> Newbies: Never use "is" to compare anything.
>
> This will help them avoid traps, and won't hurt their use of the
> language. If they get to a point that they need to contemplate using
> "is", then almost be definition, they are not a newbie anymore, and the
> rule is still valid.
personally I believe newbies should be allowed the freedom to shoot
themselves in the foot occasionally, and will happily explain the issues
that arise when they do so. It's all good learning.
I think using "is" to compare mutable objects is a difficult topic to
explain, and I think your division of objects into mutable and immutable
types is unhelpful and not to-the-point.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Want to know? Come to PyCon - soon! http://us.pycon.org/
For example when providing a unique "sentinel" value as a function
argument. The parameter must be tested for identity with the sentinel.
a is b
is entirely equivalent to
id(a) == id(b)