is None or == None ?

mk

unread,

Nov 6, 2009, 8:20:42 AM11/6/09

to pytho...@python.org

Hello,

Some claim that one should test for None using:

if x is None:

..but the standard equality which is theoretically safer works as well:

if x == None:

So, which one is recommended?

Can there be two None objects in interpreter's memory? Is testing for
identity of some variable with None safe? Does language guarantee that?
Or is it just property of implementation?

Regards,
mk

Stefan Behnel

unread,

Nov 6, 2009, 8:33:14 AM11/6/09

to

mk, 06.11.2009 14:20:

> Some claim that one should test for None using:
>
> if x is None:

Which is the correct and safe way of doing it.

> ..but the standard equality which is theoretically safer works as well:
>
> if x == None:

Absolutely not safe, think of

class Test(object):
def __eq__(self, other):
return other == None

print Test() == None, Test() is None

Stefan

Alf P. Steinbach

unread,

Nov 6, 2009, 8:35:14 AM11/6/09

to

* mk:

As I understand it, 'is' will always work and will always be efficient (it just
checks the variable's type), while '==' can depend on the implementation of
equality checking for the other operand's class.

Cheers & hth.,

- Alf

John Machin

unread,

Nov 6, 2009, 9:09:03 AM11/6/09

to

On Nov 7, 12:35 am, "Alf P. Steinbach" <al...@start.no> wrote:
> * mk:
>
> > Hello,
>
> > Some claim that one should test for None using:
>
> > if x is None:
>
> > ..but the standard equality which is theoretically safer works as well:
>
> > if x == None:
>
> > So, which one is recommended?
>
>

> As I understand it, 'is' will always work and will always be efficient (it just
> checks the variable's type),

It doesn't check the type. It doesn't need to. (x is y) is true if x
and y are the same object. If that is so, then of course (type(x) is
type(y)) is true, and if not so, their types are irrelevant. "is"
testing is very efficient in the CPython implementation: addressof(x)
== addressof(y)

Marco Mariani

unread,

Nov 6, 2009, 9:13:14 AM11/6/09

to

Alf P. Steinbach wrote:

> As I understand it, 'is' will always work and will always be efficient
> (it just checks the variable's type), while '==' can depend on the
> implementation of equality checking for the other operand's class.

"== None" makes sense, for instance, in the context of the SQLAlchemy
sql construction layer, where the underlying machinery defines __eq__()
/ __ne__() and generates the appropriate 'IS NULL' SQL code when
appropriate.

mk

unread,

Nov 6, 2009, 9:32:52 AM11/6/09

to pytho...@python.org

Stefan Behnel wrote:
> mk, 06.11.2009 14:20:
>> Some claim that one should test for None using:
>>
>> if x is None:
>
> Which is the correct and safe way of doing it.

ok

>> ..but the standard equality which is theoretically safer works as well:
>>
>> if x == None:
>
> Absolutely not safe, think of
>
> class Test(object):
> def __eq__(self, other):
> return other == None
>
> print Test() == None, Test() is None

Err, I don't want to sound daft, but what is wrong in this example? It
should work as expected:

>>> class Test(object):
... def __eq__(self, other):
... return other == None
...
>>> Test() is None
False
>>> Test() == None
True

My interpretation of 1st call is that it is correct: instance Test() is
not None (in terms of identity), but it happens to have value equal to
None (2nd call).

Or perhaps your example was supposed to show that I should test for
identity with None, not for value with None?

That, however, opens a can of worms, sort of: whether one should compare
Test() for identity with None or for value with None depends on what
programmer meant at the moment.

Regards,
mk

Daniel Fetchinson

unread,

Nov 6, 2009, 9:51:56 AM11/6/09

to Python

> Some claim that one should test for None using:
>
> if x is None:
>

> ..but the standard equality which is theoretically safer works as well:
>
> if x == None:
>

> So, which one is recommended?

It depends on what you want to do. There is no single answer to your
question, until you tell us what your context is.

The None object in python is a singleton. If you want to check whether
a given object is this singleton None object (note that this is
unambiguous, it either occupies the same address in memory as the
singleton None or it doesn't) then you should use 'x is None'.

If, however you want to test for equality, you should use 'x == None'.
An object 'x' may implement equality checks in a funny way and it may
be that 'x == None' is True although 'x is None' is False. But it may
be that this is exactly what you want, because the author of the
object 'x' intentionally has written the equality check code to make
this so. In this case you need 'x == None'.

So unless you tell us in what context you are asking this question,
there is no single answer to it.

Cheers,
Daniel

--
Psss, psss, put it down! - http://www.cafepress.com/putitdown

Alf P. Steinbach

unread,

Nov 6, 2009, 10:16:54 AM11/6/09

to

* John Machin:

Maybe.

I imagined it wouldn't waste additional space for e.g. (Python 2.x) int values,
but just use the same space as used for pointer in the case of e.g. string, in
which case it would have to check the type -- an integer can very easily have
the same bitpattern as a pointer residing there.

If you imagine that instead, for an integer variable x it stores the integer
value in the variable in some other place than ordinarily used for pointer, and
let the pointer point to that place in the same variable, then without checking
type the 'is' operator should report false for 'x = 3; y = 3; x is y', but it
doesn't with my Python installation, so if it doesn't check the type then even
this half-measure (just somewhat wasteful of space) optimization isn't there.

In short, you're saying that there is an extreme inefficiency with every integer
dynamically allocated /plus/, upon production of an integer by e.g. + or *,
inefficiently finding the previously allocated integer of that value and
pointing there, sort of piling inefficiency on top of inefficiency, which is
absurd but I have seen absurd things before so it's not completely unbelievable.

I hope someone else can comment on these implications of your statement.

Cheers,

- Alf

Marco Mariani

unread,

Nov 6, 2009, 10:51:18 AM11/6/09

to

Alf P. Steinbach wrote:

> If you imagine that instead, for an integer variable x it stores the
> integer value in the variable in some other place than ordinarily used
> for pointer, and let the pointer point to that place in the same
> variable, then without checking type the 'is' operator should report
> false for 'x = 3; y = 3; x is y', but it doesn't with my Python

Yes, CPython caches a handful of small, "commonly used" integers, and
creates objects for them upon startup. Using "x is y" with integers
makes no sense and has no guaranteed behaviour AFAIK

> In short, you're saying that there is an extreme inefficiency with every
> integer dynamically allocated /plus/, upon production of an integer by
> e.g. + or *, inefficiently finding the previously allocated integer of
> that value and pointing there,

no, it doesn't "point there":

>>>> a=1E6
>>>> a is 1E6
> False
>>>> a=100
>>>> a is 100
> True

Alf P. Steinbach

unread,

Nov 6, 2009, 11:54:53 AM11/6/09

to

* Marco Mariani:

I stand corrected on that issue, I didn't think of cache for small values.

On my CPython 3.1.1 the cache seems to support integer values -5 to +256,
inclusive, apparently using 16 bytes of storage per value (this last assuming
id() just returns the address).

But wow. That's pretty hare-brained: dynamic allocation for every stored value
outside the cache range, needless extra indirection for every operation.

Even Microsoft COM managed to get this right.

On the positive side, except that it would probably break every C module (I
don't know), in consultant speak that's definitely a potential for improvement. :-p

Cheers,

- Alf

Rami Chowdhury

unread,

Nov 6, 2009, 12:04:18 PM11/6/09

to Alf P. Steinbach, pytho...@python.org

On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
wrote:

> But wow. That's pretty hare-brained: dynamic allocation for every stored

> value outside the cache range, needless extra indirection for every
> operation.
>

Perhaps I'm not understanding this thread at all but how is dynamic
allocation hare-brained, and what's the 'needless extra indirection'?

--
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" --
Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)

Raymond Hettinger

unread,

Nov 6, 2009, 12:21:55 PM11/6/09

to

On Nov 6, 5:20 am, mk <mrk...@gmail.com> wrote:
> Some claim that one should test for None using:
>
> if x is None:
>
> ..but the standard equality which is theoretically safer works as well:
>
> if x == None:
>
> So, which one is recommended?

In the standard library, we use "x is None".

The official recommendation in PEP 8 reads:
'''
Comparisons to singletons like None should always be done with
'is' or 'is not', never the equality operators.

Also, beware of writing "if x" when you really mean "if x is not
None"
-- e.g. when testing whether a variable or argument that
defaults to
None was set to some other value. The other value might have a
type
(such as a container) that could be false in a boolean context!
'''

Raymond

Alf P. Steinbach

unread,

Nov 6, 2009, 12:28:08 PM11/6/09

to

* Rami Chowdhury:

> On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
> wrote:
>
>> But wow. That's pretty hare-brained: dynamic allocation for every
>> stored value outside the cache range, needless extra indirection for
>> every operation.
>>
>
> Perhaps I'm not understanding this thread at all but how is dynamic
> allocation hare-brained, and what's the 'needless extra indirection'?

Dynamic allocation isn't hare-brained, but doing it for every stored integer
value outside a very small range is, because dynamic allocation is (relatively
speaking, in the context of integer operations) very costly even with a
(relatively speaking, in the context of general dynamic allocation) very
efficient small-objects allocator - here talking order(s) of magnitude.

A typical scheme for representing dynamically typed objects goes like, in C++,

enum TypeId { int_type_id, dyn_object_type_id };

struct Object
{
int type_id;
union
{
void* p;
int i;
// Perhaps other special cased type's values in this union.
};
};

This would then be the memory layout of what's regarded as a variable at the
script language level.

Then getting the integer value reduces to

int intValueOf( Object const& o )
{
if( o.type_id != int_type_id ) { throw TypeError(); }
return o.i;
}

If on the other hand int (and perhaps floating point type, whatever) isn't
special-cased, then it goes like

int intValueOf( Object const& o )
{
if( o.type_id != int_type_id ) { throw TypeError(); }
return static_cast<IntType*>( o.p )->value; // Extra indirection
}

and depending on where the basic type id is stored it may be more extra
indirection, and worse, creating that value then involves a dynamic allocation.

Cheers & hth.

- Alf

Hrvoje Niksic

unread,

Nov 6, 2009, 12:22:40 PM11/6/09

to

"Alf P. Steinbach" <al...@start.no> writes:

> But wow. That's pretty hare-brained: dynamic allocation for every
> stored value outside the cache range, needless extra indirection for
> every operation.
>
> Even Microsoft COM managed to get this right.
>
> On the positive side, except that it would probably break every C
> module (I don't know), in consultant speak that's definitely a
> potential for improvement. :-p

Tagged integers have been tried, shown not really worth it, and
ultimately rejected by the BDFL:

http://mail.python.org/pipermail/python-dev/2004-July/thread.html#46139

Alf P. Steinbach

unread,

Nov 6, 2009, 1:45:02 PM11/6/09

to

* Hrvoje Niksic:

Yah, as I suspected. I looked at the first few postings in that thread and it
seems an inefficient baroque implementation was created and tested, not
realizing more than 50% speedup in a test not particularly much exercising its
savings, and against that counts as mentioned in the thread and as I mentioned
in quoted material above, breaking lots of existing C code.

Speedup would likely be more realistic with normal implementation (not fiddling
with bit-fields and stuff) not to mention when removing other inefficiencies
that likely dwarf and hide the low-level performance increase, but still I agree
wholeheartedly with those who argue compatibility, not breaking code.

As long as it Works, don't fix it... ;-)

Cheers, (still amazed, though)

- Alf

Carl Banks

unread,

Nov 6, 2009, 2:20:40 PM11/6/09

to

On Nov 6, 9:28 am, "Alf P. Steinbach" <al...@start.no> wrote:
> * Rami Chowdhury:
>
> > On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
> > wrote:
>
> >> But wow. That's pretty hare-brained: dynamic allocation for every
> >> stored value outside the cache range, needless extra indirection for
> >> every operation.
>
> > Perhaps I'm not understanding this thread at all but how is dynamic
> > allocation hare-brained, and what's the 'needless extra indirection'?
>
> Dynamic allocation isn't hare-brained, but doing it for every stored integer
> value outside a very small range is, because dynamic allocation is (relatively
> speaking, in the context of integer operations) very costly even with a
> (relatively speaking, in the context of general dynamic allocation) very
> efficient small-objects allocator - here talking order(s) of magnitude.

Python made a design trade-off, it chose a simpler implementation and
uniform object semantic behavior, at a cost of speed. C# made a
different trade-off, choosing a more complex implementation, a
language with two starkly different object semantic behaviors, so as
to allow better performance.

You don't have to like the decision Python made, but I don't think
it's fair to call a deliberate design trade-off hare-brained.

Carl Banks

Rami Chowdhury

unread,

Nov 6, 2009, 2:28:15 PM11/6/09

to Alf P. Steinbach, pytho...@python.org

On Fri, 06 Nov 2009 09:28:08 -0800, Alf P. Steinbach <al...@start.no>
wrote:

> * Rami Chowdhury:
>> On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
>> wrote:
>>
>>> But wow. That's pretty hare-brained: dynamic allocation for every
>>> stored value outside the cache range, needless extra indirection for
>>> every operation.
>>>
>> Perhaps I'm not understanding this thread at all but how is dynamic
>> allocation hare-brained, and what's the 'needless extra indirection'?
>
> Dynamic allocation isn't hare-brained, but doing it for every stored
> integer value outside a very small range is, because dynamic allocation
> is (relatively speaking, in the context of integer operations) very
> costly even with a (relatively speaking, in the context of general
> dynamic allocation) very efficient small-objects allocator - here
> talking order(s) of magnitude.

Well, sure, it may seem that way. But how large a cache would you want to
preallocate? I can't see the average Python program needing to use the
integers from -10000 to 10000, for instance. In my (admittedly limited)
experience Python programs typically deal with rather more complex objects
than plain integers.

> int intValueOf( Object const& o )
> {
> if( o.type_id != int_type_id ) { throw TypeError(); }
> return static_cast<IntType*>( o.p )->value; // Extra
> indirection
> }

If a large cache were created and maintained, would it not be equally
indirect to check for the presence of a value in the cache, and return
that value if it's present?

> creating that value then involves a dynamic allocation.

Creating which value, sorry -- the type object?

Alf P. Steinbach

unread,

Nov 6, 2009, 2:41:41 PM11/6/09

to

* Carl Banks:

> On Nov 6, 9:28 am, "Alf P. Steinbach" <al...@start.no> wrote:
>> * Rami Chowdhury:
>>
>>> On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
>>> wrote:
>>>> But wow. That's pretty hare-brained: dynamic allocation for every
>>>> stored value outside the cache range, needless extra indirection for
>>>> every operation.
>>> Perhaps I'm not understanding this thread at all but how is dynamic
>>> allocation hare-brained, and what's the 'needless extra indirection'?
>> Dynamic allocation isn't hare-brained, but doing it for every stored integer
>> value outside a very small range is, because dynamic allocation is (relatively
>> speaking, in the context of integer operations) very costly even with a
>> (relatively speaking, in the context of general dynamic allocation) very
>> efficient small-objects allocator - here talking order(s) of magnitude.
>
>
> Python made a design trade-off, it chose a simpler implementation

Note that the object implementation's complexity doesn't have to affect to any
other code since it's trivial to provide abstract accessors (even macros), i.e.,
this isn't part of a trade-off except if the original developer(s) had limited
resources -- and if so then it wasn't a trade-off at the language design level
but a trade-off of getting things done then and there.

> and uniform object semantic behavior,

Also note that the script language level semantics of objects is /unaffected/ by
the implementation, except for speed, i.e., this isn't part of a trade-off
either. ;-)

> at a cost of speed.

In summary, the trade-off, if any, couldn't as I see it be what you describe,
but there could have been a different kind of getting-it-done trade-off.

It is usually better with Something Usable than waiting forever (or too long)
for the Perfect... ;-)

Or, it could be that things just evolved, constrained by frozen earlier
decisions. That's the main reason for the many quirks in C++. Not unlikely that
it's also that way for Python.

> C# made a
> different trade-off, choosing a more complex implementation, a
> language with two starkly different object semantic behaviors, so as
> to allow better performance.

Don't know about the implementation of C#, but whatever it is, if it's bad in
some respect then that has nothing to do with Python.

> You don't have to like the decision Python made, but I don't think
> it's fair to call a deliberate design trade-off hare-brained.

OK. :-)

Cheers,

- Alf

Alf P. Steinbach

unread,

Nov 6, 2009, 2:50:33 PM11/6/09

to

* Rami Chowdhury:
> On Fri, 06 Nov 2009 09:28:08 -0800, Alf P. Steinbach <al...@start.no>
> wrote:
>
>> * Rami Chowdhury:
>>> On Fri, 06 Nov 2009 08:54:53 -0800, Alf P. Steinbach <al...@start.no>
>>> wrote:
>>>
>>>> But wow. That's pretty hare-brained: dynamic allocation for every
>>>> stored value outside the cache range, needless extra indirection for
>>>> every operation.
>>>>
>>> Perhaps I'm not understanding this thread at all but how is dynamic
>>> allocation hare-brained, and what's the 'needless extra indirection'?
>>
>> Dynamic allocation isn't hare-brained, but doing it for every stored
>> integer value outside a very small range is, because dynamic
>> allocation is (relatively speaking, in the context of integer
>> operations) very costly even with a (relatively speaking, in the
>> context of general dynamic allocation) very efficient small-objects
>> allocator - here talking order(s) of magnitude.
>
> Well, sure, it may seem that way. But how large a cache would you want
> to preallocate? I can't see the average Python program needing to use
> the integers from -10000 to 10000, for instance. In my (admittedly
> limited) experience Python programs typically deal with rather more
> complex objects than plain integers.

Uhm, you've misunderstood or failed to understand something basic, but what? The
CPython implementation uses a cache to alleviate problems with performance. A
tagged scheme (the usual elsewhere, e.g. Windows' Variant) doesn't need any
cache and can't benefit from such a cache, since then an integer's value is
directly available in any variable that logically holds an int. In short, a
cache for integer values is maningless for the tagged scheme.

>> int intValueOf( Object const& o )
>> {
>> if( o.type_id != int_type_id ) { throw TypeError(); }
>> return static_cast<IntType*>( o.p )->value; // Extra
>> indirection
>> }
>
> If a large cache were created and maintained, would it not be equally
> indirect to check for the presence of a value in the cache, and return
> that value if it's present?

Again, that's meaningless. See above.

>> creating that value then involves a dynamic allocation.
>
> Creating which value, sorry -- the type object?

Well it's an out-of-context quote, but t'was about creating the value object
that a variable contains a pointer to with the current CPython implementation.

I'm sure that more information about tagged variant schemes are available on the
net.

E.g. Wikipedia.

Mel

unread,

Nov 6, 2009, 3:12:43 PM11/6/09

to

Alf P. Steinbach wrote:
> Note that the object implementation's complexity doesn't have to affect to
> any other code since it's trivial to provide abstract accessors (even
> macros), i.e., this isn't part of a trade-off except if the original
> developer(s) had limited
> resources -- and if so then it wasn't a trade-off at the language design
> level but a trade-off of getting things done then and there.

But remember what got us in here: your belief (which followed from your
assumptions) that computing `is` required testing the object types. You
might optimize out the "extra indirection" to get an object's value, but
you'd need the "extra indirection" anyway to find out what type it was
before you could use it.

Mel.

Rami Chowdhury

unread,

Nov 6, 2009, 3:19:14 PM11/6/09

to Alf P. Steinbach, pytho...@python.org

On Fri, 06 Nov 2009 11:50:33 -0800, Alf P. Steinbach <al...@start.no>
wrote:

Oh, I see, you were referring to a tagging scheme as an alternative. Sorry
for the misunderstanding.

>
> Well it's an out-of-context quote, but t'was about creating the value
> object that a variable contains a pointer to with the current CPython
> implementation.
>

Again, perhaps I'm just misunderstanding what you're saying, but as I
understand it, in CPython if you're looking for the value of a
PyIntObject, that's stored right there in the structure, so no value
object needs to be created...

Alf P. Steinbach

unread,

Nov 6, 2009, 3:49:41 PM11/6/09

to

* Mel:

> Alf P. Steinbach wrote:
>> Note that the object implementation's complexity doesn't have to affect to
>> any other code since it's trivial to provide abstract accessors (even
>> macros), i.e., this isn't part of a trade-off except if the original
>> developer(s) had limited
>> resources -- and if so then it wasn't a trade-off at the language design
>> level but a trade-off of getting things done then and there.
>
> But remember what got us in here: your belief (which followed from your
> assumptions) that computing `is` required testing the object types.

Yes, I couldn't believe what I've now been hearing. Uh, reading. :-)

> You
> might optimize out the "extra indirection" to get an object's value, but
> you'd need the "extra indirection" anyway to find out what type it was
> before you could use it.

No, that type checking is limited (it just checks whether the type is special
cased), doesn't involve indirection, and is there anyway except for 'is'. It can
be moved around but it's there, or something more costly is there. 'is' is about
the only operation you /can/ do without checking the type, but I don't see the
point in optimizing 'is' at cost of all other operations on basic types.

Carl Banks

unread,

Nov 6, 2009, 3:53:10 PM11/6/09

to

On Nov 6, 11:41 am, "Alf P. Steinbach" <al...@start.no> wrote:
> Note that the object implementation's complexity doesn't have to affect to any
> other code since it's trivial to provide abstract accessors (even macros), i.e.,
> this isn't part of a trade-off except if the original developer(s) had limited
> resources -- and if so then it wasn't a trade-off at the language design level
> but a trade-off of getting things done then and there.

I totally disagree with this; it would be like squaring the
implementation complexity. It is far from "trivial" as you claim.
Even if it were just a matter of accessor macros (and it isn't) they
don't write themselves, especially when you focused on speed, so
that's a non-trivial complexity increase already. But you besides
writing code you now have reading code (which is now cluttered with
all kinds of ugly accessor macros, as if the Python API wasn't ugly
enough), debugging code, maintaining code, understanding semantics and
nuances, handling all the extra corner cases. To say it's trivial is
absurd.

> > C# made a
> > different trade-off, choosing a more complex implementation, a
> > language with two starkly different object semantic behaviors, so as
> > to allow better performance.
>
> Don't know about the implementation of C#, but whatever it is, if it's bad in
> some respect then that has nothing to do with Python.

C# is a prototypical example of a language that does what you were
suggesting (also it draws upon frameworks like COM, which you
mentioned) so it is a basis of comparison of the benefits versus
drawbacks of the two approaches.

Carl Banks

Stefan Behnel

unread,

Nov 7, 2009, 2:20:20 AM11/7/09

to

mk, 06.11.2009 15:32:

> Stefan Behnel wrote:
>> class Test(object):
>> def __eq__(self, other):
>> return other == None
>>
>> print Test() == None, Test() is None
>
> Err, I don't want to sound daft, but what is wrong in this example? It
> should work as expected:
>
> >>> class Test(object):
> ... def __eq__(self, other):
> ... return other == None
> ...
> >>> Test() is None
> False
> >>> Test() == None
> True

Yes, and it shows you that things can compare equal to None without being None.

> Or perhaps your example was supposed to show that I should test for
> identity with None, not for value with None?

Instead of "value" you mean "equality" here, I suppose. While there are
certain rare use cases where evaluating non-None objects as equal to None
makes sense, in normal use, you almost always want to know if a value is
exactly None, not just something that happens to return True when
calculating its equality to None, be it because of a programmer's concious
consideration or buggy implementation.

Stefan

Steven D'Aprano

unread,

Nov 7, 2009, 10:05:26 AM11/7/09

to

On Fri, 06 Nov 2009 16:51:18 +0100, Marco Mariani wrote:

> Using "x is y" with integers
> makes no sense and has no guaranteed behaviour AFAIK

Of course it makes sense. `x is y` means *exactly the same thing* for
ints as it does with any other object: it tests for object identity.
That's all it does, and it does it perfectly.

Python makes no promise whether x = 3; y = 3 will use the same object for
both x and y or not. That's an implementation detail. That's not a
problem with `is`, it is a problem with developers who make unjustified
assumptions.

--
Steven

Terry Reedy

unread,

Nov 7, 2009, 3:31:02 PM11/7/09

to pytho...@python.org

Which is to say, it normally makes no sense to write 'm is n' for m, n ints.

The *exception* is when one is exploring implementation details, either
to discover them or to test that they are as intended. So, last I
looked, the test suite for ints makes such tests. If the implementation
changes, the test should change also.

The problem comes when newbies use 'is' without realizing that they are
doing black-box exploration of otherwise irrelevant internals.
(White-box exploration would be reading the code, which makes it plain
what is going on ;-).

Terry Jan Reedy

sturlamolden

unread,

Nov 7, 2009, 5:22:28 PM11/7/09

to

On 6 Nov, 14:35, "Alf P. Steinbach" <al...@start.no> wrote:

> As I understand it, 'is' will always work and will always be efficient (it just
> checks the variable's type), while '==' can depend on the implementation of
> equality checking for the other operand's class.

'==' checks for logical equality. 'is' checks for object identity.

None is a singleton of type NoneType. Since None evaluates to True
only when compared against itself, it is safe to use both operators.

sturlamolden

unread,

Nov 7, 2009, 5:27:05 PM11/7/09

to

On 6 Nov, 18:28, "Alf P. Steinbach" <al...@start.no> wrote:

> Dynamic allocation isn't hare-brained, but doing it for every stored integer
> value outside a very small range is, because dynamic allocation is (relatively
> speaking, in the context of integer operations) very costly even with a
> (relatively speaking, in the context of general dynamic allocation) very
> efficient small-objects allocator - here talking order(s) of magnitude.

When it matters, we use NumPy and/or Cython.

sturlamolden

unread,

Nov 7, 2009, 5:54:34 PM11/7/09

to

On 6 Nov, 17:54, "Alf P. Steinbach" <al...@start.no> wrote:

> But wow. That's pretty hare-brained: dynamic allocation for every stored value
> outside the cache range, needless extra indirection for every operation.

First, integers are not used the same way in Python as they are in C+
+. E.g. you typically don't iterate over them in a for loop, but
rather iterate on the container itself. Second, if you need an array
of integers or floats, that is usually not done with a list: you would
use numpy.ndarray or array.array, and values are stored compactly.

A Python list is a list, it is not an array. If you were to put
integers in dynamic data structures in other languages (Java, C++),
you would use dynamic allocation as well. Yes a list is implemented as
an array of pointers, amortized to O(1) for appends, but that is an
implementation detail.

Python is not the only language that works like this. There are also
MATLAB and Lisp. I know you have a strong background in C++, but when
you are using Python you must unlearn that way of thinking.

Finally: if none of these helps, we can always resort to Cython.

In 99% of cases where integers are bottlenecks in Python, it is
indicative of bad style. We very often see this from people coming
form C++ and Java background, and subsequent claims that "Python is
slow". Python is not an untyped Java. If you use it as such, it will
hurt. Languages like Python, Perl, Common Lisp, and MATLAB require a
different mindset from the programmer.

Steven D'Aprano

unread,

Nov 7, 2009, 6:09:58 PM11/7/09

to

On Sat, 07 Nov 2009 14:22:28 -0800, sturlamolden wrote:

> On 6 Nov, 14:35, "Alf P. Steinbach" <al...@start.no> wrote:
>
>> As I understand it, 'is' will always work and will always be efficient
>> (it just checks the variable's type), while '==' can depend on the
>> implementation of equality checking for the other operand's class.
>
> '==' checks for logical equality. 'is' checks for object identity.

So far so good, although technically == merely calls __eq__, which can be
over-ridden to do (nearly) anything you like:

>>> class Funny(object):
... def __eq__(self, other):
... return self.payload + other
...
>>> f = Funny()
>>> f.payload = 5
>>> f == 10
15

> None is a singleton of type NoneType. Since None evaluates to True only
> when compared against itself,

That's wrong. None never evaluates to True, it always evaluates as None,
in the same way that 42 evaluates as 42 and [1,2,3] evaluates as [1,2,3].
Python literals evaluate as themselves, always.

Perhaps you mean that *comparisons* of None evaluate to True only if both
operands are None. That's incorrect too:

>>> None > None
False

You have to specify the comparison. It would be a pretty strange language
if both None==None and None!=None returned True.

> it is safe to use both operators.

Only if you want unexpected results if somebody passes the wrong sort of
object to your code.

>>> class NoneProxy:
... def __eq__(self, other):
... if other is None: return True
... return False
...
>>> o = NoneProxy()
>>> o is None
False
>>> o == None
True

You should use == *only* if you want to test for objects which are equal
to None, *whatever that object may be*, and is if you want to test for
None itself.

--
Steven

Hrvoje Niksic

unread,

Nov 7, 2009, 8:25:42 PM11/7/09

to

"Alf P. Steinbach" <al...@start.no> writes:

> Speedup would likely be more realistic with normal implementation (not
> fiddling with bit-fields and stuff)

I'm not sure I understand this. How would you implement tagged integers
without encoding type information in bits of the pointer value?

Alf P. Steinbach

unread,

Nov 8, 2009, 1:27:27 AM11/8/09

to

* Hrvoje Niksic:

A normal tag field, as illustrated in code earlier in the thread.

Hrvoje Niksic

unread,

Nov 8, 2009, 5:00:17 AM11/8/09

to

Ah, I see it now. That proposal effectively doubles the size of what is
now a PyObject *, meaning that lists, dicts, etc., would also double
their memory requirements, so it doesn't come without downsides. On the
other hand, tagged pointers have been used in various Lisp
implementations for decades, nothing really "baroque" (or inherently
slow) about them.

Alf P. Steinbach

unread,

Nov 8, 2009, 5:34:55 AM11/8/09

to

* Hrvoje Niksic:
> "Alf P. Steinbach" <al...@start.no> writes:
>
>> * Hrvoje Niksic:
>>> "Alf P. Steinbach" <al...@start.no> writes:
>>>
>>>> Speedup would likely be more realistic with normal implementation (not
>>>> fiddling with bit-fields and stuff)
>>> I'm not sure I understand this. How would you implement tagged integers
>>> without encoding type information in bits of the pointer value?
>> A normal tag field, as illustrated in code earlier in the thread.
>
> Ah, I see it now. That proposal effectively doubles the size of what is
> now a PyObject *, meaning that lists, dicts, etc., would also double
> their memory requirements, so it doesn't come without downsides.

Whether it increases memory usage depends on the data mix in the program's
execution.

For a program primarily handling objects of atomic types (like int) it saves
memory, since each value (generally) avoids a dynamically allocated object.

Bit-field fiddling may save a little more memory, and is nearly guaranteed to
save memory.

But memory usage isn't an issue except to the degree it affects the OS's virtual
memory manager.

Slowness is an issue -- except that keeping compatibility is IMO a more
important issue (don't fix, at cost, what works).

> On the
> other hand, tagged pointers have been used in various Lisp
> implementations for decades, nothing really "baroque" (or inherently
> slow) about them.

Unpacking of bit fields generally adds overhead. The bit fields need to be
unpacked for (e.g.) integer operations.

Lisp once ran on severely memory constrained machines.

Terry Reedy

unread,

Nov 8, 2009, 2:45:31 PM11/8/09

to pytho...@python.org

I believe the use of tagged pointers has been considered and so far
rejected by the CPython developers. And no one else that I know of has
developed a fork for that. It would seem more feasible with 64 bit
pointers where there seem to be spare bits. But CPython will have to
support 32 bit machines for several years.

Terry Jan Reedy

Rhodri James

unread,

Nov 9, 2009, 8:02:49 PM11/9/09

to pytho...@python.org

On Sun, 08 Nov 2009 19:45:31 -0000, Terry Reedy <tjr...@udel.edu> wrote:

> I believe the use of tagged pointers has been considered and so far
> rejected by the CPython developers. And no one else that I know of has
> developed a fork for that. It would seem more feasible with 64 bit
> pointers where there seem to be spare bits. But CPython will have to
> support 32 bit machines for several years.

I've seen that mistake made twice (IBM 370 architecture (probably 360 too,
I'm too young to have used it) and ARM2/ARM3). I'd rather not see it a
third time, thank you.

--
Rhodri James *-* Wildebeest Herder to the Masses

Grant Edwards

unread,

Nov 10, 2009, 10:46:10 AM11/10/09

to

MacOS applications made the same mistake on the 68K. They
reserved the high-end bits in a 32-bit pointer and used them to
contain meta-information. After all, those bits were "extra" --
nobody could ever hope to actually address more than 4MB of
memory, right? Heck, those address lines weren't even brought
out of the CPU package.

Guess what happened?

It wasn't the decades-long global debacle that was the MS-DOS
memory model, but it did cause problems when CPUs came out that
implemented those address lines and RAM became cheap enough
that people needed to use them.

--
Grant Edwards grante Yow! Either CONFESS now or
at we go to "PEOPLE'S COURT"!!
visi.com

Marco Mariani

unread,

Nov 10, 2009, 10:56:06 AM11/10/09

to

Grant Edwards wrote:

> MacOS applications made the same mistake on the 68K.

And and awful lot of the Amiga software, with the same 24/32 bit CPU.

I did it too, every pointer came with 8 free bits so why not use them?

> It wasn't the decades-long global debacle that was the MS-DOS
> memory model, but it did cause problems when CPUs came out that
> implemented those address lines and RAM became cheap enough
> that people needed to use them.

I suppose that's the reason many games didn't work on the 68020+

Grant Edwards

unread,

Nov 10, 2009, 11:46:38 AM11/10/09

to

On 2009-11-10, Marco Mariani <ma...@sferacarta.com> wrote:
> Grant Edwards wrote:
>
>> MacOS applications made the same mistake on the 68K.

> And and awful lot of the Amiga software, with the same 24/32
> bit CPU.
>
> I did it too, every pointer came with 8 free bits so why not
> use them?

TANSTAFB ;)

I should probably add that MacOS itself used the same trick
until system 7.

>> It wasn't the decades-long global debacle that was the MS-DOS
>> memory model, but it did cause problems when CPUs came out that
>> implemented those address lines and RAM became cheap enough
>> that people needed to use them.
>
> I suppose that's the reason many games didn't work on the 68020+

Probably. IIRC, it took a while for some vendors to come out
with "32-bit clean" versions of products.

http://en.wikipedia.org/wiki/Mac_OS_memory_management#32-bit_clean

--
Grant Edwards grante Yow! I know how to do
at SPECIAL EFFECTS!!
visi.com

Steven D'Aprano

unread,

Nov 10, 2009, 1:55:25 PM11/10/09

to

On Tue, 10 Nov 2009 15:46:10 +0000, Grant Edwards wrote:

> On 2009-11-10, Rhodri James <rho...@wildebst.demon.co.uk> wrote:
>> On Sun, 08 Nov 2009 19:45:31 -0000, Terry Reedy <tjr...@udel.edu>
>> wrote:
>>
>>> I believe the use of tagged pointers has been considered and so far
>>> rejected by the CPython developers. And no one else that I know of has
>>> developed a fork for that. It would seem more feasible with 64 bit
>>> pointers where there seem to be spare bits. But CPython will have to
>>> support 32 bit machines for several years.
>>
>> I've seen that mistake made twice (IBM 370 architecture (probably 360
>> too, I'm too young to have used it) and ARM2/ARM3). I'd rather not see
>> it a third time, thank you.
>
> MacOS applications made the same mistake on the 68K. They reserved the
> high-end bits in a 32-bit pointer and used them to contain
> meta-information.

Obviously that was their mistake. They should have used the low-end bits
for the metadata, instead of the more valuable high-end.

High-end-is-always-better-right?-ly y'rs,

--
Steven

Rhodri James

unread,

Nov 10, 2009, 6:18:20 PM11/10/09

to pytho...@python.org

Oh, ARM used the low bits too. After all, instructions were 4-byte
aligned, so the PC had all those bits going spare...

Vincent Manis

unread,

Nov 10, 2009, 7:05:01 PM11/10/09

to pytho...@python.org

On 2009-11-10, at 07:46, Grant Edwards wrote:
> MacOS applications made the same mistake on the 68K. They
> reserved the high-end bits

At the time the 32-bit Macs were about to come on the market, I saw an internal confidential document that estimated that at least over 80% of the applications that the investigators had looked at (including many from that company named after a fruit, whose head office is on Infinite Loop) were not 32-bit clean. This in spite of the original edition of Inside Mac (the one that looked like a telephone book) that specifically said always to write 32-bit clean apps, as 32-bit machines were expected in the near future.

It's not quite as bad as the program I once looked at that was released in 1999 and was not Y2K compliant, but it's pretty close.

--v

Steven D'Aprano

unread,

Nov 10, 2009, 10:07:36 PM11/10/09

to

On Tue, 10 Nov 2009 16:05:01 -0800, Vincent Manis wrote:

> At the time the 32-bit Macs were about to come on the market, I saw an
> internal confidential document that estimated that at least over 80% of
> the applications that the investigators had looked at (including many
> from that company named after a fruit, whose head office is on Infinite
> Loop) were not 32-bit clean. This in spite of the original edition of
> Inside Mac (the one that looked like a telephone book) that specifically
> said always to write 32-bit clean apps, as 32-bit machines were expected
> in the near future.

That is incorrect. The original Inside Mac Volume 1 (published in 1985)
didn't look anything like a phone book. The original Macintosh's CPU (the
Motorola 68000) already used 32-bit addressing, but the high eight pins
were ignored since the CPU physically lacked the pins corresponding to
those bits.

In fact, in Inside Mac Vol II, Apple explicitly gives the format of
pointers: the low-order three bytes are the address, the high-order byte
is used for flags: bit 7 was the lock bit, bit 6 the purge bit and bit 5
the resource bit. The other five bits were unused.

By all means criticize Apple for failing to foresee 32-bit apps, but
criticizing them for hypocrisy (in this matter) is unfair. By the time
they recognized the need for 32-bit clean applications, they were stuck
with a lot of legacy code that were not clean. Including code burned into
ROMs.

--
Steven

Grant Edwards

unread,

Nov 10, 2009, 10:39:54 PM11/10/09

to

On 2009-11-11, Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:

> By all means criticize Apple for failing to foresee 32-bit
> apps, but criticizing them for hypocrisy (in this matter) is
> unfair. By the time they recognized the need for 32-bit clean
> applications, they were stuck with a lot of legacy code that
> were not clean. Including code burned into ROMs.

They did manage to climb out of the hole they had dug and fix
things up -- something Microsoft has yet to do after 25 years.

Maybe it's finally going to be different this time around with
Windows 7...

--
Grant

Vincent Manis

unread,

Nov 11, 2009, 1:07:43 AM11/11/09

to pytho...@python.org

On 2009-11-10, at 19:07, Steven D'Aprano wrote:

> On Tue, 10 Nov 2009 16:05:01 -0800, Vincent Manis wrote:
> That is incorrect. The original Inside Mac Volume 1 (published in 1985)
> didn't look anything like a phone book. The original Macintosh's CPU (the
> Motorola 68000) already used 32-bit addressing, but the high eight pins
> were ignored since the CPU physically lacked the pins corresponding to
> those bits.
>
> In fact, in Inside Mac Vol II, Apple explicitly gives the format of
> pointers: the low-order three bytes are the address, the high-order byte
> is used for flags: bit 7 was the lock bit, bit 6 the purge bit and bit 5
> the resource bit. The other five bits were unused.

You are correct. On thinking about it further, my source was some kind of internal developer seminar I attended round about 1985 or so, where an Apple person said `don't use the high order bits, we might someday produce machines that use all 32 address bits', and then winked at us.

You are also correct (of course) about the original `Inside Mac', my copy was indeed 2 volumes in looseleaf binders; the phonebook came later.

> By all means criticize Apple for failing to foresee 32-bit apps, but
> criticizing them for hypocrisy (in this matter) is unfair. By the time
> they recognized the need for 32-bit clean applications, they were stuck
> with a lot of legacy code that were not clean. Including code burned into
> ROMs.

That's my point. I first heard about Moore's Law in 1974 from a talk given by Alan Kay. At about the same time, Gordon Bell had concluded, independently, that one needs extra address bit every 18 months (he was looking at core memory, so the cost factors were somewhat different). All of this should have suggested that relying on any `reserved' bits is always a bad idea.

I am most definitely not faulting Apple for hypocrisy, just saying that programmers sometimes assume that just because something works on one machine, it will work forevermore. And that's unwise.

-- v

Vincent Manis

unread,

Nov 11, 2009, 1:14:14 AM11/11/09

to Vincent Manis, pytho...@python.org

On 2009-11-10, at 22:07, Vincent Manis wrote:

> On 2009-11-10, at 19:07, Steven D'Aprano wrote:

>> In fact, in Inside Mac Vol II, Apple explicitly gives the format of
>> pointers: the low-order three bytes are the address, the high-order byte
>> is used for flags: bit 7 was the lock bit, bit 6 the purge bit and bit 5
>> the resource bit. The other five bits were unused.

I inadvertently must have deleted a paragraph in the response I just posted. Please add: The pointer format would have caused me to write macros or the like (that was still in the days when Apple liked Pascal) to hide the bit representation of pointers.

-- v

greg

unread,

Nov 11, 2009, 4:32:30 AM11/11/09

to

Vincent Manis wrote:

> That's my point. I first heard about Moore's Law in 1974 from a talk given
> by Alan Kay. At about the same time, Gordon Bell had concluded, independently,
> that one needs extra address bit every 18 months

Hmmm. At that rate, we'll use up the extra 32 bits in our
64 bit pointers in another 48 years. So 128-bit machines
ought to be making an appearance around about 2057, and
then we'll be all set until 2153 -- if we're still using
anything as quaintly old-fashioned as binary memory
addresses by then...

--
Greg