[Python-Dev] ctypes: is it intentional that id() is the only way to get the address of an object?

713 views
Skip to first unread message

Steven D'Aprano

unread,
Jan 17, 2019, 5:28:25 AM1/17/19
to pytho...@python.org
Disclaimer: I'm not a ctypes expert, so I might have this completely
wrong. If so, I apologise for the noise.

The id() function is documented as returning an abstract ID number. In
CPython, that happens to have been implemented as the address of the
object.

I understand that the only way to pass the address of an object to
ctypes is to use that id. Is that intentional?

As I see it, there is a conflict between two facts:

- that id() returns a memory address is an implementation detail; as
such users should not rely on it, as the implementation could (in
principle) change without notice;

- but users using ctypes have no choice but to rely on id() returning
the object memory address, as of it were an offical part of the API.

Implementations like PyPy which emulate ctypes, while objects don't have
fixed memory locations, will surely have a problem here. I don't know
how PyPy solves this.

Have I misunderstood something here?



--
Steve
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Antoine Pitrou

unread,
Jan 17, 2019, 5:39:29 AM1/17/19
to pytho...@python.org
On Thu, 17 Jan 2019 21:26:06 +1100
Steven D'Aprano <st...@pearwood.info> wrote:
> Disclaimer: I'm not a ctypes expert, so I might have this completely
> wrong. If so, I apologise for the noise.
>
> The id() function is documented as returning an abstract ID number. In
> CPython, that happens to have been implemented as the address of the
> object.
>
> I understand that the only way to pass the address of an object to
> ctypes is to use that id. Is that intentional?

Can you explain in detail what you're doing?
If you're calling a C API taking a PyObject*, it seems like you should
be using ctypes.py_object as argument type specifier. Various examples
can be found with Google.

Regards

Antoine.

eryk sun

unread,
Jan 17, 2019, 8:53:02 AM1/17/19
to pytho...@python.org
On 1/17/19, Steven D'Aprano <st...@pearwood.info> wrote:
>
> I understand that the only way to pass the address of an object to
> ctypes is to use that id. Is that intentional?

It's kind of dangerous to pass an object to C without an increment of
its reference count. The proper way is to use a simple pointer of type
"O" (object), which is already created for you as the "py_object"
type.

>>> ctypes.py_object._type_
'O'
>>> ctypes.py_object.__bases__
(<class '_ctypes._SimpleCData'>,)

It keeps a reference in the readonly _objects attribute. For example:

>>> b = bytearray(b'spam')
>>> sys.getrefcount(b)
2
>>> cb = ctypes.py_object(b)
>>> sys.getrefcount(b)
3
>>> cb._objects
bytearray(b'spam')
>>> del cb
>>> sys.getrefcount(b)
2

If you need the address without relying on id(), cast to a void pointer:

>>> ctypes.POINTER(ctypes.c_void_p)(cb)[0] == id(b)
True

Or instantiate a c_void_p from the py_object as a buffer:

>>> ctypes.c_void_p.from_buffer(cb).value == id(b)
True

Note that ctypes.cast() doesn't work in this case. It's implemented as
an FFI function that takes the object address as a void pointer. The
from_param method of c_void_p doesn't support py_object:

>>> ctypes.c_void_p.from_param(cb)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: wrong type

Gregory P. Smith

unread,
Jan 17, 2019, 7:51:15 PM1/17/19
to Steven D'Aprano, Python-Dev
I've heard that libraries using ctypes, cffi, or cython code of various sorts in the real world wild today does abuse the unfortunate side effect of CPython's implementation of id(). I don't have specific instances of this in mind but trust what I've heard: that it is happening.

id() should never be considered to be the PyObject*.  In as much as code shouldn't assume it is running on top of a specific CPython implementation.
If there is a _need_ to get a pointer to a C struct handle referencing a CPython C API PyObject, we should make an explicit API for that rather than the id() hack.  That way code can be explicit about its need, and code that is just doing a funky form of identity tracking without using is and is not can continue using id() without triggering regressive behavior on VMs that don't have a CPython compatible PyObject under the hood by default.

[who uses id() anyways?]

-gps


Chris Angelico

unread,
Jan 17, 2019, 7:59:42 PM1/17/19
to Python-Dev
On Fri, Jan 18, 2019 at 11:50 AM Gregory P. Smith <gr...@krypto.org> wrote:
>
> I've heard that libraries using ctypes, cffi, or cython code of various sorts in the real world wild today does abuse the unfortunate side effect of CPython's implementation of id(). I don't have specific instances of this in mind but trust what I've heard: that it is happening.
>
> id() should never be considered to be the PyObject*. In as much as code shouldn't assume it is running on top of a specific CPython implementation.
> If there is a _need_ to get a pointer to a C struct handle referencing a CPython C API PyObject, we should make an explicit API for that rather than the id() hack. That way code can be explicit about its need, and code that is just doing a funky form of identity tracking without using is and is not can continue using id() without triggering regressive behavior on VMs that don't have a CPython compatible PyObject under the hood by default.
>

I would be strongly in favour of ctypes gaining a "get address of
object" function, which happens (in current CPythons) to return the
same value as id() does, but is specifically tied to ctypes.

ChrisA
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

MRAB

unread,
Jan 17, 2019, 10:03:13 PM1/17/19
to pytho...@python.org
On 2019-01-18 00:48, Gregory P. Smith wrote:
> I've heard that libraries using ctypes, cffi, or cython code of various
> sorts in the real world wild today does abuse the unfortunate side
> effect of CPython's implementation of id(). I don't have specific
> instances of this in mind but trust what I've heard: that it is happening.
>
> id() should never be considered to be the PyObject*.  In as much as code
> shouldn't assume it is running on top of a specific CPython implementation.
> If there is a _need_ to get a pointer to a C struct handle referencing a
> CPython C API PyObject, we should make an explicit API for that rather
> than the id() hack.  That way code can be explicit about its need, and
> code that is just doing a funky form of identity tracking without using
> is and is not can continue using id() without triggering regressive
> behavior on VMs that don't have a CPython compatible PyObject under the
> hood by default.
>
> [who uses id() anyways?]
>
I use it in some of my code.

If I want to cache some objects, I put them in a dict, using the id as
the key. If I wanted to locate an object in a cache and didn't have
id(), I'd have to do a linear search for it.


_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev

Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nathaniel Smith

unread,
Jan 17, 2019, 10:21:45 PM1/17/19
to Gregory P. Smith, Python-Dev
n Thu, Jan 17, 2019 at 4:51 PM Gregory P. Smith <gr...@krypto.org> wrote:
>
> I've heard that libraries using ctypes, cffi, or cython code of various sorts in the real world wild today does abuse the unfortunate side effect of CPython's implementation of id(). I don't have specific instances of this in mind but trust what I've heard: that it is happening.

IME it's reasonably common with ctypes, for cases where you need to do
some gross hack and there's no other option. Here's an example in
jinja2:

https://github.com/pallets/jinja/blob/9fe9520f2daa1df6079b188adba758d6e03d6af2/jinja2/debug.py#L350

I haven't seen it with cffi or cython. (cffi explicitly doesn't
provide any way to access the CPython C API, and in cython you can
just cast an object to a pointer.)

> id() should never be considered to be the PyObject*. In as much as code shouldn't assume it is running on top of a specific CPython implementation.
> If there is a _need_ to get a pointer to a C struct handle referencing a CPython C API PyObject, we should make an explicit API for that rather than the id() hack. That way code can be explicit about its need, and code that is just doing a funky form of identity tracking without using is and is not can continue using id() without triggering regressive behavior on VMs that don't have a CPython compatible PyObject under the hood by default.

Using id() like this is certainly offensive to our sensibilities, but
in practice I don't see how it causes much harm. If you are doing
*anything* with PyObject*, then you're tying yourself to
implementation details of CPython (and usually a specific version of
CPython). That's not great, but at that point relying on CPython's
implementation of id() is the least of your worries, and it tends to
be a self-correcting problem.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Steve Dower

unread,
Jan 18, 2019, 1:11:50 AM1/18/19
to pytho...@python.org
For everyone who managed to reply *hours* after Eryk Sun posted the
correct answer and still get it wrong, here it is again in full.

As a bonus, here's a link to the place where this answer appears in the
documentation:
https://docs.python.org/3/library/ctypes.html#ctypes.py_object

Cheers,
Steve

Steve Dower

unread,
Jan 18, 2019, 1:20:27 AM1/18/19
to pytho...@python.org
I feel like I should clarify - not everyone who posted got it wrong, and
I understand there's a side discussion among those who are also
interested/participants in
https://discuss.python.org/t/demoting-the-is-operator-to-avoid-an-identity-crisis/86/
- but there was no of acknowledgement of Eryk Sun's correct and useful
answer which I find very disappointing and a great way to discourage
contributions.

We can, and should, do better, at least by thanking the person for their
response before running down a barely related side track.

Nathaniel Smith

unread,
Jan 18, 2019, 3:20:50 AM1/18/19
to Steve Dower, Python Dev
On Thu, Jan 17, 2019, 22:11 Steve Dower <steve...@python.org wrote:
For everyone who managed to reply *hours* after Eryk Sun posted the
correct answer and still get it wrong, here it is again in full.

As a bonus, here's a link to the place where this answer appears in the
documentation:
https://docs.python.org/3/library/ctypes.html#ctypes.py_object

Eryk's answer is actually much more useful than the documentation. I've read that documentation many times, but always decided not to use py_object because I couldn't figure out what it would actually do...

(I still probably won't use it because IME by the time I'm using ctypes and PyObject* together I usually need manual control over refcounts, but it's nice to know what it actually does.)

-n

Steven D'Aprano

unread,
Jan 18, 2019, 4:51:45 AM1/18/19
to pytho...@python.org
On Thu, Jan 17, 2019 at 11:37:13AM +0100, Antoine Pitrou wrote:

I said:

> > The id() function is documented as returning an abstract ID number. In
> > CPython, that happens to have been implemented as the address of the
> > object.
> >
> > I understand that the only way to pass the address of an object to
> > ctypes is to use that id. Is that intentional?


Antoine:

> Can you explain in detail what you're doing?

Code-wise, I'm not doing anything with ctypes.

Language-wise, I'm trying to get a definitive answer of whether or not
id() returning the address of the object should be a guaranteed feature
or not.

Across the entire Python ecosystem, no it isn't, as Jython and
IronPython return consecutive integers. But should we consider it an
intentional part of the CPython API?

There are developers who insist that when it comes to CPython, id()
returning the object address is an intentional feature that they can and
do rely on, because (so I was told by one of them) that using id() is
the only way to get the address of an object from pure-Python.

According to this claim, using id() to get the address for use in ctypes
is the correct and only way to do it, and this is a deliberate design
choice by the core devs rather than an accident of the implementation.
So long as you know you are using CPython, this is (so I was told)
completely safe.

In the grand scheme of things this may be a pretty minor issue. But I
suspect that it could be a pain point for implementations like PyPy that
support both objects that move and a ctypes emulation.



--
Steve

Paul Moore

unread,
Jan 18, 2019, 5:18:11 AM1/18/19
to Steven D'Aprano, Python Dev
On Fri, 18 Jan 2019 at 09:52, Steven D'Aprano <st...@pearwood.info> wrote:
> Code-wise, I'm not doing anything with ctypes.
>
> Language-wise, I'm trying to get a definitive answer of whether or not
> id() returning the address of the object should be a guaranteed feature
> or not.
>
> Across the entire Python ecosystem, no it isn't, as Jython and
> IronPython return consecutive integers. But should we consider it an
> intentional part of the CPython API?
>
> There are developers who insist that when it comes to CPython, id()
> returning the object address is an intentional feature that they can and
> do rely on, because (so I was told by one of them) that using id() is
> the only way to get the address of an object from pure-Python.
>
> According to this claim, using id() to get the address for use in ctypes
> is the correct and only way to do it, and this is a deliberate design
> choice by the core devs rather than an accident of the implementation.
> So long as you know you are using CPython, this is (so I was told)
> completely safe.
>
> In the grand scheme of things this may be a pretty minor issue. But I
> suspect that it could be a pain point for implementations like PyPy that
> support both objects that move and a ctypes emulation.

As per Eryk Sun's reply, the "correct" way to get an object address is
by using ctypes.py_object.

Supporting py_object may be a pain point for other implementations
that emulate ctypes, but then again, so is supporting the whole
CPython C API (which is where the py_object type is needed, so it's
basically the same problem).

So to answer your question, I'd say that no, id() returning the object
address is not, and should not be, a guaranteed aspect of CPython, and
the motivating issue of ctypes is solved within ctypes itself by using
py_object.

Paul

Steven D'Aprano

unread,
Jan 18, 2019, 5:23:05 AM1/18/19
to pytho...@python.org
On Thu, Jan 17, 2019 at 04:48:38PM -0800, Gregory P. Smith wrote:

> I've heard that libraries using ctypes, cffi, or cython code of various
> sorts in the real world wild today does abuse the unfortunate side effect
> of CPython's implementation of id(). I don't have specific instances of
> this in mind but trust what I've heard: that it is happening.

Indeed -- I've been told by one developer in no uncertain terms that
using id() in this fashion is the only way to get the address of an
object for use in ctypes. I don't know enough about ctypes to judge
whether that is correct or not.

The sample code I've been shown is this:

pointer_to_obj = id(obj)
from_deref = ctypes.cast(pointer_to_obj, ctypes.py_object).value
from_deref is obj # True


> id() should never be considered to be the PyObject*. In as much as code
> shouldn't assume it is running on top of a specific CPython implementation.
> If there is a _need_ to get a pointer to a C struct handle referencing a
> CPython C API PyObject, we should make an explicit API for that rather than
> the id() hack. That way code can be explicit about its need, and code that
> is just doing a funky form of identity tracking without using is and is not
> can continue using id() without triggering regressive behavior on VMs that
> don't have a CPython compatible PyObject under the hood by default.

+1 to all of this.



--
Steve
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nathaniel Smith

unread,
Jan 18, 2019, 5:33:24 AM1/18/19
to Steven D'Aprano, Python Dev
On Fri, Jan 18, 2019 at 1:51 AM Steven D'Aprano <st...@pearwood.info> wrote:
> Across the entire Python ecosystem, no it isn't, as Jython and
> IronPython return consecutive integers. But should we consider it an
> intentional part of the CPython API?

It's always worked, there's substantial code in the wild that depends
on it, and AFAICT it doesn't cause any real harm, so to me it seems
like the only possible conclusion is that CPython will continue to
guarantee this.

For this argument I don't think it matters whether it was originally
intentional, or whether there's some other alternative that people
could use in theory.

-n

--
Nathaniel J. Smith -- https://vorpus.org

Steven D'Aprano

unread,
Jan 18, 2019, 5:53:24 AM1/18/19
to pytho...@python.org
Thanks for the detailed answer. A further question below.


On Thu, Jan 17, 2019 at 07:50:51AM -0600, eryk sun wrote:
> On 1/17/19, Steven D'Aprano <st...@pearwood.info> wrote:
> >
> > I understand that the only way to pass the address of an object to
> > ctypes is to use that id. Is that intentional?
>
> It's kind of dangerous to pass an object to C without an increment of
> its reference count.

"Kind of dangerous?" How dangerous?

If I am reading this correctly, I think you are saying that using id()
in this way is never(?) correct.



--
Steve

Antoine Pitrou

unread,
Jan 18, 2019, 5:57:04 AM1/18/19
to pytho...@python.org
On Fri, 18 Jan 2019 20:49:26 +1100
Steven D'Aprano <st...@pearwood.info> wrote:
>
> Language-wise, I'm trying to get a definitive answer of whether or not
> id() returning the address of the object should be a guaranteed feature
> or not.

For me, the definitive answer is "yes, it's a CPython feature".

However, it's obviously not a PyPy feature, and I'm not sure about other
implementations. Anything with an object model that can eliminate
in-memory objects in favour of in-register values (for example using
tagged pointers or type specialization + lifetime analysis) is obviously
not able to hold the promise that id() returns the /address/ of the
"object".

That doesn't mean the CPython feature has to live forever. We may want
to deprecate it at some point (though it's not obvious how to warn the
user: just because you're using id() doesn't mean you're interested in
the actual /address/, rather than some arbitrary unique id).

> According to this claim, using id() to get the address for use in ctypes
> is the correct and only way to do it

I don't know why you keep repeating that. You were already explained
that it's /not/ the correct and only way to get the address for use in
ctypes.

Regards

Antoine.

Antoine Pitrou

unread,
Jan 18, 2019, 5:59:32 AM1/18/19
to pytho...@python.org
On Fri, 18 Jan 2019 00:18:17 -0800
Nathaniel Smith <n...@pobox.com> wrote:
> On Thu, Jan 17, 2019, 22:11 Steve Dower <steve...@python.org wrote:
>
> > For everyone who managed to reply *hours* after Eryk Sun posted the
> > correct answer and still get it wrong, here it is again in full.
> >
> > As a bonus, here's a link to the place where this answer appears in the
> > documentation:
> > https://docs.python.org/3/library/ctypes.html#ctypes.py_object
>
>
> Eryk's answer is actually much more useful than the documentation. I've
> read that documentation many times, but always decided not to use py_object
> because I couldn't figure out what it would actually do...

+1

Needless to say, this is an opportunity to improve the documentation ;-)

Regards

Antoine.

Antoine Pitrou

unread,
Jan 18, 2019, 6:04:25 AM1/18/19
to pytho...@python.org
On Fri, 18 Jan 2019 03:00:54 +0000
MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 2019-01-18 00:48, Gregory P. Smith wrote:
> > I've heard that libraries using ctypes, cffi, or cython code of various
> > sorts in the real world wild today does abuse the unfortunate side
> > effect of CPython's implementation of id(). I don't have specific
> > instances of this in mind but trust what I've heard: that it is happening.
> >
> > id() should never be considered to be the PyObject*.  In as much as code
> > shouldn't assume it is running on top of a specific CPython implementation.
> > If there is a _need_ to get a pointer to a C struct handle referencing a
> > CPython C API PyObject, we should make an explicit API for that rather
> > than the id() hack.  That way code can be explicit about its need, and
> > code that is just doing a funky form of identity tracking without using
> > is and is not can continue using id() without triggering regressive
> > behavior on VMs that don't have a CPython compatible PyObject under the
> > hood by default.
> >
> > [who uses id() anyways?]
> >
> I use it in some of my code.
>
> If I want to cache some objects, I put them in a dict, using the id as
> the key. If I wanted to locate an object in a cache and didn't have
> id(), I'd have to do a linear search for it.

Indeed. I've used it for the same purpose in the past (identity-dict).

Regards

Antoine.

Antoine Pitrou

unread,
Jan 18, 2019, 6:09:38 AM1/18/19
to pytho...@python.org
On Thu, 17 Jan 2019 22:18:13 -0800
Steve Dower <steve...@python.org> wrote:
> I feel like I should clarify - not everyone who posted got it wrong, and
> I understand there's a side discussion among those who are also
> interested/participants in
> https://discuss.python.org/t/demoting-the-is-operator-to-avoid-an-identity-crisis/86/
> - but there was no of acknowledgement of Eryk Sun's correct and useful
> answer which I find very disappointing and a great way to discourage
> contributions.
>
> We can, and should, do better, at least by thanking the person for their
> response before running down a barely related side track.

I can certainly thank Eryk for posting a much better answer than mine.

Regards

Antoine.

Steven D'Aprano

unread,
Jan 18, 2019, 6:13:50 AM1/18/19
to pytho...@python.org
On Thu, Jan 17, 2019 at 10:09:36PM -0800, Steve Dower wrote:
> For everyone who managed to reply *hours* after Eryk Sun posted the
> correct answer and still get it wrong, here it is again in full.

Sorry, I'm confused by your response here. As far as I can see, nobody
except Eryk Sun gave any technical details about how to correctly pass
objects to ctypes, so I'm not sure what sense of "get it wrong" you
mean.

A couple of people offered the opinion that we ought to offer an
explicit ctypes API for getting the address of an object, decoupling
that functionality from id(). Do you mean "wrong" in the sense that such
an API would be unnecessary, given the existing solution Eryk Sun
quoted?



> As a bonus, here's a link to the place where this answer appears in the
> documentation:
> https://docs.python.org/3/library/ctypes.html#ctypes.py_object

Thanks for the link, that's useful.



--
Steve

Walter Dörwald

unread,
Jan 18, 2019, 7:14:22 AM1/18/19
to Antoine Pitrou, pytho...@python.org

Its useful in all situations where you do topology preserving
transformations, for example pickling (i.e. object serialization) or a
deep copy of some object structures.

In these cases you need a way to record and quickly detect whether
you've handled a specific object before. In Python we can do that with a
dictionary that has object ids as keys. Java provides IdentityHashMap
for that. Javascript provides neither, so deep-copying objects in
Javascript seems to be impossible.

> Regards
>
> Antoine.

Servus,
Walter

David Mertz

unread,
Jan 18, 2019, 10:32:13 AM1/18/19
to Antoine Pitrou, Python-Dev
On Fri, Jan 18, 2019, 5:55 AM Antoine Pitrou <soli...@pitrou.net wrote:

> id() returning the address of the object should be a guaranteed feature

For me, the definitive answer is "yes, it's a CPython feature".
That doesn't mean the CPython feature has to live forever.  We may want to deprecate it at some point

Whenever I've taught Python (quite a bit between writing, in person, and webinars), I have been very explicit in stating that id(obj) returns some unique number for each object, and mentioned that for MANY Python objects CPython users an implementation convenience of using the memory address.

Every time I've explained it I've said not to rely on that implementation detail. It's not true for small integers, for example, even in CPython.

David Mertz

unread,
Jan 18, 2019, 10:50:08 AM1/18/19
to Antoine Pitrou, Python-Dev
Oh, bracket my brain glitch on small integers.  Yes, they still give id() of memory address, they just get reused, which is different.  Nonetheless, I never teach id(obj) == ctypes.c_void_p.from_buffer(ctypes.py_object(b)).value ... and not only because I only learned the latter spelling from eryk sun.
--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.

Greg Ewing

unread,
Jan 18, 2019, 5:59:39 PM1/18/19
to Python-Dev
Chris Angelico wrote:
> I would be strongly in favour of ctypes gaining a "get address of
> object" function, which happens (in current CPythons) to return the
> same value as id() does, but is specifically tied to ctypes.

Isn't this what the ctypes.py_object type is for?

Also, any code that does anything with the address of an object
other than just pass it around is going to depend heavily on
the Python implementation being used, so the idea of an
implementation-independent way to deal with object addresses
seems problematic.

--
Greg

Chris Angelico

unread,
Jan 18, 2019, 6:04:46 PM1/18/19
to Python-Dev
On Sat, Jan 19, 2019 at 9:58 AM Greg Ewing <greg....@canterbury.ac.nz> wrote:
>
> Chris Angelico wrote:
> > I would be strongly in favour of ctypes gaining a "get address of
> > object" function, which happens (in current CPythons) to return the
> > same value as id() does, but is specifically tied to ctypes.
>
> Isn't this what the ctypes.py_object type is for?

I didn't know about it when I posted that (as, I suspect, others also
didn't), and as others have pointed out, this is a prime target for a
docs update. Scanning the docs as of today does not suggest a better
way to do things.

ChrisA

Greg Ewing

unread,
Jan 18, 2019, 6:08:28 PM1/18/19
to pytho...@python.org
MRAB wrote:
If I want to cache some objects, I put them in a dict, using the id as
> the key. If I wanted to locate an object in a cache and didn't have
> id(), I'd have to do a linear search for it.

That sounds dangerous. An id() is only valid as long as the object
it came from still exists, after which it can get re-used for a different
object. So when an object is flushed from your cache, you would have
to chase down all the places its id is being stored and eliminate them.

Are you sure you couldn't achieve the same thing more safely using
weak references?

--
Greg

Tim Peters

unread,
Jan 18, 2019, 6:28:12 PM1/18/19
to Greg Ewing, Python Dev
[MRAB]
>> If I want to cache some objects, I put them in a dict, using the id as
>> the key. If I wanted to locate an object in a cache and didn't have
>> id(), I'd have to do a linear search for it.

[Greg Ewing <greg....@canterbury.ac.nz>]
> That sounds dangerous. An id() is only valid as long as the object
> it came from still exists, after which it can get re-used for a different
> object.

The objects are the values in such a dict.

thedict[id(obj)] is obj

Therefore the objects can't become garbage before id(obj) is deleted
from the dict.

> So when an object is flushed from your cache, you would have
> to chase down all the places its id is being stored and eliminate them.

The dict itself keeps the objects alive.

> Are you sure you couldn't achieve the same thing more safely using
> weak references?

I can't say exactly what MRAB is doing. I've done things "like that"
for decades, though, and have happily almost never used weakrefs. I
wouldn't call my uses "caches", though - more like using dicts to
associate info with arbitrary objects (via using the object id as the
dict key), where the object implementations are out of my control and
don't properly support being used as dict keys.

This sometimes includes builtin mutable objects, like lists, or even
other dicts.

No such uses care about object addresses, though - just that id(obj)
returns a value usable as a dict key, unique among all reachable
objects at the time `id()` is called.

MRAB

unread,
Jan 18, 2019, 6:42:08 PM1/18/19
to pytho...@python.org
On 2019-01-18 23:02, Greg Ewing wrote:
> MRAB wrote:
> If I want to cache some objects, I put them in a dict, using the id as
>> the key. If I wanted to locate an object in a cache and didn't have
>> id(), I'd have to do a linear search for it.
>
> That sounds dangerous. An id() is only valid as long as the object
> it came from still exists, after which it can get re-used for a different
> object. So when an object is flushed from your cache, you would have
> to chase down all the places its id is being stored and eliminate them.
>
> Are you sure you couldn't achieve the same thing more safely using
> weak references?
>
I'm not storing the id anywhere else.

I could've used a list for the cache, but then when I wanted to remove
an object I'd have to search for it, O(n). Using a dict makes it O(1).

Greg Ewing

unread,
Jan 18, 2019, 7:30:19 PM1/18/19
to Python Dev
Tim Peters wrote:

> The dict itself keeps the objects alive.

Yes, but the idea of a cache is that you're free to flush
things out of it to make room for something else without
breaking anything.

It sounds like MRAB is using ids as weak references,
without the assurance actual weak references give you
that they become invalidated when the refefenced object
goes away,

> No such uses care about object addresses, though - just that id(obj)
> returns a value usable as a dict key, unique among all reachable
> objects at the time `id()` is called.

Yep. In hindsight it was probably a mistake for the docs
to talk about addresses in relation to id() -- it seems to
have given some people unrealistic expectations.

--
Greg

Greg Ewing

unread,
Jan 18, 2019, 7:37:25 PM1/18/19
to pytho...@python.org
Steven D'Aprano wrote:

> The sample code I've been shown is this:
>
> pointer_to_obj = id(obj)
> from_deref = ctypes.cast(pointer_to_obj, ctypes.py_object).value
> from_deref is obj # True

There's no need to use id() or casting to create a ctypes.py_object
instance, you can just call it:

>>> obj = (1,2,3)
>>> obj
(1, 2, 3)
>>> p = ctypes.py_object(obj)
>>> p
py_object((1, 2, 3))
>>> p.value
(1, 2, 3)
>>> p.value is obj
True

--
Greg

MRAB

unread,
Jan 18, 2019, 9:01:42 PM1/18/19
to pytho...@python.org
On 2019-01-19 00:28, Greg Ewing wrote:
> Tim Peters wrote:
>
>> The dict itself keeps the objects alive.
>
> Yes, but the idea of a cache is that you're free to flush
> things out of it to make room for something else without
> breaking anything.
>
> It sounds like MRAB is using ids as weak references,
> without the assurance actual weak references give you
> that they become invalidated when the refefenced object
> goes away,
>
"Cache" was the wrong word for what it does. I'm not using the id as a
weak reference.

Sometimes I might want to store a collection of objects and their order
isn't important. I can add an object to the collection, or remove an
object from it.

If I used a list, adding would be quick, but removing would require
searching the list.

By putting them in a dict, keyed by the id, I can remove an object in O(1).

Trust me, I'm not doing anything that's unreliable! (And I _have_ done
programming in C with the Python API, so I know all about refcounts...) :-)

>> No such uses care about object addresses, though - just that id(obj)
>> returns a value usable as a dict key, unique among all reachable
>> objects at the time `id()` is called.
>
> Yep. In hindsight it was probably a mistake for the docs
> to talk about addresses in relation to id() -- it seems to
> have given some people unrealistic expectations.
>

eryk sun

unread,
Jan 19, 2019, 6:08:49 AM1/19/19
to Steven D'Aprano, pytho...@python.org
On 1/18/19, Steven D'Aprano <st...@pearwood.info> wrote:
> On Thu, Jan 17, 2019 at 07:50:51AM -0600, eryk sun wrote:
>>
>> It's kind of dangerous to pass an object to C without an increment of
>> its reference count.
>
> "Kind of dangerous?" How dangerous?

I take that back. Dangerous is too strong of a word. It can be managed
if we're careful to avoid expressions like c_function(id(f())). Using
py_object simply avoids that problem.

Bear with me while I make a few more comments about py_object, even
though it's straying off topic.

For a type "O" argument (i.e. py_object is in the function's
`argtypes`), we might be able to borrow the reference from the
argument tuple. As implemented, however, the argument actually keeps
its own reference. For example, we can observe this by calling the
from_param method:

>>> b = bytearray(b'spam')
>>> arg = ctypes.py_object.from_param(b)
>>> print(arg)
<cparam 'O' at 0x7f32a49699b0>
>>> print(arg._obj)
bytearray(b'spam')

This is due to the type "O" setfunc, which needs to keep a reference
to the object when setting the value of a py_object instance. The
reference is stored as the _objects attribute. (For non-simple pointer
and aggregate types, _objects is instead a dict keyed by the index as
a hexadecimal string.)

(The getfunc and setfunc of a simple ctypes object are called to get
and set the value, which also includes cases in which we don't have an
actual py_object instance, such as function call arguments; pointer
and array indexes; and struct and union fields. These functions are
defined in Modules/_ctypes/cfield.c.)

IMO, a downside of py_object is that it's a simple type, so the
getfunc gets called automatically when getting fields or indexes. This
is annoying for py_object since a NULL value raises ValueError.
Returning None in this case isn't possible, in contrast to other
simple pointer types. We can work around this by subclassing
py_object. For example:

>>> a1 = (ctypes.py_object * 1)()
>>> a1[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: PyObject is NULL

py_object = type('py_object', (ctypes.py_object,), {})

>>> a2 = (py_object * 1)()
>>> a2[0]
<py_object object at 0x7f10dc7d9158>

Then, like all ctypes pointers, a false boolean value means it's NULL:

>>> bool(a2[0])
False
>>> a2[0] = b'spam'
>>> bool(a2[0])
True

py_object doesn't help if a library holds onto the pointer and tries
to use it later on. For example, with Python's C API there are
functions that 'steal' a reference (with the assumption that it's a
newly created object, in which case it's more like 'claiming'), such
as PyTuple_SetItem. In this case, we need to increment the reference
count via Py_IncRef.

py_object can be returned from a callback without leaking a reference,
assuming the library manages the new reference. In contrast, other
types that need memory support have to leak a reference (e.g.
c_wchar_p, i.e. type "Z", needs a capsule object for the wchar_t
buffer). In case of a leak, we get warned with RuntimeWarning('memory
leak in callback function.').

> If I am reading this correctly, I think you are saying that using id()
> in this way is never(?) correct.

Yes, it's incorrect, but I've been guilty of using id() like this,
too, because it's convenient. Perhaps we could provide a function
that's explicitly specified to return the address, if implemented.
Maybe call it sys.getaddress()?

In my first reply, I provided two alternatives that use ctypes to
return the address instead of id(). So there's that as well. The fine
print is that ctypes is optional in the standard library. Platforms
and implementations don't have to support it.

Antoine Pitrou

unread,
Jan 19, 2019, 6:32:41 AM1/19/19
to pytho...@python.org
On Sat, 19 Jan 2019 13:28:06 +1300
Greg Ewing <greg....@canterbury.ac.nz> wrote:
> Tim Peters wrote:
>
> > The dict itself keeps the objects alive.
>
> Yes, but the idea of a cache is that you're free to flush
> things out of it to make room for something else without
> breaking anything.
>
> It sounds like MRAB is using ids as weak references,
> without the assurance actual weak references give you
> that they become invalidated when the refefenced object
> goes away,

Hmm... That sounds nonsensical to me. By construction, if you're able
to get a reference to an object in pure Python, then the object is
alive.

(by pure Python I'm excluding ctypes hacks or the exploitation of bugs
in the CPython object implementation)

By the way, you can also have a WeakValueDictionary where keys are ids
and values are the corresponding objects, if you need both identity
lookup and weak references.

Regards

Antoine.
Reply all
Reply to author
Forward
0 new messages