Cdef class fields and the adaptive interpreter

29 views
Skip to first unread message

Prakhar Goel

unread,
Jan 31, 2025, 2:31:20 PMJan 31
to cython...@googlegroups.com
Hi All,

I was working on some low level code and part of that involved implementing descriptors for field access on objects. I benchmarked my code and noticed that straight C (or rather python-free Python) code was actually seemingly slower than the default Python attr access and slots. The difference was around 20-30ns for my code vs 9ns for native field access even if that field access involved __dict__!

This was rather confusing because I've seen the code in PyMember_GetOne and my code was definitely simpler and faster! I ran a bunch of experiments and I think the difference is from the adaptive interpreter updates in 3.11 (and 3.10). For slot access, the logic replaces the usual descriptor calls with a type equality check followed by a direct lookup! This only works when the target is a Python object, the lookup is always for the same type, and the lookup goes through the standard Python member code but that covers a lot of cases.

Finally, how this might affect Cython: for public object fields on cdef classes, it would probably be more efficient to just use Python's native member def handling at least for Python 3.11+.

Thoughts? Thanks.

-- PG


da-woods

unread,
Jan 31, 2025, 4:33:42 PMJan 31
to cython...@googlegroups.com

Thanks - it's a useful suggestion.

There's a few potential complications:

* We can't do it completely generally because Cython supports more complex conversions that Python doesn't. That doesn't necessarily prevent us from using it in specific cases though.

* It potentially has implications for the thread-safety of these attributes in the free-threading build. My (unmerged) proposal is to let users make cdef public attributes thread-safe via a config option. For that to be useful users will need to ensure that the Cython-side direct access is also thread-safe.  It's easy to do with critical sections, but most of the access in PyMember_GetOne is via atomics which is harder for us to expose.

That might be an argument for only exposing public objects via the member mechanism.

David

--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cython-users/CABa1M23cDi0k_h5v_q44wAX_riYaFc3R7Hg5KPPf9U%2B%2BjiJqJw%40mail.gmail.com.

da-woods

unread,
Feb 1, 2025, 5:36:25 AMFeb 1
to cython...@googlegroups.com

To follow this up with some measurements:

I tried a class with

```
cdef class C:
    cdef public object o
    cdef public int i
```

and added member accessors too by editing the .c file

```
static PyMemberDef members[] = {
    {"o2", T_OBJECT_EX, offsetof(struct __pyx_obj_10cdefpublic_C, o), 0, 0},
    {"i2", T_INT, offsetof(struct __pyx_obj_10cdefpublic_C, i), 0, 0},
    {0, 0, 0, 0, 0}
};
```

then timed it with

```
def init():
    c = C()
    c.i = 0
    c.o = 0
    return c

def add_to_i(c):
    c.i += 1

def add_to_i2(c):
    c.i2 += 1

def add_to_o(c):
    c.o += 1

def add_to_o2(c):
    c.o2 += 1

print("add_to_i", timeit("add_to_i(c)", "c = init()", globals=globals(), number=10000000))
print("add_to_i2", timeit("add_to_i2(c)", "c = init()", globals=globals(), number=10000000))
print("add_to_o", timeit("add_to_i(c)", "c = init()", globals=globals(), number=10000000))
print("add_to_o2", timeit("add_to_i(c)", "c = init()", globals=globals(), number=10000000))
```

Python3.9:
add_to_i 1.0135185969993472
add_to_i2 1.0346901959273964
add_to_o 1.0226379960076883
add_to_o2 1.0057076970115304

Python3.11:
add_to_i 0.8202094909502193
add_to_i2 0.8345482919830829
add_to_o 0.7964245920302346
add_to_o2 0.793055891990661

Python3.13 (not freethreaded):
add_to_i 1.0986144189955667
add_to_i2 1.0725537179969251
add_to_o 1.090039519011043
add_to_o2 1.0854479550616816

Python3.13 (freethreaded, but not an optimized build of Python)
add_to_i 1.6287080210167915
add_to_i2 1.5761140210088342
add_to_o 1.5513808489777148
add_to_o2 1.5635191620094702

Obviously that's very much a micro-benchmark. I'm not sure why Python3.11 appears to be best.  But let's look at the relative numbers for now. These don't seem to change much between Python versions.

The gist seems to be that the Cython generated accessors are usually slightly faster for `int`. The Python-generated ones are usually slightly faster for `object`. There isn't much in it though.

To me it doesn't really seem worth making a change - anyone who really cares about performance here should be accessing these fields in Cython rather than through the Python accessors.

Although possibly what you're saying is: a regular Python class is now beating a cdef class in these cases because the interpreter has detailed knowledge of the internals of a plain Python class? That seems like it may be hard to beat, although the same point remains that the fast path is to access cdef public attributes through Cython rather than through Python.

David

Prakhar Goel

unread,
Feb 1, 2025, 12:21:57 PMFeb 1
to cython...@googlegroups.com
Interesting. Thanks for looking.

-- PG

Reply all
Reply to author
Forward
0 new messages