How to approach the performance of operator.itemgetter()

15 views
Skip to first unread message

Daniele Nicolodi

unread,
Jun 15, 2024, 8:18:01 AMJun 15
to cython...@googlegroups.com
Hello,

I would like to implement a class that has the same basic functionality
of operator.itemgetter(). The simplest version looks like:

cdef class Itemgetter:
cdef Py_ssize_t item

def __init__(self, Py_ssize_t item):
self.item = item

def __call__(self, v):
return v[self.item]

The Cython version is 1.4 time slower than operator.itemgetter().

Looking at the C code produced by Cython 3.0.10, I have the impression
that the performance difference mainly comes from the function arguments
handling.

Is there a way to reduce the Cython overhead?

Thank you.

Cheers,
Dan

da-woods

unread,
Jun 15, 2024, 8:58:00 AMJun 15
to cython...@googlegroups.com
I think the problem may be `cdef Py_ssize_t item`. That means that every
time you call `__call__` you need to convert it to a Python object.
You'd probably actually be better storing `item` as a Python object.

If you're sure that `v` is always going to be something that implements
the sequence protocol then you can leave it types as `Py_ssize_t` and do

from cpython.sequence cimport PySequence_GetItem

[...]

  def __call__(self, v):
    return PySequence_GetItem(v, self.item)

Note that this won't behave correctly for things like a dict with
integer keys though (which uses the mapping protocol instead) so isn't
perfectly equivalent.

Daniele Nicolodi

unread,
Jun 15, 2024, 12:17:46 PMJun 15
to cython...@googlegroups.com
On 15/06/24 14:57, da-woods wrote:
> I think the problem may be `cdef Py_ssize_t item`. That means that every
> time you call `__call__` you need to convert it to a Python object.
> You'd probably actually be better storing `item` as a Python object.

I've tried this already. Typing item as an object makes it slower, not
faster: it goes from 1.4 times slower to 1.5 times slower.

> If you're sure that `v` is always going to be something that implements
> the sequence protocol then you can leave it types as `Py_ssize_t` and do
>
> from cpython.sequence cimport PySequence_GetItem
>
> [...]
>
>   def __call__(self, v):
>     return PySequence_GetItem(v, self.item)
>
> Note that this won't behave correctly for things like a dict with
> integer keys though (which uses the mapping protocol instead) so isn't
> perfectly equivalent.

operator.itemgetter() uses PyObject_GetItem() so I don't think the
difference in performance comes from this. Anyhow, using
PySequence_GetItem() gives approximately the same run time.

Cheers,
Dan

da-woods

unread,
Jun 15, 2024, 4:23:58 PMJun 15
to cython...@googlegroups.com
The two differences that I can see (based on a quick look at version in
the operator module) are:
1. Cython supports keyword arguments and they don't. You can do this
with `def __call__(self, v, /)`. I suspect this won't make a huge
difference though, but does simplify the generated code a little.
2. The operator module supports the vectorcall for `__call__` while
Cython doesn't. That might make a real difference since it avoids
constructing a 1-element tuple for each call. (Cython supports
vectorcall elsewhere though, but not for `__call__` on classes)

One option to try would be to rewrite it as a closure:

def make_itemgetter(Py_ssize_t index):
  def call(v, /):
    return v[index]
  return call

This would support vectorcall, so if that's the key difference it should
be faster. (I haven't timed it myself though)

Daniele Nicolodi

unread,
Jun 15, 2024, 6:06:23 PMJun 15
to cython...@googlegroups.com
On 15/06/24 22:23, da-woods wrote:
> The two differences that I can see (based on a quick look at version in
> the operator module) are:
> 1. Cython supports keyword arguments and they don't. You can do this
> with `def __call__(self, v, /)`. I suspect this won't make a huge
> difference though, but does simplify the generated code a little.
> 2. The operator module supports the vectorcall for `__call__` while
> Cython doesn't. That might make a real difference since it avoids
> constructing a 1-element tuple for each call. (Cython supports
> vectorcall elsewhere though, but not for `__call__` on classes)
>
> One option to try would be to rewrite it as a closure:
>
> def make_itemgetter(Py_ssize_t index):
>   def call(v, /):
>     return v[index]
>   return call
>
> This would support vectorcall, so if that's the key difference it should
> be faster. (I haven't timed it myself though)

This is it! The closure is actually faster than operator.itemgetter().

Do you know why Cython does not support vectorcall in this case? If the
reason is simply that no one has written the code, I may give it a try.

In this specific case, I just need to be able to attach some extra
properties to the itemgetter, thus the closure solution works
(operator.itemgetter is an extension class, thus it does not allow to
assign arbitrary members, while functions allow it), but a general
solution would be very nice to have.

Thanks!

Cheers,
Dan

da-woods

unread,
Jun 16, 2024, 3:37:35 AMJun 16
to cython...@googlegroups.com
On 15/06/2024 23:06, Daniele Nicolodi wrote:
Do you know why Cython does not support vectorcall in this case? If the reason is simply that no one has written the code, I may give it a try.

The one complication with vectorcall is that it's stored per-instance. That's a bit different to Cython's normal code-generation for class slots so it may be complicated to implement. The `tp_call` slot also has a fixed non-vectorcall signature, so you'll need to account for that (although PyVectorcall_Call should do it fairly simply I think).

None of this is impossible so you're welcome to give it a go. As you've seen, it can provide a reasonable amount of optimization so it'd be a useful addition. But I do think it may be harder than it sounds.

Reply all
Reply to author
Forward
0 new messages