Hi everyone,
I’m trying to understand how Cython decides when to emit specialized C-API calls, and where that logic is implemented in the codebase.
To clarify what I am asking about: I am specifically interested in specialization for general Python object types, not numeric optimizations. I’m already aware that Cython can cast int objects to raw C integers where possible, that is not my focus here.
Instead, I’m looking at protocol-based types such as sequences or mappings. For example:
- If the static type of a variable is known to be a Sequence, then calling len(obj) could in theory be compiled to PySequence_Length(obj).
- However, Cython currently appears to emit PyObject_Length(obj) instead, without using that type information.
So my questions are:
1. Where in the Cython codebase does it decide which C-API call to use for operations like len()?
2. Under what conditions does Cython currently specialize operations for object protocol types (e.g. Sequence, Mapping)? Is this handled anywhere, or would it require new machinery?
3. Aside from the familiar for i in range(...) loop optimization, what non-numeric patterns does Cython currently recognize and optimize?
Any pointers, even just a relevant module or filename, would be very helpful
Thanks a lot in advance!
Hi,
Mostly Cython emits code for specific object types (e.g. list/tuple/dict).
The main issue is: what do you mean when you say "the static type of a variable is known to be a Sequence"? We don't currently anything with `collections.abc.Sequence` as a type-hint, mainly because it isn't that useful. It defines a Python protocol (that `__getitem__` exists and will take a Python integer) rather than saying that it's implemented in terms of `PySequenceMethods`.
People do occasionally ask for a way to prefer `PySequence_GetItem` because that is quicker for C sequences that `PyObject_GetItem`. The issue is mainly that we don't have a good way of indicating the type. And you can always just use `PySequence_GetItem` from the C API.
Most of the code you're looking for is in Cython/Compiler/Optimize.py. As I say - it's specialized for specific exact types rather than "protocol types".
I don't think it's realistic to make a list of optimizations that Cython performs (there's lots of them in lots of places) - there's definitely lots of loop optimizations for different known types though.
David
--
---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/cython-users/CAL%2BXva7KT-F%3D7gN0ZydwuNcCCQX1XBgvUt7u4aMtH%2By8Wz3hTg%40mail.gmail.com.
Thanks a lot for the detailed explanation, that’s very helpful.
I’ll definitely benchmark instead of assuming based on naming :) An also I’ll take a closer look at fused types as you suggested: that might be exactly what I need.
Thanks again for taking the time to clarify all this!
Best,
Albert