Salut Antoine,
Antoine Pitrou schrieb am 21.02.24 um 14:09:
> I'm looking at the culprit file (lib.cpp) and a bunch of things stand
> out:
>
> 1. the module state is huge, it has thousands of individual PyObject*
> variables (why not a single tuple or array of PyObject* instead, that
> you would index using #define'd constants? that would make tp_clear and
> tp_traverse much shorter as well).
See
https://github.com/cython/cython/issues/3689
https://github.com/cython/cython/issues/4926
> 2. tons of hand-written optimizations are inlined into the generated
> code
Many of those optimisations depend on the preprocessor, but yes, they
usually add to the C compile time when compiling in CPython. We use helper
functions most of the time when possible, but often, the generated code is
not entirely the same on each occurrence, which leads to more code being
generated locally. Cython usually has a mix of known and unknown types
available, for example, which often leads to slghtly different code being
generated.
Some of these adaptations could possibly be handled with similar effect in
inline functions, but you don't seem to be entirely happy with those either.
>, for example different code paths for calling a function. For
> example this trivial method generates not one but two C++ functions,
> each ~80 lines of C/C++ code. That's 160 lines of generated code for a
> trivial method that's not even performance critical!
>
> ```
> /* "pyarrow/error.pxi":45
> *
> * class ArrowKeyError(KeyError, ArrowException):
> * def __str__(self): # <<<<<<<<<<<<<<
> * # Override KeyError.__str__, as it uses the repr() of the key
> * return ArrowException.__str__(self)
> */
> ```
Not sure, but maybe you're looking at the Python wrapper of the function
(i.e. its signature implementation) and its user code implementation?
I'm only aware of one case where we're really duplicating user code, and
that's the exit part of a "try-finally" statement. Functions aren't duplicated.
> 3. many functions, generated or not, marked CYTHON_INLINE for no
> obvious reason. Of course, this is just a hint that the compiler is
> free to ignore, but it might still have an undesirable effect (some of
> those CYTHON_INLINE functions are really long and would probably not
> benefit from inlining).
A few helper functions were historically marked with "CYTHON_INLINE" to
avoid "unused" warnings when it's difficult to inject them conditionally.
In many cases, however, they are inlined because we expect the C compiler
to cut them down considerably based on the calling code.
As always, some of those modifiers are there for historical reasons. It's
easier to keep code than to revisit and change code. You've probably
noticed. :)
> 4. heroic micro-optimizations accessing PyLong internals in "inline"
> functions such as __Pyx_PyInt_As_long.
Yep – I'm actually quite proud of that. :) It's easy to disable with a C
macro guard as well, if you find it excessive.
> 5. worse, these heroic micro-optimizations are *replicated* (not even
> delegated to a helper routine) in generated code for C/C++ enums. For
> example, I see a generated function:
>
> ```
> static CYTHON_INLINE enum arrow::TimeUnit::type
> __Pyx_PyInt_As_enum____arrow_3a__3a_TimeUnit_3a__3a_type(PyObject *x)
> ```
>
> ... that is 170 lines long for a trivial functionality (unpack a
> Python int into a C++ arrow::TimeUnit::type enum) that's not even
> performance-critical to us.
We could possibly cut down on those a bit, but then, it's only one such
function per user defined type. You probably have tons of usages for each
type that you define in your code, which should outweigh the conversion
functions by far.
> 6. It's a bit weird and makes the generated code less readable that
> Cython uses C/C++-level #define'd constants for Cython-related
> compile-time options, instead of switching at code generation time.
That's intentional and IMHO a really great feature. It makes experimenting
with the code configuration very easy, by enabling or disabling different C
macros. It also makes it quite easy for us to adapt the C code to different
Python implementations and runtimes, by switching the fine-granular C
compile time macros on and off, depending on the available C-API features.
> 7. Indeed, the module initialization function is huge, but it doesn't
> need to be, as much of it is mechanical. You don't have to generate 123
> times the same global function creation code. You could instead of
> generate a constant array of 123 structures with the necessary
> information (function name, etc.), that you loop on to achieve to same
> effect.
David already mentioned that it's not quite that simple due to ordered
interdependencies and intermediate code sections.
Thanks for starting the discussion, though. It's always good to revisit the
status quo from time to time, even if it's not always easy to change it
drastically. Not everything that emerged over the years was intentional.
Stefan