In my experience, __cinit__ should be avoided. Thoughts?

626 views
Skip to first unread message

Golden Rockefeller

unread,
Feb 9, 2022, 2:30:58 AM2/9/22
to cython-users
In my experience, __cinit__ should be avoided. This is the advice that I would give to beginners in  Cython, as there are many consequences to be aware of when using it.
I converted the entirety of a medium-small sized project from Python to Cython and found that __init__ is more than fine in almost all use cases that I have come across. I disagree with Cython's documentation on this issue and I think users should prefer __init__ over __cinit__. I am here looking for a second opinion.


In brief:
  • __cinit__ is not necessary most of the time.
  • __init__ is fast enough, and there are other flexible options if users need speed.
  • Inherited objects don't get a choice if and when to call __cinit__, potentially doubling the work
  • __cinit__ of base classes constrict the __init__ of inherited classes to provide the correct parameters (and inherited classes can not call __cinit__ directly)
  • defining __cinit__ without __reduce__ prevents pickle and copy

There are legitimate reasons to use __cinit__, but I strongly disagree with Cython documentation that "Any initialisation which cannot safely be done in the __cinit__() method should be done in the __init__() method." (link) And I don't like that I am seeing __cinit__() used over __init__() in the documentation more.

The benefits of __cinit__ is to initialize C data types like pointers to dynamically allocated arrays, so there aren't any undefined behaviors down the road. But if a user is converting Python code to Cython, then they will most likely not be dealing with these data types. My advice is that, if there has to be a __cinit__ method, it should take no parameters, so as not to restrict the __init__ method of inherited classes.

__cinit__ may be a faster option, but by how much? To my understanding, __init__ is a special method, and it is called speedily through tp_init.

Additionally, a user can define a "cpdef MyClass new_MyClass(...)" function that is:
  • even faster than __cinit__ by avoiding the boxing and unboxing of parameters in tuples,
  •  works with early binding and avoids, 
  • can be called from Cython and Python, (great for switching between the two modes)
  • does not constrict the__init__ of inherited types
  • could be used to make custom allocation strategies like free-list (with unsafe access to (<_object*>obj)[0].ob_refcnt); I did this to avoid the validation overhead of memoryviews when I wanted to create numpy arrays of varying sizes)

 Preferably, because of these reasons, I would like it if the documentation did not state a preference for __cinit__ over __init__, but maybe I am missing something. So please, let me know your thoughts.

Stefan Behnel

unread,
Feb 9, 2022, 3:04:28 AM2/9/22
to cython...@googlegroups.com
Golden Rockefeller schrieb am 09.02.22 um 03:14:
> I would like it if the documentation
> did not state a preference for __cinit__ over __init__

I agree, and I actually hope that it doesn't. But it seems that that's not
clear enough. Apparently, it always says things like

"This is only one of the reasons why the __cinit__() method is safer and
preferable over the normal __init__() method for extension types."

That is correct in the context, but shouldn't read as general as it sounds.

The documentation should explain the differences between the two, and when
to use one or the other.


> "Any initialisation which cannot safely be done
> in the __cinit__() method should be done in the __init__() method."

I think this is very true and important. In general, "__cinit__" should be
used to bring the object into a safe state (so this isn't about speed at
all), and then the rest can be done in "__init__". I'm aware that that's
not quite what the sentence above says or suggests.


> My advice is that, if there has to be a __cinit__ method, it should
> take no parameters, so as not
> to restrict the __init__ method of inherited classes.

It hints at that, but again, probably not clearly enough. Starting with a
no-args "__cinit__" and then see if that's enough seems like a good idea in
general.

The documentation was written incrementally over more than a decade, and
isn't always consistent. Improvements are welcome.

Stefan

Jonathan Kliem

unread,
Feb 9, 2022, 4:13:50 AM2/9/22
to cython-users
I agree with Stefan that `__cinit__` is not about speed.

It took me a while (and hints from other people) to figure out from the documentation what the difference with `__cinit__` and `__init__` is and I agree that the documentation can be improved.

Here are some further thoughts:

"By the time your __cinit__() method is called, memory has been allocated for the object and any C attributes it has have been initialised to 0 or null."

So cython already does some C-level initialization before and you do not strictly need cinit, e.g. for a vector class:

cdef class Vector:
    cdef int* data
    cdef int size
    def __init__(self, int n):
        self.data = <int*> realloc(self.data, n*sizeof(int))
        self.size = n
   
    def __dealloc__(self):
        free(self.data)

The above has a disadvantage though, that

cdef Vector foo = Vector.__new__(Vector, 20)

does not work as expected. I would expect container classes to allocate memory on `__new__`, but not initialize the entries. (And usually `__init__` would also initialize the class and not just allocate memory.)

I agree that if your class does not have C-attributes, `__cinit__` should probably not be used at all. Otherwise `__cinit__` should ideally be a reflection of `__dealloc` (or vice versa) and should contain exactly everything that should be called exactly once to get your class in a safe state.

Jonathan

Stefan Behnel

unread,
Feb 9, 2022, 5:00:52 AM2/9/22
to cython...@googlegroups.com
Stefan Behnel schrieb am 09.02.22 um 09:04:
> Golden Rockefeller schrieb am 09.02.22 um 03:14:
>> I would like it if the documentation
>> did not state a preference for __cinit__ over __init__
>
> I agree, and I actually hope that it doesn't. But it seems that that's not
> clear enough.

Here's an update. Further improvements welcome as PRs.

https://github.com/cython/cython/commit/01f291a8ca57d9ca91ca309839beb6449f9952ac

Stefan
Reply all
Reply to author
Forward
0 new messages