cython extension class inheriting from datetime?

257 views
Skip to first unread message

Adam Klein

unread,
Feb 5, 2012, 5:31:40 PM2/5/12
to cython...@googlegroups.com
Sorry, fixing subject line and reposting.  Copy-and-paste stupidity.

Hi all,

I am trying to write a Cython extension type, Timestamp, that mimics datetime but enriches it with metadata. This is for the timeseries functionality we're trying to build up in pandas.

It seems like I cannot derive from datetime directly.  I tried this, and it worked syntactically, but there seems to be no way to intercept the constructor call and construct the super class appropriately.

So I went with composition, where datetime is a member variable.  However, I cannot get this sort of thing to work:

datetime <comparison> Timestamp

while this is just fine:

Timestamp <comparison> datetime

How would you all approach this situation?

Thanks in advance,
 Adam

mark florisson

unread,
Feb 5, 2012, 5:55:52 PM2/5/12
to cython...@googlegroups.com
On 5 February 2012 22:31, Adam Klein <ad...@lambdafoundry.com> wrote:
> Sorry, fixing subject line and reposting.  Copy-and-paste stupidity.
>
>> Hi all,
>>
>> I am trying to write a Cython extension type, Timestamp, that mimics
>> datetime but enriches it with metadata. This is for the timeseries
>> functionality we're trying to build up in pandas.
>>
>> It seems like I cannot derive from datetime directly.  I tried this, and
>> it worked syntactically, but there seems to be no way to intercept the
>> constructor call and construct the super class appropriately.

Hm, how does it now construct properly? You tried something along the lines of

cdef extern from "datetime.h":
ctypedef class datetime.datetime [ object PyDateTime_DateTime ]:
pass

cdef class MyDateTime(datetime):
pass

?

>> So I went with composition, where datetime is a member variable.  However,
>> I cannot get this sort of thing to work:
>>
>> datetime <comparison> Timestamp
>>
>> while this is just fine:
>>
>> Timestamp <comparison> datetime
>>
>> How would you all approach this situation?

How does it not work? In the latter case your comparison method will
only be invoked if Timestamp returns NotImplemented.

>> Thanks in advance,
>>  Adam
>
>

Dag Sverre Seljebotn

unread,
Feb 5, 2012, 5:58:28 PM2/5/12
to cython...@googlegroups.com

First thing to note is that when implementing __richcmp__, then,
confusingly, "self" may be the second argument rather than the first. So
you need to do typechecks on both arguments etc.

Dag

mark florisson

unread,
Feb 5, 2012, 6:16:26 PM2/5/12
to cython...@googlegroups.com
On 5 February 2012 22:58, Dag Sverre Seljebotn

I thought the same thing, but a test with a print actually shows that
it inverts the operators. BTW, we should update the documentation to
use Py_LT and friends.

Adam Klein

unread,
Feb 5, 2012, 6:16:32 PM2/5/12
to cython...@googlegroups.com
On Sun, Feb 5, 2012 at 5:55 PM, mark florisson <markflo...@gmail.com> wrote:
On 5 February 2012 22:31, Adam Klein <ad...@lambdafoundry.com> wrote:
> Sorry, fixing subject line and reposting.  Copy-and-paste stupidity.
>
>> Hi all,
>>
>> I am trying to write a Cython extension type, Timestamp, that mimics
>> datetime but enriches it with metadata. This is for the timeseries
>> functionality we're trying to build up in pandas.
>>
>> It seems like I cannot derive from datetime directly.  I tried this, and
>> it worked syntactically, but there seems to be no way to intercept the
>> constructor call and construct the super class appropriately.

Hm, how does it now construct properly? You tried something along the lines of

cdef extern from "datetime.h":
   ctypedef class datetime.datetime [ object PyDateTime_DateTime ]:
       pass

cdef class MyDateTime(datetime):
   pass

?

This is exactly the direction I took. It worked until I tried to alter the constructor arguments, in which case datetime complained it didn't take those arguments.

Or is this A Bad Thing To Do?
 

>> So I went with composition, where datetime is a member variable.  However,
>> I cannot get this sort of thing to work:
>>
>> datetime <comparison> Timestamp
>>
>> while this is just fine:
>>
>> Timestamp <comparison> datetime
>>
>> How would you all approach this situation?

How does it not work? In the latter case your comparison method will
only be invoked if Timestamp returns NotImplemented.

I implemented __richcmp__, which only gets called when Timestamp is on the LHS of the comparison.  I would expect datetime to try to invoke Timestamp's comparison on failure, but this doesn't seem to be what it does ...
 

>> Thanks in advance,
>>  Adam
>
>

mark florisson

unread,
Feb 5, 2012, 6:21:09 PM2/5/12
to cython...@googlegroups.com

Liskov's substitution principle doesn't quite agree, but you can
filter arguments to your heart's content. E.g. def __init__(self,
arg1, arg2): super(MyClass, self).__init__(arg1)

>>
>>
>> >> So I went with composition, where datetime is a member variable.
>> >>  However,
>> >> I cannot get this sort of thing to work:
>> >>
>> >> datetime <comparison> Timestamp
>> >>
>> >> while this is just fine:
>> >>
>> >> Timestamp <comparison> datetime
>> >>
>> >> How would you all approach this situation?
>>
>> How does it not work? In the latter case your comparison method will
>> only be invoked if Timestamp returns NotImplemented.
>
>
> I implemented __richcmp__, which only gets called when Timestamp is on the
> LHS of the comparison.  I would expect datetime to try to invoke Timestamp's
> comparison on failure, but this doesn't seem to be what it does ...

Oh sorry, you're implementing Timestamp, not datetime. Well, yes, the
LHS gets the comparison function called. The RHS only gets its
comparison function called when the LHS returns NotImplemented. If the
LHS thinks it knows the answer, then the RHS can't change that.

>>
>>
>> >> Thanks in advance,
>> >>  Adam
>> >
>> >
>
>

Adam Klein

unread,
Feb 5, 2012, 6:33:05 PM2/5/12
to cython...@googlegroups.com
It seems if I do Timestamp(datetime) to inherit datetime, I cannot implement __init__ with any other arguments than exactly those datetime itself normally takes.  There seems to be no way around this?

In [5]: _tseries.Timestamp()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Volumes/HD2/adam/Documents/Code/python/pandas/pandas/<ipython-input-5-07d31c694237> in <module>()
----> 1 _tseries.Timestamp()

TypeError: Required argument 'year' (pos 1) not found
 

>>
>>
>> >> So I went with composition, where datetime is a member variable.
>> >>  However,
>> >> I cannot get this sort of thing to work:
>> >>
>> >> datetime <comparison> Timestamp
>> >>
>> >> while this is just fine:
>> >>
>> >> Timestamp <comparison> datetime
>> >>
>> >> How would you all approach this situation?
>>
>> How does it not work? In the latter case your comparison method will
>> only be invoked if Timestamp returns NotImplemented.
>
>
> I implemented __richcmp__, which only gets called when Timestamp is on the
> LHS of the comparison.  I would expect datetime to try to invoke Timestamp's
> comparison on failure, but this doesn't seem to be what it does ...

Oh sorry, you're implementing Timestamp, not datetime. Well, yes, the
LHS gets the comparison function called. The RHS only gets its
comparison function called when the LHS returns NotImplemented. If the
LHS thinks it knows the answer, then the RHS can't change that.


Ah. Darn.
 
>>
>>
>> >> Thanks in advance,
>> >>  Adam
>> >
>> >
>
>

Adam Klein

unread,
Feb 5, 2012, 6:38:07 PM2/5/12
to cython...@googlegroups.com
Or, for instance, TypeError: function takes at most 8 arguments (9 given)

even if I define __init__ the same way.  I realized it's because datetime __new__ is being called before init, but I don't seem to be able to redefine or intercept datetime.__new__

mark florisson

unread,
Feb 5, 2012, 7:00:02 PM2/5/12
to cython...@googlegroups.com

You're talking about this error? "__new__ method of extension type
will change semantics in a future version of Pyrex and Cython. Use
__cinit__ instead." You could subclass your class and override
__new__.

Adam Klein

unread,
Feb 5, 2012, 8:34:11 PM2/5/12
to cython...@googlegroups.com
That might be what I have to do - if I understand correctly - subclass the extension type and construct the user-facing object in Python.  Thanks for the help.

mark florisson

unread,
Feb 6, 2012, 5:16:14 AM2/6/12
to cython...@googlegroups.com

That would work, but the point was to write a python subclass, not to
subclass in python :) That means you can do it in either Python or
Cython, i.e. write 'class' instead of 'cdef class'.

Stefan Behnel

unread,
Feb 6, 2012, 10:10:30 AM2/6/12
to cython...@googlegroups.com
mark florisson, 06.02.2012 00:16:

> On 5 February 2012 22:58, Dag Sverre Seljebotn wrote:
>> On 02/05/2012 11:31 PM, Adam Klein wrote:
>>> I am trying to write a Cython extension type, Timestamp, that mimics
>>> datetime but enriches it with metadata. This is for the timeseries
>>> functionality we're trying to build up in pandas.
>>>
>>> It seems like I cannot derive from datetime directly. I tried this,
>>> and it worked syntactically, but there seems to be no way to
>>> intercept the constructor call and construct the super class
>>> appropriately.
>>>
>>> So I went with composition, where datetime is a member variable.
>>> However, I cannot get this sort of thing to work:
>>>
>>> datetime <comparison> Timestamp
>>>
>>> while this is just fine:
>>>
>>> Timestamp <comparison> datetime
>>>
>>> How would you all approach this situation?
>>
>>
>> First thing to note is that when implementing __richcmp__, then,
>> confusingly, "self" may be the second argument rather than the first. So you
>> need to do typechecks on both arguments etc.
>
> I thought the same thing, but a test with a print actually shows that
> it inverts the operators. BTW, we should update the documentation to
> use Py_LT and friends.

... and eventually fix the problem by generating an implementation for
__richcmp__() when __eq__() and friends are defined.

http://trac.cython.org/cython_trac/ticket/130

Stefan

mark florisson

unread,
Feb 6, 2012, 10:12:28 AM2/6/12
to cython...@googlegroups.com

+1 for that, the __richcmp__ is kind of a wart.

Stefan Behnel

unread,
Feb 6, 2012, 10:16:45 AM2/6/12
to cython...@googlegroups.com
mark florisson, 05.02.2012 23:55:

> cdef extern from "datetime.h":
> ctypedef class datetime.datetime [ object PyDateTime_DateTime ]:
> pass
>
> cdef class MyDateTime(datetime):
> pass

On a related note, should we add these things to Cython/Includes/cpython ?
I don't think we currently have any stdlib modules there.

Stefan

mark florisson

unread,
Feb 6, 2012, 10:18:39 AM2/6/12
to cython...@googlegroups.com

It's already there in cpython/object.pxd.

Stefan Behnel

unread,
Feb 6, 2012, 10:24:07 AM2/6/12
to cython...@googlegroups.com
mark florisson, 06.02.2012 16:18:

> On 6 February 2012 15:16, Stefan Behnel wrote:
>> mark florisson, 05.02.2012 23:55:
>>> cdef extern from "datetime.h":
>>> ctypedef class datetime.datetime [ object PyDateTime_DateTime ]:
>>> pass
>>>
>>> cdef class MyDateTime(datetime):
>>> pass
>>
>> On a related note, should we add these things to Cython/Includes/cpython ?
>> I don't think we currently have any stdlib modules there.
>
> It's already there in cpython/object.pxd.

Not on my side - I don't see any "datetime" stuff in cpython/*.pxd.

Stefan

mark florisson

unread,
Feb 6, 2012, 10:26:52 AM2/6/12
to cython...@googlegroups.com

Oh, sorry you were talking about datetime. We definitely could yeah,
although I don't think many people are using it.

Francesc Alted

unread,
Feb 6, 2012, 12:57:31 PM2/6/12
to cython...@googlegroups.com


Hmm, with the introduction of dattetime support in NumPy 1.6 series, I foresee that many people will start using it gradually. So +1 for this.

Francesc Alted


mark florisson

unread,
Feb 6, 2012, 3:56:18 PM2/6/12
to cython...@googlegroups.com

Good point. They are encoded as 64 bit integers though, right? In any
case, supporting more of Python's public interface in pxds is always
good.

Francesc Alted

unread,
Feb 7, 2012, 10:51:05 AM2/7/12
to cython...@googlegroups.com


Yes, int64 for both timedelta64 and datetime64. For more info, see the NEP:

https://github.com/numpy/numpy/blob/master/doc/neps/datetime-proposal.rst

Francesc Alted


Brock Mendel

unread,
May 17, 2020, 1:15:31 PM5/17/20
to cython-users
So 8 years later, I'm poking at the pd.Timestamp constructor.  The pattern we've landed on looks like:

```
cdef class _Timestamp(datetime):
     [...]

class Timestamp(_Timestamp):
     def __new__(self, ...):
          if usually:
                return create_timestamp_from_ts(...)
          else:
                return pd.NaT

cdef _Timestamp create_timestamp_from_ts(int64_t value, npy_datetimestruct dts, object tz, object freq, bint fold):
       cdef _Timestamp ts_base
       ts_base = _Timestamp.__new__(Timestamp, dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us, tz, fold)
       ts_base.value = value
       ts_base.nanosecond = dts.ps // 1000
       ts_base.freq = freq
       return ts_base
```

I would like to take this module-level `create_timestamp_from_ts` and implement it as `_Timestamp.__cinit__`.  When doing so, the `_Timestamp.__new__` would be replaced with `datetime.__new__`.  But when I call `_Timestamp.__new__(Timestamp, value, dts, tz, freq, fold)`, instead of passing these args to `_Timestamp.__cinit__`, it is directly passing them to `datetime.__new__`, which raises since since it has a different signature.

Is there a correct way to do this?  If not, is this a bug?

Stefan Behnel

unread,
May 17, 2020, 1:26:33 PM5/17/20
to cython...@googlegroups.com
Brock Mendel schrieb am 17.05.20 um 19:15:
This seems confused. "__cinit__()" does not create the object, nor should
it call its base class method. It is itself called as part of the call
chain up and down the base classes, after creating the object and
initialising the base classes. Is that unclear from the docs?

https://cython.readthedocs.io/en/latest/src/userguide/special_methods.html?highlight=__cinit__#initialisation-methods-cinit-and-init

Or am I misunderstanding what you are doing?

Stefan

Brock Mendel

unread,
May 17, 2020, 2:06:25 PM5/17/20
to cython...@googlegroups.com
> This seems confused

Definitely accurate.

> "__cinit__()" does not create the object [...] Is that unclear from the docs?

https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#fast-instantiation says that the way to call `Penguin.__cinit__` is with `Penguin.__new__(Pengiuin, 'wheat')`.  The top-voted answer here says "The __cinit__ method is roughly equivalent to a __new__ method in Python.*".  I think these are contributing to confusion about cinit/new.

> nor should it call its base class method.  It is itself called as part of the call chain up and down the base classes

ATM the call to `_Timestamp.__new__` within `create_timestamp_from_ts` is equivalent to `datetime.__new__`, as `_Timestamp` does not implement `__cinit__` or `__new__`.  Since this works fine in the status quo, I'm guessing you're referring to the hypothetical `__cinit__` that would take the place of `create_timestamp_from_ts`.

My understanding is that if `_Timestamp.__cinit__` exists, then `datetime.__cinit__` (or some analogue?) gets called before `_Timestamp.__cinit__`, so `datetime.__cinit__` does not need to be called from within `_Timestamp.__cinit__`.  I think that is what you are re-iterating here.

The trouble is the signature mismatch.  As long as the same args/kwargs are being passed to both `_Timestamp.__cinit__` and `datetime.__cinit__` , this can never work AFAICT.  We need a way to intercept the automatic `datetime.__cinit__` call and instead pass the appropriate arguments.  This is roughly what `create_timestamp_from_ts` is doing.

> Or am I misunderstanding what you are doing?

Hopefully this is clearer, but in bullet point form:
 - trying to move from create_timestamp_from_ts to __cinit__ to be more idiomatic
 - hoping to be more performant as a result


--

---
You received this message because you are subscribed to a topic in the Google Groups "cython-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cython-users/JBVhaeNMQhg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/0a88764d-a8ba-eddb-e8f8-fca4a13b15e8%40behnel.de.

Stefan Behnel

unread,
May 17, 2020, 2:36:15 PM5/17/20
to cython...@googlegroups.com
Brock Mendel schrieb am 17.05.20 um 20:06:
>> "__cinit__()" does not create the object [...] Is that unclear from the
> docs?
>
> https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#fast-instantiation
> says that the way to call `Penguin.__cinit__` is with
> `Penguin.__new__(Pengiuin, 'wheat')`.

I'm failing to see what it says that.


> The top-voted answer here
> <https://stackoverflow.com/questions/18260095/cant-override-init-of-class-from-cython-extension>says
> "The __cinit__ method is roughly equivalent to a __new__ method in
> Python.*".

The tiny star at the end is _very_ important here, because it refers to a
little paragraph at the end of the answer that states quite clearly that it
actually is not equavalent at all. :)


> I think these are contributing to confusion about cinit/new.

I can see what that is coming from. ;-)


>> nor should it call its base class method. It is itself called as part of
>> the call chain up and down the base classes
>
> ATM the call to `_Timestamp.__new__` within `create_timestamp_from_ts` is
> equivalent to `datetime.__new__`, as `_Timestamp` does not implement
> `__cinit__` or `__new__`. Since this works fine in the status quo, I'm
> guessing you're referring to the hypothetical `__cinit__` that would take
> the place of `create_timestamp_from_ts`.
>
> My understanding is that if `_Timestamp.__cinit__` exists, then
> `datetime.__cinit__` (or some analogue?) gets called before
> `_Timestamp.__cinit__`, so `datetime.__cinit__` does not need to be called
> from within `_Timestamp.__cinit__`. I think that is what you are
> re-iterating here.

If datetime is referring to Python's "datetime.datetime", then there is no
"datetime.__cinit__" (that method is a pure Cython feature). But that also
wouldn't matter, because "__cinit__" is really an implementation details of
the class that implemenents it (or not). Subclasses do not have to care
about that at all.


> The trouble is the signature mismatch. As long as the same args/kwargs are
> being passed to both `_Timestamp.__cinit__` and `datetime.__cinit__` , this
> can never work AFAICT. We need a way to intercept the automatic
> `datetime.__cinit__` call and instead pass the appropriate arguments. This
> is roughly what `create_timestamp_from_ts` is doing.
>
>> Or am I misunderstanding what you are doing?
>
> Hopefully this is clearer, but in bullet point form:
> - trying to move from create_timestamp_from_ts to __cinit__ to be more
> idiomatic
> - hoping to be more performant as a result

Ok, IIUC, you want to subclass "datetime.datetime", but with a different
signature of the constructor, which then calculates the constructor
arguments for the "datetime.__new__()" call and passes them up to create
the object. Is that correct?

This does not work in Cython, because it does not support "__new__" for
extension types. Specifically, "__cinit__" is not a way to do this, because
it gets called _after_ creating the object (whereas "__new__" would get
called before). "__cinit__" is not "__new__", at all.

The reason for the lack of this feature is that Cython currently needs to
control the constructor call chain ("tp_new()") to the base class. If
"__new__" could be implemented by users, then the base class call could
happen at any place in the method (or not at all), which is difficult to
handle. Not impossible, just difficult (or unsafe) enough to not let us
implement it (yet). It would also defeat the assumption that instantiating
an extension type creates an object of that type, which is helpful in
several places.

So, coming back to your problem at hand, I would suggest to leave the code
as it is, pretty much. It's not a bad or unusual pattern to have factory
functions for alternative constructors.

Stefan

Brock Mendel

unread,
May 17, 2020, 7:30:26 PM5/17/20
to cython...@googlegroups.com
Thanks for walking me through this.  Keeping the status quo for the constructor is totally viable.

One related thing I'm noticing is inconsistent optimization of attribute lookups for attributes shared with datetime.  e.g. `self.hour` becomes `__pyx_v_self->hour` while `self.minute` becomes `__Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_minute)`.  I'm having a hard time coming up with a reason why this would be intentional.  (not hard to fix by using PyDateTime_DATE_GET_MINUTE etc, which improves perf by up to 30%)

--

---
You received this message because you are subscribed to a topic in the Google Groups "cython-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cython-users/JBVhaeNMQhg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cython-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages