Nils Bruin
unread,Apr 6, 2013, 1:55:02 PM4/6/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cython-users
At present it seems that (at least on Python 2.7) Cython produced
extension classes do not have the "Py_TPFLAGS_HAVE_VERSION_TAG" flag
set. That means they are opting out of the attribute cache and that
they therefore do not benefit from the considerable speedups that
cache can produce. It also means any classes derived from them don't
benefit, and the difference in speed can be considerable with a deep
MRO.
The following code snippet illustrates the issue. We produce two
deeply nested subclass hierarchies, one deriving from `object`, the
other from a no-op cython class. We then test attribute access on
instances of either (as well as instances of classes higher up in the
hierarchy for reference).
------------------------ code ---------------------------
cython("""
cdef class cython_object(object):
pass
cdef extern from "Python.h":
ctypedef struct PyTypeObject_ext "PyTypeObject":
void * tp_getattr
void * tp_setattr
void * tp_getattro
void * tp_setattro
long tp_version_tag
long tp_flags
def get_type_flags(T):
cdef PyTypeObject_ext * V
V = <PyTypeObject_ext *><void *> T
print "tp_getattr:",<long> V.tp_getattr
print "tp_setattr:",<long> V.tp_setattr
print "tp_getattro:",<long> V.tp_getattro
print "tp_setattro:",<long> V.tp_setattro
print "tp_version_tag:",V.tp_version_tag
return V.tp_flags
""")
def shallow_and_deep(base):
shallow = type("T0",(base,),{"t":100});
P=shallow
for i in range(1,1000):
P=type("T%d"%i,(P,),{"a%i"%i:1,"b%i"%i:2})
deep = P
return (shallow,deep)
S_py,D_py=shallow_and_deep(object)
s_py=S_py()
s_py.u=100
d_py=D_py()
d_py.u=100
print "python shallow class attribute:"
timeit("s_py.t",number=20000)
print "python deep class attribute:"
timeit("d_py.t",number=20000)
print "python shallow instance attribute:"
timeit("s_py.u",number=20000)
print "python deep instance attribute:"
timeit("d_py.u",number=20000)
S_cy,D_cy=shallow_and_deep(cython_object)
s_cy=S_cy()
s_cy.u=100
d_cy=D_cy()
d_cy.u=100
print "cython shallow class attribute:"
timeit("s_cy.t",number=20000)
print "cython deep class attribute:"
timeit("d_cy.t",number=20000)
print "cython shallow instance attribute:"
timeit("s_cy.u",number=20000)
print "cython deep instance attribute:"
timeit("d_cy.u",number=20000)
print "pure python object fields:"
py_flags=get_type_flags(D_py)
print "cython derived object fields:"
cy_flags=get_type_flags(D_cy)
py_V=py_flags & (1<<18) #TPFLAGS_HAVE_VERSION_TAG
print "pure python object has version tag:",bool(py_V)
cy_V=cy_flags & (1<<18) #TPFLAGS_HAVE_VERSION_TAG
print "cython derived object has version tag:",bool(cy_V)
--------------------------------------------------------------
Output:
python shallow class attribute:
20000 loops, best of 3: 53.7 ns per loop
python deep class attribute:
20000 loops, best of 3: 54 ns per loop
python shallow instance attribute:
20000 loops, best of 3: 52.7 ns per loop
python deep instance attribute:
20000 loops, best of 3: 52.7 ns per loop
cython shallow class attribute:
20000 loops, best of 3: 66.8 ns per loop
cython deep class attribute:
20000 loops, best of 3: 38.2 µs per loop
cython shallow instance attribute:
20000 loops, best of 3: 83.9 ns per loop
cython deep instance attribute:
20000 loops, best of 3: 23.3 µs per loop
pure python object fields:
tp_getattr: 0
tp_setattr: 0
tp_getattro: 139723747864304
tp_setattro: 139723747864976
tp_version_tag: 2440
cython derived object fields:
tp_getattr: 0
tp_setattr: 0
tp_getattro: 139723747864304
tp_setattro: 139723747864976
tp_version_tag: 0
pure python object has version tag: True
cython derived object has version tag: False
As you can see, on the pure python class instances, MRO depth is
irrelevant, because the cache catches it; also for the instance
attribute, so the absence of an attribute higher up in the MRO that
might shadow the instance attribute (a data descriptor, for instance)
also gets cached.
The rest of the code confirms that the cython-derived class does not
have the "HAVE_VERSION_TAG" flag.
The 1000 level deep hierarchy is of course an exaggeration, but
already with 4 levels (which can easily happen in real-world code),
lookup takes about twice the time.
----
Cython seems to initialize tp_flags with
Py_TPFLAGS_DEFAULT|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|
Py_TPFLAGS_BASETYPE
The following note from python's "object.h" seems relevant:
"""
NOTE: when building the core, Py_TPFLAGS_DEFAULT includes
Py_TPFLAGS_HAVE_VERSION_TAG; outside the core, it doesn't. This
is so
that extensions that modify tp_dict of their own types directly
don't
break, since this was allowed in 2.5. In 3.0 they will have to
manually remove this flag though!
"""
This suggests that on Python 3.*, the difference observed above would
not be present. It would be nice if cython on 2.7 would also have an
option (probably turned on by default) to include the
"HAVE_VERSION_TAG" flag, because in most cases it should be entirely
valid to have it and it should lead to significant performance gains
in many cases.