I've encountered an interesting case in the code I'm developing that I can't find clarified in the cython documentation.
If I am building an extension class like so:
vector.pxd:
cdef class Vector:
cdef x, y, z
vector.pyx:
cdef class Vector:
<blah, blah>
property x:
def __get__(self):
return self.x
def __set__(self, double v):
self.x = v
This compiles fine and when I include the pxd in other pyx files, direct access is always to generated for the cdef attributes. What I wanted to confirm is that, despite defining the x attribute in the python scope via property while simultaneously having an x attribute cdef, the cython code will *always* use the direct attribute access, rather than attempting to use the python calls (when cimporting). i.e. does the cdef attribute always have priority in the generated code? I can't find any confirmation of this ordering in the documentation so I thought I'd better check this was a definite design choice and not a happy accident. Having the above code working makes the cython side of my api look as tidy as the python side (I originally have cdef _x, _y, _z as the attributes to avoid a clash, but would obviously prefer x,y,z!).
If I've missed this in the documentation, I apologise.
Thanks for the clarification Stefan.
The code I have at present would be better handled by making the cdef public as the property is the same as the direct access. I refactored my code recently and didn't spot the simplification (I've not needed public before and it didn't spring to mind... I'm still relatively new to cython).
I have had to diverge my cython and python apis in some places, notably vector multiplication. Using __mul__(x,y) is very slow compared to a dedicated cdef Vector mul(self, double v) method - I assume this is because of the overhead required to allow this method to be redefined in a subclass. Is there any way in cython of preventing classes from being extended so that the cython complier can safely inline the __mul__ code (knowing it can never change)? Overloading methods would be useful in this case too!
If you are interested, the sort of code I'm finding slow is as follows (everything here is a vector, dot is a cpdef):
Calculate reflection vector (using __mul__ and __sub__)
r = i - 2*n*(n.dot(i))
Writing this as follows is ~10 times faster (d is a double):
d = n.dot(i)
r.x = i.x - 2*n*d
r.y = i.y - 2*n*d
r.z = i.z - 2*n*d
I'd like to avoid having to drop to such a low level for every bit of vector maths if I can.
If you want to have a look, the code (alpha!) is at: www.raysect.org, see the core/math package. Feel free to tell me I'm doing things wrong. The aim of the project is to develop an easily extensible, generic ray-tracing framework for use in high precision scientific and engineering applications. Ease of use is the primary goal, but it still needs to be fast.
There is a bodged together speed test in raysect/tests
to build for development:
dev/build.sh in the package root
dev/test.sh to run the package tests
dev/speedtest.sh to run the speed test
Hmm... just had a look at the generated c for my two examples. The __sub,__mul example generates a lot of temporary objects, unlike the lower level version. Inlining __mul__ etc won't necessarily fix the creation of the temporaries... I guess it would make a small improvement but not as much as I would like.
Turns out I was wrong - it is substantially faster, here are the results for test run:
Test 5: compound maths (reflection vector)
Loop of 2,500,000 operations:
- python: 16218.0 ms
- raysect via python: 13323.5 ms
- raysect via cython: 12758.5 ms
- raysect via optimised cython (high-level): 165.6 ms
- raysect via optimised cython (low-level): 88.0 ms
raysect (python scope) vs python: 1.217 times faster
raysect (cython scope) vs python: 1.271 times faster
raysect (optimised cython, high-level) vs python: 97.922 times faster
raysect (optimised cython, low-level) vs python: 184.318 times faster
The last three results correspond to:
raysect (cython scope) vs python: 1.271 times faster:
def ctest5(int n):
cdef Vector incident, normal, reflected
incident = Vector([1,-1,0]).normalise()
normal = Vector([0,1,0]).normalise()
for i in range(0, n):
reflected = incident - 2.0 * normal * normal.dot(incident)return reflected
raysect (optimised cython, high-level) vs python: 97.922 times faster:
def cotest5a(int n):
cdef Vector incident, normal, reflected
incident = new_vector(1, -1, 0).normalise()
normal = new_vector(0, 1, 0).normalise()
for i in range(0, n):
# r = i - 2*n*(n.i)
reflected = incident.sub(normal.mul(2.0*normal.dot(incident)))
return reflected
raysect (optimised cython, low-level) vs python: 184.318 times faster:
def cotest5b(int n):
cdef Vector incident, normal, reflected
cdef double d
incident = new_vector(1, -1, 0).normalise()
normal = new_vector(0, 1, 0).normalise()
for i in range(0, n):
# r = i - 2*n*(n.i)
d = normal.dot(incident)
reflected = new_vector(incident.x - 2.0 * normal.x * d,
incident.y - 2.0 * normal.y * d,
incident.z - 2.0 * normal.z * d)
return reflected
Using __mul__ and __sub__ is slow compared to cdef mul and cdef sub.
I've found the @cython.final directive can prevent subclassing. In theory would this not allow __mul__ and __sub__ etc.. to be compiled in as though they are inline cdefs? I've tested my above code with Vector defined as final and it makes no change to the speed, so I take it that this is'nt done? Is there something I'm missing that would prevent this optimisation? I'd be very happy for make all my core maths classes final if it meant I get a clean api and speed!