I have been playing with memory views and cython arrays. I am not sure what exactly is going wrong, but it seems like there is something not right about how cython allocates memory for cython arrays (as in
http://docs.cython.org/src/userguide/memoryviews.html). Creating of memory views are also horribly slow. Or I have done something braindead in my code but I don't think so. The docs for cython arrays are pretty much non-existent. Where are they in the source tree? I couldn't find the cython.view module anywhere
I ran a quick check to see what is going on with the array creation in cython. I have attached the pyx and the setup file used to generate this data and copied it at the end of the email too. FWIW, I'm on Windows 7 Python 2.7, visual studio 2008 wih a fresh cython from GIT head. The array creation time for cython arrays seems to be constant independent of length of the array? To be fair, the numpy example isn't exactly the same, but still, 10 times slower in cython than pure c is not too good. 2us per array creation for a 10 element array is very slow, a speed that is just not cutting it for my severely speed constrained application. In fact what I was trying to was to extend the code that was started in
https://groups.google.com/forum/?fromgroups=#!topic/cython-users/PpIokkZVOrA to use the native cython arrays in order to avoid the hacking in the array example provided.
My idea is I am going to wrap my own 1-d array floating point datatype extension type from the ground up with a focus on speed for the element-wise operations. I'll still include the same interface by implementing most or all of the special methods.
Incidentally, it does not seem to be possible to inherit from array.array class in 0.17.1
Here is the summary (averaged elapsed time per call for one million calls):
Elapsed time to make array with length of 1 is 2.40862075722 us
Elapsed time to make array with length of 10 is 2.36220109038 us
Elapsed time to make array with length of 100 is 2.37456325685 us
Elapsed time to make array with length of 1000 is 2.37144104029 us
Elapsed time to make array with length of 10000 is 2.72447256951 us
Elapsed time to make numpy array with length of 1 is 0.806911490145 us
Elapsed time to make numpy array with length of 10 is 0.869661726899 us
Elapsed time to make numpy array with length of 100 is 0.926880930014 us
Elapsed time to make numpy array with length of 1000 is 1.52742669012 us
Elapsed time to make numpy array with length of 10000 is 7.3662581456 us
Elapsed time with raw c-allocation with length of 1 is 0.195063960543 us
Elapsed time with raw c-allocation with length of 10 is 0.196661594872 us
Elapsed time with raw c-allocation with length of 100 is 0.204236565547 us
Elapsed time with raw c-allocation with length of 1000 is 0.211111834023 us
Elapsed time with raw c-allocation with length of 10000 is 0.363121851293 us
mv_test.pyx:
------------------------------------------------
from cython.view cimport array as cvarray
import time
import numpy as np
cimport numpy as np
from libc.stdlib cimport malloc, free
cdef long N = 1000000
cdef double* ptr
for L in [1,10,100,1000,10000]:
t1 = time.clock()
for i in range(N):
a = cvarray((L,),sizeof(double),'d')
t2 = time.clock()
print 'Elapsed time to make array with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'
for L in [1,10,100,1000,10000]:
t1 = time.clock()
for i in range(N):
a = np.arange(L)
t2 = time.clock()
print 'Elapsed time to make numpy array with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'
for L in [1,10,100,1000,10000]:
t1 = time.clock()
for i in range(N):
ptr = <double*> malloc(sizeof(double) * L)
free(ptr)
t2 = time.clock()
print 'Elapsed time with raw c-allocation with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'
setup.py
---------------------------
from distutils.core import setup
from Cython.Build import cythonize
import numpy
import Cython
#This will generate HTML to show where there are still pythonic bits hiding out
Cython.Compiler.Options.annotate = True
setup(
name = "My hello app",
ext_modules = cythonize('mv_test.pyx'), # accepts a glob pattern
include_dirs = [numpy.get_include()]
)