Cython array creation is horribly slow

Showing 1-15 of 15 messages
Cython array creation is horribly slow Ian Bell 11/10/12 10:22 AM
I have been playing with memory views and cython arrays.  I am not sure what exactly is going wrong, but it seems like there is something not right about how cython allocates memory for cython arrays (as in http://docs.cython.org/src/userguide/memoryviews.html).  Creating of memory views are also horribly slow.  Or I have done something braindead in my code but I don't think so.  The docs for cython arrays are pretty much non-existent.  Where are they in the source tree?  I couldn't find the cython.view module anywhere

I ran a quick check to see what is going on with the array creation in cython.  I have attached the pyx and the setup file used to generate this data and copied it at the end of the email too.  FWIW, I'm on Windows 7 Python 2.7, visual studio 2008 wih a fresh cython from GIT head.  The array creation time for cython arrays seems to be constant independent of length of the array? To be fair, the numpy example isn't exactly the same, but still, 10 times slower in cython than pure c is not too good.  2us per array creation for a 10 element array is very slow, a speed that is just not cutting it for my severely speed constrained application.  In fact what I was trying to was to extend the code that was started in https://groups.google.com/forum/?fromgroups=#!topic/cython-users/PpIokkZVOrA to use the native cython arrays in order to avoid the hacking in the array example provided.

My idea is I am going to wrap my own 1-d array floating point datatype extension type from the ground up with a focus on speed for the element-wise operations.  I'll still include the same interface by implementing most or all of the special methods.

Incidentally, it does not seem to be possible to inherit from array.array class in 0.17.1

Here is the summary (averaged elapsed time per call for one million calls):
Elapsed time to make array with length of 1 is 2.40862075722 us
Elapsed time to make array with length of 10 is 2.36220109038 us
Elapsed time to make array with length of 100 is 2.37456325685 us
Elapsed time to make array with length of 1000 is 2.37144104029 us
Elapsed time to make array with length of 10000 is 2.72447256951 us
Elapsed time to make numpy array with length of 1 is 0.806911490145 us
Elapsed time to make numpy array with length of 10 is 0.869661726899 us
Elapsed time to make numpy array with length of 100 is 0.926880930014 us
Elapsed time to make numpy array with length of 1000 is 1.52742669012 us
Elapsed time to make numpy array with length of 10000 is 7.3662581456 us
Elapsed time with raw c-allocation with length of 1 is 0.195063960543 us
Elapsed time with raw c-allocation with length of 10 is 0.196661594872 us
Elapsed time with raw c-allocation with length of 100 is 0.204236565547 us
Elapsed time with raw c-allocation with length of 1000 is 0.211111834023 us
Elapsed time with raw c-allocation with length of 10000 is 0.363121851293 us

mv_test.pyx:
------------------------------------------------
from cython.view cimport array as cvarray
import time

import numpy as np
cimport numpy as np

from libc.stdlib cimport malloc, free

cdef long N = 1000000

cdef double* ptr

for L in [1,10,100,1000,10000]:
    t1 = time.clock()
    for i in range(N):
        a = cvarray((L,),sizeof(double),'d')
    t2 = time.clock()
    print 'Elapsed time to make array with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'
   
for L in [1,10,100,1000,10000]:
    t1 = time.clock()
    for i in range(N):
        a = np.arange(L)
    t2 = time.clock()
    print 'Elapsed time to make numpy array with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'
   
for L in [1,10,100,1000,10000]:
    t1 = time.clock()
    for i in range(N):
        ptr = <double*> malloc(sizeof(double) * L)
        free(ptr)
    t2 = time.clock()
    print 'Elapsed time with raw c-allocation with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'


setup.py
---------------------------
from distutils.core import setup
from Cython.Build import cythonize
import numpy

import Cython

#This will generate HTML to show where there are still pythonic bits hiding out
Cython.Compiler.Options.annotate = True

setup(
    name = "My hello app",
    ext_modules = cythonize('mv_test.pyx'), # accepts a glob pattern
    include_dirs = [numpy.get_include()]
)
Re: Cython array creation is horribly slow Ian Bell 11/10/12 11:09 AM
FWIW, same behavior in ubuntu as well
Re: [cython-users] Re: Cython array creation is horribly slow Robert Bradshaw 11/10/12 2:00 PM
Certainly creating an array is doing a lot more than a malloc, so if
all you want is a 1-dimensional contiguous non-Python-visible list of
c values, you're not going to beat that. I'm really not sure why
there's that much overhead though, I'd hoped it would be better for
this case.
Re: [cython-users] Re: Cython array creation is horribly slow Ian Bell 11/10/12 2:33 PM


On Sat, Nov 10, 2012 at 10:59 PM, Robert Bradshaw <robe...@gmail.com> wrote:
Certainly creating an array is doing a lot more than a malloc, so if
all you want is a 1-dimensional contiguous non-Python-visible list of
c values, you're not going to beat that. I'm really not sure why
there's that much overhead though, I'd hoped it would be better for
this case.

Undoubtedly there is some stuff happening behind the scenes, but 10 times the allocation time of a malloc? And still 3 times slower than a numpy array, which isn't exactly the fastest for allocation..

I could live with malloc but the manual memory management is not very fun.  If I implement __dealloc__ could I just do the free() there?  Would that work? 

 

Re: [cython-users] Re: Cython array creation is horribly slow Robert Bradshaw 11/10/12 2:47 PM
On Sat, Nov 10, 2012 at 2:32 PM, Ian Bell <ian.h...@gmail.com> wrote:
>
>
> On Sat, Nov 10, 2012 at 10:59 PM, Robert Bradshaw <robe...@gmail.com>
> wrote:
>>
>> Certainly creating an array is doing a lot more than a malloc, so if
>> all you want is a 1-dimensional contiguous non-Python-visible list of
>> c values, you're not going to beat that. I'm really not sure why
>> there's that much overhead though, I'd hoped it would be better for
>> this case.
>
> Undoubtedly there is some stuff happening behind the scenes, but 10 times
> the allocation time of a malloc? And still 3 times slower than a numpy
> array, which isn't exactly the fastest for allocation..

Yes, I agree it's pretty dismal. We should be able to at least match numpy.

> I could live with malloc but the manual memory management is not very fun.
> If I implement __dealloc__ could I just do the free() there?  Would that
> work?

Yes, if the lifetime of the memory can be tied to the lifetime of an
object, this works well. (I do that a lot.)
Re: [cython-users] Re: Cython array creation is horribly slow Robert Bradshaw 11/10/12 2:53 PM
On Sat, Nov 10, 2012 at 2:46 PM, Robert Bradshaw <robe...@gmail.com> wrote:
> On Sat, Nov 10, 2012 at 2:32 PM, Ian Bell <ian.h...@gmail.com> wrote:
>>
>>
>> On Sat, Nov 10, 2012 at 10:59 PM, Robert Bradshaw <robe...@gmail.com>
>> wrote:
>>>
>>> Certainly creating an array is doing a lot more than a malloc, so if
>>> all you want is a 1-dimensional contiguous non-Python-visible list of
>>> c values, you're not going to beat that. I'm really not sure why
>>> there's that much overhead though, I'd hoped it would be better for
>>> this case.
>>
>> Undoubtedly there is some stuff happening behind the scenes, but 10 times
>> the allocation time of a malloc? And still 3 times slower than a numpy
>> array, which isn't exactly the fastest for allocation..
>
> Yes, I agree it's pretty dismal. We should be able to at least match numpy.

http://trac.cython.org/cython_trac/ticket/794

If you really care about speed, for the time being you're probably not
going to beat a custom extension class with a raw pointer in it. Of
course this is assuming you need to allocate and delete a lot of
arrays quickly.
Re: [cython-users] Re: Cython array creation is horribly slow Ian Bell 11/10/12 3:07 PM


On Sat, Nov 10, 2012 at 11:52 PM, Robert Bradshaw <robe...@gmail.com> wrote:
On Sat, Nov 10, 2012 at 2:46 PM, Robert Bradshaw <robe...@gmail.com> wrote:
> On Sat, Nov 10, 2012 at 2:32 PM, Ian Bell <ian.h...@gmail.com> wrote:
>>
>>
>> On Sat, Nov 10, 2012 at 10:59 PM, Robert Bradshaw <robe...@gmail.com>
>> wrote:
>>>
>>> Certainly creating an array is doing a lot more than a malloc, so if
>>> all you want is a 1-dimensional contiguous non-Python-visible list of
>>> c values, you're not going to beat that. I'm really not sure why
>>> there's that much overhead though, I'd hoped it would be better for
>>> this case.
>>
>> Undoubtedly there is some stuff happening behind the scenes, but 10 times
>> the allocation time of a malloc? And still 3 times slower than a numpy
>> array, which isn't exactly the fastest for allocation..
>
> Yes, I agree it's pretty dismal. We should be able to at least match numpy.

http://trac.cython.org/cython_trac/ticket/794

If you really care about speed, for the time being you're probably not
going to beat a custom extension class with a raw pointer in it. Of
course this is assuming you need to allocate and delete a lot of
arrays quickly.

>> I could live with malloc but the manual memory management is not very fun.
>> If I implement __dealloc__ could I just do the free() there?  Would that
>> work?
>
> Yes, if the lifetime of the memory can be tied to the lifetime of an
> object, this works well. (I do that a lot.)

This is exactly my use case.  Think many millions of creations, copies, and element-wise operations.  I think I will soon have something that will work - might be useful to add into the cython code somehow.  I imagine that other people have similar needs for a stripped down super-fast extension type like I am building right now
 

Re: [cython-users] Cython array creation is horribly slow Mark Florisson 11/10/12 4:28 PM
On 10 November 2012 18:21, Ian Bell <ian.h...@gmail.com> wrote:
> I have been playing with memory views and cython arrays.  I am not sure what
> exactly is going wrong, but it seems like there is something not right about
> how cython allocates memory for cython arrays (as in
> http://docs.cython.org/src/userguide/memoryviews.html).  Creating of memory
> views are also horribly slow.  Or I have done something braindead in my code
> but I don't think so.  The docs for cython arrays are pretty much
> non-existent.  Where are they in the source tree?  I couldn't find the
> cython.view module anywhere

cython.view is a fake module. The cython array implementation and
other things can be found in Cython/Utility/MemoryView.pyx.

The reason it is slow is because no one ever cared to optimize it. It
allocates shape and strides independently on the heap. So it contains
at least 3 malloc calls, and a bunch of other code.

If you want to make it really fast, implement it with a free list,
only malloc the data and make sure the code in the constructor is
native where possible. No one is actively working on it, but it's just
Cython code, so patches would be appreciated and merged.
Re: [cython-users] Cython array creation is horribly slow Mark Florisson 11/10/12 4:31 PM
On 11 November 2012 00:28, mark florisson <markflo...@gmail.com> wrote:
> On 10 November 2012 18:21, Ian Bell <ian.h...@gmail.com> wrote:
>> I have been playing with memory views and cython arrays.  I am not sure what
>> exactly is going wrong, but it seems like there is something not right about
>> how cython allocates memory for cython arrays (as in
>> http://docs.cython.org/src/userguide/memoryviews.html).  Creating of memory
>> views are also horribly slow.  Or I have done something braindead in my code
>> but I don't think so.  The docs for cython arrays are pretty much
>> non-existent.  Where are they in the source tree?  I couldn't find the
>> cython.view module anywhere
>
> cython.view is a fake module. The cython array implementation and
> other things can be found in Cython/Utility/MemoryView.pyx.
>
> The reason it is slow is because no one ever cared to optimize it. It
> allocates shape and strides independently on the heap. So it contains
> at least 3 malloc calls, and a bunch of other code.
>
> If you want to make it really fast, implement it with a free list,
> only malloc the data and make sure the code in the constructor is
> native where possible. No one is actively working on it, but it's just
> Cython code, so patches would be appreciated and merged.

At the time, I considered whether cython.array should be even
documented, the interface is rather horrid. What you'd really want is
variable sized memoryviews and C arrays, like 'cdef double[:M, :N]
myarray', with a check to see whether it should be stack or heap
allocated (depending also on how it is used).
Re: Cython array creation is horribly slow Andreas van Cranenburgh 11/16/12 6:31 AM
That's not an efficient way to create a cpython.array, you want to use clone(). I also suspect that your idea of wrapping a 1-D C array in an object mostly amounts to using cpython.array without the memoryview interface. If you leave out the memoryview interface, you can use the cpython.array pointer directly (so no boundscheck):

from cpython.array cimport array, clone
cdef array uintarray = array("I", [])
[...]
def foo():
    cdef array counts = clone(uintarray, 10, False)
    counts.data.as_uint[1] = 2

As for the documentation, look at the docstrings in Cython/Includes/cpython/array.pxd
and the examples in tests/run/pyarray.pyx (paths relative to directory with cython sources).

Cheers, Andreas
Re: Cython array creation is horribly slow Andreas van Cranenburgh 11/16/12 1:11 PM
I changed your example code to follow my recommendations, to see how much overhead Python arrays have:

from cpython.array cimport array, clone
cdef array ar, template = array('d')
for L in [1,10,100,1000,10000]:
    t1 = time.clock()
    for i in range(N):
        ar = clone(template, L, False)
    t2 = time.clock()
    print 'Elapsed time to make array with length of ' +str(L)+' is '+str((t2-t1)/N*1e6)+' us'

And I get the following results:

Elapsed time to make array with length of 1 is 0.46 us
Elapsed time to make array with length of 10 is 0.47 us
Elapsed time to make array with length of 100 is 0.52 us
Elapsed time to make array with length of 1000 is 0.45 us
Elapsed time to make array with length of 10000 is 0.51 us
[... snip numpy much slower...]
Elapsed time with raw c-allocation with length of 1 is 0.39 us
Elapsed time with raw c-allocation with length of 10 is 0.37 us
Elapsed time with raw c-allocation with length of 100 is 0.42 us
Elapsed time with raw c-allocation with length of 1000 is 0.38 us
Elapsed time with raw c-allocation with length of 10000 is 0.5 us


So in conclusion there's barely any overhead with cpython.array, and it has the advantage of being part of the standard library and it takes memory management out of your hands.
Re: Cython array creation is horribly slow Andreas van Cranenburgh 11/17/12 5:03 AM
I just discovered that, confusingly, cython.array and cpython.array are actually two different modules.
This doesn't change my benchmarks and conclusion of recommending the latter for this case, but perhaps to avoid confusion it's better to just remove cython.array, seeing as it provides a subset of cpython.array (which has an optional memoryview interface) and has performance limitations as noted in this thread.
Re: [cython-users] Re: Cython array creation is horribly slow Mark Florisson 11/17/12 5:51 AM
I'd suggest using NumPy and not allocating small arrays repeatedly.
But cython.view.array it does not provide a subset of functionality,
it supports multi-dimensional and structured arrays (but 'support' is
a big word, it accepts them and allows you to obtain memoryviews from
them). I agree it's not very useful though, it was never really meant
to be used by end users.
Re: [cython-users] Re: Cython array creation is horribly slow Andreas van Cranenburgh 11/17/12 6:50 AM
The overhead of cpython.array over malloc is marginal, while NumPy is more than twice as slow as cpython.array, so the latter is clearly preferred in the 1-dimensional case (in this benchmark I used np.empty so it's uninitialized memory as with malloc), and especially when the arrays should be light weight. You can argue that one should avoid allocating many small arrays, but if the problem requires that and you try to allocate one large array, you'll end up re-implementing a sort of malloc or arena to manage things, which is not worthwhile in most cases. 

[ array creation benchmark script: https://gist.github.com/4096484 ]
Elapsed time to make arrays with lengths [1,10,100,1000,10000]
numpy: 1.09 1.09 1.29 1.35 1.35 us
raw c-allocation: 0.34 0.34 0.46 0.44 0.38 us
array: 0.41 0.4 0.44 0.44 0.45 us
array with memview: [...out of memory...]

That is, if you don't use cpython.array's memoryview support, because when that's enabled there appears to be a huge memory leak (computer grinds to a halt due to low memory). I've already established that the __releasebuffer__ function in array.pxd is called correctly, so it  may me a general bug related to buffers/memory views.
Re: [cython-users] Re: Cython array creation is horribly slow yarde...@gmail.com 3/1/14 12:33 PM
Is it possible to use the 'clone' trick to initialize a 2d array, as substitute for this?

cdef double[:,:] foo = np.empty((N,M),dtype=float)

thanks.