use the memoryview slice inside a cdef class as an attribute is slow?

刘振海

unread,

Apr 9, 2012, 9:54:15 AM4/9/12

to cython...@googlegroups.com

hi everyone,

I have played around the memoryview slice since I knew it, it's really convenient. Thanks, cython developers!

when I try to use the memoryview slice inside a cdef class as an attribute,

I find out using the dot style to get the memoryview slice attribute to set item or get item is slower than [2],[3]

here is the code:

#cython: boundscheck = False

#cython: wraparound = False

import numpy as np

import time

cdef class Mem_slice(object):

cdef double[::1] x

def __init__(self, x):

self.x=x

cdef int i

a=np.ones(10000000,"f8")

cdef Mem_slice m=Mem_slice(a)

#[1] the most convenient way but the slowest. time: 1.30s

t1=time.clock()

for i in range(10000000):

m.x[i]=2.0

t2=time.clock()

print t2-t1

#[2] the fastest way. time: 0.034s

t1=time.clock()

cdef double[::1] x=m.x

for i in range(10000000):

x[i]=2.0

t2=time.clock()

print t2-t1

#[3] directly use numpy array index. time: 1.17s

t1=time.clock()

for i in range(10000000):

a[i]=2.0

t2=time.clock()

print t2-t1

I dig a little bit the generated C source then I find:

/* "mtest.pyx":17

* t1=time.clock()

* for i in range(10000000):

* m.x[i]=2.0 # <<<<<<<<<<<<<<

* t2=time.clock()

* print t2-t1

*/

if (unlikely(!__pyx_v_5mtest_m->x.memview)) {PyErr_SetString(PyExc_AttributeError,"Memoryview is not initialized");{__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;}}

__pyx_t_4 = __pyx_v_5mtest_m->x; // here is the second time consuming part due to the memoryview slice object's assign(copy) to temporary variable.

//Here is the most time consuming part because of the PyThread_acquire_lock and PyThread_release_lock (in __PYX_INC_MEMVIEW and __PYX_XDEC_MEMVIEW )

__PYX_INC_MEMVIEW(&__pyx_t_4, 1);

__pyx_t_5 = __pyx_v_5mtest_i;

*((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_t_4.data) + __pyx_t_5)) )) = 2.0;

__PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);

since I am not writing parallel code so may be there's no need to acquire the lock.

(maybe add a Compiler directives to enble or unable this?)

Maybe it can use the pointer to avoid the copy overhead or put the assign opereration ( __pyx_t_4 = __pyx_v_5mtest_m->x;) outside the loop

I didn't familiar with python's thread or compiler optimization, it would be my honor to be pointed out.

Best Regards,

liu zhenhai

mark florisson

unread,

Apr 9, 2012, 4:00:11 PM4/9/12

to cython...@googlegroups.com

Thanks for the report, it currently is indeed not as efficient as it
should be. Cython should perform many more optimizations like bounds
check optimizations and other loop optimizations as well as many
others. At least this problem can be fixed quite easily, so we'll try
fixing that for the release.

Generally speaking, the acquisition counting (reference counting for
these slices) should be more efficient and smarter and not rely on
atomics or locks, but use a more GC-like approach. In any case, the
copying overhead could be reduced by creating a new type for each
N-dimensional memoryview, which could support any N without overhead
for the other memoryviews. Both these approaches are somewhat more
involved, so will have to wait until someone is up for the task.

mark florisson

unread,

Apr 10, 2012, 6:58:35 AM4/10/12

to cython...@googlegroups.com

On 9 April 2012 14:54, 刘振海 <1989...@gmail.com> wrote:

> hi everyone,
> I have played around the memoryview slice since I knew it, it's really
> convenient. Thanks, cython developers!
> when I try to use the memoryview slice inside a cdef class as an attribute,
> I find out using the dot style to get the memoryview slice attribute to set
> item or get item is slower than [2],[3]
> here is the code:

I fixed it, I get the following results:

[1]: from 0.275564s to 0.021208s

This is even faster (0.012339s):

m.x[:] = 2.0

You can find the fixes in this branch:
https://github.com/markflorisson88/cython/tree/release

刘振海

unread,

Apr 10, 2012, 9:49:52 AM4/10/12

to cython...@googlegroups.com

Hi,

It works fine on my computer. Thank you very much.

Best Regards,

liu zhenhai

刘振海

unread,

Apr 10, 2012, 10:43:39 AM4/10/12

to cython...@googlegroups.com

Hi,

I did more test on the memoryview slice,

when I run the below code, it fails at run time.

#cython: boundscheck = False

#cython: wraparound = False

import numpy as np

import time

cdef class Mem_slice(object):

cdef double[::1] x

def __init__(self, x):

self.x=x

cdef double[::1] func(self):

return self.x

cdef int i

a=np.ones(10,"f8")

cdef Mem_slice m=Mem_slice(a)

for i in range(10):

m.func()[i]=1

print a

    /* "test.pyx":17
 * 
 * for i in range(10):
 *     m.func()[i]=1             # <<<<<<<<<<<<<<
 * 
 * print a
 */
    __pyx_t_4 = ((struct __pyx_vtabstruct_4test_Mem_slice *)__pyx_v_4test_m->__pyx_vtab)->func(__pyx_v_4test_m); if (unlikely(!__pyx_t_4.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_5 = __pyx_v_4test_i;
    *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_t_4.data) + __pyx_t_5)) )) = 1.0;
    __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1); //here is the bug, only decrease without increase the references?

Best Regards,

Liu zhenhai

mark florisson

unread,

Apr 10, 2012, 11:41:04 AM4/10/12

to cython...@googlegroups.com

2012/4/10 刘振海 <1989...@gmail.com>:

Thanks for the report, I actually fixed that today, could you retry
from my branch?

刘振海

unread,

Apr 10, 2012, 10:47:58 PM4/10/12

to cython...@googlegroups.com

Hi mark,

I have tested using your branch,it worked like a charm.

thanks for your excellent work!

Best Regards,

liu zhenhai

在 2012年4月10日下午11:41，mark florisson <markflo...@gmail.com>写道：

2012/4/10 刘振海 <1989...@gmail.com>:

> Hi

Reply all

Reply to author

Forward