use the memoryview slice inside a cdef class as an attribute is slow?

488 views
Skip to first unread message

刘振海

unread,
Apr 9, 2012, 9:54:15 AM4/9/12
to cython...@googlegroups.com
hi everyone,
I have played around the memoryview slice since I knew it, it's really convenient. Thanks, cython developers!
when I try to use the memoryview slice inside a cdef class as an attribute,
I find out using the dot style to get the memoryview slice attribute to set item or get item is slower than [2],[3]
here is the code:

#cython: boundscheck = False
#cython: wraparound = False
import numpy as np
import time
cdef class Mem_slice(object):
    cdef double[::1] x
    def __init__(self, x):
        self.x=x

cdef int i
a=np.ones(10000000,"f8")
cdef Mem_slice m=Mem_slice(a)

#[1] the most convenient way but the slowest. time: 1.30s
t1=time.clock()
for i in range(10000000):
    m.x[i]=2.0
t2=time.clock()
print t2-t1

#[2] the fastest way. time: 0.034s 
t1=time.clock()
cdef double[::1] x=m.x
for i in range(10000000):
    x[i]=2.0
t2=time.clock()
print t2-t1

#[3] directly use numpy array index. time: 1.17s
t1=time.clock()
for i in range(10000000):
    a[i]=2.0
t2=time.clock()
print t2-t1


I dig a little bit the generated C source then I find:

    /* "mtest.pyx":17
 * t1=time.clock()
 * for i in range(10000000):
 *     m.x[i]=2.0             # <<<<<<<<<<<<<<
 * t2=time.clock()
 * print t2-t1
 */
    if (unlikely(!__pyx_v_5mtest_m->x.memview)) {PyErr_SetString(PyExc_AttributeError,"Memoryview is not initialized");{__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;}}
    __pyx_t_4 = __pyx_v_5mtest_m->x; // here is the second time consuming part due to the memoryview slice object's assign(copy) to temporary variable.
   //Here is the most time consuming part because of the PyThread_acquire_lock and PyThread_release_lock (in __PYX_INC_MEMVIEW and __PYX_XDEC_MEMVIEW  )
    __PYX_INC_MEMVIEW(&__pyx_t_4, 1); 
    __pyx_t_5 = __pyx_v_5mtest_i;
    *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_t_4.data) + __pyx_t_5)) )) = 2.0; 
    __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);


since I am not writing parallel code so may be there's no need to acquire the lock.
(maybe add a Compiler directives to enble or unable this?)
Maybe it can use the pointer to avoid the copy overhead or put the assign opereration ( __pyx_t_4 = __pyx_v_5mtest_m->x;) outside the loop

I didn't familiar with python's thread or compiler optimization, it would be my honor to be pointed out.

Best Regards,
liu zhenhai


mark florisson

unread,
Apr 9, 2012, 4:00:11 PM4/9/12
to cython...@googlegroups.com

Thanks for the report, it currently is indeed not as efficient as it
should be. Cython should perform many more optimizations like bounds
check optimizations and other loop optimizations as well as many
others. At least this problem can be fixed quite easily, so we'll try
fixing that for the release.

Generally speaking, the acquisition counting (reference counting for
these slices) should be more efficient and smarter and not rely on
atomics or locks, but use a more GC-like approach. In any case, the
copying overhead could be reduced by creating a new type for each
N-dimensional memoryview, which could support any N without overhead
for the other memoryviews. Both these approaches are somewhat more
involved, so will have to wait until someone is up for the task.

mark florisson

unread,
Apr 10, 2012, 6:58:35 AM4/10/12
to cython...@googlegroups.com
On 9 April 2012 14:54, 刘振海 <1989...@gmail.com> wrote:
> hi everyone,
> I have played around the memoryview slice since I knew it, it's really
> convenient. Thanks, cython developers!
> when I try to use the memoryview slice inside a cdef class as an attribute,
> I find out using the dot style to get the memoryview slice attribute to set
> item or get item is slower than [2],[3]
> here is the code:

I fixed it, I get the following results:

[1]: from 0.275564s to 0.021208s

This is even faster (0.012339s):

m.x[:] = 2.0

You can find the fixes in this branch:
https://github.com/markflorisson88/cython/tree/release

刘振海

unread,
Apr 10, 2012, 9:49:52 AM4/10/12
to cython...@googlegroups.com
Hi,
It works fine on my computer. Thank you very much.

Best Regards,
liu zhenhai

刘振海

unread,
Apr 10, 2012, 10:43:39 AM4/10/12
to cython...@googlegroups.com
Hi,
I did more test on the memoryview slice,
when I run the below code, it fails at run time.

#cython: boundscheck = False
#cython: wraparound = False
import numpy as np
import time
cdef class Mem_slice(object):
    cdef double[::1] x
    def __init__(self, x):
        self.x=x
    cdef double[::1] func(self): 
        return self.x
        
cdef int i
a=np.ones(10,"f8")
cdef Mem_slice m=Mem_slice(a)

for i in range(10):
    m.func()[i]=1

print a

    /* "test.pyx":17
 * 
 * for i in range(10):
 *     m.func()[i]=1             # <<<<<<<<<<<<<<
 * 
 * print a
 */
    __pyx_t_4 = ((struct __pyx_vtabstruct_4test_Mem_slice *)__pyx_v_4test_m->__pyx_vtab)->func(__pyx_v_4test_m); if (unlikely(!__pyx_t_4.memview)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
    __pyx_t_5 = __pyx_v_4test_i;
    *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_t_4.data) + __pyx_t_5)) )) = 1.0;
    __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1); //here is the bug, only decrease without increase the references?
Best Regards,
Liu zhenhai

mark florisson

unread,
Apr 10, 2012, 11:41:04 AM4/10/12
to cython...@googlegroups.com
2012/4/10 刘振海 <1989...@gmail.com>:

Thanks for the report, I actually fixed that today, could you retry
from my branch?

刘振海

unread,
Apr 10, 2012, 10:47:58 PM4/10/12
to cython...@googlegroups.com
Hi mark,
I have tested using your branch,it worked like a charm.
thanks for your excellent work!

Best Regards,
liu zhenhai
在 2012年4月10日 下午11:41,mark florisson <markflo...@gmail.com>写道:
2012/4/10 刘振海 <1989...@gmail.com>:
> Hi
Reply all
Reply to author
Forward
0 new messages