Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion use the memoryview slice inside a cdef class as an attribute is slow?
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
mark florisson  
View profile  
 More options Apr 9 2012, 4:00 pm
From: mark florisson <markflorisso...@gmail.com>
Date: Mon, 9 Apr 2012 21:00:11 +0100
Local: Mon, Apr 9 2012 4:00 pm
Subject: Re: [cython-users] use the memoryview slice inside a cdef class as an attribute is slow?
On 9 April 2012 14:54, 刘振海 <1989l...@gmail.com> wrote:

> hi everyone,
> I have played around the memoryview slice since I knew it, it's really
> convenient. Thanks, cython developers!
> when I try to use the memoryview slice inside a cdef class as an attribute,
> I find out using the dot style to get the memoryview slice attribute to set
> item or get item is slower than [2],[3]
> here is the code:

> #cython: boundscheck = False
> #cython: wraparound = False
> import numpy as np
> import time
> cdef class Mem_slice(object):
>     cdef double[::1] x
>     def __init__(self, x):
>         self.x=x

> cdef int i
> a=np.ones(10000000,"f8")
> cdef Mem_slice m=Mem_slice(a)

> #[1] the most convenient way but the slowest. time: 1.30s
> t1=time.clock()
> for i in range(10000000):
>     m.x[i]=2.0
> t2=time.clock()
> print t2-t1

> #[2] the fastest way. time: 0.034s
> t1=time.clock()
> cdef double[::1] x=m.x
> for i in range(10000000):
>     x[i]=2.0
> t2=time.clock()
> print t2-t1

> #[3] directly use numpy array index. time: 1.17s
> t1=time.clock()
> for i in range(10000000):
>     a[i]=2.0
> t2=time.clock()
> print t2-t1

> I dig a little bit the generated C source then I find:

>     /* "mtest.pyx":17
>  * t1=time.clock()
>  * for i in range(10000000):
>  *     m.x[i]=2.0             # <<<<<<<<<<<<<<
>  * t2=time.clock()
>  * print t2-t1
>  */
>     if (unlikely(!__pyx_v_5mtest_m->x.memview))
> {PyErr_SetString(PyExc_AttributeError,"Memoryview is not
> initialized");{__pyx_filename = __pyx_f[0]; __pyx_lineno = 17; __pyx_clineno
> = __LINE__; goto __pyx_L1_error;}}
>     __pyx_t_4 = __pyx_v_5mtest_m->x; // here is the second time consuming
> part due to the memoryview slice object's assign(copy) to temporary
> variable.
>    //Here is the most time consuming part because of the
> PyThread_acquire_lock and PyThread_release_lock (in __PYX_INC_MEMVIEW
> and __PYX_XDEC_MEMVIEW  )
>     __PYX_INC_MEMVIEW(&__pyx_t_4, 1);
>     __pyx_t_5 = __pyx_v_5mtest_i;
>     *((double *) ( /* dim=0 */ ((char *) (((double *) __pyx_t_4.data) +
> __pyx_t_5)) )) = 2.0;
>     __PYX_XDEC_MEMVIEW(&__pyx_t_4, 1);

> since I am not writing parallel code so may be there's no need to acquire
> the lock.
> (maybe add a Compiler directives to enble or unable this?)
> Maybe it can use the pointer to avoid the copy overhead or put the assign
> opereration ( __pyx_t_4 = __pyx_v_5mtest_m->x;) outside the loop

> I didn't familiar with python's thread or compiler optimization, it would be
> my honor to be pointed out.

> Best Regards,
> liu zhenhai

Thanks for the report, it currently is indeed not as efficient as it
should be. Cython should perform many more optimizations like bounds
check optimizations and other loop optimizations as well as many
others. At least this problem can be fixed quite easily, so we'll try
fixing that for the release.

Generally speaking, the acquisition counting (reference counting for
these slices) should be more efficient and smarter and not rely on
atomics or locks, but use a more GC-like approach. In any case, the
copying overhead could be reduced by creating a new type for each
N-dimensional memoryview, which could support any N without overhead
for the other memoryviews. Both these approaches are somewhat more
involved, so will have to wait until someone is up for the task.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.