declare numpy.array in cython class

1,611 views
Skip to first unread message

Zahra Sheikhbahaee

unread,
Jun 9, 2014, 11:02:13 AM6/9/14
to cython...@googlegroups.com

I a very newbie with cython and c and I want to write a cython class which divides a two dimensional numpy array  to two one dimensional arrays, e.g. x and y. But I don't know how to declare the numpy.array for my class. I have written so far this but it doesn't work(demo.pyx):

 import cython
cimport cython

import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    double *x
    double *y
    def __init__(self, np.ndarray[DTYPE_t,ndim=2] positions):
        self.x = &positions[:,0]
        self.y = &positions[:,1]

If for instance my input array is a=np.array([[2,1.5],[1,0.6]]), I would make this work in order to get the following output
from demo import halo_positions
pos=halo_positions(a)
print pos.x
np.array([2,1.])

I look forward to the replies.




Chris Barker

unread,
Jun 9, 2014, 5:34:27 PM6/9/14
to cython-users
On Mon, Jun 9, 2014 at 8:02 AM, Zahra Sheikhbahaee <sheikh...@gmail.com> wrote:

I a very newbie with cython and c and I want to write a cython class which divides a two dimensional numpy array  to two one dimensional arrays, e.g. x and y.

first -- there is probably little benefit to using Cython for this -- numpy knows how to do it, and will do it if its C library about as fast as it can.

But as this may be a simplification of a larger problem where it does make sense, here are some comments:
 
But I don't know how to declare the numpy.array for my class. I have written so far this but it doesn't work(demo.pyx):

 import cython
cimport cython

import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

just so you know, this is more complicated than it needs to be -- it's good if you want to be able to change the dtype in one place, but you can also just put np.float64 in where you need a dtype.
 
cdef class halo_positions(object):
    double *x
    double *y

here's you problem -- this is declaring pointers (which are arrays in C), but you would still need to allocate the memory for them, and copy data to them when you want to use them. IF you want these to be numpy arrays, then:

    cdef x np.ndarray[DTYPE_t, ndim=1]
    cdef y np.ndarray[DTYPE_t, ndim=1]
 
(untested)
    def __init__(self, np.ndarray[DTYPE_t,ndim=2] positions):
        self.x = &positions[:,0]
        self.y = &positions[:,1]

and here:
        self.x = positions[:,0]
        self.y = positions[:,1]
 
but again, that's going to be about the same as pure python.

What you had was making a pointer that pointed to teh passed-in numpy array's data block, but:

1) that data would only be valid whilie he numpy array was around -- when it was deleted (garbage collected), the pointers would be invalid.

2) the data wouldn't be arranged in memory the right way anyway. (note: it could be if you forced the numpy array into Fortran order, but I don't think this is what you really want anyway)

HTH,

-Chris


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Zahra Sheikh

unread,
Jun 10, 2014, 3:34:41 AM6/10/14
to cython...@googlegroups.com
Hi Chris,

I made the correction you pointed out as following:


import cython
cimport cython

import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    cdef x np.ndarray[DTYPE_t, ndim=1]
    cdef y np.ndarray[DTYPE_t, ndim=1]
    def __init__(self, np.ndarray[DTYPE_t,ndim=2] positions):
        self.x = positions[:,0]
        self.y = positions[:,1]

    def debug(self):
        print self.x, self.y
and I got this error message:


Error compiling Cython file:
------------------------------------------------------------
...

cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    cdef x np.ndarray[DTYPE_t, ndim=1]
            ^
------------------------------------------------------------

demo.pyx:10:13: Syntax error in C variable declaration

I try to improve the speed of my code, I figured the bottleneck in my original code which contains scipy.integration.quad, sum over elements of an array and iteration in a loop. I want to use cython inorder to use cython gsl and also by defining in my cython code this halo_positions parameter as an array, later I can use numpy.vectorize and avoid some loops. However I am not completely sure it might help. What would you reckon?

Yury V. Zaytsev

unread,
Jun 10, 2014, 6:24:29 AM6/10/14
to cython...@googlegroups.com
On Tue, 2014-06-10 at 00:34 -0700, Zahra Sheikh wrote:
>
> cdef x np.ndarray[DTYPE_t, ndim=1]

It should be the other way around:

cdef np.ndarray[DTYPE_t, ndim=1] x

The type should precede the variable name.

I don't think that numpy.vectorize() is going to help to speed the code
up, it's more of a convenience type of thing.

--
Sincerely yours,
Yury V. Zaytsev


Zahra Sheikh

unread,
Jun 10, 2014, 6:57:01 AM6/10/14
to cython...@googlegroups.com

Hi Yury,

Even changing the definition to what you have already mentioned caused error messages. Assuming my demo.pyx is as following

import cython
cimport cython

import numpy as np
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    cdef np.ndarray[DTYPE_t, ndim=1] x
    cdef np.ndarray[DTYPE_t, ndim=1] y
    def __init__(self, np.ndarray[DTYPE_t,ndim=2] positions):
        self.x = positions[:,0]
        self.y = positions[:,1]

    def debug(self):
        print self.x, self.y

I am getting this error message:
 Error compiling Cython file:
------------------------------------------------------------
...
cimport numpy as np

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    cdef np.ndarray[DTYPE_t, ndim=1] x
                                    ^
------------------------------------------------------------

demo.pyx:10:37: Buffer types only allowed as function local variables

Error compiling Cython file:
------------------------------------------------------------
...

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t
cdef class halo_positions(object):
    cdef np.ndarray[DTYPE_t, ndim=1] x
    cdef np.ndarray[DTYPE_t, ndim=1] y
                                    ^
------------------------------------------------------------

demo.pyx:11:37: Buffer types only allowed as function local variables

Regards,
Zahra

Chris Barker

unread,
Jun 10, 2014, 1:04:38 PM6/10/14
to cython-users
On Tue, Jun 10, 2014 at 12:34 AM, Zahra Sheikh <sheikh...@gmail.com> wrote:
I made the correction you pointed out as following:
Error compiling Cython file:
------------------------------------------------------------

That's what I get for posting untested code -- sorry!

I am getting this error message:

cdef class halo_positions(object):
    cdef np.ndarray[DTYPE_t, ndim=1] x
                                    ^
------------------------------------------------------------

demo.pyx:10:37: Buffer types only allowed as function local variables

OK -- Igue4ss you can't use numpy arrays as class attributes in cdef classes - not sure why, not, but moving on...

You can use memory views instead, which you may well prefer, actually:


cdef class Halo_Positions(object):

    cdef double [:] _x

creates a 1-d memory view object, that can borrow the memory from a numpy array.

If you want access to that attribute from python (you may not), you can make a property:

cdef class Halo_Positions(object):

    cdef double [:] _x
    property x:
        def __get__(self):
            return np.array(self._x)
        def __set__(self, np.ndarray[DTYPE_t, ndim=1] x):
            self._x = x

(you also may not want a setter...)

Enclosed is a compolete example, pyx file, setup.y to built it, and test python code.

I try to improve the speed of my code, I figured the bottleneck in my original code which contains scipy.integration.quad, sum over elements of an array and iteration in a loop. I want to use cython inorder to use cython gsl and also by defining in my cython code this halo_positions parameter as an array, later I can use numpy.vectorize and avoid some loops.However I am not completely sure it might help. What would you reckon?

numpy.vectorize won't help.

making sure that something is an array, does n't take Cython. Cythonb make help if you re-write the iteration itself in cython -- i.e. re-write scipy.integration.quad. A little more detail might help us suggest the best option -- perhaps a small sample code that does what you want in pure python.
numpy_test.pyx
setup.py
test_numpy_test.py

Zaur Shibzukhov

unread,
Jun 10, 2014, 1:14:02 PM6/10/14
to cython...@googlegroups.com


понедельник, 9 июня 2014 г., 19:02:13 UTC+4 пользователь Zahra Sheikh написал:
I'd propose to use memory views:

import cython
cimport cython

import numpy as np
cimport numpy as np

np.import_array()

DTYPE = np.float64
ctypedef np.float64_t DTYPE_t

cdef class halo_positions(object):
    cdef DTYPE_t[:] x
    cdef DTYPE_t[:] y
    def __init__(self, DTYPE_t[:,::1] positions):
        self.x = positions[:,0]
        self.y = positions[:,1]

    def debug(self):
        print np.array(self.x), np.array(self.y)


def test():
    ap = np.array([[1,2],[3,4]], dtype=np.float64)
    hp = halo_positions(ap)
    hp.debug()
    
 

Daπid

unread,
Jun 10, 2014, 1:26:29 PM6/10/14
to cython...@googlegroups.com
On 10 June 2014 19:03, Chris Barker <chris....@noaa.gov> wrote:
-- i.e. re-write scipy.integration.quad.

That is probably the way to get best performance. But an easier path is to make your integrand as fast as possible, so you only have to cythonise one function. I have done similar things (not specifically for quad), and got an order of magnitude faster for a quite simple function.

It seems like you are doing something more complex than a mere integration, and I don't know how much this would apply, but you should give it a try and see if it is fast enough.

Zahra Sheikh

unread,
Jun 10, 2014, 2:53:13 PM6/10/14
to cython...@googlegroups.com
Thanks a lot Chris! I didn't use the properties of __get__ and __set__  functions to pass the  python array declaration to cython object. Back to my another problem, I have written another cython class which has again problem and I think I need the opinion of one expert. I used cython_gsl to improve the speed of my code (cnfw_model.pyx ) which one of its bottleneck in python version was the integration in Da instance:
import numpy as np
from cython_gsl cimport *

ctypedef double * double_ptr
ctypedef void * void_ptr


cdef class Cosmology(object):
    cdef double omega_c
    def __init__(self, double omega_m=0.3, double omega_lam=0.7):
        # no quintessence, no radiation in this universe!
        self.omega_m = omega_m
        self.omega_lam = omega_lam
        self.omega_c = (1. - omega_m - omega_lam)

    cdef double a(self, double z):
        return 1./(1+z)

    cdef double E(self, double a):
        return (self.omega_m*a**(-3) + self.omega_c*a**(-2) + self.omega_lam)**0.5

    cdef double __angKernel(self, double x, void * params) nogil:
        cdef double alpha, f
        alpha = (<double_ptr> params)[0]
        f=self.E(alpha*x**-1)**-1
        return f

    cdef double Da(self, int size, double *z, double z_ref=0):
        cdef gsl_integration_workspace *w
        cdef double result, error, expected, alpha
        w = gsl_integration_workspace_alloc (1000)
        cdef gsl_function F
        if size>1:
           da = np.zeros(size)
           for i in range(size):
               da[i] = self.Da(1, &z[i], z_ref)
           return da
        else:
           if z[0] < 0:
                    raise ValueError("Redshift z must not be negative")
           if z[0] < z_ref:
                    raise ValueError("Redshift z must not be smaller than the reference redshift")

           expected = -4.0
           alpha = 1

           F.function = &self.__angKernel
           F.params = &alpha

           gsl_integration_qags (&F, z_ref+1, z[0]+1, 0, 1e-7, 1000, w, &result, &error)
           d=result
           rk = (abs(self.omega_c))**0.5
           if (rk*d > 0.01):
              if self.omega_c > 0:
                 d = sinh(rk*d)/rk
              if self.omega_c < 0:
                 d = sin(rk*d)/rk
           return d/(1+z[0])
 
When I tried to build this code, I got this error message:

Error compiling Cython file:
------------------------------------------------------------
...
                    raise ValueError("Redshift z must not be smaller than the reference redshift")

           expected = -4.0
           alpha = 1

           F.function = &self.__angKernel
                       ^
------------------------------------------------------------

cnfw_model.pyx:102:24: Cannot assign type 'double (*)(Cosmology, double, void *) nogil' to 'double (*)(double, void *) nogil'


Error compiling Cython file:
------------------------------------------------------------
...
    cdef double __angKernel(self, double x, void * params) nogil:
        cdef double alpha, f
        alpha = (<double_ptr> params)[0]
        """Integration kernel for angular diameter distance computation.
        """
        f=self.E(alpha*x**-1)**-1
               ^
------------------------------------------------------------

cnfw_model.pyx:73:16: Calling gil-requiring function not allowed without gil
Reply all
Reply to author
Forward
0 new messages