Error compiling cython with numpy arrays

1,833 views
Skip to first unread message

Justin

unread,
Mar 31, 2012, 11:03:49 AM3/31/12
to cython...@googlegroups.com
Hi,
This is my first time posting on here, so please let me know
if I'm not following correct formatting or something.

Basically, I keep getting the following error message:


Error compiling Cython file:
------------------------------------------------------------
...
global cl_positions, cl_radii, cl_elements, positions,\
radii, cl_masses, masses, cl_wpos, cl_wvel,\
cl_velocities
cdef:
int prox, el, temp
FLOAT_t score, d_big, l=0.0, m_dir
nu.ndarray[ndim=1,dtype=FLOAT] v_big, v_rel, dist
^
------------------------------------------------------------

ast_builder.pyx:406:8: 'ndarray' is not a type identifier


I would provide my code, but it consists of a few hundred
lines and isn't completed. However, every single instance
of ndarray is getting this error message.

Dropping the variable typing here is an option, but the
code references arrays a huge number of times with each
iteration, so I would really rather not do so.

Does anyone have any idea why this might be happening?

(By the way, I'm using Python 2.7.2 with Numpy 1.6.1 and
Cython 0.14.1 on a Windows 7 machine).

Thanks in advance for any help

Robert Bradshaw

unread,
Mar 31, 2012, 6:51:11 PM3/31/12
to cython...@googlegroups.com
On Sat, Mar 31, 2012 at 8:03 AM, Justin <just...@gmail.com> wrote:
> Hi,
> This is my first time posting on here, so please let me know
> if I'm not following correct formatting or something.
>
> Basically, I keep getting the following error message:
>
>
> Error compiling Cython file:
> ------------------------------------------------------------
> ...
>    global cl_positions, cl_radii, cl_elements, positions,\
> radii, cl_masses, masses, cl_wpos, cl_wvel,\
>            cl_velocities
>    cdef:
>        int prox, el, temp
>        FLOAT_t score, d_big, l=0.0, m_dir
>        nu.ndarray[ndim=1,dtype=FLOAT] v_big, v_rel, dist
>       ^
> ------------------------------------------------------------
>
> ast_builder.pyx:406:8: 'ndarray' is not a type identifier
>
>
> I would provide my code, but it consists of a few hundred
> lines and isn't completed. However, every single instance
> of ndarray is getting this error message.

At a few hundred lines, a link to the full code could be helpful.

> Dropping the variable typing here is an option, but the
> code references arrays a huge number of times with each
> iteration, so I would really rather not do so.
>
> Does anyone have any idea why this might be happening?
>
> (By the way, I'm using Python 2.7.2 with Numpy 1.6.1 and
> Cython 0.14.1 on a Windows 7 machine).
>
> Thanks in advance for any help

What is nu? Is it supposed to be np (e.g. if you did a cimport numpy as np)?

- Robert

Justin

unread,
Apr 1, 2012, 12:28:04 PM4/1/12
to cython-users
Yes, I've always been in the habit of importing numpy as nu.

I've written another program just to see which aspects of cython are
working and which aren't, so I can just show that source here.

The original had no problems other than the type declarations, it
compiled once I removed any ndarray type declarations, but I'm
concerned about speed losses since my algorithm has a lot of looping
over the arrays.

Basically, I've written the following useless program to test
compilation and execution speed:


import numpy as nu
cimport numpy as nu

from libc.math cimport sqrt
cdef double sqrt(double x):
return sqrt(x)

#cdef nu.ndarray[nu.float,ndim=1] start=nu.zeros(10000)
#cdef nu.ndarray start=nu.zeros(10000,dtype=nu.float)
start=nu.zeros(10000,dtype=nu.float)

def run_test(int n_loops):
global start
cdef:
nu.ndarray arr=nu.zeros(len(start),dtype=nu.float)
nu.float_t add=0.0
int i, j, l
l=len(start)
for i in xrange(l):
start[i]=add+i
i=0
while i<n_loops:
for j in xrange(l):
if j==0:
arr[j]=sqrt(start[j]*start[j+1])
continue
if j==l-1:
arr[j]=sqrt(start[j-1]*start[j])
continue
arr[j]=sqrt(start[j-1]*start[j]*start[j+1])
for j in xrange(l):
arr[j]=0.0
i+=1
return True


When I edit the file by using a different one of the 3 lines below the
sqrt declaration I get the following results:

-If I use the first line, I get this error:

Error compiling Cython file:
------------------------------------------------------------
...

from libc.math cimport sqrt
cdef double sqrt(double x):
return sqrt(x)

cdef nu.ndarray[nu.float,ndim=1] start=nu.zeros(10000)
^
------------------------------------------------------------

cy_test.pyx:8:33: Buffer types only allowed as function local
variables

Error compiling Cython file:
------------------------------------------------------------
...
import numpy as nu
^
------------------------------------------------------------

cy_test.pyx:1:0: Buffer vars not allowed in module scope
building 'c_test' extension
C:\Python27\Scripts\gcc.exe -mno-cygwin -mdll -O -Wall -IC:
\Python27\lib\site-pa
ckages\numpy\core\include -IC:\Python27\include -IC:\Python27\PC -c
cy_test.c -o
build\temp.win32-2.7\Release\cy_test.o
cy_test.c:1:2: error: #error Do not use this file, it is the result of
a failed
Cython compilation.
error: command 'gcc' failed with exit status 1


-If I use the second line, it works about twice as fast as Python.

-The third line also works, but only about 1.5 times as fast as
Python.

I'm looking for at least a 10 time speedup in the code I'm using and
have tweaked the algorithm as much as I can.

I know the problem has something to do with the buffer types, but have
no idea how to solve it.


On Mar 31, 11:51 pm, Robert Bradshaw <rober...@gmail.com> wrote:

Stefan Behnel

unread,
Apr 1, 2012, 1:37:28 PM4/1/12
to cython...@googlegroups.com
Justin, 01.04.2012 18:28:

> from libc.math cimport sqrt
> cdef double sqrt(double x):
> return sqrt(x)

What is this even supposed to do? When I compile it, I get a compile error:

Error compiling Cython file:
------------------------------------------------------------
...
from libc.math cimport sqrt
cdef double sqrt(double x):

^
------------------------------------------------------------
sq.pyx:2:5: Function signature does not match previous declaration

(and I'm quite happy to get one at all, even if it's not exactly the one
I'd like to get)

Stefan

J Diviney

unread,
Apr 1, 2012, 4:49:43 PM4/1/12
to cython...@googlegroups.com
This is just a simple example of the imports I'm doing. Like I said the code is useless, I just wanted to show what I was having trouble with instead of uploading 500 lines of code and the files that go with it.
That's odd, I just checked again, the importing of sqrt from the c library works fine for me. In any case, you can just replace that with:
from math import sqrt.
Once again, apologies if my description is lacking, haven't posted here before.

Aronne Merrelli

unread,
Apr 2, 2012, 11:32:32 AM4/2/12
to cython...@googlegroups.com
On Sun, Apr 1, 2012 at 3:49 PM, J Diviney <just...@gmail.com> wrote:
> This is just a simple example of the imports I'm doing. Like I said the code
> is useless, I just wanted to show what I was having trouble with instead of
> uploading 500 lines of code and the files that go with it.
> That's odd, I just checked again, the importing of sqrt from the c library
> works fine for me. In any case, you can just replace that with:
> from math import sqrt.
> Once again, apologies if my description is lacking, haven't posted here
> before.
>

Correct - if you do:

from libc.math cimport sqrt

Then you are "done" - all later calls to sqrt(x) in your cdef function
will directly use the C function.

In your example code you - do you really need to pull in the variable
as a global? You are defining the variable outside of any cdef
function, which isn't really helpful IMO. I do not think you will be
able to access "start" from python, since python cannot directly
access cdef variables (only cdef functions). Plus, the first version
with the declared dimensions would be the fastest; you can do this
easily if you just cdef an additional input to the test function.
Specifically, replace this:

start=nu.zeros(10000,dtype=nu.float)
def run_test(int n_loops):
global start

...

With the following:

def run_test2(int n_loops, nu.ndarray[nu.float_t, ndim=1] start):
...


If you also typedef the arr variable same way - specifically, replace:

cdef nu.ndarray arr=nu.zeros(len(start),dtype=nu.float)

With this (I like to make this 2 steps, it is clearer to me):

cdef nu.ndarray[nu.float_t, ndim=1] arr
arr=nu.zeros(len(start),dtype=nu.float)

The cython version is then substantially faster - here is a timing
result on my machine (I put those functions into dummy.pyx):

In [14]: %timeit dummy.run_test(10)
10 loops, best of 3: 193 ms per loop

In [15]: %timeit dummy.run_test2(10,np.zeros(10000))
1000 loops, best of 3: 996 us per loop

Hope that helps,
Aronne

J Diviney

unread,
Apr 3, 2012, 1:38:43 AM4/3/12
to cython...@googlegroups.com
For this particular example, I don't need the variable as a global. However, it is needed for the project I'm doing.
I realise that the ndarray declarations you've used are faster, but that's exactly my problem, they refuse to compile and I usually get a message saying:


'ndarray' is not a type identifier


Robert Bradshaw

unread,
Apr 3, 2012, 1:53:28 AM4/3/12
to cython...@googlegroups.com
On Mon, Apr 2, 2012 at 10:38 PM, J Diviney <just...@gmail.com> wrote:
> For this particular example, I don't need the variable as a global. However,
> it is needed for the project I'm doing.

If you really need this, you can assign to a local variable within
your function (and then re-assign on exiting if you didn't just change
it inplace).

> I realise that the ndarray declarations you've used are faster, but that's
> exactly my problem, they refuse to compile and I usually get a message
> saying:
>
>
> 'ndarray' is not a type identifier

Could you please send the smallest, complete example you can create
where you get this error?

J Diviney

unread,
Apr 3, 2012, 1:57:59 AM4/3/12
to cython...@googlegroups.com
I have a series of arrays each with at least 1000 elements and each of which needs to be accessed by about 10 different functions for different purposes.
I've declared them as global to make the code simpler and to avoid passing them as arguments when it's much simpler to pass indices.
In any case, I've tried these things, ndarray declarations with buffer types just aren't working for me, regardless of where I make them.

Robert Bradshaw

unread,
Apr 3, 2012, 3:46:14 AM4/3/12
to cython...@googlegroups.com
On Mon, Apr 2, 2012 at 10:57 PM, J Diviney <just...@gmail.com> wrote:
> I have a series of arrays each with at least 1000 elements and each of which
> needs to be accessed by about 10 different functions for different purposes.
> I've declared them as global to make the code simpler and to avoid passing
> them as arguments when it's much simpler to pass indices.

Perhaps this would be more naturally written as a class with methods?
(Or maybe not, it all depends and I don't have enough information to
judge.)

> In any case, I've tried these things, ndarray declarations with buffer types
> just aren't working for me, regardless of where I make them.

Could you give a short, complete example of something that doesn't
work for you? If there's a bug here, we'd like to know.

Justin

unread,
Apr 3, 2012, 4:55:05 AM4/3/12
to cython...@googlegroups.com
Ok, I'll put together an example script with the same error as soon as I can.

I previously (in the pure python version) used classes, but a closer look at the program and it's eventual outputs showed that state vectors were more efficient and effective.

Justin

unread,
Apr 5, 2012, 9:43:51 AM4/5/12
to cython-users
I've not managed to get the same error message, but it is stemming
from the same thing (buffer type declarations):

import numpy as nu
cimport numpy as nu
import random as ra
import cython

#A very stripped down version of the original program:

ctypedef nu.float FLOAT
ctypedef nu.float_t FLOAT_t

#Math functions:
cdef extern from "math.h":
double sqrt(double x)

cdef:
nu.ndarray[FLOAT,ndim=1] positions, radii, masses
double size=1000
int number=200

def initiate():
"""Begins the simulation"""
global positions, radii, masses, size, number
cdef:
int i
positions=nu.zeros(number*3)
radii=nu.zeors(number)
masses=nu.zeros(number)
for i in xrange(number):
positions[i*3]=ra.uniform(0.1*size,0.9*size)
positions[i*3+1]=ra.uniform(0.1*size,0.9*size)
positions[i*3+2]=ra.uniform(0.1*size,0.9*size)
radii[i]=ra.uniform(1,20)
masses[i]=ra.uniform(1,80)
return 0

def number_of_collisions():
"""Counts how many collisions there are"""
global number
cdef i, j, collisions
for i in xrange(number):
j=0
while i+j<number:
if do_collide(i,i+j,radii):
collisions+=1
return collisions

cdef bint do_collide(int i, int j, nu.ndarray[nu.float,ndim=1] radii):
"""Checks if two particles collide"""
global positions, radii
cdef:
nu.ndarray[FLOAT,ndim=1] dist
double d2, rad2
dist=positions[i*3:i*3+3]-positions[j*3:j*3+3]
d2=nu.dot(dist,dist)
rad2=(radii[i]+radii[j])*(radii[i]+radii[j])
if dist<rad2:
return True
return False


The above code refuses to compile due to a series of errors basically
telling me that I can only use buffer types as local variables. Is
there really no way to have a global and typed array in Cython?
I'll try and get the error I originally posted again, but I'm on a new
computer now, so the compiler could (but shouldn't) function slightly
differently.


On Apr 3, 8:46 am, Robert Bradshaw <rober...@gmail.com> wrote:
> On Mon, Apr 2, 2012 at 10:57 PM, J Diviney <justd...@gmail.com> wrote:
> > I have a series of arrays each with at least 1000 elements and each of which
> > needs to be accessed by about 10 different functions for different purposes.
> > I've declared them as global to make the code simpler and to avoid passing
> > them as arguments when it's much simpler to pass indices.
>
> Perhaps this would be more naturally written as a class with methods?
> (Or maybe not, it all depends and I don't have enough information to
> judge.)
>
> > In any case, I've tried these things, ndarray declarations with buffer types
> > just aren't working for me, regardless of where I make them.
>
> Could you give a short, complete example of something that doesn't
> work for you? If there's a bug here, we'd like to know.
>
>
>
>
>
>
>
> > On 3 April 2012 06:53, Robert Bradshaw <rober...@gmail.com> wrote:
>
> >> On Mon, Apr 2, 2012 at 10:38 PM, J Diviney <justd...@gmail.com> wrote:
> >> > For this particular example, I don't need the variable as a global.
> >> > However,
> >> > it is needed for the project I'm doing.
>
> >> If you really need this, you can assign to a local variable within
> >> your function (and then re-assign on exiting if you didn't just change
> >> it inplace).
>
> >> > I realise that the ndarray declarations you've used are faster, but
> >> > that's
> >> > exactly my problem, they refuse to compile and I usually get a message
> >> > saying:
>
> >> > 'ndarray' is not a type identifier
>
> >> Could you please send the smallest, complete example you can create
> >> where you get this error?
>
> >> > On 2 April 2012 16:32, Aronne Merrelli <aronne.merre...@gmail.com>
> >> > wrote:

刘振海

unread,
Apr 5, 2012, 10:09:43 AM4/5/12
to cython...@googlegroups.com
hi,
I think you may use the memoryview slice (new in cython0.16) to replace the  nu.ndarray[FLOAT,ndim=1]  with  FLOAT_t [:]

cheers,
Liu zhenhai

刘振海

unread,
Apr 5, 2012, 10:50:34 AM4/5/12
to cython...@googlegroups.com
Hi
I have modified your code as below:


import numpy as nu
import random as ra


#A very stripped down version of the original program:

ctypedef double FLOAT_t # you can us nu.float_t as well

#Math functions:
cdef extern from "math.h":
   double sqrt(double x)

cdef:
   FLOAT_t[:] positions, radii, masses
   double size=10
   int number=10

def initiate():
   """Begins the simulation"""
   global positions, radii, masses, size, number
   cdef:
       int i
   positions=nu.zeros(number*3)
   radii=nu.zeros(number)
   masses=nu.zeros(number)
   for i in xrange(number):
       positions[i*3]=ra.uniform(0.1*size,0.9*size)
       positions[i*3+1]=ra.uniform(0.1*size,0.9*size)
       positions[i*3+2]=ra.uniform(0.1*size,0.9*size)
       radii[i]=ra.uniform(1,20)
       masses[i]=ra.uniform(1,80)
   return 0

def number_of_collisions():
   """Counts how many collisions there are"""
   global number,radii
   cdef int i, j, collisions=0 # here added int and init collisions to 0
   for i in xrange(number):
       j=0
       while i+j<number:
           if do_collide(i,i+j,radii):
               collisions+=1
           j+=1 # add
   return collisions

cdef bint do_collide(int i, int j, FLOAT_t[:] radii):
   """Checks if two particles collide"""
   global positions
   cdef:
       double d2, rad2
   dist=positions[i]-positions[j]   # you may use a "for" to implement the comparison, element wise, here I change your code for prototype
   rad2=(radii[i]+radii[j])*(radii[i]+radii[j])
   if dist<rad2:
       return True
   return False

initiate()
print number_of_collisions()


I don't think using the global variable is good idea,
maybe you can put the global variable into a class or just use function to transport the variable.

cheers,
Liu zhenhai

J Diviney

unread,
Apr 7, 2012, 10:40:15 AM4/7/12
to cython...@googlegroups.com
Thanks a lot for the suggestion of using typed memoryviews, I've installed the newest release of cython and have no trouble compiling with them and have good speed improvements with a program I wrote to see if I was getting anywhere.

Out of curiosity, why does everyone keep recommending against using globals? I know that they require a dictionary look up in python, but I thought this wasn't the case in cython and that globals were accessed very quickly. I am very reluctant to do another large rewrite of my code (changing arrays to memoryviews will be fairly trivial), so I'm unlikely to switch to classes unless it results in a very significant speed difference.

2012/4/5 刘振海 <1989...@gmail.com>
Reply all
Reply to author
Forward
0 new messages