cdef struct for nested ndarray

512 views
Skip to first unread message

Felix Schlesinger

unread,
Mar 1, 2010, 3:43:43 PM3/1/10
to cython-users
Hi,

I am working with large numpy ndarrays in cython. These are 'record
arrays' like
np.array([], dtype= [('foo', 'i1'),('bar','S1'),('foobars',('i2',5)),
('barfoos',('i2',5)) ])

What is the right way to declare this type as a struct in cython for
static typing and speed-up?

cdef packed struct arr:
np.int8_t foo
np.character_t bar
np.ndarray[int16_t] foobars
np.ndarray[int16_t] barfoos

does not work, because
1) character_t does not exist. Is there a cython ctype corresponding
to np.character?
2) More importantly the nested arrays are not supported: "Buffer types
only allowed as function local variables"
Neither does "np.int16_t foobars[5]" work (Synax error)

Is there a different syntax for this that I missed in the
documentation? If not, is there a good workaround. Since the size of
each 'record' is always the same, it seems it should be possible to
map it to a static ctype.

Thanks for any help
Felix

Robert Bradshaw

unread,
Mar 1, 2010, 5:15:25 PM3/1/10
to cython...@googlegroups.com
On Mar 1, 2010, at 12:43 PM, Felix Schlesinger wrote:

> Hi,
>
> I am working with large numpy ndarrays in cython. These are 'record
> arrays' like
> np.array([], dtype= [('foo', 'i1'),('bar','S1'),('foobars',('i2',5)),
> ('barfoos',('i2',5)) ])
>
> What is the right way to declare this type as a struct in cython for
> static typing and speed-up?
>
> cdef packed struct arr:
> np.int8_t foo
> np.character_t bar
> np.ndarray[int16_t] foobars
> np.ndarray[int16_t] barfoos
>
> does not work, because
> 1) character_t does not exist. Is there a cython ctype corresponding
> to np.character?
> 2) More importantly the nested arrays are not supported: "Buffer types
> only allowed as function local variables"
> Neither does "np.int16_t foobars[5]" work (Synax error)

This also won't work because you're trying to use a struct--there's
not really a good way to do refcounting of struct members so not even
objects are allowed. Use a cdef class instead.

> Is there a different syntax for this that I missed in the
> documentation? If not, is there a good workaround. Since the size of
> each 'record' is always the same, it seems it should be possible to
> map it to a static ctype.

You can't do this right now, and there are significant (though not
insurmountable) technical issues to be able to do this. If your arrays
are large there shouldn't be much overhead in assigning them to locals
of the right type as local variables in your function.

- Robert

Felix Schlesinger

unread,
Mar 1, 2010, 11:47:43 PM3/1/10
to cython-users
On Mar 1, 5:15 pm, Robert Bradshaw <rober...@math.washington.edu>
wrote:

> On Mar 1, 2010, at 12:43 PM, Felix Schlesinger wrote:

> > I am working with large numpy ndarrays in cython. These are 'record
> > arrays' like
> > np.array([], dtype= [('foo', 'i1'),('bar','S1'),('foobars',('i2',5)),
> > ('barfoos',('i2',5))  ])
>
> > What is the right way to declare this type as a struct in cython for
> > static typing and speed-up?
>
> > cdef packed struct arr:
> >    np.int8_t foo
> >    np.character_t bar
> >    np.ndarray[int16_t] foobars
> >    np.ndarray[int16_t] barfoos

> This also won't work because you're trying to use a struct--there's  


> not really a good way to do refcounting of struct members so not even  
> objects are allowed. Use a cdef class instead.

But the cdef class could not be used as an array type, right?

When I try I get
cdef class foobar:
cdef np.ndarray i
cdef np.ndarray j

def foo(np.ndarray[foobar]):
pass

I get: "dtype must be "object", numeric type or a struct"

> > Is there a different syntax for this that I missed in the
> > documentation? If not, is there a good workaround. Since the size of
> > each 'record' is always the same, it seems it should be possible to
> > map it to a static ctype.
>
> You can't do this right now, and there are significant (though not  
> insurmountable) technical issues to be able to do this. If your arrays  
> are large there shouldn't be much overhead in assigning them to locals  
> of the right type as local variables in your function.

Each struct of the array is small, so typechecking and assigning them
to a cdef individually would be slow I am afraid.

def foo(ndarray arr):
for i in range(10):
ndarray[np.int8_t] bar = arr[1]['bar']
ndarray[np.int8_t] bar2 = arr[1]['bar2']
do_stuff(bar,bar2)

Instead I could cdef the entire column

def foo(ndarray arr):
ndarray[np.int8_t,ndim=2] bars = arr['bar']
ndarray[np.int8_t,ndim=2] bars2 = arr['bar2']
for i in range(10):
do_stuff(bars[i],bars2[i])

This leads to somewhat akward code and, juding by the output of cython
-a, still quite some overhead. But I do not have a complete benchmark
yet, so I am not sure how it will affect the overall application in
the end.
Is this what you had in mind? Or should I use C arrays here?

I guess I do not fully understand what the differnce is between
declaring a record array using a cdef struct of primary types or one
containing a C Array of such a primary types of known length. Is there
a principle problem with this or just an implementation issue.

Thanks
Felix

Dag Sverre Seljebotn

unread,
Mar 2, 2010, 2:21:30 AM3/2/10
to cython...@googlegroups.com
The answer is simply that support for nested arrays is not implemented
yet (but keep reading).

If it were, it would look something like this for a nested 10x20 chunk
of floats:

cdef packed struct MyDtype:
cdef int a
cdef float nested[10][20]

This *might* work now if you do

cdef np.ndarray[MyDtype, cast=True] arr = numpy.zeros(....)

however I'm not sure. Please report what you find.

Nested arrays contained in NumPy arrays are very different from the
NumPy arrays themselves, as they are fixed-size at compilation time and
don't act as their own object but just describe layout in memory.


Dag Sverre

Felix Schlesinger

unread,
Mar 2, 2010, 10:55:26 AM3/2/10
to cython-users
> cdef packed struct MyDtype:
>     cdef int a
>     cdef float nested[10][20]

The 'cdef's inside the struct are typos, right? At least in cython
0.12 they are a syntax error.

> This *might* work now if you do
>
> cdef np.ndarray[MyDtype, cast=True] arr = numpy.zeros(....)

def foo(np.ndarray arr):
cdef np.ndarray[MyDtype,cast=True] nestedarr = arr
return 0

leads to an assertion error at Cython/Compiler/Buffer.py line 630 (see
end of message).

> Nested arrays contained in NumPy arrays are very different from the
> NumPy arrays themselves, as they are fixed-size at compilation time and
> don't act as their own object but just describe layout in memory.

But shouldn't their fixed size and lack of 'ownership' make it easier
to map them to C arrays in cython (if they are in C memory layout in
numpy)?

thanks
Felix

Traceback (most recent call last):
File "/home/schlesin/python/cython", line 8, in <module>
load_entry_point('Cython==0.12', 'console_scripts', 'cython')()
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 745, in setuptools_main
return main(command_line = 1)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 762, in main
result = compile(sources, options)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 737, in compile
return compile_multiple(source, options)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 707, in compile_multiple
result = run_pipeline(source, options)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 568, in run_pipeline
err, enddata = context.run_pipeline(pipeline, source)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 219, in run_pipeline
data = phase(data)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Main.py", line 152, in generate_pyx_code
module_node.process_implementation(options, result)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/ModuleNode.py", line 72, in process_implementation
self.generate_c_code(env, options, result)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/ModuleNode.py", line 274, in generate_c_code
self.body.generate_function_definitions(env, code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 341, in generate_function_definitions
stat.generate_function_definitions(env, code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 341, in generate_function_definitions
stat.generate_function_definitions(env, code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 1118, in generate_function_definitions
self.body.generate_execution_code(code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 347, in generate_execution_code
stat.generate_execution_code(code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 2871, in generate_execution_code
self.generate_assignment_code(code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Nodes.py", line 2965, in generate_assignment_code
self.lhs.generate_assignment_code(self.rhs, code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/ExprNodes.py", line 1354, in generate_assignment_code
self.generate_acquire_buffer(rhs, code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/ExprNodes.py", line 1400, in generate_acquire_buffer
pos=self.pos, code=code)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Buffer.py", line 287, in put_assign_to_buffer
getbuffer = get_getbuffer_call(code, "%s", buffer_aux,
buffer_type) # fill in object below
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Buffer.py", line 258, in get_getbuffer_call
dtype_typeinfo = get_type_information_cname(code,
buffer_type.dtype)
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Buffer.py", line 622, in get_type_information_cname
for f in fields]
File "/home/schlesin/python/Cython-0.12-py2.6-linux-x86_64.egg/
Cython/Compiler/Buffer.py", line 630, in get_type_information_cname
assert False
AssertionError

Reply all
Reply to author
Forward
0 new messages