Does cython optimize for contiguous numpy arrays?

108 views
Skip to first unread message

Matan

unread,
Jan 20, 2024, 1:23:06 AMJan 20
to cython-users
Hi,

I was just wondering, whether cython has specific optimizations for numpy such as optimizing its generated C and downstream GCC compilation for the common case that numpy arrays are contiguous or guaranteed to be arranged in memory in certain ways and not in others. 

E.g. in the context of code like:

import numpy as np
from libc.math cimport fabs

cdef inline double distance(double a, double b):
return fabs(a - b)

cpdef inline double[:,:] go(double[:] stream, double[:] query):

matrix = np.empty((len(stream), len(query)))
cdef double [:, :] matrix_c = matrix

cdef int i, j
cdef int stream_len = len(stream)
for i in range(stream_len):
for j in range(len(query)):
matrix_c[i, j] = distance(stream[i], query[j])

return matrix

Just in case you have any comments.

using Cython 3.0.8 or above.

A small quirk is that from the point of view of the python code calling the cpdef defined function, cython returns a _memoryviewslice object and not a numpy.ndarray which the variable `matrix` above is. I wonder how technically that happens (although ultimately functions like this will end up being called by other cython functions and not python in my codebase, so it doesn't really matter later on). 

Thanks!

da-woods

unread,
Jan 20, 2024, 4:21:49 AMJan 20
to cython...@googlegroups.com
On 20/01/2024 02:43, Matan wrote:
> Hi,
>
> I was just wondering, whether cython has specific optimizations for
> numpy such as optimizing its generated C and downstream GCC
> compilation for the common case that numpy arrays are contiguous or
> guaranteed to be arranged in memory in certain ways and not in others.
>
Only if you tell it that the memoryviews are contiguous e.g.:

cdef double[:, ::1] mview

tells it that a memoryview is contiguous in the last axis.

mattip

unread,
Jan 21, 2024, 4:35:56 AMJan 21
to cython-users
Is that syntax documented somewhere? Searching for " mview" (with a leading space) [0] I see it in tests but not in documentation
Matti

Marcel Martin

unread,
Jan 21, 2024, 3:19:26 PMJan 21
to cython...@googlegroups.com
Hi,

On 21/01/2024 10.13, mattip wrote:
> Only if you tell it that the memoryviews are contiguous e.g.:
>
> cdef double[:, ::1] mview
>
> tells it that a memoryview is contiguous in the last axis.
>
> Is that syntax documented somewhere? Searching for " mview" (with a
> leading space) [0] I see it in tests but not in documentation
> [0] https://github.com/search?q=repo%3Acython%2Fcython+%22+mview%22&type=code
> Matti

The relevant part is the ::1, see here:
<https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html#c-and-fortran-contiguous-memoryviews>

Regards,
Marcel

da-woods

unread,
Jan 21, 2024, 3:55:54 PMJan 21
to cython...@googlegroups.com
On 21/01/2024 20:19, Marcel Martin wrote:
>
> The relevant part is the ::1, see here:
> <https://cython.readthedocs.io/en/stable/src/userguide/memoryviews.html#c-and-fortran-contiguous-memoryviews>
>


Yes - indeed. "mview" was just intended as an example variable name, not
special syntax.

Matan

unread,
Jan 25, 2024, 2:46:13 PMJan 25
to cython-users
Thanks, I must have missed that. 

Whenever being obsessive or whimsical about local optimization, would you allocate memory for a numpy array in a way different than using the numpy api, using cython 3.x? what may be the most idiomatic ways yielding a performance boost, especially when we flood-fill the array in loops and do not need to initialize it with numbers while allocating it?

How would you end up returning a numpy array object after the cell assignments, rather than a cython memory view if you needed or choose to? In the current code the return type ends up being a numpy array ― as seen from python code at least ― despite function returning the variable `matrix` and not `matrix_c` variable. 

import numpy as np
from libc.math cimport fabs

cdef inline double distance(double a, double b):
  return fabs(a - b)

cpdef inline double[:,:] go(double[:] stream, double[:] query):

  matrix = np.empty((len(stream), len(query)))
  cdef double [:, :] matrix_c = matrix

  cdef int i, j
  cdef int stream_len = len(stream)
  cdef int query_len = len(query)
  for i in range(stream_len):
    for j in range(query_len):

      matrix_c[i, j] = distance(stream[i], query[j])

  return matrix

I hope you excuse my follow-up question, on this learning drill (but it's also part of a real project).

Matan

Matan

unread,
Jan 26, 2024, 9:23:21 AMJan 26
to cython-users
Any idea why this cython function actually returns a _memoryviewslice and not a memory view, regardless of whether I'm using ::1 or not? 

da-woods

unread,
Jan 26, 2024, 9:28:54 AMJan 26
to cython...@googlegroups.com
It depends on if you call it via a "def"-like call, or a "cdef"-like call. This depends on if Cython can see the definition of the function. If you've cimported it then Cython should be able to see the definition and you'll get a "cdef"-like call. If you've imported it then Cython won't and you'll get a "def"-like call.

If you've got a "cdef" call then Cython should return a typed memory. If you've got a "def" call then it *must* return a Python object (since it's just a regular Python call) so you get the memoryviewslice wrapper.

My personal opinion is that cpdef is usually the *wrong* choice. You should chose to have a Cython (cdef) or Python (def) interface, but trying to have both with cpdef gives you the worst of all worlds. Other people disagree with opinion.
--

---
You received this message because you are subscribed to the Google Groups "cython-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/6a0858ab-7ace-4ef2-8b7e-958bf18c582en%40googlegroups.com.


Matan Addam (Safriel)

unread,
Jan 27, 2024, 2:06:27 AMJan 27
to cython...@googlegroups.com
Thanks, 

Admittedly I was not aware that I can call my cython function from python unless it were a `cpdef`, unless we switched to the python api use of cython where cython is summoned in by imports and it's all valid python and not any superset of python (the pure python syntax in the terminology of the cython documentation). Or if I went on a quirky tangent of importing a C module through facilities which don't know it is a module specifically generated by cython.

When I switch my cpdef to def, the cython function signature is no longer valid, it becomes a "python" declaration which presumably loses much of the ability for static typing and optimization, or can only be mimicked by switching to the python api of cython, which I am arbitrarily not using in my project in order to avoid two different syntax sets for the same thing (cython) so I'm currently only using what the docs call  the "cython" syntax and not what it calls the "pure python" syntax. 

So I'm still happy to learn what may be the idiomatic way to turn a numpy array fiddled in cython as a Memory View, or for that matter any cython array, into a regular numpy array for the python caller to consume as one. 

I mean more generally, should we not consider that at some point in a python codebase arrays/variables returned from cython need to become python objects back again? otherwise, it's a pure cython project, if there's a reason to have such a thing when it's not a cython library we're writing.

Moreso now that I got just 2 times slower than the former cython posted ― by merely employing the optimization power of numpy itself through a clever employment of `np.fromfunction` in pure python ― the need to pass arrays back and forth from cython to python appears to be useful for gradual optimization workflows. 

Any suggestions and thoughts?

Thanks,
Matan

You received this message because you are subscribed to a topic in the Google Groups "cython-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cython-users/eucPxHtes8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cython-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cython-users/7636ffa2-37c2-45ba-aaf3-66564244c59a%40d-woods.co.uk.

Chris Barker

unread,
Mar 6, 2024, 7:26:44 PMMar 6
to cython...@googlegroups.com
> Admittedly I was not aware that I can call my cython function from python unless it were a `cpdef`, unless we switched to the python api use of cython where cython is summoned in by imports and it's all valid python and not any superset of python

a "def" function is Cython can still be fully Cython code, and it can call C and other cython(cdef) functions directly.

The difference is only at the call and return stages, where a def will take and return only Python objects.

My advice: 

write your "core" code where computational speed is important, as cdef functions -- particularly if they are small and will be called a lot. I sometimes use cpdef to prototype, for easier testing, and then go back to cdef (I've never profiled the difference)

Write a Python API -- the functions called from Python code -- as def functions.

In those def functions, do any handy polymorphism/conversion -- I make heavy use of `np.asarray()` and `np.ascontiguousarray` to get inputs into an efficient form for your cdef functions to work on.

And they can return the types you want -- e.g. numpy arrays from a memoryview.

- my $0.02






--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov
Reply all
Reply to author
Forward
0 new messages