Questions about arrays, lists, and Memory views

922 views
Skip to first unread message

saad khalid

unread,
Mar 21, 2018, 8:55:42 AM3/21/18
to cython-users
Hey everyone:

I'm just getting started with Cython, I am trying to convert some Mathematica code to code that is compatible with cython/sage. I am familiar with python and reasonably familiar with C (though it has been a while). I'm a bit confused with some of the syntax that is used when defining arrays in cython. I have seen this syntax is one of the tutorials:

from cpython cimport array
import array
cdef array.array a = array.array('i', [1, 2, 3])
cdef int[:] ca = a

So, from my understanding, the array.array creates a c array. Then, what is the purpose of the "int[:] ca = c" part of the code? is this to allow for easier/safer access to the original array? Why is it not possible to simply initialize my array with that line, instead of having to initialize it as an "array.array"? On stackoverflow, I also saw the following:

cdef int mom2calc[3]
mom2calc
[:] = [1, 2, 3]

What is the difference between what this does and what the earlier code did? Also, from what I've read, much of the speed gain in cython comes from having static typed variables. So if I were to predefine a variable as a list, would that give a speed boost similar to the above? Perhaps something like:

cdef list b = [1,2,3]

Thank you!


Nathan Goldbaum

unread,
Mar 21, 2018, 11:45:17 AM3/21/18
to cython-users


On Wednesday, March 21, 2018 at 7:55:42 AM UTC-5, saad khalid wrote:
Hey everyone:

I'm just getting started with Cython, I am trying to convert some Mathematica code to code that is compatible with cython/sage. I am familiar with python and reasonably familiar with C (though it has been a while). I'm a bit confused with some of the syntax that is used when defining arrays in cython. I have seen this syntax is one of the tutorials:

from cpython cimport array
import array
cdef array.array a = array.array('i', [1, 2, 3])
cdef int[:] ca = a


Normally cython code gets called by higher level python code. In these cases, python allocates the array (usually as a numpy array) and passes it down to cython. What memoryviews allow you to do is to add types to your *interfaces*. If you need to allocate memory in cython then you'd need to do something like the example you found, using e.g. array.array or one of the many methods in numpy that allocates new numpy arrays.
 
So, from my understanding, the array.array creates a c array. Then, what is the purpose of the "int[:] ca = c" part of the code? is this to allow for easier/safer access to the original array? Why is it not possible to simply initialize my array with that line, instead of having to initialize it as an "array.array"?

It is possible, you could do:

  cdef int ca[:] = array.array('i', [1, 2, 3])

 
On stackoverflow, I also saw the following:

cdef int mom2calc[3]
mom2calc
[:] = [1, 2, 3]

What is the difference between what this does and what the earlier code did?

array.array returns an object that implements the python buffer interface, so cython is able to convert it to a typed memoryview with little overhead. Python lists do not implement the buffer protocol but since lists are so commonly used to store data, cython knows how to automatically generate code to copy the list to a new memoryview buffer.
 
Also, from what I've read, much of the speed gain in cython comes from having static typed variables. So if I were to predefine a variable as a list, would that give a speed boost similar to the above? Perhaps something like:

cdef list b = [1,2,3]


You might get some speedup, but you will still be dealing with python lists, which are ultimately python objects, so there will always be some python overhead when using them. The nice thing about memoryviews is that they can be used in situations where there are no python objects (e.g. when releasing the GIL). You can also tweak and optimize things to get C-like speed, see e.g.:

https://jakevdp.github.io/blog/2012/08/08/memoryview-benchmarks/
 
Thank you!


Chris Barker

unread,
Mar 21, 2018, 9:03:52 PM3/21/18
to cython-users
one more note:

from cpython cimport array
import array
cdef array.array a = array.array('i', [1, 2, 3])
cdef int[:] ca = a


So, from my understanding, the array.array creates a c array.

no -- array.array creates a Python array.array object:

https://docs.python.org/3/library/array.html

Then, what is the purpose of the "int[:] ca = c" part of the code? is this to allow for easier/safer access to the original array?

sort of -- it creates a "memory view", which, as it's name implies is a view (with some symantics) onto existing memory -- the "int[:]" means that the memory is a 1-d array of ints, so Cython knows how to write the C code to work with it.

array.array objects are one of teh python objects that support the buffer protocol, so Cyhton knows how to get the underlying C pointer to the memory manged by the array.
 
Why is it not possible to simply initialize my array with that line, instead of having to initialize it as an "array.array"?

because memoryviews do not allocate or manage memory -- you need to allocate it somehow -- array.array is one of the safest and easiest ways to do that.

> It is possible, you could do:

  cdef int ca[:] = array.array('i', [1, 2, 3])

hmm, then there isn't a reference to the array object -- does its reference count get managed properly?
 
-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris....@noaa.gov

Stefan Behnel

unread,
Mar 22, 2018, 1:21:49 AM3/22/18
to cython...@googlegroups.com
Chris Barker schrieb am 22.03.2018 um 02:03:
>> Why is it not possible to simply initialize my array with that line,
>>> instead of having to initialize it as an "array.array"?
>>>
>>
> because memoryviews do not allocate or manage memory -- you need to
> allocate it somehow -- array.array is one of the safest and easiest ways to
> do that.
>
>> It is possible, you could do:
>
> cdef int ca[:] = array.array('i', [1, 2, 3])
>
> hmm, then there isn't a reference to the array object -- does its reference
> count get managed properly?

Yes, this is Cython. ;)

There *is* a life reference that the memory view holds to the object that
owns the buffer.

Stefan

saad khalid

unread,
Mar 23, 2018, 1:44:33 AM3/23/18
to cython-users
Thank you all for your responses! You are all very helpful
Reply all
Reply to author
Forward
0 new messages