On Mon, 03 Jun 2013 20:44:58 +0600, Yury V. Zaytsev <
yu...@shurup.com>
wrote:
> Hi,
>
> This thread has to do with another problem with by Python bindings for a
> C++ project: the data transfer in the C++ -> Python direction, where on
> the C++ level things are stored in vectors of int and double.
> 1) Do I understand it correctly, that returning array.arrays is the best
> idea, because memoryview objects are only available in Python 2.7+ ?
memoryview is an iteration of an older builtin named "buffer", available
in all Python versions.
Neither buffer nor memoryview own the data they represent. You always
need an underlying object supporting buffer protocol. Such objects
include numpy.ndarray, array.array and ctypes arrays.
array.array is a good (but limited) alternative if you don't want
to depend on numpy.
Bear in mind that numpy can wrap any external chunk of memory without
any copying, while array.array always requires an initial copy
(unless you modify your algorithms to take preallocated array as input).
Also, Python's array.array doesn't in fact support buffer interface.
But Cython (and apparently numpy) hack around it, so it's not a problem
in practice.
> 2) Can I on the level of Python hide the fact that I'm using array.array
> if users have NumPy installed? Will numpy.asarray() produce a view on
> the array without copying the data and big performance losses?
numpy.frombuffer will give you a view, numpy.asarray will copy.
> 3) On the level of Cython, what's the best way to allocate and populate
> large array.arrays?
Typical (optimized) array operations:
from cpython cimport array
# Declare global template arrays, you only need one of each per
project.
# (see array module docs for available type specifiers)
cdef INT_ARRAY = array.array('i')
cdef BYTE_ARRAY = array.array('B')
...
# Allocate an empty byte array
cdef array.array data = array.copy(BYTE_ARRAY)
# Allocate a new int array with n elements:
cdef array.array arr = array.clone(INT_ARRAY, n, False)
# ... and access its data
for i in range(n):
arr.data.as_ints[i] = i
# Append some external data to the int array
cdef int* data = [1,2,3]
cdef int len = 3
array.extend_buffer(<char*>data, len)
# Iterate over array values (works with any pointer, really):
cdef int value, sum = 0
for value in arr.data.as_ints[:len(arr)]:
sum += value
# Accept an iterable from a user and turn it into a vector
def func(points not None): # points is any iterable
cdef array.array pointsArray = array.array('f', points)
cdef float* pointsData = pointsArray.data.as_floats
# now I can pass pointsData to any C function...
Arrays in Cython support buffer access too, like
cdef array.array[int] arr = ...
arr[i] = 1 # optimized indexing
But using arr.data.as_ints[i] (and other .data.as_xxx) is
faster still, because it avoids buffer setup and teardown.
> Now, what's the best (fastest) method to copy the data?
>
> arr = clone(...)
> for i in range(N): arr[i] = vec[i]
>
> or
>
> arr = array(...)
> for i in range(N): arr.append(vec[i])
Fastest way is:
cdef vector[int] vec = ...
cdef array.array arr = array.clone(INT_ARAY, vec.size(), False)
memcpy(arr.data.as_ints, &vec[0], vec.size() * sizeof(int))
Best regards,
Nikita Nemkin