[Numpy-discussion] Owndata flag

Fabrice Silva

unread,

Dec 15, 2011, 11:17:44 AM12/15/11

to Discussion of Numerical Python

How can one arbitrarily assumes that an ndarray owns its data ?

More explicitly, I have some temporary home-made C structure that holds
a pointer to an array. I prepare (using Cython) an numpy.ndarray using
the PyArray_NewFromDescr function. I can delete my temporary C structure
without freeing the memory holding array, but I wish the numpy.ndarray
becomes the owner of the data.

How can do I do such thing ?
--
Fabrice Silva

_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

unread,

Dec 15, 2011, 11:36:24 AM12/15/11

to Discussion of Numerical Python

On Thu, Dec 15, 2011 at 16:17, Fabrice Silva <si...@lma.cnrs-mrs.fr> wrote:
> How can one arbitrarily assumes that an ndarray owns its data ?
>
> More explicitly, I have some temporary home-made C structure that holds
> a pointer to an array. I prepare (using Cython) an numpy.ndarray using
> the PyArray_NewFromDescr function. I can delete my temporary C structure
> without freeing the memory holding array, but I wish the numpy.ndarray
> becomes the owner of the data.
>
> How can do I do such thing ?

You can't, really. numpy-owned arrays will be deallocated with numpy's
deallocator. This may not be the appropriate deallocator for memory
that your library allocated.

If at all possible, I recommend using numpy to create the ndarray and
pass that pointer to your library. Sometimes the library's API gets in
the way of this. Otherwise, copy the data.

Devs, looking into this, I noticed that we use PyDataMem_NEW() and
PyDataMem_FREE() (which is #defined to malloc() and free()) for
handling the data pointer. Why aren't we using the appropriate
PyMem_*() functions (or the PyArray_*() memory functions which default
to using the PyMem_*() implementations)? Using the PyMem_*() functions
lets the Python memory manager have an accurate idea how much memory
is being used, which can be important for the large amounts of memory
that numpy arrays can consume.

I assume this is intentional design. I just want to know the rationale
for it and would like it documented. I can certainly understand if it
causes bad interactions with the garbage collector, say (though hiding
information from the GC seems like a suboptimal approach).

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

Gregor Thalhammer

unread,

Dec 15, 2011, 12:09:14 PM12/15/11

to Discussion of Numerical Python

Am 15.12.2011 um 17:17 schrieb Fabrice Silva:

> How can one arbitrarily assumes that an ndarray owns its data ?
>
> More explicitly, I have some temporary home-made C structure that holds
> a pointer to an array. I prepare (using Cython) an numpy.ndarray using
> the PyArray_NewFromDescr function. I can delete my temporary C structure
> without freeing the memory holding array, but I wish the numpy.ndarray
> becomes the owner of the data.
>
> How can do I do such thing ?

There is an excellent blog entry from Travis Oliphant, that describes how to create a ndarray from existing data without copy: http://blog.enthought.com/?p=62
The created array does not actually own the data, but its base attribute points to an object, which frees the memory if the numpy array gets deallocated. I guess this is the behavior you want to achieve.
Here is a cython implementation (for a uint8 array)

Gregor

"""
see 'NumPy arrays with pre-allocated memory', http://blog.enthought.com/?p=62
"""

import numpy as np
from numpy cimport import_array, ndarray, npy_intp, set_array_base, PyArray_SimpleNewFromData, NPY_DOUBLE, NPY_INT, NPY_UINT8

cdef extern from "stdlib.h":
void* malloc(int size)
void free(void *ptr)

cdef class MemoryReleaser:
cdef void* memory

def __cinit__(self):
self.memory = NULL

def __dealloc__(self):
if self.memory:
#release memory
free(self.memory)
print "memory released", hex(<long>self.memory)

cdef MemoryReleaser MemoryReleaserFactory(void* ptr):
cdef MemoryReleaser mr = MemoryReleaser.__new__(MemoryReleaser)
mr.memory = ptr
return mr

cdef ndarray frompointer(void* ptr, int nbytes):
import_array()
#cdef int dims[1]
#dims[0] = nbytes
cdef npy_intp dims = <npy_intp>nbytes
cdef ndarray arr = PyArray_SimpleNewFromData(1, &dims, NPY_UINT8, ptr)
#TODO: check for error
set_array_base(arr, MemoryReleaserFactory(ptr))

return arr

def test_new_array_from_pointer():
nbytes = 16
cdef void* mem = malloc(nbytes)
print "memory allocated", hex(<long>mem)
return frompointer(mem, nbytes)

Fabrice Silva

unread,

Dec 16, 2011, 5:53:16 AM12/16/11

to numpy-di...@scipy.org

Le jeudi 15 décembre 2011 à 18:09 +0100, Gregor Thalhammer a écrit :

> There is an excellent blog entry from Travis Oliphant, that describes
> how to create a ndarray from existing data without copy:
> http://blog.enthought.com/?p=62
> The created array does not actually own the data, but its base
> attribute points to an object, which frees the memory if the numpy
> array gets deallocated. I guess this is the behavior you want to
> achieve.
> Here is a cython implementation (for a uint8 array)

Even better: the addendum!
http://blog.enthought.com/python/numpy/simplified-creation-of-numpy-arrays-from-pre-allocated-memory/

Within cython:
cimport numpy
numpy.set_array_base(my_ndarray, PyCObject_FromVoidPtr(pointer_to_Cobj, some_destructor))

Seems OK.
Any objections about that ?
--
Fabrice Silva

Gregor Thalhammer

unread,

Dec 16, 2011, 9:33:51 AM12/16/11

to Discussion of Numerical Python

Am 16.12.2011 um 11:53 schrieb Fabrice Silva:

> Le jeudi 15 décembre 2011 à 18:09 +0100, Gregor Thalhammer a écrit :
>
>> There is an excellent blog entry from Travis Oliphant, that describes
>> how to create a ndarray from existing data without copy:
>> http://blog.enthought.com/?p=62
>> The created array does not actually own the data, but its base
>> attribute points to an object, which frees the memory if the numpy
>> array gets deallocated. I guess this is the behavior you want to
>> achieve.
>> Here is a cython implementation (for a uint8 array)
>
> Even better: the addendum!
> http://blog.enthought.com/python/numpy/simplified-creation-of-numpy-arrays-from-pre-allocated-memory/
>
> Within cython:
> cimport numpy
> numpy.set_array_base(my_ndarray, PyCObject_FromVoidPtr(pointer_to_Cobj, some_destructor))
>
> Seems OK.
> Any objections about that ?

This is ok, but CObject is deprecated as of Python 3.1, so it's not portable to Python 3.2.

Gregor

Fabrice Silva

unread,

Dec 16, 2011, 10:16:25 AM12/16/11

to numpy-di...@scipy.org

Le vendredi 16 décembre 2011 à 15:33 +0100, Gregor Thalhammer a écrit :
> > Even better: the addendum!
> > http://blog.enthought.com/python/numpy/simplified-creation-of-numpy-arrays-from-pre-allocated-memory/
> >
> > Within cython:
> > cimport numpy
> > numpy.set_array_base(my_ndarray, PyCObject_FromVoidPtr(pointer_to_Cobj, some_destructor))
> >
> > Seems OK.
> > Any objections about that ?
>
> This is ok, but CObject is deprecated as of Python 3.1, so it's not portable to Python 3.2.

My guess is then that the PyCapsule object is the way to go...

--
Fabrice Silva

Dag Sverre Seljebotn

unread,

Dec 16, 2011, 12:38:20 PM12/16/11

to Discussion of Numerical Python

On 12/16/2011 04:16 PM, Fabrice Silva wrote:
> Le vendredi 16 décembre 2011 à 15:33 +0100, Gregor Thalhammer a écrit :
>>> Even better: the addendum!
>>> http://blog.enthought.com/python/numpy/simplified-creation-of-numpy-arrays-from-pre-allocated-memory/
>>>
>>> Within cython:
>>> cimport numpy
>>> numpy.set_array_base(my_ndarray, PyCObject_FromVoidPtr(pointer_to_Cobj, some_destructor))
>>>
>>> Seems OK.
>>> Any objections about that ?
>>
>> This is ok, but CObject is deprecated as of Python 3.1, so it's not portable to Python 3.2.
>
> My guess is then that the PyCapsule object is the way to go...
>

Another way: With recent NumPy you should be able to do something like
this in Cython

cdef class SomeBufferWrapper:
...
def __getbuffer__(self, ...): ...
def __releasebuffer__(self, ...): ..

arr = np.asarray(SomeBufferWrapper(buf))

and then __releasebuffer__ will be called then `arr` goes out of use.
See Cython docs.

Dag

Travis Oliphant

unread,

Dec 18, 2011, 9:03:01 PM12/18/11

to Discussion of Numerical Python

[snip]

Devs, looking into this, I noticed that we use PyDataMem_NEW() and
PyDataMem_FREE() (which is #defined to malloc() and free()) for
handling the data pointer. Why aren't we using the appropriate
PyMem_*() functions (or the PyArray_*() memory functions which default
to using the PyMem_*() implementations)? Using the PyMem_*() functions
lets the Python memory manager have an accurate idea how much memory
is being used, which can be important for the large amounts of memory
that numpy arrays can consume.

I assume this is intentional design. I just want to know the rationale
for it and would like it documented. I can certainly understand if it
causes bad interactions with the garbage collector, say (though hiding
information from the GC seems like a suboptimal approach).

The macros were created so that the allocator could be switched when we understood better the benefits and trade-offs of using the Python memory manager versus the system memory manager (or one specialized for NumPy).

So, the only intentional design was to use the macros (the decision to make them point to malloc and free was more because that's what was being done before than explicit decision.

-Travis

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

---

Travis Oliphant

Enthought, Inc.

olip...@enthought.com

1-512-536-1057

http://www.enthought.com

Gael Varoquaux

unread,

Dec 23, 2011, 4:01:36 AM12/23/11

to Discussion of Numerical Python

On Thu, Dec 15, 2011 at 04:36:24PM +0000, Robert Kern wrote:
> > More explicitly, I have some temporary home-made C structure that holds
> > a pointer to an array. I prepare (using Cython) an numpy.ndarray using
> > the PyArray_NewFromDescr function. I can delete my temporary C structure
> > without freeing the memory holding array, but I wish the numpy.ndarray
> > becomes the owner of the data.

> > How can do I do such thing ?

> You can't, really. numpy-owned arrays will be deallocated with numpy's
> deallocator. This may not be the appropriate deallocator for memory
> that your library allocated.

Coming late to the battle, but I recently followed the same route, and
came to similar conclusions: using the owndata flag is not suited, and
you will need you own deallocator.

I implemented a demo code showing all the steps to implement this
strategy to bind an existing C library with Cython in
https://gist.github.com/1249305
in particular, the deallocator is in
https://gist.github.com/1249305#file_cython_wrapper.pyx