malloc vs PyMem_malloc

1,403 views
Skip to first unread message

Aronne Merrelli

unread,
Dec 10, 2011, 11:31:41 AM12/10/11
to cython-users
Hello,

I've found myself writing some simple C functions and using cython to get between the C calculations and NumPy arrays. I'm very much a C novice, but I understand the need to control the memory allocations for arrays that I create at the C level. At the moment I'm just calling malloc/free within the C functions, and within cython functions, exposing malloc/free with:

from libc.stdlib cimport malloc, free

My current use case is primarily extending python/NumPy by speeding up un-vectorizable calculations. So, any results from C-level calculations are written into NumPy arrays if I need to keep them. In other words, anything that I am malloc'ing at the C level will be freed when the calculation is done. Now, I've noticed there is a similar pair of memory allocation functions in pymem.h: PyMem_malloc and PyMem_free; what is the difference those two and the "plain" malloc/free?  Which should I be using in cython functions or the C functions I call from those?

The comment in pymem.h indicates that you shouldn't mix the two types, but I don't understand precisely what that means. I could either take that to mean don't mix calls within one function (meaning malloc is OK), or within one application (which then means I should use PyMem_malloc since the parent python session would presumably be doing so?)

Thanks,
Aronne

Robert Bradshaw

unread,
Dec 10, 2011, 2:16:57 PM12/10/11
to cython...@googlegroups.com

You can use either, as long as you free with the one you allocated
with. The PyMem functions allocate memory on the Python heap, and are
optimized for allocating many small objects of similar size over and
over, but IIRC defers to the standard system malloc (plus some
bookkeeping) if the size is big enough (assuming a standard compile
that hasn't #defined it to go elsewhere). Personally, I tend to use
malloc/free. On this note, a useful pattern is

try:
x = malloc(...)
finally:
free(x)

It could be nice to encapsulate this in a context manager.

- Robert

mark florisson

unread,
Dec 10, 2011, 3:39:41 PM12/10/11
to cython...@googlegroups.com

I think I'd prefer variable-sized arrays that would always get
deallocated on exit of the function (which could be implemented as C99
variable sized arrays, with alloca or with malloc, depending on the
size of the array and the availability of the respective
functionalities).

> - Robert

mark florisson

unread,
Dec 10, 2011, 3:44:02 PM12/10/11
to cython...@googlegroups.com

That wouldn't tackle every use case, such as for instance mallocing
stuff in a parallel section (until we get declarations in blocks!),
but special cases can still just malloc and use try blocks, as
demonstrated.

Stefan Behnel

unread,
Dec 10, 2011, 5:00:15 PM12/10/11
to cython...@googlegroups.com
mark florisson, 10.12.2011 21:44:
> On 10 December 2011 20:39, mark florisson wrote:

>> On 10 December 2011 19:16, Robert Bradshaw wrote:
>>> On this note, a useful pattern is
>>>
>>> try:
>>> x = malloc(...)
>>> finally:
>>> free(x)
>>>
>>> It could be nice to encapsulate this in a context manager.

Absolutely.


>> I think I'd prefer variable-sized arrays that would always get
>> deallocated on exit of the function

Why? A context manager is much clearer and gives users total control over
the lifetime of the memory.


>> (which could be implemented as C99
>> variable sized arrays, with alloca or with malloc, depending on the
>> size of the array and the availability of the respective
>> functionalities).

That could still be done for a context manager, just like we do with
gil/nogil blocks today.


> That wouldn't tackle every use case, such as for instance mallocing
> stuff in a parallel section (until we get declarations in blocks!),
> but special cases can still just malloc and use try blocks, as
> demonstrated.

I would consider the usage of memory over the whole lifetime of a function
the special case, not the other way round.

Stefan

mark florisson

unread,
Dec 10, 2011, 7:02:09 PM12/10/11
to cython...@googlegroups.com, Core developer mailing list of the Cython compiler
On 10 December 2011 22:00, Stefan Behnel <stef...@behnel.de> wrote:
> mark florisson, 10.12.2011 21:44:
>>
>> On 10 December 2011 20:39, mark florisson wrote:
>>
>>> On 10 December 2011 19:16, Robert Bradshaw wrote:
>>>>
>>>> On this note, a useful pattern is
>>>>
>>>> try:
>>>>    x = malloc(...)
>>>> finally:
>>>>    free(x)
>>>>
>>>> It could be nice to encapsulate this in a context manager.
>
>
> Absolutely.
>
>
>
>>> I think I'd prefer variable-sized arrays that would always get
>>> deallocated on exit of the function
>
>
> Why? A context manager is much clearer

That is highly subjective, I think it would be harder to read and
introduce more code blocks and nesting.

> and gives users total control over
> the lifetime of the memory.
>

Yes, but very often you don't need it. And if Cython would support
declarations in blocks you'd get it for free. Supporting that
(disregarding the difficulties of that) would also be helpful in
identifying the scope and privatization rules in parallel blocks.

The thing is that a context manager would be very Cython-specific,
whereas most people are already familiar with arrays of variable size
from C or Java. Lets compare the following statements and decide which
is more aesthetically pleasing:

cdef int array1[m]
cdef double array2[n]

vs

cdef int *array1
cdef double *arrays2

with cython.malloc(sizeof(int) * m), cython.malloc(sizeof(double)
* n) as array1, array2:
...

>
>>> (which could be implemented as C99
>>> variable sized arrays, with alloca or with malloc, depending on the
>>> size of the array and the availability of the respective
>>> functionalities).
>
>
> That could still be done for a context manager, just like we do with
> gil/nogil blocks today.
>

Sure (it was more of an observation than an argument).

>
>> That wouldn't tackle every use case, such as for instance mallocing
>> stuff in a parallel section (until we get declarations in blocks!),
>> but special cases can still just malloc and use try blocks, as
>> demonstrated.
>
>
> I would consider the usage of memory over the whole lifetime of a function
> the special case, not the other way round.

Yes, but the point is not where to deallocate the memory, the point is
that you very often don't care. You need it somewhere in the function,
and deallocation on return is fine (or, "at the end of the block").
Analogously, you don't 'del' your variables once you have stopped
using them.

I also gave this functionality some thought for memoryviews, e.g.

cdef int[:m, :n] myslice # this gets you a view on a cython.array
of size m * n

> Stefan

Stefan Behnel

unread,
Dec 11, 2011, 11:51:59 AM12/11/11
to cython...@googlegroups.com, Cython-devel
mark florisson, 11.12.2011 01:02:

> On 10 December 2011 22:00, Stefan Behnel wrote:
>> mark florisson, 10.12.2011 21:44:
>>> On 10 December 2011 20:39, mark florisson wrote:
>>>> On 10 December 2011 19:16, Robert Bradshaw wrote:
>>>>> On this note, a useful pattern is
>>>>>
>>>>> try:
>>>>> x = malloc(...)
>>>>> finally:
>>>>> free(x)
>>>>>
>>>>> It could be nice to encapsulate this in a context manager.
>>
>> Absolutely.
>>
>>>> I think I'd prefer variable-sized arrays that would always get
>>>> deallocated on exit of the function
>>
>> Why? A context manager is much clearer
>
> That is highly subjective, I think it would be harder to read and
> introduce more code blocks and nesting.
>
>> and gives users total control over
>> the lifetime of the memory.
>
> Yes, but very often you don't need it. And if Cython would support
> declarations in blocks you'd get it for free. Supporting that
> (disregarding the difficulties of that) would also be helpful in
> identifying the scope and privatization rules in parallel blocks.
>
> The thing is that a context manager would be very Cython-specific

Not at all. It's the One Way To Do It in Python.

Stefan

Nikolaus Rath

unread,
Dec 11, 2011, 6:16:04 PM12/11/11
to cython...@googlegroups.com

FWIW, I think that the variable-sized array approach is much nicer than
using context managers.

Cython already supports C style fixed-array definitions and the C & and
* operators. All these would be written differently in Python, so I
think there's nothing wrong with having variable-length array
definitions in C style rather than with context managers.


Best,

-Nikolaus

--
»Time flies like an arrow, fruit flies like a Banana.«

PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C

Chris.Barker

unread,
Dec 12, 2011, 12:08:57 PM12/12/11
to cython...@googlegroups.com
Hi,

Some good discussion of the issues brought up here, but a comment:

On 12/10/11 8:31 AM, Aronne Merrelli wrote:

> My current use case is primarily extending python/NumPy by speeding up
> un-vectorizable calculations. So, any results from C-level calculations
> are written into NumPy arrays if I need to keep them.

There may well be a really need for allocating you memory here, but I've
found that I can generally (i.e. every use case I've had so far) speed
up non-vectorizable numpy calculations with pure Cython with no need for
custom memory allocation -- usually just small temporaries on the stack.

See various examples in the wiki.

We'd need to know your use case, but you may be making things more
complicated that you need to.

Just a thought,

-Chris


--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Aronne Merrelli

unread,
Dec 12, 2011, 12:55:56 PM12/12/11
to cython...@googlegroups.com
On Mon, Dec 12, 2011 at 11:08 AM, Chris.Barker <chris....@noaa.gov> wrote:
Hi,

Some good discussion of the issues brought up here, but a comment:


On 12/10/11 8:31 AM, Aronne Merrelli wrote:

My current use case is primarily extending python/NumPy by speeding up
un-vectorizable calculations. So, any results from C-level calculations
are written into NumPy arrays if I need to keep them.

There may well be a really need for allocating you memory here, but I've found that I can generally (i.e. every use case I've had so far) speed up non-vectorizable numpy calculations with pure Cython with no need for custom memory allocation -- usually just small temporaries on the stack.

See various examples in the wiki.



Chris,

I'm understand what you mean in terms of specific syntax (C novice!); I think what this means is doing:

cdef double[10] x

instead of

cdef double *x = <double *>malloc(10 * sizeof(double))
... stuff ...
free(x)

Is that correct? I did originally try that but hit a roadblock somewhere - I think I had trouble getting returns from functions if they were not declared as pointers. I should double check this, though. There is at least one case where the size of the array is an algorithm parameter, though, so in that case the malloc/free is necessary.

Thanks for the suggestion,
Aronne

Robert Bradshaw

unread,
Dec 12, 2011, 2:48:18 PM12/12/11
to cython...@googlegroups.com

I would propose that most people using Cython are not familiar with C
and Java, but all are familiar with Python.

> Lets compare the following statements and decide which
> is more aesthetically pleasing:
>
>    cdef int array1[m]
>    cdef double array2[n]
>
> vs
>
>    cdef int *array1
>    cdef double *arrays2
>
>    with cython.malloc(sizeof(int) * m), cython.malloc(sizeof(double)
> * n) as array1, array2:

I agree. I'd say that

cdef int[m] array1
cdef double[n] array2

is an even clearer way to declare m ints and n doubles.
However, arrays are a bit painful to deal with: they
can't be returned (short of copying them into another type) or
assigned or resized (perhaps that could be supported; not sure of the
syntax). Just think how many buffer overflows are due to a fixed-size
C array holding user data... it's the classic way to smash the stack.
Perhaps this is OK for function-scoped objects.

>>>> (which could be implemented as C99
>>>> variable sized arrays, with alloca or with malloc, depending on the
>>>> size of the array and the availability of the respective
>>>> functionalities).
>>
>>
>> That could still be done for a context manager, just like we do with
>> gil/nogil blocks today.
>>
>
> Sure (it was more of an observation than an argument).
>
>>
>>> That wouldn't tackle every use case, such as for instance mallocing
>>> stuff in a parallel section (until we get declarations in blocks!),
>>> but special cases can still just malloc and use try blocks, as
>>> demonstrated.
>>
>>
>> I would consider the usage of memory over the whole lifetime of a function
>> the special case, not the other way round.
>
> Yes, but the point is not where to deallocate the memory, the point is
> that you very often don't care. You need it somewhere in the function,
> and deallocation on return is fine (or, "at the end of the block").
> Analogously, you don't 'del' your variables once you have stopped
> using them.
>
> I also gave this functionality some thought for memoryviews, e.g.
>
>    cdef int[:m, :n] myslice # this gets you a view on a cython.array
> of size m * n

I think something like this would be great. The best solution would be
a chunk of memory that's optionally allocated on the stack (depending
on size and scope), but can be passed around and whose lifetime is
gc'd or tied to the lifetime of a Python object as needed.

- Robert

Chris.Barker

unread,
Dec 12, 2011, 3:59:39 PM12/12/11
to cython...@googlegroups.com
On 12/12/11 9:55 AM, Aronne Merrelli wrote:
> I'm understand what you mean in terms of specific syntax (C novice!); I
> think what this means is doing:
>
> cdef double[10] x
>
> instead of
>
> cdef double *x = <double *>malloc(10 * sizeof(double))
> ... stuff ...
> free(x)
>
> Is that correct?

yup.

> I did originally try that but hit a roadblock somewhere
> - I think I had trouble getting returns from functions if they were not
> declared as pointers.

Well, I jsut say this in another note in thisthread:

On 12/12/11 11:48 AM, Robert Bradshaw wrote:

> cdef int[m] array1
> cdef double[n] array2
>
> is an even clearer way to declare m ints and n doubles.
> However, arrays are a bit painful to deal with: they
> can't be returned (short of copying them into another type)

...

which may be you ran into.

But I also meant that you may be able to everything in your extension
with numpy arrays, and not have to deal with raw C arrays or pointers at
all.

It all depends on your use-case, of course.

mark florisson

unread,
Dec 12, 2011, 4:00:13 PM12/12/11
to cython...@googlegroups.com

Sure, but this discussion is for 'with cython.malloc():' vs arrays of
variable size. In either case the lifetime of the memory is bound to
the block or function. If that's not what you want, then you should
obviously use something else.

> Just think how many buffer overflows are due to a fixed-size
> C array holding user data... it's the classic way to smash the stack.
> Perhaps this is OK for function-scoped objects.

I don't understand, if variable sized arrays don't fix that problem
then the code is simply too broken. No form of memory allocation will
protect you from that.

The memory would be always on the heap there, it would simply create a
view of a cython.array. That would be somewhat more heavy-weight than
a regular (variable-sized) array though, and it would require the GIL.

Robert Bradshaw

unread,
Dec 13, 2011, 3:41:50 AM12/13/11
to cython...@googlegroups.com
On Mon, Dec 12, 2011 at 1:00 PM, mark florisson

The advantage of malloc is that you can realloc, variable sized arrays
don't provide this ability. (I agree smashing the stack is a separate
issue, I was pointing out the problem with arrays in general.)

Perhaps we could infer when it's entirely local and no GIL/refcounting
is needed as an optimization. I still think there could be a case for
a Python list-like structure of primitives (in particular
automatically growing, which is a case that arrays/memory views don't
support).

- Robert

Sturla Molden

unread,
Dec 13, 2011, 5:23:26 AM12/13/11
to cython...@googlegroups.com

>On this note, a useful pattern is
>try:
> x = malloc(...)
>finally:
> free(x)
>
>It could be nice to encapsulate this in a context manager. - Robert

I often encapsulate malloc in an extension class, so I can
rely on Python to clean it up for me.

cimport stdlib

cdef class buffer:

cdef void *buf
cdef readonly Py_intptr_t addr

def __cinit__(buffer self, int n):
self.buf = stdlib.malloc(n)
if self.buf == NULL:
raise MemoryError, "malloc(%d) failed" % (n,)
self.addr = <Py_intptr_t> self.buf

def __dealloc__(buffer self):
if (self.buf != 0):
stdlib.free(self.buf)

This also applies to other resource allocation such as fopen/fclose.

When using Cython or C++, it is important to know that an
exception can cause resource leaks if not handled carefully. One of
the most important causes of memory leaks in C++ is use of new[]
and delete[] outside class contructors and deletors. It seems many
programmers are unaware that an exeption can cause parts of the
code to be skipped. The error is even common in C++ textbooks,
which might be the reason it is so common.

Yuor code with try/finally works correctly of course. Though it would
not be possible in C++ as there is no finally. Using class contructors
and destructors is an idiom that works in Cython and C++ alike.

It should perhaps be noted that a simpler options exists. In C++
we can use std::vector and in Cython we can use numpy.ndarray.
AFAIK, we cannot put C++ std::vector on the stack with Cython,
which limits their usefulness in ensuring proper clean-up (we must
call delete on them automatically in Cython).

I can see that a context manager would be useful in Python code
using malloc/free with ctypes, as __del__ might not be called,
e.g. if there is a circular reference, unlike __enter__ and __exit__.
But what would the benefit be in Cython, when the C initialization
and clean-up methods __cinit__ and __dealloc__ are deterministic?


Sturla

Sturla Molden

unread,
Dec 13, 2011, 5:38:51 AM12/13/11
to cython...@googlegroups.com
Den 13.12.2011 09:41, skrev Robert Bradshaw:
> The advantage of malloc is that you can realloc, variable sized arrays
> don't provide this ability. (I agree smashing the stack is a separate
> issue, I was pointing out the problem with arrays in general.)

The only language I know of that handles stack smashing and automatic
arrays gracefully is Fortran 90 (and later). An automatic array might be
placed on the stack or the heap depending on its size. The compilers are
usually smart enough to emit code that knows (or guess) when to use
malloc or alloca (or some equivalents).

Cython could do that too, if anyone cares to implement it ;-)

Sturla

mark florisson

unread,
Dec 13, 2011, 3:13:09 PM12/13/11
to cython...@googlegroups.com

I was thinking that cython.malloc could be implemented however it
liked, so you wouldn't be able to call realloc on such a pointer. In
any case, I think realloc is quite a special case, one that doesn't
need any special language support, especially considering that you're
going to free the pointer when the block exits. You can always resize
your numpy array or realloc your malloced pointer manually, and if you
really claim try/finally is too hard you can use a cdef class with a
destructor like Sturla demonstrated.

Hm, if we'd implement fused types for cdef classes a user could
reasonably easily write a cdef class with list-like behaviour for the
types needed. One could also resize a numpy array, and cython.array
could implement similar behaviour if needed. Often when I use lists I
already have objects though, I don't often find myself needing a list
of primitive types and a situation where the conversion would be too
expensive.

> - Robert

Stefan Behnel

unread,
Dec 13, 2011, 3:31:32 PM12/13/11
to cython...@googlegroups.com
mark florisson, 13.12.2011 21:13:

> On 13 December 2011 08:41, Robert Bradshaw wrote:
>> The advantage of malloc is that you can realloc, variable sized arrays
>> don't provide this ability. (I agree smashing the stack is a separate
>> issue, I was pointing out the problem with arrays in general.)
>
> I was thinking that cython.malloc could be implemented however it
> liked, so you wouldn't be able to call realloc on such a pointer.

I think users should be able to do that if they want.


> In
> any case, I think realloc is quite a special case, one that doesn't
> need any special language support, especially considering that you're
> going to free the pointer when the block exits. You can always resize
> your numpy array or realloc your malloced pointer manually

What's wrong with

with cython.malloc(x) as mem:
# do stuff with mem[]
mem.realloc(y) # raise MemoryError on allocation failure
# do more stuff with mem[]

?


> and if you
> really claim try/finally is too hard you can use a cdef class with a
> destructor like Sturla demonstrated.

Destructors have the disadvantage that they are not guaranteed to get
called immediately when the current reference to the object dies. The
"with" statement is made to guarantee exactly this.

Stefan

mark florisson

unread,
Dec 13, 2011, 3:55:15 PM12/13/11
to cython...@googlegroups.com
On 13 December 2011 20:31, Stefan Behnel <stef...@behnel.de> wrote:
> mark florisson, 13.12.2011 21:13:
>
>> On 13 December 2011 08:41, Robert Bradshaw wrote:
>>>
>>> The advantage of malloc is that you can realloc, variable sized arrays
>>> don't provide this ability. (I agree smashing the stack is a separate
>>> issue, I was pointing out the problem with arrays in general.)
>>
>>
>> I was thinking that cython.malloc could be implemented however it
>> liked, so you wouldn't be able to call realloc on such a pointer.
>
>
> I think users should be able to do that if they want.
>
>

Users could trivially write such a class themselves though. Again, I
think reallocing memory is quite specific and doesn't deserve any
special language support. A language doesn't have to tackle every
problem in the most convenient way possible, and variable sized arrays
are simply much more intuitive and easier to use as the type is
already in there and you don't see any ugly arithmetic and pointers.

>> In
>> any case, I think realloc is quite a special case, one that doesn't
>> need any special language support, especially considering that you're
>> going to free the pointer when the block exits. You can always resize
>> your numpy array or realloc your malloced pointer manually
>
>
> What's wrong with
>
>  with cython.malloc(x) as mem:
>      # do stuff with mem[]
>      mem.realloc(y)  # raise MemoryError on allocation failure
>      # do more stuff with mem[]
>
> ?
>

How does cython.malloc() know the type? How can it know what to return
through indexing?

I think 'cdef int[:m, :n] myslice' would be better suited for that.
The memoryviewslice struct could set the memoryview object to NULL
and keep a pointer to the runtime type information. When it would then
be coerced to an object it could create a cython.array and it could go
through the buffer interface. It simply deallocates memory when all
references are lost, and it could work without the GIL.

>
>> and if you
>> really claim try/finally is too hard you can use a cdef class with a
>> destructor like Sturla demonstrated.
>
>
> Destructors have the disadvantage that they are not guaranteed to get called
> immediately when the current reference to the object dies. The "with"
> statement is made to guarantee exactly this.

Again I would argue that I am extremely skeptical of your supposed
need to deallocate the memory immediately. Besides, the class wouldn't
have any reference cycles, so I don't see why it wouldn't deallocate
it the object and memory immediately.

> Stefan

Chris Barker

unread,
Dec 13, 2011, 4:13:44 PM12/13/11
to cython...@googlegroups.com
On Tue, Dec 13, 2011 at 12:41 AM, Robert Bradshaw <robe...@math.washington.edu> wrote:
 I still think there could be a case for
a Python list-like structure of primitives (in particular
automatically growing, 

absolutely -- there have been a number of discussion about that on this list, and other places.

A Cython-only version would be great, but even better would be a numpy or numpy compatible version. I"ve written one in Python for numpy but the performance isn't great, and you can't access it conveniently with Cython -- so a Cython solution would be very nice.

-Chris
 
Reply all
Reply to author
Forward
0 new messages