using context manager for C malloc/free

158 views
Skip to first unread message

Max Bachmann

unread,
Apr 24, 2021, 12:04:00 PM4/24/21
to cython-users
I have a lot of code which looks like this:

```
var1 = malloc(...)
try:
    var2 = malloc(...)
    try:
        do_something(var1, var2)
    finally:
        free(var2)
finally:
    free(var1)
```

It would be a lot simpler to write this as:
```
with Manager(...) as var1, Manager(...) as var2:
    do_something(var1, var2)
```

From my experiments it appears like I can write this manager in the following way:

```
cdef class Manager:
    cdef type name

    def __cinit__(self, ...):
        name = malloc(...)
    
    cdef type __enter__(self):
        return self.name

    def __exit__(self, type, value, tb):
        free(self.name)
```
It was required to use `def __exit__` (but it appears like cython generates a C function for it) and `cdef type __enter__` so the type of the return value is known.
However I wanted to double check whether this is actually the correct way to write this.
I use this context manager in multiple files. Because of the `def` functions it appears like I can not place it in a pxd file  and have to define it in each of the `pyx` files. Is this correct, or is there a way to define this once and import it from the other `pyx` files?
Thanks in advance
        

Stefan Behnel

unread,
Apr 24, 2021, 12:15:42 PM4/24/21
to cython...@googlegroups.com
'Max Bachmann' via cython-users schrieb am 24.04.21 um 17:52:
> I have a lot of code which looks like this:
>
> ```
> var1 = malloc(...)
> try:
> var2 = malloc(...)
> try:
> do_something(var1, var2)
> finally:
> free(var2)
> finally:
> free(var1)
> ```
>
> It would be a lot simpler to write this as:
> ```
> with Manager(...) as var1, Manager(...) as var2:
> do_something(var1, var2)
> ```
>
> From my experiments it appears like I can write this manager in the
> following way:
>
> ```
> cdef class Manager:
> cdef type name
>
> def __cinit__(self, ...):
> name = malloc(...)
>
> cdef type __enter__(self):
> return self.name
>
> def __exit__(self, type, value, tb):
> free(self.name)
> ```

Looks good to me. (I'd probably use adifferent names for the class and its
attribute, e.g. "Malloc" and "data".)


> It was required to use `def __exit__` (but it appears like cython generates
> a C function for it) and `cdef type __enter__` so the type of the return
> value is known.

There is a slight overhead in the "__exit__" method because it takes three
Python object arguments, but it's not big.

It cannot currently be a cdef method because something needs to keep the
context manager object alive until the method is called. That could be
resolved differently, but that's how it currently is.


> However I wanted to double check whether this is actually the correct way
> to write this.
> I use this context manager in multiple files. Because of the `def`
> functions it appears like I can not place it in a pxd file and have to
> define it in each of the `pyx` files. Is this correct, or is there a way to
> define this once and import it from the other `pyx` files?

You can implement it in one pyx module, add a .pxd file to it that declares
and exports the context manager class, and then cimport it from other
modules. Those will then import the implementing module when they are
loaded and use the same class.

Stefan

Max Bachmann

unread,
Apr 24, 2021, 12:16:09 PM4/24/21
to cython-users
I just rechecked the code and realised that the ContextManager I posted still generates a lot of Python calls (the context manager more than doubles the runtime of my functions).
So right now my only working solution is to manually write a ton of nested `try...finally` statements.

Max Bachmann

unread,
Apr 24, 2021, 1:56:31 PM4/24/21
to cython-users
> Looks good to me. (I'd probably use adifferent names for the class and its
> attribute, e.g. "Malloc" and "data".)
The naming is purely for this example. My real implementation does some more things and has a better fitting name.

> There is a slight overhead in the "__exit__" method because it takes three
> Python object arguments, but it's not big.

Since my functions are pretty fast this overhead is really substantial in my use case:
As a comparision I benchmarked the calls per second with usual input:
try..finally:              ~6.000.000
context manager: ~2.600.000
So the context manager costs me about ~300ns per element (~150 per context manager)

This impact is a lot worse in some other functions, which compares a single element to a list of elements (save the overhead of multiple Python function calls),
which can process up to 70 million elements per second. So due to the overhead of the context manager for each element of the list, the runtime increases from
~14ns per element to ~164ns -> runtime increase of more than 10x (for a similar reason I do not use `with nogil` in any functions that do not use multiprocessing themselves).

I guess I will write a jinja2 template to auto generate the try..finally in my pyx files, since I require this in a lot of places.

> You can implement it in one pyx module, add a .pxd file to it that declares
> and exports the context manager class, and then cimport it from other
> modules. Those will then import the implementing module when they are
> loaded and use the same class.
Thats what I was searching for

Max

Stefan Behnel

unread,
Apr 24, 2021, 2:09:52 PM4/24/21
to cython...@googlegroups.com
'Max Bachmann' via cython-users schrieb am 24.04.21 um 18:59:
>> Looks good to me. (I'd probably use adifferent names for the class and its
>> attribute, e.g. "Malloc" and "data".)
> The naming is purely for this example. My real implementation does some
> more things and has a better fitting name.
>
>> There is a slight overhead in the "__exit__" method because it takes three
>> Python object arguments, but it's not big.
>
> Since my functions are pretty fast this overhead is really substantial in
> my use case:
> As a comparision I benchmarked the calls per second with usual input:
> try..finally: ~6.000.000
> context manager: ~2.600.000
> So the context manager costs me about ~300ns per element (~150 per context
> manager)
>
> This impact is a lot worse in some other functions, which compares a single
> element to a list of elements (save the overhead of multiple Python
> function calls),
> which can process up to 70 million elements per second. So due to the
> overhead of the context manager for each element of the list, the runtime
> increases from
> ~14ns per element to ~164ns -> runtime increase of more than 10x (for a
> similar reason I do not use `with nogil` in any functions that do not use
> multiprocessing themselves).

In that case, I would actually consider doing away with the malloc() calls
all together and using pre-allocated memory. Even if the sizes differ,
maybe you don't need that many different memory block sizes and can get
away with pre-allocating a bunch of different sizes (or doing it at need),
and then collecting them in a list or dict. Or just overallocate and reuse
a single large chunk of memory. You can also use realloc() to grow that
chunk if you notice along the way that you need a larger one.

Stefan

Max Bachmann

unread,
Apr 24, 2021, 4:15:08 PM4/24/21
to cython-users
> In that case, I would actually consider doing away with the malloc() calls
> all together and using pre-allocated memory. Even if the sizes differ,
> maybe you don't need that many different memory block sizes and can get
> away with pre-allocating a bunch of different sizes (or doing it at need),
> and then collecting them in a list or dict. Or just overallocate and reuse
> a single large chunk of memory. You can also use realloc() to grow that
> chunk if you notice along the way that you need a larger one.


Thats a good idea. I already do not allocate memory for some objects types:
- for strings I directly use the internal buffer (uint8_t*, uint16_t*, uint32_t*)
- for arrays I use the internal buffer on cpython (on pypy I copy all elements)
- In the future it should use the internal buffer for numpy arrays as well

At least for the non multiprocessed functions I could definitely reuse the buffer (the 14 ns is only for functions that do not need to allocate any memory like strings).
The size is not really known ahead of time, but I guess using realloc when needed is good enough. In general anyone who wants good performance should use the functions
with strings (or on cpython arrays) anyways. Otherwise the function has to iterate over the Python object and copy all elements, which is slow.
When multiprocessing I have to allocate the memory for all strings ahead of time anyways, since I have to release the GIL.


Max
Reply all
Reply to author
Forward
0 new messages