[Python-Dev] ctypes, memory mapped files and context manager

398 views
Skip to first unread message

Hans-Peter Jansen

unread,
Jan 4, 2017, 7:43:03 PM1/4/17
to pytho...@python.org
Hi,

first of all, sorry for being such a pest, but all former attempts to solve
this issue on other fora has been grinding to a halt.

In short: I try to combine a context manager with ctypes structures on memory
mapped files in order to manage huge binary files. (An approach, that performs
great, while easy to use and keeping the resource usage low).

FWIW, the code is targeted for Linux environments running Python3.

The smallest script demonstrating the issue (thanks to Peter Otten):

import ctypes
import mmap

from contextlib import contextmanager

class T(ctypes.Structure):
_fields = [("foo", ctypes.c_uint32)]


@contextmanager
def map_struct(m, n):
m.resize(n * mmap.PAGESIZE)
yield T.from_buffer(m)

SIZE = mmap.PAGESIZE * 2
f = open("tmp.dat", "w+b")
f.write(b"\0" * SIZE)
f.seek(0)
m = mmap.mmap(f.fileno(), mmap.PAGESIZE)

with map_struct(m, 1) as a:
a.foo = 1
with map_struct(m, 2) as b:
b.foo = 2


resulting in:
$ python3 mmap_test.py
Traceback (most recent call last):
File "mmap_test.py", line 23, in <module>
with map_struct(m, 2) as b:
File "/usr/lib64/python3.4/contextlib.py", line 59, in __enter__
return next(self.gen)
File "mmap_test.py", line 12, in map_struct
m.resize(n * mmap.PAGESIZE)
BufferError: mmap can't resize with extant buffers exported.


Python2 does not crash, but that's a different story. What happens here is:
the context manager variable "a" keeps a reference to a memory mapped area
alive, that results in a unresizable and not properly closable mmap.

Right now, this rather ugly and error prone workaround must be used, that
renders the purpose of the context manager ad absurdum:

with map_struct(m, 1) as a:
a.foo = 1
del a
with map_struct(m, 2) as b:
b.foo = 2
del b

In order to get this working properly, the ctypes mapping needs a method to
free the mapping actively. E.g.:

@contextmanager
def map_struct(m, n):
m.resize(n * mmap.PAGESIZE)
yield T.from_buffer(m)
T.unmap_buffer(m)

Other attempts with weakref and the like do not work due to the nature of the
ctypes types.

My own investigations in the _ctypes module were unsuccessful so far.

Hopefully, somebody in the audience cares enough for this module in order to
get this fixed up (or probably I'm missing something obvious..).

Any ideas are very much appreciated.

Pete
_______________________________________________
Python-Dev mailing list
Pytho...@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Nick Coghlan

unread,
Jan 4, 2017, 9:39:05 PM1/4/17
to Hans-Peter Jansen, pytho...@python.org
On 5 January 2017 at 10:28, Hans-Peter Jansen <h...@urpla.net> wrote:
> In order to get this working properly, the ctypes mapping needs a method to
> free the mapping actively. E.g.:
>
> @contextmanager
> def map_struct(m, n):
> m.resize(n * mmap.PAGESIZE)
> yield T.from_buffer(m)
> T.unmap_buffer(m)
>
> Other attempts with weakref and the like do not work due to the nature of the
> ctypes types.

I don't know ctypes well enough myself to comment on the idea of
offering fully deterministic cleanup, but the closest you could get to
that without requiring a change to ctypes is to have the context
manager introduce a layer of indirection:

class _T_data(ctypes.Structure):
_fields = [("foo", ctypes.c_uint32)]

class T:
def __init__(self, buffer):
self.data = _T_data.from_buffer(buffer)
def close(self):
self.data = None

@contextmanager
def map_struct(m, n):
m.resize(n * mmap.PAGESIZE)
mapped = T(m)
try:
yield mapped
finally:
mapped.close()

Client code would then need to consistently access the struct through
the data attribute:

with map_struct(m, 1) as a:
a.data.foo = 1
with map_struct(m, 2) as b:
b.data.foo = 2

Something like http://wrapt.readthedocs.io/en/latest/wrappers.html#object-proxy
would let you make the indirection to a contained object transparent,
but as far as I can tell, wrapt doesn't currently support "closing" a
proxy by replacing the reference to the internal object with a
reference to None (adding that might be possible, but I don't
personally know wrapt well enough to guess the feasibility of doing
so).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Hans-Peter Jansen

unread,
Jan 5, 2017, 9:11:34 AM1/5/17
to Nick Coghlan, pytho...@python.org
Hi Nick,
Thank you very much. Nice idea, indeed.

Here's a slightly more complex example incorporating your idea:
https://gist.github.com/frispete/97c27e24a0aae1bcaf1375e2e463d239#file-ctypes_mmap_ctx2-py

I'm not sure, if I like the resulting code more than the dreaded dels.
Real code based on this approach tends to be much more complex, and
suffers appropriately.

Anyway, your solution is working fine, and provides a choice.
Much appreciated, Nick.

@ctypes developers: with an unmap operation available, we wouldn't need
to go through these hoops, and ctypes powers would become even more
accessible for such cool use cases ;)...

For now, I'm about to resign from using a context manager at all, since
it uglifies the code in one way or another without buying much..

Cheers,
Pete

eryk sun

unread,
Jan 5, 2017, 10:32:58 AM1/5/17
to pytho...@python.org
On Thu, Jan 5, 2017 at 2:37 AM, Nick Coghlan <ncog...@gmail.com> wrote:
> On 5 January 2017 at 10:28, Hans-Peter Jansen <h...@urpla.net> wrote:
>> In order to get this working properly, the ctypes mapping needs a method to
>> free the mapping actively. E.g.:
>>
>> @contextmanager
>> def map_struct(m, n):
>> m.resize(n * mmap.PAGESIZE)
>> yield T.from_buffer(m)
>> T.unmap_buffer(m)
>>
>> Other attempts with weakref and the like do not work due to the nature of the
>> ctypes types.
>
> I don't know ctypes well enough myself to comment on the idea of
> offering fully deterministic cleanup, but the closest you could get to
> that without requiring a change to ctypes is to have the context
> manager introduce a layer of indirection:

I think that's the best you can do with the current state of ctypes.

from_buffer was made safer in Python 3 by ensuring it keeps a
memoryview reference in the _objects attribute (i.e.
CDataObject.b_objects). Hans-Peter's problem is a consequence of this
reference. Simply calling release() on the underlying memoryview is
unsafe. For example:

>>> b = bytearray(2**20)
>>> a = ctypes.c_char.from_buffer(b)
>>> a._objects
<memory at 0x7f04283b8dc8>
>>> a._objects.release()
>>> del b
>>> a.value
Segmentation fault (core dumped)

A release() method on ctypes objects could release the memoryview and
also clear the CDataObject b_ptr field. In this case, any function
that accesses b_ptr would have to be modified to raise a ValueError
for a NULL value. Currently ctypes assumes b_ptr is valid, so this
would require adding a lot of checks.

On a related note, ctypes objects aren't tracking the number of
exported views like they should. resize() should raise a BufferError
in the following example:

>>> b = (ctypes.c_char * (2**20))(255)
>>> m = memoryview(b).cast('B')
>>> m[0]
255
>>> ctypes.resize(b, 2**22)
>>> m[0]
Segmentation fault (core dumped)

Hans-Peter Jansen

unread,
Jan 5, 2017, 6:30:21 PM1/5/17
to pytho...@python.org
Hi Eryk,

On Donnerstag, 5. Januar 2017 15:30:33 eryk sun wrote:
>
> > manager introduce a layer of indirection:
> I think that's the best you can do with the current state of ctypes.
>
> from_buffer was made safer in Python 3 by ensuring it keeps a
> memoryview reference in the _objects attribute (i.e.
> CDataObject.b_objects). Hans-Peter's problem is a consequence of this
> reference. Simply calling release() on the underlying memoryview is
>
> unsafe. For example:
> >>> b = bytearray(2**20)
> >>> a = ctypes.c_char.from_buffer(b)
> >>> a._objects
>
> <memory at 0x7f04283b8dc8>
>
> >>> a._objects.release()
> >>> del b
> >>> a.value
>
> Segmentation fault (core dumped)

This is exactly, what I was after:

@contextmanager
def cstructmap(cstruct, mm, offset = 0):
# resize the mmap (and backing file), if structure exceeds mmap size
# mmap size must be aligned to mmap.PAGESIZE
cssize = ctypes.sizeof(cstruct)
if offset + cssize > mm.size():
newsize = align(offset + cssize, mmap.PAGESIZE)
mm.resize(newsize)
cmap = cstruct.from_buffer(mm, offset)
try:
yield cmap
finally:
for mv in cmap._objects.values():
mv.release()

See also:
https://gist.github.com/frispete/97c27e24a0aae1bcaf1375e2e463d239#file-ctypes_mmap_ctx3-py

While technically possible (which is a surprise for me on its own), nothing
should access the with variable after the block has finished. If that happens,
a segfault is exactly was it deserves IMHO.

Leaves the question, how stable this "interface" is?
Accessing _objects here belongs to voodoo programming practices of course, but
the magic is locally limited to just two lines of code, which is acceptable in
order to get this context manager working without messing with the rest of the
code.

Opinions?

Thanks,
Pete

Hans-Peter Jansen

unread,
Jan 5, 2017, 7:06:53 PM1/5/17
to pytho...@python.org
On Freitag, 6. Januar 2017 00:28:37 Hans-Peter Jansen wrote:
> Hi Eryk,
>
> This is exactly, what I was after:
>
> @contextmanager
> def cstructmap(cstruct, mm, offset = 0):
> # resize the mmap (and backing file), if structure exceeds mmap size
> # mmap size must be aligned to mmap.PAGESIZE
> cssize = ctypes.sizeof(cstruct)
> if offset + cssize > mm.size():
> newsize = align(offset + cssize, mmap.PAGESIZE)
> mm.resize(newsize)
> cmap = cstruct.from_buffer(mm, offset)
> try:
> yield cmap
> finally:
> for mv in cmap._objects.values():
if isinstance(mv, memoryview):
mv.release()

It happens, that _objects contain other objects as well...

Cheers,

eryk sun

unread,
Jan 5, 2017, 9:39:11 PM1/5/17
to pytho...@python.org
On Thu, Jan 5, 2017 at 11:28 PM, Hans-Peter Jansen <h...@urpla.net> wrote:
> Leaves the question, how stable this "interface" is?
> Accessing _objects here belongs to voodoo programming practices of course, but
> the magic is locally limited to just two lines of code, which is acceptable in
> order to get this context manager working without messing with the rest of the
> code.

My intent was not to suggest that anyone directly use the _objects
value / dict in production code. It's a private implementation
detail. I was demonstrating the problem of simply releasing the buffer
and the large number of checks that would be required if b_ptr is
cleared. It would be simpler for a release() method to allocate new
memory for the object and set the b_needsfree flag, but this may hide
bugs. Operating on a released object should raise an exception.

Armin Rigo

unread,
Jan 8, 2017, 3:27:42 AM1/8/17
to Hans-Peter Jansen, Python Dev
Hi Hans-Peter,

On 6 January 2017 at 00:28, Hans-Peter Jansen <h...@urpla.net> wrote:
> Leaves the question, how stable this "interface" is?

Another way to jump through hoops:

c_raw = ctypes.PYFUNCTYPE(ctypes.c_void_p, ctypes.c_void_p)(lambda p: p)

addr = c_raw(ctypes.pointer(T.from_buffer(m)))
b = ctypes.cast(addr, ctypes.POINTER(T)).contents

These lines give an object 'b' that is equivalent to
'T.from_buffer(m)', but doesn't hold any reference or any "opened
buffer" state to the original 'm'. Your context manager can yield
that. It should prevent all BufferErrors, at the price of segfaulting
if used incorrectly. This means in your case that ``with
map_struct(..) as a:`` should not continue to use ``a`` after the
``with`` statement, which is pretty natural anyway.

(The same issue occurs with cffi instead of ctypes, but in this case a
simple cast is enough to detach the memoryview, instead of the hack
above.)


A bientôt,

Armin.

eryk sun

unread,
Jan 8, 2017, 10:23:38 AM1/8/17
to Python Dev
On Sun, Jan 8, 2017 at 8:25 AM, Armin Rigo <armin...@gmail.com> wrote:
>
> c_raw = ctypes.PYFUNCTYPE(ctypes.c_void_p, ctypes.c_void_p)(lambda p: p)

Use ctypes.addressof.

> addr = c_raw(ctypes.pointer(T.from_buffer(m)))
> b = ctypes.cast(addr, ctypes.POINTER(T)).contents

ctypes.cast uses an FFI call. In this case you can more simply use from_address:

b = T.from_address(ctypes.addressof(T.from_buffer(m)))

There's no supporting connection between b and m. If m was allocated
from a heap/pool/freelist, as opposed to a separate mmap
(VirtualAlloc) call, then you won't necessarily get a segfault (access
violation) if b is used after m has been deallocated or internally
realloc'd. It can lead to corrupt data and difficult to diagnose
errors. You're lucky if it segfaults.
Reply all
Reply to author
Forward
0 new messages