Accessing mmap object's buffer interface

442 views
Skip to first unread message

pelletie...@gmail.com

unread,
Oct 13, 2013, 12:06:52 AM10/13/13
to pytho...@googlegroups.com
Hello,

Thank you so much for cffi, it is a really amazing module and I'm having a lot of fun with it.

I have a question about accessing a mmap objects buffer interface.  I don't know much about this little used corner of the language, but as I gather mmap objects support the buffer interface that allows the memory "backing" the mmap to be accessed directly without copying.  There is a function that the zeromq library exposes, called zmq_zmsg_init_data that accepts a pointer to an existing buffer that is to be used as the source of a message object to do "zero-copy" (really one-less-copy) messaging.  Here's my code:

    mapped = mmap.mmap(f.fileno(), 0)
    ...
    for i in range(numpoints):
        msg = ffi.new('zmq_msg_t*')
        pos = psize*i
        zmq.msg_init_data(
            msg,
            buffer(mapped, pos, psize),
            psize,
            free,
            ffi.NULL)
        zmq.sendmsg(sock, msg, 0)

The mapped file  a sequence of 'numpoints' data chunks 'psize' in length.  but unfortunately that doesn't work, cffi throws a typerror that it needs void* instead of a buffer.  If I do buffer(mapped)[pos:pos+psize] it works, but I suspect that's causing a copy operation, although I'm not certain.  It seems like the right thing to do would be to accept a buffer object in place of a cdata pointer, but again I have really limited knowledge of what the right approach here is.  I know what I want from a high level, the already mapped memory being used as the buffer for the message, with no copying in or out, but I'm not sure how to get there.

Any tips?

-Michel

pelletie...@gmail.com

unread,
Oct 14, 2013, 3:10:35 AM10/14/13
to pytho...@googlegroups.com
I think I've dug up enough to know what's going on here. From what I can reckon the Python 2 mmap module does not support the new Py_buffer that the Python 3 module supports, and in theory this would work in Python 3, but I decided to take another approach and just wrap mmap myself with cffi. :)  Now it works great!

-Michel

Armin Rigo

unread,
Oct 16, 2013, 11:04:50 AM10/16/13
to pytho...@googlegroups.com
Hi Michel,

On Mon, Oct 14, 2013 at 9:10 AM, <pelletie...@gmail.com> wrote:
> I think I've dug up enough to know what's going on here. From what I can
> reckon the Python 2 mmap module does not support the new Py_buffer that the
> Python 3 module supports, and in theory this would work in Python 3, but I
> decided to take another approach and just wrap mmap myself with cffi. :)
> Now it works great!

:-)

I think at some point we'll need to add CFFI support for general
Python buffer objects, at least to turn them into "void *" cffi
pointers. So far I've been a bit reluctant to do it because PyPy
doesn't fully support it, but I think it's fine if we add it anyway,
together with sanity checks like "it's the buffer of a string, you
shouldn't use that function for that case".


A bientôt,

Armin.

pelletie...@gmail.com

unread,
Oct 16, 2013, 4:20:05 PM10/16/13
to pytho...@googlegroups.com, ar...@tunes.org
Hi Armin, thanks for the reply.

Here's a more detailed example of how I'm (not) using buffer:


it requires:



zmq_msg_init_data takes a void * and size of the message data to be sent.  I want to pass a pointer
into the mapped file's space and slice out 'psize' chunks over the file without copying.

The heart of it is around line 67.  mmap returns a void pointer, but I need to offset into that void*
in order to slice just one message out of the file.  I'm currently doing that with an ugly cast to 
int, increment, and cast back to void*.  But it has the positive attribute of seeming to
work correctly. :)

Armin Rigo

unread,
Oct 16, 2013, 5:12:14 PM10/16/13
to pytho...@googlegroups.com
Hi Michel,

On Wed, Oct 16, 2013 at 10:20 PM, <pelletie...@gmail.com> wrote:
> The heart of it is around line 67. mmap returns a void pointer, but I need
> to offset into that void* in order to slice just one message out of the file.

The C type "char *" can be incremented in bytes, so you can do it more simply:

ffi.cast("char *", p) + offset


A bientôt,

Armin.

Simon Sapin

unread,
Oct 16, 2013, 11:12:55 AM10/16/13
to pytho...@googlegroups.com
Le 16/10/2013 16:04, Armin Rigo a �crit :
Is it only 'str' objects that can move in memory? (Making any pointer
invalid.)

--
Simon Sapin

Armin Rigo

unread,
Oct 18, 2013, 12:44:21 PM10/18/13
to pytho...@googlegroups.com
Hi Simon,

On Wed, Oct 16, 2013 at 5:12 PM, Simon Sapin <simon...@exyr.org> wrote:
> Is it only 'str' objects that can move in memory? (Making any pointer
> invalid.)

Strings and bytearrays are the most obvious problems for PyPy. For
the rest of the objects supporting the buffer API, like arrays and
mmap, they are implemented as regular pointers to fixed memory anyway
in PyPy too. So far it doesn't provide a "raw" buffer interface for
them ("give me a real pointer to your data"). But this can be added.


A bientôt,

Armin.
Reply all
Reply to author
Forward
0 new messages