[Python-3000] Using memoryviews

10 views
Skip to first unread message

M.-A. Lemburg

unread,
Nov 20, 2008, 5:12:49 AM11/20/08
to Python 3000
I've had a look at the new memoryview and associated buffer API
and have a question: how is a C extension supposed to use the buffer
API without going directly into the C struct Py_buffer ?

I have not found any macros for accessing Py_buffer internals and
the docs mention the struct members directly (which is a bit unusual
for the Python C API).

Shouldn't there be a set of macros providing some form of abstraction
for the struct members ?

BTW: I was looking for a suitable replacement for the buffer object
which isn't available in Python 3 anymore.

Thanks,
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 20 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2008-11-12: Released mxODBC.Connect 0.9.3 http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
_______________________________________________
Python-3000 mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: http://mail.python.org/mailman/options/python-3000/python-3000-garchive-63646%40googlegroups.com

Josiah Carlson

unread,
Nov 21, 2008, 11:30:30 AM11/21/08
to M.-A. Lemburg, Python 3000
On Thu, Nov 20, 2008 at 2:12 AM, M.-A. Lemburg <m...@egenix.com> wrote:
> I've had a look at the new memoryview and associated buffer API
> and have a question: how is a C extension supposed to use the buffer
> API without going directly into the C struct Py_buffer ?
>
> I have not found any macros for accessing Py_buffer internals and
> the docs mention the struct members directly (which is a bit unusual
> for the Python C API).
>
> Shouldn't there be a set of macros providing some form of abstraction
> for the struct members ?
>
> BTW: I was looking for a suitable replacement for the buffer object
> which isn't available in Python 3 anymore.
>
> Thanks,
> --
> Marc-Andre Lemburg
> eGenix.com

>From what I understand of the memoryview when I tried to do the same
thing a few months ago (use memoryview to replace buffer in
asyncore/asynchat), memoryview is incomplete. It didn't support
character buffer slicing (you know, the 'offset' and 'size' arguments
that were in buffer), and at least a handful of other things (that I
can't remember at the moment).

- Josiah

M.-A. Lemburg

unread,
Nov 21, 2008, 2:41:51 PM11/21/08
to Josiah Carlson, Python 3000
On 2008-11-21 17:30, Josiah Carlson wrote:
> On Thu, Nov 20, 2008 at 2:12 AM, M.-A. Lemburg <m...@egenix.com> wrote:
>> I've had a look at the new memoryview and associated buffer API
>> and have a question: how is a C extension supposed to use the buffer
>> API without going directly into the C struct Py_buffer ?
>>
>> I have not found any macros for accessing Py_buffer internals and
>> the docs mention the struct members directly (which is a bit unusual
>> for the Python C API).
>>
>> Shouldn't there be a set of macros providing some form of abstraction
>> for the struct members ?
>>
>> BTW: I was looking for a suitable replacement for the buffer object
>> which isn't available in Python 3 anymore.
>>
>> Thanks,
>> --
>> Marc-Andre Lemburg
>> eGenix.com
>
>>From what I understand of the memoryview when I tried to do the same
> thing a few months ago (use memoryview to replace buffer in
> asyncore/asynchat), memoryview is incomplete. It didn't support
> character buffer slicing (you know, the 'offset' and 'size' arguments
> that were in buffer), and at least a handful of other things (that I
> can't remember at the moment).

True, memoryview objects aren't as useful in Python as the underlying
Py_buffer "C" objects are in the C API.

But then I only need it to signal "this is binary data" for the purpose
of using the memoryview in DB-API extensions.

However, this would only be of effective use if there's a documented way
of accessing the actual C char* buffer behind the object, instead of
having to allocate a new buffer and copy the data over - only to reference
it like that.

In the past, we've always tried to provide abstract access methods to
C struct internals of Python objects and I wonder whether this was
deliberately not done for Py_buffer structs or simply not considered.

I don't think it's a good idea to use my_Py_buffer->buf in a C
extension and would rather like to write:

Py_Buffer_AS_BUFFER(my_Py_buffer)
Py_Buffer_GET_SIZE(my_Py_buffer)
Py_Buffer_GET_ITEM_SIZE(my_Py_buffer)
etc.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 21 2008)


>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2008-11-12: Released mxODBC.Connect 0.9.3 http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Benjamin Peterson

unread,
Nov 21, 2008, 4:34:29 PM11/21/08
to M.-A. Lemburg, Python 3000
On Fri, Nov 21, 2008 at 1:41 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>
> In the past, we've always tried to provide abstract access methods to
> C struct internals of Python objects and I wonder whether this was
> deliberately not done for Py_buffer structs or simply not considered.
>
> I don't think it's a good idea to use my_Py_buffer->buf in a C
> extension and would rather like to write:
>
> Py_Buffer_AS_BUFFER(my_Py_buffer)
> Py_Buffer_GET_SIZE(my_Py_buffer)
> Py_Buffer_GET_ITEM_SIZE(my_Py_buffer)
> etc.

I think that's a good idea, too, and we should get something like that
in for 3.1. I rather feel like the new buffer API slipped in without
any real review.

--
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

Nick Coghlan

unread,
Nov 21, 2008, 6:34:09 PM11/21/08
to Benjamin Peterson, Python 3000
Benjamin Peterson wrote:
> On Fri, Nov 21, 2008 at 1:41 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>> In the past, we've always tried to provide abstract access methods to
>> C struct internals of Python objects and I wonder whether this was
>> deliberately not done for Py_buffer structs or simply not considered.
>>
>> I don't think it's a good idea to use my_Py_buffer->buf in a C
>> extension and would rather like to write:
>>
>> Py_Buffer_AS_BUFFER(my_Py_buffer)
>> Py_Buffer_GET_SIZE(my_Py_buffer)
>> Py_Buffer_GET_ITEM_SIZE(my_Py_buffer)
>> etc.
>
> I think that's a good idea, too, and we should get something like that
> in for 3.1. I rather feel like the new buffer API slipped in without
> any real review.

The review that was done was actually quite extensive - see PEP 3118.
However:
1. There's a reason 3118 is still at accepted rather than final - the
major foundations (and the all-important underlying protocol) are in
place, but there are finishing touches still needed.
2. The review of the PEP focused on the power and capabilities of the
underlying protocol and less on the aesthetics of the C API.

The PEP was fairly explicit that the fields in the Py_buffer struct were
public and accessed directly via C syntax though, as are the current
docs (http://docs.python.org/dev/3.0/c-api/buffer.html).

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

Benjamin Peterson

unread,
Nov 21, 2008, 6:52:20 PM11/21/08
to Nick Coghlan, Python 3000
On Fri, Nov 21, 2008 at 5:34 PM, Nick Coghlan <ncog...@gmail.com> wrote:
> Benjamin Peterson wrote:
>> On Fri, Nov 21, 2008 at 1:41 PM, M.-A. Lemburg <m...@egenix.com> wrote:
>>> In the past, we've always tried to provide abstract access methods to
>>> C struct internals of Python objects and I wonder whether this was
>>> deliberately not done for Py_buffer structs or simply not considered.
>>>
>>> I don't think it's a good idea to use my_Py_buffer->buf in a C
>>> extension and would rather like to write:
>>>
>>> Py_Buffer_AS_BUFFER(my_Py_buffer)
>>> Py_Buffer_GET_SIZE(my_Py_buffer)
>>> Py_Buffer_GET_ITEM_SIZE(my_Py_buffer)
>>> etc.
>>
>> I think that's a good idea, too, and we should get something like that
>> in for 3.1. I rather feel like the new buffer API slipped in without
>> any real review.
>
> The review that was done was actually quite extensive - see PEP 3118.
> However:
> 1. There's a reason 3118 is still at accepted rather than final - the
> major foundations (and the all-important underlying protocol) are in
> place, but there are finishing touches still needed.
> 2. The review of the PEP focused on the power and capabilities of the
> underlying protocol and less on the aesthetics of the C API.

I'm not talking necessarily about the PEP and API. I find the
implementation confusing and contradictory in some places.

>
> The PEP was fairly explicit that the fields in the Py_buffer struct were
> public and accessed directly via C syntax though, as are the current
> docs (http://docs.python.org/dev/3.0/c-api/buffer.html).

Well, I wrote those based on the PEP. :)

--
Cheers,
Benjamin Peterson
"There's nothing quite as beautiful as an oboe... except a chicken
stuck in a vacuum cleaner."

Antoine Pitrou

unread,
Nov 22, 2008, 7:18:31 PM11/22/08
to pytho...@python.org
Josiah Carlson <josiah.carlson <at> gmail.com> writes:
>
> From what I understand of the memoryview when I tried to do the same
> thing a few months ago (use memoryview to replace buffer in
> asyncore/asynchat), memoryview is incomplete. It didn't support
> character buffer slicing (you know, the 'offset' and 'size' arguments
> that were in buffer), and at least a handful of other things (that I
> can't remember at the moment).

You should try again, memoryview now supports slicing (with the usual Python
syntax, e.g. m[2:5]) as well as slice assignment (with the fairly sensible
limitation that you can't resize the underlying buffer). There's no real doc for
it, but you can look at test_memoryview.py in the Lib/test directory to have a
fairly comprehensive list of the things currently supported.

I also support the addition of official functions or macros to access the
underlying fields of the Py_buffer struct, rather than access them directly from
3rd party code. Someone please open an issue for that in the tracker.

The big, big limitation of memoryviews right now is that they only support
one-dimensional byte buffers. The people interested in more complex arrangements
(that is, Scipy/Numpy people) have been completely absent from the python-dev
community for many months now, and I don't think anyone else cares enough to do
the job instead of them.

Regards

Antoine.

Josiah Carlson

unread,
Nov 23, 2008, 4:12:19 AM11/23/08
to Antoine Pitrou, pytho...@python.org
On Sat, Nov 22, 2008 at 4:18 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
> Josiah Carlson <josiah.carlson <at> gmail.com> writes:
>>
>> From what I understand of the memoryview when I tried to do the same
>> thing a few months ago (use memoryview to replace buffer in
>> asyncore/asynchat), memoryview is incomplete. It didn't support
>> character buffer slicing (you know, the 'offset' and 'size' arguments
>> that were in buffer), and at least a handful of other things (that I
>> can't remember at the moment).
>
> You should try again, memoryview now supports slicing (with the usual Python
> syntax, e.g. m[2:5]) as well as slice assignment (with the fairly sensible
> limitation that you can't resize the underlying buffer). There's no real doc for
> it, but you can look at test_memoryview.py in the Lib/test directory to have a
> fairly comprehensive list of the things currently supported.

I meant in the sense of X = memoryview(char_buffer, offset, length).
Post-facto slicing is nice, but a little more wasteful than necessary.

> I also support the addition of official functions or macros to access the
> underlying fields of the Py_buffer struct, rather than access them directly from
> 3rd party code. Someone please open an issue for that in the tracker.
>
> The big, big limitation of memoryviews right now is that they only support
> one-dimensional byte buffers. The people interested in more complex arrangements
> (that is, Scipy/Numpy people) have been completely absent from the python-dev
> community for many months now, and I don't think anyone else cares enough to do
> the job instead of them.

That's unfortunate, as they were the major pushers for memoryview as
it stands today. I'm still thinking about trying to convince people
to add string methods to them (you have your encoded email message in
memory, you chop it and slice it as necessary for viewing...all using
pointers to the one block of memory, which minimizes fragmentation,
memory copies, etc.).

- Josiah

Georg Brandl

unread,
Nov 23, 2008, 5:56:29 AM11/23/08
to pytho...@python.org
Josiah Carlson schrieb:

> On Sat, Nov 22, 2008 at 4:18 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>> Josiah Carlson <josiah.carlson <at> gmail.com> writes:
>>>
>>> From what I understand of the memoryview when I tried to do the same
>>> thing a few months ago (use memoryview to replace buffer in
>>> asyncore/asynchat), memoryview is incomplete. It didn't support
>>> character buffer slicing (you know, the 'offset' and 'size' arguments
>>> that were in buffer), and at least a handful of other things (that I
>>> can't remember at the moment).
>>
>> You should try again, memoryview now supports slicing (with the usual Python
>> syntax, e.g. m[2:5]) as well as slice assignment (with the fairly sensible
>> limitation that you can't resize the underlying buffer). There's no real doc for
>> it, but you can look at test_memoryview.py in the Lib/test directory to have a
>> fairly comprehensive list of the things currently supported.
>
> I meant in the sense of X = memoryview(char_buffer, offset, length).
> Post-facto slicing is nice, but a little more wasteful than necessary.

Why? It's only a view, after all.

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

Nick Coghlan

unread,
Nov 23, 2008, 6:37:16 AM11/23/08
to Josiah Carlson, pytho...@python.org, Antoine Pitrou
Josiah Carlson wrote:
> On Sat, Nov 22, 2008 at 4:18 PM, Antoine Pitrou <soli...@pitrou.net> wrote:
>> The big, big limitation of memoryviews right now is that they only support
>> one-dimensional byte buffers. The people interested in more complex arrangements
>> (that is, Scipy/Numpy people) have been completely absent from the python-dev
>> community for many months now, and I don't think anyone else cares enough to do
>> the job instead of them.
>
> That's unfortunate, as they were the major pushers for memoryview as
> it stands today.

I believe the Scipy/Numpy folks mainly needed the underlying protocol
for describing and sharing chunks of memory (e.g. when mixing the use of
PIL and NumPy in a single program). The memoryview Python object just
provides a basic mechanism to access that protocol from pure Python
code. At this point in time, I would expect significant uses of the
protocol to be largely mediated by extension modules (either existing
ones or new ones) rather than via pure Python code.

I see it as similar to the way extended slicing was originally
introduced without significant support in the builtin types, but still
immediately solved a problem for the NumPy folks due to the existence of
the new protocol.

Cheers,
Nick.


--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia
---------------------------------------------------------------

M.-A. Lemburg

unread,
Nov 25, 2008, 11:56:49 AM11/25/08
to Benjamin Peterson, Python 3000, Nick Coghlan

I find the implementation of the buffer protocol way too complicated.
One of the reasons why the buffer protocol in Python 2 never caught
on was the fact that it was too complicated and the Python 3 is
even worse in this respect.

In practice you do want to have the ability to hook directly into the
data buffer of an object, but apart from some special needs that PIL
and the numeric folks may have, most users will just want to work
with a single contiguous chunk of memory and need a simple API to
do this - pass in an object, get a void* back.

With the new interface, programmers will have to deal with an
PyObject_GetBuffer() API having 17 modification flags in order
to deal with many different corner cases and returning a Py_buffer
C struct with another 10 elements.

http://docs.python.org/dev/3.0/c-api/buffer.html#PyObject_GetBuffer

Can we please get something simple like PyObject_AsReadBuffer() back
into Python 3 ?

http://docs.python.org/c-api/objbuffer.html

(and ideally, this should also work on memoryview objects)

Thanks,
--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 25 2008)


>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2008-11-12: Released mxODBC.Connect 0.9.3 http://python.egenix.com/

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Stefan Behnel

unread,
Nov 27, 2008, 5:03:26 AM11/27/08
to pytho...@python.org
M.-A. Lemburg wrote:
> I find the implementation of the buffer protocol way too complicated.
> One of the reasons why the buffer protocol in Python 2 never caught
> on was the fact that it was too complicated and the Python 3 is
> even worse in this respect.
>
> In practice you do want to have the ability to hook directly into the
> data buffer of an object, but apart from some special needs that PIL
> and the numeric folks may have, most users will just want to work
> with a single contiguous chunk of memory and need a simple API to
> do this - pass in an object, get a void* back.

Cython makes it that easy to access a buffer (also in Python 2.3-2.5, BTW).
You only have to declare the type of a buffer variable.

http://wiki.cython.org/enhancements/buffer

According to what I hear, at least the NumPy developers make use of this
already. No idea how common it is in the PIL area, but it does work there, too.

Stefan

Reply all
Reply to author
Forward
0 new messages