Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to get bytes from bytearray without copying

600 views
Skip to first unread message

Juraj Ivančić

unread,
Mar 2, 2014, 7:07:27 PM3/2/14
to pytho...@python.org
Is it possible to somehow 'steal' bytearray's buffer and make it a
read-only bytes? I failed to find a way to do this, and would like to
make sure.

My use case is, I would expect, fairly common. I read a certain
(potentially very large) amount of data from the network into a
pre-allocated bytearray. From that point on, this data is logically
read-only. To prevent making redundant copies, I wrap it in a
memoryview, and then slice and dice it. The problem with this memoryview
is that it, and its slices, are considered writable, and thus cannot be
hashed:

ValueError: cannot hash writable memoryview object

The only way (AFAICT) to make this work is to first create a bytes
object from bytearray, but this copies the data. I don't need this copy,
so I'd like to avoid it, because of both principle and performance reasons.

Is there any reason why bytearray isn't able to release/convert its
buffer to bytes? I see that it has a clear() method which... well...
clears it. The former would be much more useful.

I would also be content if there is some way of making memoryview
artificially read-only to avoid the above error.

Any help/thoughts/comments are highly appreciated.

Cameron Simpson

unread,
Mar 2, 2014, 7:44:58 PM3/2/14
to pytho...@python.org
On 03Mar2014 01:07, Juraj Ivančić <juraj....@gmail.com> wrote:
> Is it possible to somehow 'steal' bytearray's buffer and make it a
> read-only bytes? I failed to find a way to do this, and would like
> to make sure.
>
> My use case is, I would expect, fairly common. I read a certain
> (potentially very large) amount of data from the network into a
> pre-allocated bytearray. From that point on, this data is logically
> read-only. To prevent making redundant copies, I wrap it in a
> memoryview, and then slice and dice it. The problem with this
> memoryview is that it, and its slices, are considered writable, and
> thus cannot be hashed:
>
> ValueError: cannot hash writable memoryview object

Have you considered subclassing memoryview and giving the subclass
a __hash__ method?

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Mountain rescue teams insist the all climbers wear helmets, and fall haedfirst.
They are then impacted into a small globular mass easily stowed in a rucsac.
- Tom Patey, who didnt, and wasnt

Mark Lawrence

unread,
Mar 2, 2014, 7:49:28 PM3/2/14
to pytho...@python.org
On 03/03/2014 00:07, Juraj Ivančić wrote:
> Is it possible to somehow 'steal' bytearray's buffer and make it a
> read-only bytes? I failed to find a way to do this, and would like to
> make sure.
>
> My use case is, I would expect, fairly common. I read a certain
> (potentially very large) amount of data from the network into a
> pre-allocated bytearray. From that point on, this data is logically
> read-only. To prevent making redundant copies, I wrap it in a
> memoryview, and then slice and dice it. The problem with this memoryview
> is that it, and its slices, are considered writable, and thus cannot be
> hashed:
>
> ValueError: cannot hash writable memoryview object
>
> The only way (AFAICT) to make this work is to first create a bytes
> object from bytearray, but this copies the data. I don't need this copy,
> so I'd like to avoid it, because of both principle and performance reasons.
>
> Is there any reason why bytearray isn't able to release/convert its
> buffer to bytes? I see that it has a clear() method which... well...
> clears it. The former would be much more useful.
>
> I would also be content if there is some way of making memoryview
> artificially read-only to avoid the above error.
>
> Any help/thoughts/comments are highly appreciated.
>

If your data is readonly why can't you simply read it as bytes in the
first place? Failing that from
http://docs.python.org/3/library/stdtypes.html#memoryview

tobytes() - Return the data in the buffer as a bytestring. This is
equivalent to calling the bytes constructor on the memoryview.

>>> m = memoryview(b"abc")
>>> m.tobytes()
b'abc'
>>> bytes(m)
b'abc'

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com


Ian Kelly

unread,
Mar 2, 2014, 7:55:48 PM3/2/14
to Python
On Sun, Mar 2, 2014 at 5:44 PM, Cameron Simpson <c...@zip.com.au> wrote:
> Have you considered subclassing memoryview and giving the subclass
> a __hash__ method?

>>> class MyMemoryView(memoryview):
... def __hash__(self): return 42
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: type 'memoryview' is not an acceptable base type

Ian Kelly

unread,
Mar 2, 2014, 8:27:40 PM3/2/14
to Python
On Sun, Mar 2, 2014 at 5:07 PM, Juraj Ivančić <juraj....@gmail.com> wrote:
> Is it possible to somehow 'steal' bytearray's buffer and make it a read-only
> bytes? I failed to find a way to do this, and would like to make sure.
>
> My use case is, I would expect, fairly common. I read a certain (potentially
> very large) amount of data from the network into a pre-allocated bytearray.
> From that point on, this data is logically read-only. To prevent making
> redundant copies, I wrap it in a memoryview, and then slice and dice it. The
> problem with this memoryview is that it, and its slices, are considered
> writable, and thus cannot be hashed:
>
> ValueError: cannot hash writable memoryview object
>
> The only way (AFAICT) to make this work is to first create a bytes object
> from bytearray, but this copies the data. I don't need this copy, so I'd
> like to avoid it, because of both principle and performance reasons.
>
> Is there any reason why bytearray isn't able to release/convert its buffer
> to bytes? I see that it has a clear() method which... well... clears it. The
> former would be much more useful.
>
> I would also be content if there is some way of making memoryview
> artificially read-only to avoid the above error.

Python 3.3 has a C API function to create a memoryview for a char*,
that can be made read-only.

http://docs.python.org/3/c-api/memoryview.html#PyMemoryView_FromMemory

I don't see a way to do what you want in pure Python, apart from
perhaps writing an elaborate proxy class that would just be a poor
man's memoryview. Or you could bite the bullet and copy everything
once at the start to create a bytes object, and then never have to
worry about it again.

Cameron Simpson

unread,
Mar 2, 2014, 8:47:58 PM3/2/14
to Python
Ah. The slices were going to be an issue too, anyway.

He could write a wrapper class with a __hash__ method, whose slices
themselves are also the wrapper class.

It raises the implementation bar only slightly.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Please do not send me Microsoft Word files.
http://en.nothingisreal.com/wiki/Please_don't_send_me_Microsoft_Word_documents

Juraj Ivančić

unread,
Mar 3, 2014, 3:15:00 AM3/3/14
to pytho...@python.org
On 3.3.2014. 1:44, Cameron Simpson wrote:

>> ValueError: cannot hash writable memoryview object
>
> Have you considered subclassing memoryview and giving the subclass
> a __hash__ method?

I have, and then, when I failed to subclass it, I considered doing
aggregation, and make it behave byte-like. But how to implement the
overridden __hash__ method? It will still require at least *some*
redundant copying. And there is the slicing thing... the whole idea
started to feel like I was performing tonsillectomy through the anal cavity.


Juraj Ivančić

unread,
Mar 3, 2014, 3:52:25 AM3/3/14
to pytho...@python.org
On 3.3.2014. 1:49, Mark Lawrence wrote:

> If your data is readonly why can't you simply read it as bytes in the
> first place? Failing that from
> http://docs.python.org/3/library/stdtypes.html#memoryview
>
> tobytes() - Return the data in the buffer as a bytestring. This is
> equivalent to calling the bytes constructor on the memoryview.
>
> >>> m = memoryview(b"abc")
> >>> m.tobytes()
> b'abc'
> >>> bytes(m)
> b'abc'

Initially it has to be a bytearray because I read this data from a
socket. My point is that once I have a bytearray x, then

m = memoryview(bytes(x))

is a very expensive way to make a read-only memoryview, opposed to

m = memoryview(x)

or (fictional)

m = memoryview(x, force_readonly=True)

especially if the x-es are many, large, and occur often.

I feel like memoryview's __hash__ is trying to be to smart for its own
good, and that it should just return the damn hash like its name
suggests, regardless of the value of 'writable' flag.






Juraj Ivančić

unread,
Mar 3, 2014, 4:09:56 AM3/3/14
to pytho...@python.org
On 3.3.2014. 2:27, Ian Kelly wrote:

> Python 3.3 has a C API function to create a memoryview for a char*,
> that can be made read-only.
>
> http://docs.python.org/3/c-api/memoryview.html#PyMemoryView_FromMemory

Yes, this is probably what I'll do in absence of pure Python solution.
Thanks for the tip.

> Or you could bite the bullet and copy everything
> once at the start to create a bytes object, and then never have to
> worry about it again.

That would be a surrender :-)

Cameron Simpson

unread,
Mar 3, 2014, 5:02:07 PM3/3/14
to pytho...@python.org
Write a wrapper class instead and use:

def __hash__(self):
return id(self)

Simple and fast. Unless you need slices with the same content to
hash the same (eg storing them as dict keys, or in sets).

And alternative would be a simple hash of the first few bytes in
whatever slice you had.

Cheers,
--
Cameron Simpson <c...@zip.com.au>

Why is it so hard for people to simply leave people alone? But, the answer
comes to me: they are idiots and in a perfect world, I would be permitted to
kill them all. - Julie Rhodes <jk.r...@asacomp.com>

Juraj Ivančić

unread,
Mar 4, 2014, 10:23:32 AM3/4/14
to pytho...@python.org
On 3.3.2014. 2:27, Ian Kelly wrote:

> Python 3.3 has a C API function to create a memoryview for a char*,
> that can be made read-only.
>
> http://docs.python.org/3/c-api/memoryview.html#PyMemoryView_FromMemory
>
> I don't see a way to do what you want in pure Python, apart from
> perhaps writing an elaborate proxy class that would just be a poor
> man's memoryview. Or you could bite the bullet and copy everything
> once at the start to create a bytes object, and then never have to
> worry about it again.

Just for reference, it is doable in pure Python, with ctypes help:

pydll = ctypes.cdll.LoadLibrary("python{}{}".format(
sys.version_info.major, sys.version_info.minor))

def ro_memoryview_from_bytearray(buffer):
assert isinstance(buffer, bytearray)
ptr = ctypes.c_char_p(pydll.PyByteArray_AsString(
ctypes.py_object(buffer)))
mv_id = pydll.PyMemoryView_FromMemory(ptr, len(buffer), 0)
return ctypes.cast(mv_id, py_object).value

Note that this is just the jist, in real code I added safeguards to
prevent misuse of the (temporary) memoryview.


Stefan Behnel

unread,
Mar 4, 2014, 11:19:52 AM3/4/14
to pytho...@python.org
Juraj Ivančić, 04.03.2014 16:23:
> Just for reference, it is doable in pure Python, with ctypes help

For some questionable meaning of "pure".

Stefan


0 new messages