Accessing PIL data efficiently in Cython

1,232 views
Skip to first unread message

Joe Kilner

unread,
Feb 20, 2015, 7:23:51 AM2/20/15
to cython...@googlegroups.com
Hi everyone,

I am 18 hours into exploring Cython and trying to use it to speed up some Image processing I am doing. (Please note, I do not want to introduce a dependency on numpy, the whole reason I am interested in Cython is that it avoids needing to worry about distributing numpy alongside the tool I'm building).

Basically, is there a nice way to get at the underlying buffer representation of an image from PIL in Cython that avoids copying memory and ideally gives me [i, j] indexing to get a pixel value

I have two images - one input and one output. I had some code that looked something like this:

    cdef c_array.array in_array = array('i', in_image.getdata(0))
    cdef int* in_val = in_array.data.as_ints
    cdef c_array.array out_array = array('i', [default] * len(in_array))
    cdef int* out = out_array.data.as_ints

    pixel = in_val[x + (y * width)]

However it would seem to me that this is copying the data and also just looks like the wrong way to do things. I feel that something like the following should work:

    cdef int [:, :] in_memoryview = in_image
    cdef int [:, :] out_memoryview = out_image

as everything talking about the "buffer interface" says that "PIL uses it extensively" but without any associated code examples to show how you get to the buffer. Also nothing on whether it is possible to get a writeable memoryview on an image or whether I need to construct a writeable buffer and then map a PIL image on to that.

As I said, I'm just starting out with Cython, so maybe I've got my terminology wrong and am asking the wrong question. But does anyone know what specific piece of voodoo I am asking for and how to get it?

Thanks!

Cem Karan

unread,
Feb 20, 2015, 8:40:36 AM2/20/15
to cython...@googlegroups.com
From your use of array above, I'm guessing you're using Python's array.array (https://docs.python.org/3/library/array.html), correct? Try taking a look at the buffer protocol (https://docs.python.org/3/c-api/buffer.html) which is what I suspect they are talking about.

...in fact, here is something that might be what you're looking for! https://docs.python.org/3/c-api/buffer.html#pil-style-shape-strides-and-suboffsets

Hope that helps,
Cem Karan

Joe Kilner

unread,
Feb 20, 2015, 9:57:49 AM2/20/15
to cython...@googlegroups.com
That is exactly what I'm getting at. I got it work by using array.array as an intermediate step, but I can't find a way in PIL to get hold of an object that actually supports the buffer protocol, just code that tells me that it exists and that tell me what to do with it when I have it. I am sure that I've just not "got" something, but as there are no code examples that start with img = Image.load("x.png") and end with a buffer I have no idea what the intermediate steps are. I have tried various sensible looking constructs, searching the Pillow codebase for Py_buffer and can't find anything that even looks like the right place to start digging....
 

Cem Karan

unread,
Feb 20, 2015, 12:14:37 PM2/20/15
to cython...@googlegroups.com
Have you tried asking this on any PIL-related mailing lists? At this point, I haven't a clue as to what to do either...

Thanks,
Cem Karan

Sturla Molden

unread,
Feb 20, 2015, 12:46:26 PM2/20/15
to cython...@googlegroups.com
On 20/02/15 13:23, Joe Kilner wrote:

> cdef int [:, :] in_memoryview = in_image
> cdef int [:, :] out_memoryview = out_image
>
> as everything talking about the "buffer interface" says that "PIL uses
> it extensively" but without any associated code examples to show how you
> get to the buffer.

It is the old buffer interface. Try numpy.frombuffer.

PIL is basically abandonware. Use packages like scikit-image, tifffile,
OpenCV and imageio instead.


Sturla

Stefan Behnel

unread,
Feb 20, 2015, 12:52:38 PM2/20/15
to cython...@googlegroups.com
Sturla Molden schrieb am 20.02.2015 um 18:46:
> PIL is basically abandonware.

Well, there's Pillow:

https://pypi.python.org/pypi/Pillow

Not sure if it supports the new buffer protocol, but it definitely runs on
Py3.x, which no longer has the old one. So it's worth at least a look.

Stefan

Sturla Molden

unread,
Feb 20, 2015, 2:22:35 PM2/20/15
to cython...@googlegroups.com
On 20/02/15 18:52, Stefan Behnel wrote:
>> PIL is basically abandonware.
>
> Well, there's Pillow:
>
> https://pypi.python.org/pypi/Pillow
>
> Not sure if it supports the new buffer protocol, but it definitely runs on
> Py3.x, which no longer has the old one. So it's worth at least a look.

If it has the new buffer protocol it probably uses the suboffsets
feature, which was added to support PIL. I don't think suboffsets will
play very nicely with typed memoryviews. :-(


Sturla





Joe Kilner

unread,
Feb 21, 2015, 6:59:20 PM2/21/15
to cython...@googlegroups.com


On Friday, February 20, 2015 at 7:22:35 PM UTC, Sturla Molden wrote:
On 20/02/15 18:52, Stefan Behnel wrote:
>> PIL is basically abandonware.
>
> Well, there's Pillow:
>
> https://pypi.python.org/pypi/Pillow 
 
Thanks for all your replies. I think it is clear that what looked like it should be a simple and common operation is actually a bit of an edge case that no one else is doing which I guess explains why there are no examples...
 
I should also have mentioned that I am actually using Pillow rather than PIL (which as you say seems to have been abandoned). Also not depending on Numpy is a must (I already have a Numpy based solution, I am looking for an alternative that avoids the distribution complications that Numpy introduces on Windows systems) and I am working with PNGs, and PSDS mainly (so tifffile is not great).

For me the point of using Cython was that it looked like it would allow me to remove the dependency on Numpy/Scipy from my original code (which uses scipy.weave to implement the algorithm in C++). I guess a better tactic would be to look at the Pillow source, see how to use their C++ API and see if I can access that directly from Cython. Or go with my original solution and live with the copying of the data.

I'll also try and see if there is a Pillow group to ask on, if I get any good answers I'll let you know.

Thanks again for the help.

Sturla Molden

unread,
Feb 22, 2015, 12:56:17 PM2/22/15
to cython...@googlegroups.com
I took a quick look at the Pillow source code. The image class exports
an __array_interface__ to NumPy so you can do

from PIL import Image
import numpy as np

npim = np.array(Image.open("foobar.jpg"))

But the attribute lookup for __array_interface__ actually returns a copy
of the image by calling the .asbytes() method, it does not return the
internal PIL buffer. To further process the image with pillow you must
use the PIL.Image.fromarray function to create a new image object.

You can access the NumPy array you create like you normally would in
Cython. Here is an example:

http://nbviewer.ipython.org/urls/dl.dropboxusercontent.com/u/12464039/lenna.ipynb


Sturla
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cython-users...@googlegroups.com
> <mailto:cython-users...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.


Sturla Molden

unread,
Feb 22, 2015, 1:08:00 PM2/22/15
to cython...@googlegroups.com
On 20/02/15 13:23, Joe Kilner wrote:


> Basically, is there a nice way to get at the underlying buffer
> representation of an image from PIL in Cython that avoids copying memory

From the pillow sources it appears the answer is "no".


Sturla

Stefan Behnel

unread,
Feb 22, 2015, 1:21:52 PM2/22/15
to cython...@googlegroups.com
Then I suggest that someone takes the bit of time to change that. Getting
something really simple running that just exports a 2D 32bit int buffer (or
whatever Pillow uses internally by default) should be doable within hours.

Stefan

Chris Barker - NOAA Federal

unread,
Feb 23, 2015, 8:33:27 PM2/23/15
to cython...@googlegroups.com
You might look at ( or use ) the numpy asarray() functionality. I'm
pretty sure there is code in PIL that exposes an image as a buffer
numpy can use. You should be able to do the same thing without numpy.

-CHB
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.

Chris Barker - NOAA Federal

unread,
Feb 23, 2015, 8:50:10 PM2/23/15
to cython...@googlegroups.com
Sorry -- my email got out of sync -- clearly you've already explored that...

Sturla Molden

unread,
Feb 23, 2015, 9:19:55 PM2/23/15
to cython...@googlegroups.com
On 24/02/15 02:33, Chris Barker - NOAA Federal wrote:

> You might look at ( or use ) the numpy asarray() functionality. I'm
> pretty sure there is code in PIL that exposes an image as a buffer
> numpy can use. You should be able to do the same thing without numpy.

Nope.

First Pillow (and PIL) makes a bytes object, then copies the internal
buffer into this bytes object. This bytes object is then exposed as a
read-only array-like object with NumPy's __array_interface__ protocol.

Since a bytes object is immutable, if you create NumPy array with
numpy.asarray() it will be flagged read-only. Then Cython will raise a
ValueError if you try to create a typed memoryview from it. You must
therefore let NumPy take a copy of the buffer exposed by Pillow.

What Cython sees is hence a copy of a copy.

You can look in the Pillow source yourself if you don't believe me. It
is on line 618 and 642:

https://github.com/python-pillow/Pillow/blob/master/PIL/Image.py

And if you want to confirm for yourself that Image.tobytes() actually
makes a copy of the internal buffer the machinery is in this C file:

https://github.com/python-pillow/Pillow/blob/master/libImaging/RawEncode.c

And on line 128 in this C file:

https://github.com/python-pillow/Pillow/blob/22862cee000e55a48fafc8dfbcd762fb423678a1/encode.c



Sturla

Sturla Molden

unread,
Feb 24, 2015, 10:39:05 AM2/24/15
to cython...@googlegroups.com
On 22/02/15 19:21, Stefan Behnel wrote:

> Then I suggest that someone takes the bit of time to change that. Getting
> something really simple running that just exports a 2D 32bit int buffer (or
> whatever Pillow uses internally by default) should be doable within hours.

Since Image is a Python class one would have to make a C function in
libImaging that would create a memoryview instead of a copy as
RawEncoder.c does. It would only be doable if Pillow internally stores
the image as a contiguous buffer (I am not sure). Otherwise one could
e.g. use a bytesarray to skip the intermediate bytes object and at least
skip one of the copies. The Image class (as it is a Python class) would
need to have a method that would return this memoryview as a memoryview
cannot be subclassed. Unfortunately it is not possible for a Python
class to implement the buffer protocol. The __array_interface__
attribute would also have to changed to be filled in with data from the
memoryview.

https://github.com/python-pillow/Pillow/blob/master/PIL/Image.py

https://github.com/python-pillow/Pillow/blob/22862cee000e55a48fafc8dfbcd762fb423678a1/encode.c


https://github.com/python-pillow/Pillow/blob/master/libImaging/RawEncode.c


Sturla


Stefan Behnel

unread,
Feb 24, 2015, 11:27:17 AM2/24/15
to cython...@googlegroups.com
The internal memory layout is defined in the main header file of the C
library. If you can get hold of a "ImagingMemoryInstance" pointer somehow,
you can just wrap it in a memory view yourself, even from Cython code.

https://github.com/python-pillow/Pillow/blob/9b8202203aa5bd85ec89152c3d596ec8d8a56684/libImaging/Imaging.h#L78

Stefan

Joe Kilner

unread,
Feb 26, 2015, 3:16:38 PM2/26/15
to cython...@googlegroups.com, stef...@behnel.de


On Tuesday, February 24, 2015 at 4:27:17 PM UTC, Stefan Behnel wrote:

The internal memory layout is defined in the main header file of the C
library. If you can get hold of a "ImagingMemoryInstance" pointer somehow,
you can just wrap it in a memory view yourself, even from Cython code.

https://github.com/python-pillow/Pillow/blob/9b8202203aa5bd85ec89152c3d596ec8d8a56684/libImaging/Imaging.h#L78

Stefan


I managed to find a quick hack that does exactly what I need. Obviously I'd still prefer a nicer Cythonic way to do this, but as I've only just started using Cython and never touched any C extensions before there is a bit of learning for me to do before I can start writing my own plumbing to do this.

Anyway, the PyAccess module in Pillow uses CFFI to access the pixel buffers directly (apparently this is fast in PyPy). I just ported a part of what it does to Cython to directly access the buffers:

    in_image.load()
    ptr_val = dict(in_image.im.unsafe_ptrs)['image8']
    cdef unsigned char** in_val = (<unsigned char**>ptr_val)

    output = Image.new("L", image_size, 0)
    output.load()
    ptr_val = dict(output.im.unsafe_ptrs)['image8']
    cdef unsigned char** out_val = (<unsigned char**>ptr_val)

obviously this is unsafe pointer access and only works with images that are of the "L" mode, but in my case that's fine. And it's worth it; my original python implementation takes a couple of minutes to run, the optimised parallelised cython code runs in less than a second!
 
Reply all
Reply to author
Forward
0 new messages