Go vs PIL performance

5,260 views
Skip to first unread message

Alexei Sholik

unread,
Sep 6, 2012, 8:21:46 AM9/6/12
to golan...@googlegroups.com
Hi.

I've recently done a performance comparison between Go's image package and Python Imaging Library by porting a simple Python script that reads an RGBA image from PNG and converts it to grayscale.

PIL provides a higher level interface than Go's image package. In Python, the conversion is performed like this:

    image = Image.open(filename)
    assert image.mode == 'RGBA'
    new_image = image.convert('L')


In Go, I tried two alternative approaches:

    1) use image/draw.Draw()
    2) copy each pixel value in a loop

Here's a glimpse at the timings:

    $ python to_grayscale.py
    Image open took 2.87699699402 ms
    Converting the image took 3.58319282532 ms
    Image save took 24.7330665588 ms
    ---
    Total time: 31.1932563782 ms

    $ go run to_grayscale.go
    Image decode took 17 ms
    Creating an empty image took 0 ms
    Converting pixels took 21 ms
    Image save took 31 ms
    ---
    Total time: 69 ms
    
So far, the timings Go is showing are worse than I expected. Here's a repo with the code for both versions and a test image -> https://github.com/alco/go_pil

Is there a more performant way to do this in Go? I should say that the package docs lack in examples badly. There was a post about image/draw[1], in it they also used the Draw function from image/draw.

I'm not sure if Python's time() function offers enough precision (under OS X), but it's the best one I've found.

---

On a side note, there's been another thread on the list[2] where it turned out that image/png could be optimized quite easily, but no one had bothered to do this before.

I'm curious to hear your opinions about practicability of repeating the develop-debug-optimize cycle all over again for the functionality that already exists in other languages. PIL has been around since 1995, its authors had plenty of time to optimize it and implement most of the commonly used features. And for PNG decoding there's libpng and other more recent libraries. Both are written in C and have been field-tested for years.

What's the reasoning behing reimplementing this in Go. Language purity? Security? I'll be happy to hear your opinions, guys.

Thanks!

---

Rob Pike

unread,
Sep 6, 2012, 11:30:16 AM9/6/12
to Alexei Sholik, golan...@googlegroups.com
Your code uses the Image interface and its At method, which is a
general-purpose interface with much overhead. For higher throughput,
it should use the implementation directly, if it's known, such as
NRGBA or whatever. That will make a big difference. Also, you're right
that there's much work to do in making the image libraries better.

Your "why bother?" question is a good one. There are many answers.

First, with cgo one could use PIL or some other library of course, and
there are situations where that approach is right. Go does not prevent
you using a library in another language.

The main reason for doing things again is to make them Go native,
native in the sense of fitting into the ecosystem well, building on
the libraries and interfaces that are already there, and feeding ideas
and insights back into the libraries and community.

But of course there's also the "why not?" retort: If one just accepts
that everything that needs to exist already does in its finest form,
there's no reason to do anything new (including designing a new
programming language). But in doing things again, one learns.
Designing a native package provides insight into how things should be
put together. For the particular problem you're working on,
implementing things that require performance encourages
performance-driven work. If every time a loop is slow we use an FFI
and the C compiler, we'll never find out where the Go compilers (and
libraries and runtime) could be made better in ways that would make
all programs faster.

Go's implementation is getting faster all the time. We've seen massive
improvements this year; relative to Go 1 the implementation at tip is
seeing benchmarks that are often 50% or more faster, sometimes much
more. We'll be releasing those improvements soon.

In the meantime, please keep pushing and poking at the code and giving
feedback and learning how to make things run fast. It benefits all of
us.

-rob

MarkM

unread,
Sep 6, 2012, 12:13:08 PM9/6/12
to golan...@googlegroups.com
Well, on the "Why bother" front. I compiled the Go version of your converter on my Ubuntu box and uploaded it to our old POS RHEL4 box sitting in the corner. It ran fine.

But the python script:

Traceback (most recent call last):
  File "to_grayscale.py", line 4, in ?
    from PIL import Image
ImportError: No module named PIL

 
I'd really rather not have to try to install the needed deps on a machine that was built in 2008, isn't supported anymore, and no one at the company knows how anything built on it works. It's one of those: you back it up, don't touch it and hope it doesn't crash before it gets phased out things.

Deployment and running code in the wild is a big big problem for us system guys. Go is a dream for that. It's a daily pain for me that some of the software I depend on, like Puppet which is written in Ruby, I just can't use on certain systems because I can't screw around with adding the needed dependencies in fear of breaking something else.

And as a developer wouldn't it be nice to just be able to ship 32 bit / 64 bit binaries and not have to deal with "Oh, you need to install X, plus I'm not sure it'll work with Python version Y, and yeah on Linux version Z they split that library into A and B so you'll need both".

MarkM

unread,
Sep 6, 2012, 1:22:28 PM9/6/12
to golan...@googlegroups.com
Eh, also the Python version on my machine isn't changing the image to grayscale.



So assuming it's working okay on your machine, then something is screwy on mine. Python version, image library version?? It doesn't throw any errors when it runs.

mmealman@marklaptop:~/src/go_pil$ python to_grayscale.py
Image open took 2.14600563049 ms
Converting the image took 4.40096855164 ms
Image save took 22.7069854736 ms
---
Total time: 29.2539596558 ms

mmealman@marklaptop:~/src/go_pil$ md5sum py_output.png
34bf25217bcb64bff2c68e14e23f4d1c  py_output.png

mmealman@marklaptop:~/src/go_pil$ ls -l *png
-rw-rw-r-- 1 mmealman mmealman 16323 Sep  6 13:08 go_output.png
-rw-rw-r-- 1 mmealman mmealman 38195 Sep  6 11:44 image.png
-rw-rw-r-- 1 mmealman mmealman 38332 Sep  6 13:18 py_output.png


This reminds me of the joke at my last job which was a Ruby shop. We'd always pass around the "Works on my machine" image when someone complained about something being broken.

Alexei Sholik

unread,
Sep 6, 2012, 2:56:57 PM9/6/12
to MarkM, r...@golang.org, golan...@googlegroups.com
Thanks for sharing your thoughts, Rob. As for using a a direct pixel access instead of the At() function, I hadn't found a way to do that. But then, after reading your reply I realized that I could simply look at the image/draw code and see how it handles the fast paths. I'm guilty to not have thought about it before, but now I understand your point about learning and making the code better more clearly. Thanks.

Mark, it's a silly bug on my part. I was saving the original image instead of the grayscale one. Now image.save works even faster. Thanks for pointing this out.

I've updated the code and timings over at https://github.com/alco/go_pil. Here are the new timings from my machine. Now it is clear were Go performs well and where it still lacks proper optimizations. PIL doesn't start decoding the image until it needs pixel data, so I'd say that it still outperforms Go significantly when decoding and encoding PNG. But accessing pixel data in Go is now as fast as it should be.

    $ go run to_grayscale.go
    Image decode took 17.162 ms
    Creating an empty image took 0.049 ms
    Converting pixels took 0.609 ms
    Image save took 35.479 ms
    ---
    Total time: 53.299 ms

    $ python to_grayscale.py 
    Image open took 2.86508 ms
    Converting the image took 5.26881 ms
    Image save took 7.88808 ms
    ---
    Total time: 16.022 ms


--
Best regards
Alexei Sholik
Message has been deleted

Nigel Tao

unread,
Sep 6, 2012, 8:45:16 PM9/6/12
to Alexei Sholik, golan...@googlegroups.com
On 6 September 2012 22:21, Alexei Sholik <alcos...@gmail.com> wrote:
> So far, the timings Go is showing are worse than I expected. Here's a repo
> with the code for both versions and a test image ->
> https://github.com/alco/go_pil

The PNG decoder is significantly faster in Go tip when compared to Go 1.0.2:
http://codereview.appspot.com/6127051
http://codereview.appspot.com/6127064/
http://codereview.appspot.com/6242056/
http://codereview.appspot.com/6251044/
I haven't measured the net effect but I imagine that it's around 2x
faster. There may be even more improvements possible. The next step is
probably to optimize Go's flate performance, whether in the
compress/flate/*.go code or in the compiler, but I didn't have time to
get around to doing either.

The PNG encoder has not been optimized yet, and I wouldn't be
surprised if there was similar low-hanging performance fruit.

As for writing the explicit loop versus using image/draw, I would
rather that you used image/draw. That package has some code fast-paths
for a number of common operations (such as blitting RGBAs over each
other), but it currently does not have a fast path to convert to an
image.Gray. That's simply a bug.

BTW, your explicit loop has an image.Gray dst and image.NRGBA src and does:
dst.Pix[i] = rgba.Pix[i*4 + 3]
which set's the destination gray value to the source's alpha value.
This translates both fully opaque black and fully opaque white to 100%
white, which might 'work' for your test image, but seems incorrect in
general.

Your saveImage function writes directly to a file. You may or may not
get better numbers if you wrap that file in a bufio.Writer (don't
forget to Flush it), to avoid lots of little writes to the file
system. This is arguably a bug in package image/png.

As for why not use libpng, others have already given some reasons, but
another is that Go is a memory-safe language: slice accesses are
bounds-checked. A native Go PNG decoder is not as susceptible to
buffer overflow attacks as libpng. Even though libpng has had over 15
years of develop-debug-optimize cycles, I note that
http://libpng.org/pub/png/libpng.html has still issued two "serious
vulnerability" warnings in this year alone about "the possibility of
execution of hostile code".

Alexei Sholik

unread,
Sep 7, 2012, 1:37:16 AM9/7/12
to Nigel Tao, je...@tomahawk-player.org, golan...@googlegroups.com
> Can you tell us which interpreter (Python) and compiler (Go) you used?

> For instance, Python via PyPy is likely to be significantly faster
> than Python via CPython.

> Likewise, Go compiled with gccgo is likely to be significantly faster
> than Go compiled with its own compiler.

Jeff, I don't think PyPy would have significant effect on the performance. Most of the Image module is C already. As for Go, I used its bundled compiler (I believe it's 6g), that's what most of the users should care about.

Nigel, thanks for your comments, I will also take a look at the tip for any future comparisons. You are right that my conversion algorithm works only for this specific case. Doing a proper rgba -> grayscale transformation is a little slower, but still an order of magnitude faster than image/draw. Although I would also like to use image/draw as a higher level approach, it can't have a fast path for every possible operation a user wants. If I used it blindly and didn't compare performance, it could bite me later like in a running production system.

> Your saveImage function writes directly to a file. You may or may not
> get better numbers if you wrap that file in a bufio.Writer (don't
> forget to Flush it), to avoid lots of little writes to the file
> system. This is arguably a bug in package image/png.

This is an interesting point. Wrapping the file in a bufio.Writer doesn't improve the timings for me (even when initializing the writer outside of the timed section). I think this is a difficult thing to measure, because the HDD has a cache, the OS can have its own buffer, and the File object itself might have a buffer. The docs say "File represents an open file descriptor", and most OSes don't guarantee immediate writes to disk, so Go could as well choose to buffer writes on its side if it made any sense. All in all, this should have a net effect of far less performance hit than the encoding algorithm itself.

Alexei Sholik

unread,
Sep 7, 2012, 4:01:27 PM9/7/12
to Matt Harden, golan...@googlegroups.com, MarkM, r...@golang.org
You are right, Matt. This has already been mentioned here, but thanks for bringing this up again. I have fixed the algorithm and did another benchmark using Go tip.

I've also mentioned that PIL doesn't decode the image data until it's needed. I have now updated the python version to benchmark the decode step separately. So now everything fits into a nice table:

+---------------------------------------------+------------+
|                   | Python, ms | Go, ms     | Go tip, ms |
+===================+============+============+============+
| Image decode      |       3.13 |      15.61 |       6.81 |
| RGBA -> Grayscale |       0.88 |       2.70 |       1.54 |
| Image save        |       6.55 |      33.89 |      24.26 |
+-------------------+------------+------------+------------+
| Total             |      10.56 |      57.50 |      32.65 |
+---------------------------------------------+------------+

See updated repository (and properly formatted table) at https://github.com/alco/go_pil

On 7 September 2012 17:00, Matt Harden <matt....@gmail.com> wrote:
Your convertLoop is naively taking the alpha channel from the NRGBA pixels and setting that as the grayscale value. PIL and Go's draw.Draw use the ITU-R 601-2 luma transform:
    L = R * 299/1000 + G * 587/1000 + B * 114/1000
And since the RGB data is not alpha premultiplied you need to multiply by A as well. But even with that transform I suspect the Go code can be faster for the conversion step than PIL.

Alexei Sholik

unread,
Sep 7, 2012, 4:53:58 PM9/7/12
to Matt Harden, golan...@googlegroups.com, MarkM, r...@golang.org
The writer.go for image/png certainly has space for improvements. After a brief look at the code I found that it is using the slow At() method in its innermost loops.

Take a look at this loop http://golang.org/src/pkg/image/png/writer.go#L305. When I define the following variable above, before the outermost loop

    grayImg := m.(*image.Gray)

and replace the code at the linked line with

    cr[0][i] = grayImg.Pix[y * grayImg.Stride + x]

I get from 15 ms down to 0.5 ms on average for this particular section of code. The test image is the same one I've been using all this time. 

Obviously, implementing similar optimizations for all cases will make the code less general and more special-casey, but I think this is the way to go and it will have to happen eventually.

Nigel Tao

unread,
Sep 10, 2012, 1:42:32 AM9/10/12
to Alexei Sholik, Matt Harden, golan...@googlegroups.com, MarkM, r...@golang.org
On 8 September 2012 06:53, Alexei Sholik <alcos...@gmail.com> wrote:
> I get from 15 ms down to 0.5 ms on average for this particular section of
> code. The test image is the same one I've been using all this time.

15ms down to 0.5ms sounds almost too good to be true. Does "go test
image/png" pass after that change?

Even so, it's easy to add an image.Gray fast-path to PNG encoding.
With http://codereview.appspot.com/6490099, I get

Before:
--------
$ go run to_grayscale.go
Image decode took 9.113 ms
Creating an empty image took 0.008 ms
Drawing the image took 33.127 ms
Image save took 26.65 ms
---
Total time: 68.898 ms

Image decode took 7.342 ms
Creating an empty image took 0.038 ms
Converting pixels took 1.804 ms
Image save took 27.393 ms
---
Total time: 36.577 ms
--------

After:
--------
$ go run to_grayscale.go
Image decode took 7.198 ms
Creating an empty image took 0.005 ms
Drawing the image took 32.623 ms
Image save took 12.071 ms
---
Total time: 51.897 ms

Image decode took 7.304 ms
Creating an empty image took 0.041 ms
Converting pixels took 1.804 ms
Image save took 12.366 ms
---
Total time: 21.515 ms
--------

"Image save" drops from 27ms to 12ms.
Reply all
Reply to author
Forward
0 new messages