Re: Thank you for libyuv (and feature request)

137 views
Skip to first unread message

Frank Barchard

unread,
Jul 22, 2014, 7:24:56 PM7/22/14
to Till Varoquaux, discuss-libyuv
Hi,
Good to hear libyuv is working out for you, and thanks for the feedback.
There is a mail list, I've added, and feature requests generally go into 'issues'.

Sounds like 3 requests
1. planar RGB
2. build without jpeg
3. allow overread

1. planar rgb
For planar rgb, there is a ScalePlane you could call 3 times.
There is no planar to packed conversions.  They do implement nicely in SIMD.  I've only had minor need for such a function.  The simpliest would be 4 planes to/from 4 bytes packed.  That would be 2 functions.  Would that suffice?

2. jpeg
I think that could be done with a GYP_DEFINE.  file an issue?

3. overread
I think as is you could lie about the image size?
overead, and especially overwrite, are dangerous.
For some functions I achieved a similar win using row coalescing.  If the bytes of the rows are contiguous, it treats the image as one row.  But it wont work for scaling.
For 2 pass functions, which scaling often is, I'm able to do overread/write in temporary row buffers, and I sometimes have flexibility about doing vertical or horizontal operation first.
That could be extended to copy the image a row at a time into a temporary row buffer and then overread/write that.
But the main direction I plan to go is implement 'any' functions that can handle any width.  Conversions are done this way and its been clean and easy enough to maintain.
Most low levels are written with overread/write in mind.  e.g. a function does 16 at a time.  If you tell it 17, it will do 32.
In the past cpus were more efficient with aligned memory instructions - e.g. movdqa.  Newer cpus can use unaligned instructions and still benefit from aligned memory but do not require it.
I'll likely move toward removing aligned and unaligned versions of each function.
So you functions will only check the cpu and width.
For Neon and SSE2 most functions do 16 pixels at a time.
For AVX2 most do 32 pixels at a time.
With upcoming AVX512, I expect it'll be 64 pixels at a time.  So you may want to align even more with that in mind.
For webcams and screens, sizes are naturally aligned.  But if its general purpose images, then perhaps keep it 32 for now.



On Tue, Jul 22, 2014 at 3:19 PM, Till Varoquaux <ti...@okcupid.com> wrote:
After trying ffmpeg, libav, pixman and libgd libyuv stood out as both the fastest library and the simplest interface for our image resizing server (at okcupid).
Libyuv is one of the main reasons why our server is so fast. Thank you.

That being said there are a small number of things that could make our life easier. Normally I would send a patch but I am way out of my depth when it comes to SIMD ASM. The full list is (in decreasing order of usefulness):

_ Planar RGB support. Libjpeg uses planar RGB internally (available through the read_raw and write_raw functions). Ideally we would have conversions from Planar_rgb to (and from): I420, RGBx, xRGB.
- A working RGBx box filter. We use FFMPEG for downscales of factors greater than 2 (those are pretty rare because jpegs can be downscaled when we load them).
_ Ability to build out of the chromium tree without patching. Right now we use a small patch that comments out:
        [ 'OS != "ios"', {
          'defines': [
            'HAVE_JPEG'
          ],
          'conditions': [
            # Caveat system jpeg support may not support motion jpeg
            [ 'use_system_libjpeg == 1', {
              'dependencies': [
                 '<(DEPTH)/third_party/libjpeg/libjpeg.gyp:libjpeg',
              ],
            }, {
              'dependencies': [
                 '<(DEPTH)/third_party/libjpeg_turbo/libjpeg.gyp:libjpeg',
              ],
            }],
in the gyp file. I don't really know if there's a way to check whether a variable is defined in gyp,
_ Ability to allow over-reading. All of our pointers ar 32bytes alligned and so are the strides; the width of the images however are not. Feels like we might be leaving some easy optimizations on the table here.

If I can do anything to help getting those features in please let me know. As said previously libyuv has been a great fit for our codebase and I don't want to seem ungrateful by asking for more.

Gratefully,
Till Varoquaux

Till Varoquaux

unread,
Jul 22, 2014, 8:41:55 PM7/22/14
to Frank Barchard, discuss-libyuv
Thank you. I have added myself to the mailing list.

1) The packing functions would be the single most useful set of functions. I have more need for a 3 planes to 4 bytes packed function (with the last byte being garbage) but could use one of the 3 planes as a 4th input plane to 4 planes to 4 bytes function.
I wonder how much of an overhead we would have when converting planar RGB to YUV (I420 in our case). We would have to go planar RGB -> packed RGBx -> I420. I suspect the I420 conversion functions are not easy to write, this overhead is probably not a big deal.
2) I just filled a request for libjpegless builds (#346).
3) It's not hard for us to adjust the alignment based upon the cpuid, when we get avx512 on our servers it'll be an easy transition. I see that there's also an open issue for overreading/overwriting. I'll keep an eye on it.

Reply all
Reply to author
Forward
0 new messages