Scaling YUV422 Pixel format

Srujan Sriram

unread,

Sep 14, 2016, 11:39:51 AM9/14/16

to discuss-libyuv

Hello,

I was going through libyuv's scaling algorithms in scale.cc, and understand that it does per plane scaling. I had a couple of questions:

Can Y, U and V also be treated as planes just the same as R, G and B would be? Meaning, Y, U and V are difference channels - would the arithmetic of doing bilinear interpolation between luma and the two chroma channels work out alright? Are there any edge cases I need to be worried about?
If the format is pixel instead of Planar (that is, if the luma and chroma channels are interleaved, example: YUV422),then is this bilinear interpolation algorithm correct?
TOP ROW: Y00, U00, Y01, V00, Y02, U01, Y03, V01, BOTTOM ROW: Y10, U10, Y11, V10, Y12, U11, Y13, V11, Result is interpolation of: (Y00, Y01, Y10, Y11), (U00, U01, U10, U11), (Y02, Y03, Y12, Y13), (U00, U01, U10, U11). That forms my first two YUYV pixels of 32 bits.

Regards

Srujan

Frank Barchard

unread,

Sep 14, 2016, 1:21:37 PM9/14/16

to discuss-libyuv

On Wed, Sep 14, 2016 at 8:39 AM, Srujan Sriram <srujan...@gmail.com> wrote:

Hello,

I was going through libyuv's scaling algorithms in scale.cc, and understand that it does per plane scaling. I had a couple of questions:
Can Y, U and V also be treated as planes just the same as R, G and B would be? Meaning, Y, U and V are difference channels - would the arithmetic of doing bilinear interpolation between luma and the two chroma channels work out alright? Are there any edge cases I need to be worried about?

Yes you can scale them independently. There may be a small shift in chroma if you scale chroma differently than luma.

If the format is pixel instead of Planar (that is, if the luma and chroma channels are interleaved, example: YUV422),then is this bilinear interpolation algorithm correct?
TOP ROW: Y00, U00, Y01, V00, Y02, U01, Y03, V01, BOTTOM ROW: Y10, U10, Y11, V10, Y12, U11, Y13, V11, Result is interpolation of: (Y00, Y01, Y10, Y11), (U00, U01, U10, U11), (Y02, Y03, Y12, Y13), (U00, U01, U10, U11). That forms my first two YUYV pixels of 32 bits.

no that would not directly scale correctly. You'd need to first convert the YUY2 to planar YUV using YUY2ToI422. Then scale that. And convert back if necessary.

The only exception to that being a vertical-only scale. If width remained the same, and the vertical was scaled, you could use ScalePlane.

Regards
Srujan

--
You received this message because you are subscribed to the Google Groups "discuss-libyuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-libyuv+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Srujan Sriram

unread,

Sep 14, 2016, 3:19:43 PM9/14/16

to discuss-libyuv

I am not sure I understand the reasoning for the second question. Whether you convert into planar format, perform the arithmetic and convert back to pixel format OR you pick the values at the right indices without doing a conversion and then perform the arithmetic - the result ought to be the same. What am I missing? Are the interpolation indices I chose in the example wrong?

The whole point of doing a direct YUYV conversion is to avoid any conversions. A few known recommendations were to convert to RGBA, scale and then scale back.

Regards
Srujan

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-libyu...@googlegroups.com.

Srujan Sriram

unread,

Sep 14, 2016, 3:22:59 PM9/14/16

to discuss-libyuv

I meant - convert back and not scale back (at the end of my previous message).

Frank Barchard

unread,

Sep 15, 2016, 1:26:32 PM9/15/16

to discuss-libyuv

On Wed, Sep 14, 2016 at 12:19 PM, Srujan Sriram <srujan...@gmail.com> wrote:

I am not sure I understand the reasoning for the second question. Whether you convert into planar format, perform the arithmetic and convert back to pixel format OR you pick the values at the right indices without doing a conversion and then perform the arithmetic - the result ought to be the same. What am I missing? Are the interpolation indices I chose in the example wrong?

bilinear scaling using ScalePlane on YUY2 pixels directly, wont work.

bilinear steps thru the source, and filters between what it thinks are the closest 4 pixels. ScalePlane expects 1 byte per pixel, and bilinear would interpolate between 4 pixels in a 2x2, expecting them to be from the same channel. e.g. Y. But in YUY2 you'd have a 2x2 that looks like this

YU

and it would interpolate between Y and U channels.

In your example

TOP ROW: Y00, U00, Y01, V00, Y02, U01, Y03, V01,
BOTTOM ROW: Y10, U10, Y11, V10, Y12, U11, Y13, V11,

If scaling to half size with bilinear, you would get

(Y00+U00+Y10+U10)/4, (Y01+U01+Y11+U11)/4, (Y02+U02+Y12+U12)/4, (Y03+U03+Y13+U13)/4,

which is incorrect.

To make it work you'd need a different version of the ScalePlane that expects Y channel to have a pixel stride of 2 - 2 bytes between adjacent Y pixels. And another for U and V planes that expects pixel stride of 4. This would be a non-trival amount of work to implement.

The whole point of doing a direct YUYV conversion is to avoid any conversions. A few known recommendations were to convert to RGBA, scale and then scale back.

Converting YUY2ToARGB is slow and lossy. It would be much faster to convert YUIY2ToI422 and back.

Benchmark on Sandy Bridge (HP Z620) for 1000 images of 1280x720

YUY2ToARGB_Opt (498 ms)

ARGBToYUY2_Opt (750 ms)

YUY2ToI422_Opt (293 ms)

I422ToYUY2_Opt (181 ms)

On Wednesday, September 14, 2016 at 10:21:37 AM UTC-7, Frank Barchard wrote:

On Wed, Sep 14, 2016 at 8:39 AM, Srujan Sriram <srujan...@gmail.com> wrote:
Hello,

I was going through libyuv's scaling algorithms in scale.cc, and understand that it does per plane scaling. I had a couple of questions:
Can Y, U and V also be treated as planes just the same as R, G and B would be? Meaning, Y, U and V are difference channels - would the arithmetic of doing bilinear interpolation between luma and the two chroma channels work out alright? Are there any edge cases I need to be worried about?

Yes you can scale them independently. There may be a small shift in chroma if you scale chroma differently than luma.
If the format is pixel instead of Planar (that is, if the luma and chroma channels are interleaved, example: YUV422),then is this bilinear interpolation algorithm correct?
TOP ROW: Y00, U00, Y01, V00, Y02, U01, Y03, V01, BOTTOM ROW: Y10, U10, Y11, V10, Y12, U11, Y13, V11, Result is interpolation of: (Y00, Y01, Y10, Y11), (U00, U01, U10, U11), (Y02, Y03, Y12, Y13), (U00, U01, U10, U11). That forms my first two YUYV pixels of 32 bits.
no that would not directly scale correctly. You'd need to first convert the YUY2 to planar YUV using YUY2ToI422. Then scale that. And convert back if necessary.
The only exception to that being a vertical-only scale. If width remained the same, and the vertical was scaled, you could use ScalePlane.

Regards
Srujan

--
You received this message because you are subscribed to the Google Groups "discuss-libyuv" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-libyu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "discuss-libyuv" group.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss-libyuv+unsubscribe@googlegroups.com.

Srujan Sriram

unread,

Sep 15, 2016, 3:06:15 PM9/15/16

to discuss...@googlegroups.com

Great, I think this is exactly what I was saying - that it is possible to perform bilinear interpolation by jumping to the next byte of the same channel. That is, if we are interpolating luma - then jump from Y00 to the next luma byte, which is Y01 - as I had depicted in my example - likewise with Chroma - jump to next relevant byte and interpolate. Agreed that a different algorithm needs to be written - I never intended for ScalePlane to already do this. The new algorithm seems straight-forward to implement, although to optimize (vectorize) would be difficult.

Thanks for the feedback, and keep the thoughts flowing...

--
You received this message because you are subscribed to a topic in the Google Groups "discuss-libyuv" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/discuss-libyuv/jJCLsPUZc5c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to discuss-libyuv+unsubscribe@googlegroups.com.

Frank Barchard

unread,

Sep 15, 2016, 6:59:37 PM9/15/16

to discuss-libyuv

On Thu, Sep 15, 2016 at 12:06 PM, Srujan Sriram <srujan...@gmail.com> wrote:

Great, I think this is exactly what I was saying - that it is possible to perform bilinear interpolation by jumping to the next byte of the same channel. That is, if we are interpolating luma - then jump from Y00 to the next luma byte, which is Y01 - as I had depicted in my example - likewise with Chroma - jump to next relevant byte and interpolate. Agreed that a different algorithm needs to be written - I never intended for ScalePlane to already do this. The new algorithm seems straight-forward to implement, although to optimize (vectorize) would be difficult.

Thanks for the feedback, and keep the thoughts flowing...

ok then the way I've implemented bilinear is 2 passes - columns and rows. This keeps the code simplier and in some ways faster. The column code is much slower, so depending if you are scaling up or down, I do one or the other first. When scaling up, I would scale the columns first, into a buffer.

The scale vertically from that buffer and the previous buffer multiple times, until the next source row. e.g. if scaling up by 10x, scaleFilterCols_C would be called once, and then InterpolateRow() called 10 times... once for each destination row, but using the same source row buffers.

The row code wouldnt need a change.

The column code looks like this:

void ScaleFilterCols_C(uint8* dst_ptr, const uint8* src_ptr,

int dst_width, int x, int dx) {

int j;

for (j = 0; j < dst_width - 1; j += 2) {

int xi = x >> 16;

int a = src_ptr[xi];

int b = src_ptr[xi + 1];

dst_ptr[0] = BLENDER(a, b, x & 0xffff);

x += dx;

xi = x >> 16;

a = src_ptr[xi];

b = src_ptr[xi + 1];

dst_ptr[1] = BLENDER(a, b, x & 0xffff);

x += dx;

dst_ptr += 2;

}

if (dst_width & 1) {

int xi = x >> 16;

int a = src_ptr[xi];

int b = src_ptr[xi + 1];

dst_ptr[0] = BLENDER(a, b, x & 0xffff);

}

instead of +1 you would add + 2 for luma or + 4 for chroma, and advance x or scale it by 2. lets say scale for now... so it would look like this

void ScaleFilterCols_Step2_C(uint8* dst_ptr, const uint8* src_ptr,

int dst_width, int x, int dx) {

int j;

for (j = 0; j < dst_width - 1; j += 2) {

int xi = (x >> 16) * 2;

int a = src_ptr[xi];

int b = src_ptr[xi + 2];

dst_ptr[0] = BLENDER(a, b, x & 0xffff);

x += dx;

xi = (x >> 16) * 2;

a = src_ptr[xi];

b = src_ptr[xi + 2];

dst_ptr[1] = BLENDER(a, b, x & 0xffff);

x += dx;

dst_ptr += 2;

}

if (dst_width & 1) {

int xi = (x >> 16) * 2;

int a = src_ptr[xi];

int b = src_ptr[xi + 2];

dst_ptr[0] = BLENDER(a, b, x & 0xffff);

Srujan Sriram

unread,

Sep 16, 2016, 12:35:56 PM9/16/16

to discuss...@googlegroups.com

You had me convinced to do the conversion to YV12, scale and then back-convert, mostly because I already have super efficient implementations of scaling grayscale images - all that would mean is that I could simply use those existing implementations to scale Y,U,V independently.

The benefit I am seeing with scaling YUV422 directly, as you have shown, is that data needs to be brought into cache just once. Thanks for your feedback.

Frank Barchard

unread,

Sep 16, 2016, 2:02:55 PM9/16/16

to discuss-libyuv

On Fri, Sep 16, 2016 at 9:35 AM, Srujan Sriram <srujan...@gmail.com> wrote:

You had me convinced to do the conversion to YV12, scale and then back-convert, mostly because I already have super efficient implementations of scaling grayscale images - all that would mean is that I could simply use those existing implementations to scale Y,U,V independently.

YUY2 is a 422 packed pixel format, so you might want to convert to YV16 (aka I422) which converts it to planar losslessly. YUY2ToI422()

ScalePlane is a grayscale function... Is highly optimized for bilinear and especially for specific scale factors of 3/4, 1/2, 3/8, and 1/4 scaling down.

So give it a try and see if its faster than your version.

The benefit I am seeing with scaling YUV422 directly, as you have shown, is that data needs to be brought into cache just once. Thanks for your feedback.

Srujan Sriram

unread,

Sep 16, 2016, 2:09:37 PM9/16/16

to discuss...@googlegroups.com

Noted that. Thanks, will try out yours as well too. I too have broken down implementations for recognized scale factors - but do not use the column/row based approach, maybe yours is faster afterall :)

Reply all

Reply to author

Forward