Failure on large images

Gordon

unread,

Dec 19, 2013, 6:47:30 PM12/19/13

to discuss...@googlegroups.com

I'm using libyuv for processing JPEG images and had some problems which I wanted to give some feedback on. I'm developing on an Samsung galaxy S3 (arm v7 with neon). I was using version 877 but I didn't see any changes in the code that was causing me problems.

1) For scaling and color conversion I was getting failures because the images I was using was exceeding the size limitation of kMaxStride (default of 1920 pixels). After checking the code I see the value is used to create a temp row array for processing. The array is getting declared as a local variable which is allocated from the stack. Increasing the size eventually led to stack overflows. I'm not sure why its trying to a create 2-4K byte array on stack so I switched the code to use malloc and it works OK for the functions I'm using. Example of the changes:

//SIMD_ALIGNED(uint16 row[kMaxStride]); // original code
#define ROUND_UP(x,y) ((x + y - 1) / y * y)
void *buffer = malloc(kMaxStride * sizeof(uint16) + 16); // allocate from heap and add extra 16 bytes to allow for alignment
uint16 *row = (uint16*)ROUND_UP((uint)buffer, 16 ); // align buffer to 16 byte address
.....
free(buffer);

Also, I think instead of kMaxStride it can use the actual stride to get an exact size buffer.

2) When I use RGB24ToI420() or RAWToI420() I was getting a color shift (appears as red tint). When I using the SW version of RGB to YUV (RGB24ToYRow_C & RGB24ToUVRow_C) the color shift didn't occur. I'm guessing the neon optimized code has a lot of rounding errors which is throwing off the results compared to the SW implementation. For the opposite conversion, I420 to RGB24, I did not notice any significant shift when using the neon optimized version.

Frank Barchard

unread,

Dec 23, 2013, 2:47:20 PM12/23/13

to discuss...@googlegroups.com

On Thursday, December 19, 2013 3:47:30 PM UTC-8, Gordon wrote:

I'm using libyuv for processing JPEG images and had some problems which I wanted to give some feedback on. I'm developing on an Samsung galaxy S3 (arm v7 with neon). I was using version 877 but I didn't see any changes in the code that was causing me problems.

Hi Gordon, thanks for the reports.

1) For scaling and color conversion I was getting failures because the images I was using was exceeding the size limitation of kMaxStride (default of 1920 pixels). After checking the code I see the value is used to create a temp row array for processing. The array is getting declared as a local variable which is allocated from the stack. Increasing the size eventually led to stack overflows. I'm not sure why its trying to a create 2-4K byte array on stack so I switched the code to use malloc and it works OK for the functions I'm using. Example of the changes:

//SIMD_ALIGNED(uint16 row[kMaxStride]); // original code
#define ROUND_UP(x,y) ((x + y - 1) / y * y)
void *buffer = malloc(kMaxStride * sizeof(uint16) + 16); // allocate from heap and add extra 16 bytes to allow for alignment
uint16 *row = (uint16*)ROUND_UP((uint)buffer, 16 ); // align buffer to 16 byte address
.....
free(buffer);

Also, I think instead of kMaxStride it can use the actual stride to get an exact size buffer.

The intent was webcams, which usually dont go beyond 1080p, but fair enough. I've had a todo in the row.h to remove kMaxStride restrictions.

Entered a bug https://code.google.com/p/libyuv/issues/detail?id=296

The example you show is in scale.cc? Do you have a list of function you need with reguards to jpeg? Seems like jpeg itself should work?

2) When I use RGB24ToI420() or RAWToI420() I was getting a color shift (appears as red tint). When I using the SW version of RGB to YUV (RGB24ToYRow_C & RGB24ToUVRow_C) the color shift didn't occur. I'm guessing the neon optimized code has a lot of rounding errors which is throwing off the results compared to the SW implementation. For the opposite conversion, I420 to RGB24, I did not notice any significant shift when using the neon optimized version.

What, no Neon code fix? :-)

Hmm... Looking at RGB24ToUV_NEON, I guess the / 4 I did on coefficients to save time on averaging was too much loss. I'd have to add an extra set of shifts for the averaging to make it full accuracy.

If you have a chance, you could confirm the RGB24ToUV_NEON is the problematic function. The RGB24ToY_NEON doesn't use that trick and should be ok.

Filed a bug

https://code.google.com/p/libyuv/issues/detail?id=297

Frank Barchard

unread,

Dec 26, 2013, 2:32:21 PM12/26/13

to discuss...@googlegroups.com

r920 should fix the RGB24ToI420. If youre currently using RGB24ToYRow_C RGB24ToUVRow_C try switching them to RGB24ToYRow_NEON RGB24ToUVRow_NEON.

Let me know if that works now.

For some of the conversions.. the ones that have 2x subsampling vertically, I think I could use a half add.

This would loss a slight amount of precision, but remove the shift instructions.

It would also be more similiar to the SSE2 version which does a vertical pavgb of rows.

I've put in a todo to consider that, but will need to set up hardware / benchmarks, unittests to evaluate if its a win.

The current shifts shouldnt hurt perf much, and still a huge win over C code.

C RGB24ToI420_Opt (13374 ms)

NEON RGB24ToI420_Opt (2487 ms)

Frank Barchard

unread,

Dec 26, 2013, 9:18:57 PM12/26/13

to discuss...@googlegroups.com

r929 uses malloc/free for row buffers.

This will hurt performance here and there, so let me know if you run into that.

I think I've done the allocator in row.h in a way that would allow it to use an array or alloca() instead.. or an alternative fast allocator.

I tested on Windows with

set LIBYUV_WIDTH=30000

out\release\libyuv_unittest

Scaling is limited to width of 32767 at the moment, due to math overflow, which I'll fix in near future.

Closing bugs. Please try the new versions and let me know how it works out.

Frank Barchard

unread,

Jan 2, 2014, 5:33:35 PM1/2/14

to discuss...@googlegroups.com

Scaling still has size restriction of 32767 pixel width. I've opened a bug

https://code.google.com/p/libyuv/issues/detail?id=302

and in r940 made an improvement to the math, but there are still limitations for the fixed point stepping thru source.

Frank Barchard

unread,

Jan 3, 2014, 1:45:04 PM1/3/14

to discuss...@googlegroups.com

r949 resolves all known width restrictions.

Reply all

Reply to author

Forward