I'm using libyuv for processing JPEG images and had some problems which I wanted to give some feedback on. I'm developing on an Samsung galaxy S3 (arm v7 with neon). I was using version 877 but I didn't see any changes in the code that was causing me problems.
1) For scaling and color conversion I was getting failures because the images I was using was exceeding the size limitation of kMaxStride (default of 1920 pixels). After checking the code I see the value is used to create a temp row array for processing. The array is getting declared as a local variable which is allocated from the stack. Increasing the size eventually led to stack overflows. I'm not sure why its trying to a create 2-4K byte array on stack so I switched the code to use malloc and it works OK for the functions I'm using. Example of the changes:
//SIMD_ALIGNED(uint16 row[kMaxStride]); // original code
#define ROUND_UP(x,y) ((x + y - 1) / y * y)
void *buffer = malloc(kMaxStride * sizeof(uint16) + 16); // allocate from heap and add extra 16 bytes to allow for alignment
uint16 *row = (uint16*)ROUND_UP((uint)buffer, 16 ); // align buffer to 16 byte address
.....
free(buffer);
Also, I think instead of kMaxStride it can use the actual stride to get an exact size buffer.
2) When I use RGB24ToI420() or RAWToI420() I was getting a color shift (appears as red tint). When I using the SW version of RGB to YUV (RGB24ToYRow_C & RGB24ToUVRow_C) the color shift didn't occur. I'm guessing the neon optimized code has a lot of rounding errors which is throwing off the results compared to the SW implementation. For the opposite conversion, I420 to RGB24, I did not notice any significant shift when using the neon optimized version.