Efficient RGB24 to YUV

1,365 views
Skip to first unread message

Oliver Uvman

unread,
Jan 30, 2014, 7:22:30 AM1/30/14
to apps-...@webmproject.org
Hello!

I'm working on screencasting software for my university. We're looking to effectively stream the screen from clients to a server. Right now we're looking to do this by taking screenshots with xlib's XGetImage which returns RGB24 formatted screens. We convert the screenshots to .ivf frames on the client using a modified version of simple_encoder.c found here: http://www.webmproject.org/docs/vp8-sdk/samples.html - we stream the .ivf frames to the server which wraps them in a .webm container using ffmpeg.

The simple_encoder.c takes images in YUV (i420) format, so before passing the screens off we do our own conversion. At the moment, we're capturing screens way too slowly. About 50% of the time is spent doing the RGB to YUV conversion. The conversion is done in the following manner: https://gist.github.com/8706203

I'm looking for advice on how to optimize this. I've found that libvpx uses libyuv, which contains seemingly applicable conversion functions optimized for SSSE3. I've also found that xiph-vp32 has some ColorSpaces libraries that can do the conversions I need, though optimized for Pentium 2 and 3. Of course, the fastest code is that which does not need to be run. Perhaps any of you know how to get YUV formatted data directly from X11, or know a way to encode .ivf frames directly from RGB24 formatted images.

Any advice greatly appreciated & Warm regards,
Oliver Uvman

James Zern

unread,
Jan 30, 2014, 2:32:45 PM1/30/14
to Application Developers
Hi,
You have options as noted. Wtih your current system libyuv should be
an easy fit, but note you can also use ffmpeg to encode using libvpx.
In addition has swscale for color conversions. If you aren't using the
ffmpeg API, you can probably still achieve this with -f rawvideo on
the input -- have a look at -pix_fmts -- or by writing the frames to
an intermediate format.

Joshua Litt

unread,
Jan 30, 2014, 2:52:09 PM1/30/14
to apps-...@webmproject.org
Depending on the GPU in your system, RGB to YUV might be a great candidate for a trivial cuda or opencl implementation as well.



--
You received this message because you are subscribed to the Google Groups "Application Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to apps-devel+...@webmproject.org.
To post to this group, send email to apps-...@webmproject.org.
Visit this group at http://groups.google.com/a/webmproject.org/group/apps-devel/.
For more options, visit https://groups.google.com/a/webmproject.org/groups/opt_out.

Oliver

unread,
Jan 31, 2014, 9:27:00 AM1/31/14
to apps-...@webmproject.org
Thanks for the responses! We can't get ffmpeg onto the clients (client is a java applet) but might be able to do it with libav. Using libyuv seems a lot easier though, so I'm going to go with that for now and hope it helps.


--
You received this message because you are subscribed to a topic in the Google Groups "Application Developers" group.
To unsubscribe from this topic, visit https://groups.google.com/a/webmproject.org/d/topic/apps-devel/Vol2GJ6_IX0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to apps-devel+...@webmproject.org.

Oliver

unread,
Jan 31, 2014, 2:07:23 PM1/31/14
to apps-...@webmproject.org
Managed to switch out our home-rolled color conversion for the libyuv conversion. Reduced the amount of time spent color converting from 60% to 5%. Would call that a success. :)

Thanks again for the advice! Will also consider doing some gpu programming to reduce it further if it ever becomes a bottleneck again - would be very fun to try my hand at.

James Zern

unread,
Jan 31, 2014, 2:45:42 PM1/31/14
to Application Developers
On Fri, Jan 31, 2014 at 11:07 AM, Oliver <embr...@gmail.com> wrote:
> Managed to switch out our home-rolled color conversion for the libyuv
> conversion. Reduced the amount of time spent color converting from 60% to
> 5%. Would call that a success. :)
>

Great to hear, glad it's working well for you.

Mark Pietras

unread,
Jan 31, 2014, 5:32:06 PM1/31/14
to apps-devel
Was pseudo-following this thread and wanted to try out libyuv as a replacement for our C++ RGB to I420 conversion as well... I downloaded, installed depot_tools, compiled as per instructions, and found that it is a fair bit faster on x86 than ours.  I was excited that this would be a quick/easy improvement but soon discovered that it's actually substantially slower on x64.  Did you experience that as well?

Average results for a very large frame, in milliseconds:

x86| 17ms
x64|106ms

As per their home-page "Optimized for SSE2/SSSE3/AVX2 on x86/x64" libyuv support SSE on x64, but without digging deeply I actually don't see evidence of that in the code.

What was your experience?  I suppose I'll ping them but didn't want to spend gobs of time on it...


Brendan Bolles

unread,
Feb 3, 2014, 5:02:01 PM2/3/14
to apps-...@webmproject.org
On Jan 30, 2014, at 4:22 AM, Oliver Uvman wrote:

> The simple_encoder.c takes images in YUV (i420) format, so before passing the screens off we do our own conversion. At the moment, we're capturing screens way too slowly. About 50% of the time is spent doing the RGB to YUV conversion. The conversion is done in the following manner: https://gist.github.com/8706203


One thing I noticed about Oliver's code is that it appears to use the 601 coefficients for converting RGB to YUV. In his particular case, things are even more thorny than usual because his source isn't 601 or 709, but a computer monitor, presumably sRGB.

But anyway, what color space is WebM said to be in? Seems like Rec 709 would be the assumption these days, but if so I think you're supposed to use different YUV coefficients, at least according to this:

http://www.martinreddy.net/gfx/faqs/colorconv.faq


Brendan

Frank Barchard

unread,
Feb 3, 2014, 5:02:08 PM2/3/14
to James Zern, Application Developers
Nice.
My concern with cuda or opencl is the cost of transferring to and from the gpu is high... at least as expensive as a memcpy for the cpu.

For screencasting, the preferred solution would be a shader that converts to YUV as part of the capture process, and stores the result directly to a frame accessible by CPU.  But this isnt always possible.

Oliver

unread,
Feb 3, 2014, 5:53:26 PM2/3/14
to apps-...@webmproject.org
Mark: I'm running on x64 (haven't tried on x86 machines yet). Gprof tells me SSE-optimized functions are being run. This is all substantially faster than our previous code, and I'm able to reach 21fps on 1680x1050. This includes encoding to .ivf after the color conversion, so I'm guaranteed to be faster than 50ms per frame for the color encoding.

By the way, if anyone knows off the top of their head whether XGetImage will always return RGBA images, or if this will differ from machine to machine, I'd appreciate a heads up.

Mark Pietras

unread,
Feb 4, 2014, 11:32:57 AM2/4/14
to apps-devel
Oliver: thanks for the reply.  I'm compiling on Windows x86 and x64.  This morning I recompiled everything again with all defaults and ran their own test filters... something is wrong, I'll see if I can get anywhere on their project page; nearly 10x slower on x64 (nearly the same as my own bench marking):

x86 Note: Google Test filter = *I420ToARGB_Opt
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.I420ToARGB_Opt
[       OK ] libyuvTest.I420ToARGB_Opt (458 ms)
[----------] 1 test from libyuvTest (460 ms total)

x64 Note: Google Test filter = *I420ToARGB_Opt
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from libyuvTest
[ RUN      ] libyuvTest.I420ToARGB_Opt
[       OK ] libyuvTest.I420ToARGB_Opt (4228 ms)
[----------] 1 test from libyuvTest (4230 ms total)

Oliver

unread,
Feb 4, 2014, 11:59:16 AM2/4/14
to apps-...@webmproject.org
Ah, should have said that I'm doing this on linux. Please update here if you find out how to fix it, I'll probably need to use this library on windows too at some point.

//Oliver

Mark Pietras

unread,
Feb 5, 2014, 8:07:45 AM2/5/14
to apps-devel
Oliver: I got this reply from the development team so I'm out of luck for the time being:

This is a known issue.  Visual Studio does not allow 64 bit assembly.
This is still true as os VS2012, and I think VS2013.
Short term, alternative compilers seem most feasible.  I'm aware of 2 visual c compatible compilers - clang-cl and icl.  We've testing/fixed clang-cl for 32 bit, but not tested the 64 bit version.
For Web Apps there is 64 bit NaCL on windows, which uses a variation of gcc.

Oliver

unread,
Feb 5, 2014, 8:12:15 AM2/5/14
to apps-...@webmproject.org
Thanks!

Mark Pietras

unread,
Feb 27, 2014, 9:47:41 AM2/27/14
to apps-devel
FYI I was able to cross-compile/assemble just the one row_posix file in Linux to a Windows x64 target and link that in with the other libyuv objects compiled on x64 Windows with Visual Studio. A pain, but substantially improved performance.
Reply all
Reply to author
Forward
0 new messages