On 12/18/18 10:53 AM, Ed wrote:
> I have an application that pulls images off a v4l2 camera in BGR3
> format, then (after some image processing) converts each image to JPEG
> for streaming. Using 'top' I see the CPU usage of the application, and
> for both versions (with or without turbo-jpeg) the usage is essentially
> the same.
That would be like comparing the speed of a Ferrari and a Yugo by
comparing the engine RPM. Ultimately, what matters is not how fast the
engine is running but how fast the car gets from Point A to Point B.
You want the CPU to be at 100% when generating JPEG images, because
otherwise the CPU is being under-utilized, and you aren't generating
those images as quickly as you could be. The idea is that both
libjpeg-turbo and libjpeg should "red-line" the CPU, but libjpeg-turbo
should get to the finish line much more quickly.
Another thing to be aware of is Amdahl's Law. If, for instance, 95% of
the execution time is spent on I/O and 5% is spent on JPEG encoding,
then speeding up JPEG encoding by 2x will only speed up the overall
execution by 2-3%, which won't be noticeable. File this under: "Why the
'time' command is usually a poor way to measure the performance of
libjpeg-turbo."
Computer performance engineering-- particularly understanding Amdahl's
Law, how to measure and report speedups, how to design benchmarks,
pipelining, parallel processing, Flynn's taxonomy, etc.-- is a critical
skill to understanding libjpeg-turbo. Ultimately, every benchmark
number is just a measurement of how a specific application performs on a
specific system with a specific workload. Referring to
https://libjpeg-turbo.org/About/Performance, the "2-6x" speedup claim
comes from using tjbench, which measures the performance of
libjpeg-turbo in isolation. Due to Amdahl's Law, that speedup will be
realized less and less as libjpeg-turbo accounts for less and less of
the application's total execution time. But clever application design,
such as pipelining I/O with compute, eliminating unnecessary buffer
copies, etc., can allow the application to realize more of an overall
speedup from libjpeg-turbo.
For a price, I do corporate consulting in that regard (i.e. helping
companies achieve the best possible speedup from libjpeg-turbo by
restructuring their code so it doesn't get in the way of libjpeg-turbo.)
> Secondly, what type of CPU are you using?
>
> Right now I'm just using a Raspberry Pi
The ARM SIMD implementation in libjpeg-turbo is not as complete as the
x86/x86-64 SIMD implementation, but it should still give you about 2-4x
speedup for baseline images.