libjpeg-turbo slow on Windows? Where to start with investigation...

321 views
Skip to first unread message

David H

unread,
Feb 16, 2021, 2:11:57 PM2/16/21
to libjpeg-turbo User Discussion/Support
Hi all,

I'm writing some software which uses libjpeg-turbo to write its output file. I managed to build the turbojpeg-static project with Visual Studio C++ to create the turbojpeg-static.lib file and linked it to my program, also built with Visual Studio C++. So far so good.

In testing, writing a 14849 x 7444 JPEG takes 1.47 seconds.

However, when I compile the same program in my WSL Ubuntu environment running on the same laptop, linking to libjpeg (apt-get install libjpeg-dev), writing the JPEG only takes 0.72 seconds. The other parts of the program all vary slightly in speed, as you'd expect with different compilers, but none show such a huge disparity as JPEG output. PNG output is the same speed from both builds.

This seems like a pretty big difference to me, but I'm not sure where to start figuring it out. I'm pretty sure I have all the good optimisations turned on in the VC project, and I've tried it with /fp:fast, but it doesn't seem to make a difference.

Are there any known speed issues with libjpeg-turbo on WIndows that would explain this, or can anyone suggest some things for me to check?

DRC

unread,
Feb 16, 2021, 3:33:43 PM2/16/21
to libjpeg-t...@googlegroups.com
The quickest way to know whether libjpeg-turbo is at fault for the
performance difference is to run tjbench with the same input image and
settings on both machines. For instance:

/opt/libjpeg-turbo/bin/tjbench image.ppm 80 -rgb -subsamp 420 -nowrite
or
c:\libjpeg-turbo64\bin\tjbench image.ppm 80 -rgb -subsamp 420 -nowrite

will test the raw compute performance of compressing the contents of
image.ppm from an RGB pixel buffer into a JPEG image with quality 80 and
4:2:0 subsampling.

That will also give you an idea of the performance ceiling, excluding
I/O time. I suspect that the difference you're observing is due to I/O
time, which is out of libjpeg-turbo's control (Windows I/O is just
slower than Linux I/O.) However, here are some possible areas for
optimization:

-- If you can spare the memory, the most efficient way to compress a
JPEG image is to load the entire source image into memory and use the
in-memory destination manager. (That's what tjbench does.) However,
it's understandable if this is an untenable proposition for a
110-megapixel image.

-- If you have to use buffered I/O, then try increasing the size of your
buffer.

-- Check for any costly and unnecessary Extended-RGB-to-RGB color
conversion algorithms that could be replaced with the use of the
libjpeg-turbo colorspace extensions. I've seen older code, which was
written for libjpeg, perform really inefficient per-pixel RGBA-to-RGB or
BGRA-to-RGB conversion, and these algorithms are so slow that they
effectively hide any speedup from libjpeg-turbo.

I'm happy to review your JPEG compression kernel if you'll post a
snippet of code.

David Horman

unread,
Feb 16, 2021, 3:54:25 PM2/16/21
to libjpeg-t...@googlegroups.com

Thanks for the suggestion. Here are my results (apt helpfully suggested that I install libjpeg-turbo-test):

Ubuntu (Windows Subsystem on Linux):

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 227 x 149
Compress      --> Frame rate:         7421.929573 fps
                  Output image size:  6068 bytes
                  Compression ratio:  16.721984:1
                  Throughput:         251.031924 Megapixels/sec
                  Output bit stream:  360.290149 Megabits/sec
Decompress    --> Frame rate:         9198.674991 fps
                  Throughput:         311.126784 Megapixels/sec

Windows 10 (same computer), x64:

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 227 x 149
Compress      --> Frame rate:         2274.411861 fps
                  Output image size:  6068 bytes
                  Compression ratio:  16.721984:1
                  Throughput:         76.927432 Megapixels/sec
                  Output bit stream:  110.409049 Megabits/sec
Decompress    --> Frame rate:         3659.631437 fps
                  Throughput:         123.779714 Megapixels/sec

As you can see, still quite a big difference! x86 tjbench.exe was even slower at 1660fps.

I probably should have mentioned before, I'm using version 2.0.4.

As for my code, it prepares and writes 64 rows at a time using jpeg_write_scanlines. All the image data is already in RAM, I just prepare it in strips because that's what libtiff expects you to do (it outputs TIFF, PNG, or JPEG using the same code, varying only in the call to the appropriate library once each strip is complete. and as noted before PNG speed is the same on both Ubuntu and Windows). I also tried 1, 16, 512, and the full 7444 rows at a time, but it didn't make any difference.

David

DRC

unread,
Feb 16, 2021, 4:00:31 PM2/16/21
to libjpeg-t...@googlegroups.com

Please test an image that is closer to the actual size you intend to compress in your application.  The performance of the 227x149 test image in libjpeg-turbo is going to depend too heavily on overhead to be a good comparison.  You want a much larger image so you can really test the maximum throughput.

--
You received this message because you are subscribed to the Google Groups "libjpeg-turbo User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libjpeg-turbo-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libjpeg-turbo-users/0a9ce63c-c554-0ef0-c472-42ffaff69ba3%40gmail.com.

David Horman

unread,
Feb 16, 2021, 4:09:35 PM2/16/21
to libjpeg-t...@googlegroups.com

Ubuntu (WSL):

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 15335 x 7991
Compress      --> Frame rate:         2.735202 fps
                  Output image size:  11710674 bytes
                  Compression ratio:  31.392382:1
                  Throughput:         335.177027 Megapixels/sec
                  Output bit stream:  256.248429 Megabits/sec
Decompress    --> Frame rate:         3.473690 fps
                  Throughput:         425.672917 Megapixels/sec

Windows 10 (x64):

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 15335 x 7991
Compress      --> Frame rate:         0.740540 fps
                  Output image size:  11710674 bytes
                  Compression ratio:  31.392382:1
                  Throughput:         90.747236 Megapixels/sec
                  Output bit stream:  69.377776 Megabits/sec
Decompress    --> Frame rate:         1.230370 fps
                  Throughput:         150.772003 Megapixels/sec

------------------------------------------------------------------

And for good measure, a medium-sized image:

------------------------------------------------------------------

Ubuntu (WSL):

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 1024 x 1024
Compress      --> Frame rate:         351.023714 fps
                  Output image size:  72448 bytes
                  Compression ratio:  43.420495:1
                  Throughput:         368.075042 Megapixels/sec
                  Output bit stream:  203.447728 Megabits/sec
Decompress    --> Frame rate:         430.199562 fps
                  Throughput:         451.096936 Megapixels/sec

Windows 10 (x64):

>>>>>  RGB (Top-down) <--> JPEG 4:2:0 Q80  <<<<<

Image size: 1024 x 1024
Compress      --> Frame rate:         101.308824 fps
                  Output image size:  72448 bytes
                  Compression ratio:  43.420495:1
                  Throughput:         106.230002 Megapixels/sec
                  Output bit stream:  58.716973 Megabits/sec
Decompress    --> Frame rate:         150.991629 fps
                  Throughput:         158.326198 Megapixels/sec

---------------------------------------------------------------------------

DRC

unread,
Feb 16, 2021, 4:25:27 PM2/16/21
to libjpeg-t...@googlegroups.com

OK, so it's a legitimate slow-down, but unfortunately, I have no clue what could be causing it.  When I run Windows vs. Linux on the same hardware, I observe more like a 5% slow-down under Windows.

Are you trying to benchmark the Windows code while the Linux VM is running?  That might be the cause.  Maybe Hyper-V is giving a higher priority to the Linux guest than to user code running in the Windows host.

I'm also wondering if maybe CPU feature detection is borked somehow.  If you're comfortable building libjpeg-turbo from source, try adding a print statement at the end of init_simd() in simd/x86_64/jsimd.c and see if you get the same values for simd_support and simd_huffman on both O/S platforms.

--
You received this message because you are subscribed to the Google Groups "libjpeg-turbo User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libjpeg-turbo-u...@googlegroups.com.

David Horman

unread,
Feb 16, 2021, 4:43:16 PM2/16/21
to libjpeg-t...@googlegroups.com

I just discovered the pre-built Windows binaries - not sure how I overlooked these before - so I got 2.0.6 and it gives me comparable results - faster, in fact, perhaps because I/O is more direct - to Ubuntu. I must have messed something up when compiling 2.0.4, although I don't know what that might have been. I think I may have followed this guide:

https://github.com/libjpeg-turbo/libjpeg-turbo/blob/master/BUILDING.md

to generate the project files (it looks like I did so twice, in separate directories, for x86 and x64), so maybe there was some flag I should have set that I didn't. I couldn't see anything amiss in the Project Properties though, and I made sure I was doing Release builds.

CPU detection certainly sounds like a plausible culprit - I'm still curious to find out what the problem was so I may give it a try another day, and if I do I'll report back.

Sorry if I've wasted your time, but thanks very much for your help!

(PS To link to the pre-built 2.0.6 library I had to link in legacy_stdio_definitions.lib and add my own implentation of __iob_func. I think this is because it was compiled with VS 2015 and I'm using VS 2019, which has inlined and redefined some stdio stuff)

David

You received this message because you are subscribed to a topic in the Google Groups "libjpeg-turbo User Discussion/Support" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/libjpeg-turbo-users/IwvQhDFfjXE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to libjpeg-turbo-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libjpeg-turbo-users/1a55d360-ad8a-b051-8323-edcf6a39be36%40virtualgl.org.

DRC

unread,
Feb 16, 2021, 4:46:52 PM2/16/21
to libjpeg-t...@googlegroups.com

Aha.  OK, then my suspicion is that your custom build isn't enabling the SIMD code at all for some reason.  Pass -DREQUIRE_SIMD=1 to cmake when configuring the build, and that will cause the configuration to fail if it can't enable SIMD instructions.  It could be something as simple as NASM not being installed and in your PATH.

2.0.6 was compiled with VS 2010.  I started using VS 2015 with libjpeg-turbo 2.1, so the 2.1 beta1 release should avoid that linkage issue.

Reply all
Reply to author
Forward
0 new messages