jpegtran - flipping problem

17 views
Skip to first unread message

SFA

unread,
Oct 22, 2024, 12:20:17 PM10/22/24
to libjpeg-turbo User Discussion/Support
Dear libjpeg-turbo team,

we are experiencing an issue when vertically flipping a JPEG image using jpegtran. Please find attached a ZIP file containing the input and output images.

The command used was: jpegtran -flip vertical imgflipv.jpeg > flipped.jpeg

It appears there may be a bug in the library causing the incorrect output. We hope this issue will be resolved in future versions of jpegtran.

Any potential solutions or workarounds would be greatly appreciated. Please feel free to ask for further information.

Sincerely,
SFA
imgflipv.zip

DRC

unread,
Oct 22, 2024, 2:34:56 PM10/22/24
to libjpeg-t...@googlegroups.com

New users are moderated by default (unfortunately necessary because of spambots), so that's why your initial message did not appear to go through.  (Presumably that's why you made a duplicate post.)

jpegtran is working as designed.  It's just that the transform you are requesting is "imperfect."

To explain what an imperfect transform is, I first need to roughly explain the stages of JPEG compression:

1. Color Conversion: The packed source pixels are typically converted from RGB to YCbCr, which allows the "luminance" (brightness) to be separated from the "chrominance" (color.)  The Y, Cb, and Cr components are organized into component planes.

2. Chrominance Subsampling (AKA Downsampling): The chrominance (Cb, Cr) components are optionally "subsampled", which typically involves discarding every other component in either the horizontal or vertical direction or both.  (The human eye is more sensitive to spatial changes in brightness than spatial changes in color, so with a photograph or other image content that has gradual variations in color, you can discard 1/2 or even 3/4 of the color data without much if any perceptual quality loss.)  If necessary, the chrominance component planes are padded to the nearest multiple of 8 components in both directions, and the luminance component plane is padded to the nearest multiple of 8 * the horizontal or vertical subsampling factor (for instance, 8x8 in the case of no subsampling, 16x8 in the case of 2x1 chrominance subsampling, and 16x16 in the case of 2x2 chrominance subsampling.)  The padded area at the right of the plane is filled with the right-most component, and the padded area at the bottom of the plane is filled with the bottom-most component.

3. Forward DCT: Each 8x8 block of each component plane is processed with the discrete cosine transform (DCT) algorithm to produce an 8x8 block of DCT coefficients.  Within an 8x8 DCT coefficient block, the lowest-frequency coefficient (the "DC" coefficient) from the corresponding 8x8 component block is stored at the upper left, and the frequencies increase in a zigzag pattern toward the highest-frequency coefficient at the lower right.  The minimum set of DCT coefficients that includes an 8x8 coefficient block from all components is called an "iMCU" (interleaved minimum coded unit.)  For example, an iMCU in an image that uses 2x2 (AKA "4:2:0") subsampling contains four 8x8 coefficient blocks from the luminance plane and one 8x8 coefficient block from each chrominance plane.  If the image width or height isn't evenly divisible by the iMCU width or height, then the iMCUs at the right or bottom of the image are "partial" iMCUs, i.e. they don't correspond to full 8x8 blocks of components in the source image.

4. Quantization: The highest-frequency coefficients from each 8x8 coefficient block are removed, depending on the JPEG quality level.  (Quality 100 removes no coefficients.  Quality 0 removes all but the DC coefficient.)  This and chrominance subsampling are the two "lossy" parts of JPEG encoding.  (If you create a JPEG image with Quality 100 and no subsampling, then the only loss comes from round-off error.)  The idea is that, with a photograph or other image content that has gradual variations in color, you can discard the highest frequencies without much if any perceptual quality loss.  (MP3 compression is based on the same assumption with respect to audio.)

5. Entropy Encoding: The purpose of all of the aforementioned reorganization and conversion of data is to create long runs of zeroes that can compress well with a lossless codec, so the last step is losslessly compressing the quantized DCT coefficients using either Huffman coding or arithmetic coding.

jpegtran works by performing entropy decoding on a JPEG image, rearranging the DCT coefficients, then performing entropy re-encoding on the rearranged DCT coefficients to produce a transformed JPEG image.  However, because the DCT coefficients are organized into 8x8 blocks based on frequency, there are limits to how they can be rearranged.  If you have a full iMCU, then you can always transform the corresponding pixels/components by changing the order of the DCT coefficients.  However, you can't always do that with a partial iMCU, because the components at the right or bottom of the corresponding component block are just padding.  In general, if the image width or height isn't evenly divisible by the iMCU width or height, then some transform operations are "imperfect", i.e. they cannot transform the partial iMCUs in the image:

- Horizontal flipping, transverse transposition, 180-degree rotation, and 270-degree rotation are imperfect if the image contains any partial iMCUs along its right edge.

- Vertical flipping, transverse transposition, 90-degree rotation, and 180-degree rotation are imperfect if the image contains any partial iMCUs along its bottom edge.

- Regular (non-transverse) transposition is always perfect.

To put this another way, if a coefficient block comes from a component block that was padded on the right side, then you can't transform the block in such a way that the right side would become the left or top side.  If a coefficient block comes from a component block that was padded on the bottom side, then you can't transform it in such a way that the bottom side would become the left or top side.

The test image is 236x236 and uses 4:2:0 (2x2) subsampling, so the iMCU size is 16x16.  Since 236 isn't evenly divisible by 16, both the right and bottom edges contain partial iMCUs.  By default, jpegtran leaves partial iMCUs in place.  You can optionally discard partial iMCUs with the -trim option, but that's probably not what you want either.  Unfortunately, there is no other solution except to decompress the image and transform it in the spatial domain (which would incur generational loss if you recompressed it into a JPEG image.)  Reorganizing DCT coefficients in order to losslessly transform a JPEG image in the frequency domain is a neat trick that Tom Lane came up with, but per above, it is limited by the structure of the JPEG format.

BTW, all of this is documented in the jpegtran man page, as well as in usage.txt and wizard.txt.  Those are the official and vetted sources of documentation regarding this.  My summary above is quick & dirty and based on my own memory.  Thus, it may contain errors and should not be considered official or canonical documentation.

DRC

--
You received this message because you are subscribed to the Google Groups "libjpeg-turbo User Discussion/Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to libjpeg-turbo-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/libjpeg-turbo-users/25cbb128-38da-45e5-b45d-c488b45f8f4fn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages