libjpeg : storing similar images optimally

Joy Arulraj

unread,

Oct 30, 2012, 11:23:13 PM10/30/12

to

Hi folks,

I am using libjpeg library and trying to figure out how to save *multiple* similar images (like photos of a landscape with similar brightness and colors) together in a more space efficient way. I tried 3 basic approaches :

a) wrote similar image files together in a large file : entire first file first, then entire 2nd file.. into a single big image file. This gave pretty much the same big file size as the sum of all small file sizes.
b) tried merging the first row scanline of all similar images together, then the 2nd row... to a single big file. This approach infact *increased* size of the big file by ~25%
c) Instead of a row, I tried combining blocks of rows from 1st image,2nd image and so on as before. When I used a larger blocking factor, I observed some 5% *reduction* in space. [5 similar 640x480 JPG files, each ~80KB]

I am wondering why in case (b), file size increased, as I expected row data similarity across images will help improve compression ?
And can you give some suggestions to go forward from (c) [like maybe doing something similar across columns or something different] ?

Thanks,
Joy

Thomas Richter

unread,

Oct 31, 2012, 3:27:07 AM10/31/12

to

Am 31.10.2012 04:23, schrieb Joy Arulraj:
> Hi folks,
>
> I am using libjpeg library and trying to figure out how to save *multiple* similar images (like photos of a landscape with similar brightness and colors) together in a more space efficient way. I tried 3 basic approaches :

Please note that there is a libjpeg by the ISO JPEG (the standardization
group) which you find here:

https://github.com/thorfdbg/libjpeg

and which supports *all* of 10918-1, and an IJG jpeg implementation
which supports only a small subset of our standard, and something that
is not even JPEG.

> a) wrote similar image files together in a large file : entire first file first, then entire 2nd file.. into a single big image file. This gave pretty much the same big file size as the sum of all small file sizes.

The only thing you are able to save this way is the overhead of the JPEG
tables (quantization and/or Huffman table). Did you use optimized
Huffman tables or just the default tables?

You also create edges at the last line of every image since you merge
there with the previous image, likely within a block row.

> b) tried merging the first row scanline of all similar images together, then the 2nd row... to a single big file. This approach infact *increased* size of the big file by ~25%

That doesn't work very well because you run now a DCT over image lines
that do not come from the same image, and hence have artificial edges
that create amplitudes that need to be compressed.

> c) Instead of a row, I tried combining blocks of rows from 1st image,2nd image and so on as before. When I used a larger blocking factor, I observed some 5% *reduction* in space. [5 similar 640x480 JPG files, each ~80KB]

That at least avoids the edges of the b) approach as blocks are (except
for the DC coefficient) compressed independently. It helps compared to
a) because you now no longer have the problem of artificial edges in the
last block row of each image.

> I am wondering why in case (b), file size increased, as I expected row data similarity across images will help improve compression ?
> And can you give some suggestions to go forward from (c) [like maybe doing something similar across columns or something different] ?

Not really. JPEG compresses blocks almost independently, so there is
almost no gain by merging images. You can improve compression efficiency
by using a more efficient entropy coder - e.g. optimize Huffman coding
(-h option of the JPEG's libjpeg). Another approach would be to use an
AC coder, but this has the drawback that most 10918-1 implementations
would not understand the file, so that's likely not an option for you.
Or you could use a better codec that compresses better (JPEG 2000 would
be an option). Or you could use a JPEG "post-compressor" like the one
StuffIt provides, or packJPG for an open source implementation. The
latter solutions all of course require you to "unpack" the images
whenever you want to view them.

So long,
Thomas

Robert Wessel

unread,

Oct 31, 2012, 1:48:04 PM10/31/12

to

Wouldn't the video compression techniques be more suited to this
application?

prabh...@gmail.com

unread,

Oct 31, 2012, 2:39:19 PM10/31/12

to

Yes, in some sense, it is related to video compression. But, as far as I understand, unlike a video, in my case, the number of similar frames is smaller and the diff between images can be larger.

Does MJPEG use some interesting trick to store similar frames optimally that I can reuse for my usecase ?

Thanks !

prabh...@gmail.com

unread,

Oct 31, 2012, 2:49:22 PM10/31/12

to

Hi Thomas,

Thanks for the detailed comments ! I now understand the problem of introducing more edges. As you had mentioned, JPEG compresses *blocks* almost independently, so there is almost no gain by merging images.

Will storing *similar* pixels from multiple images within a *single* block give better compression ?

I looked at packJPG and other options you had mentioned. PackJPG especially seemed very interesting. But, these schemes primarily target single-image compression. Its just that I am more interested in leveraging similarity of *multiple* images to optimize their overall storage.

Do you have any suggestions for addressing this multiple-image problem (like rearranging the data smartly in some manner etc.) ?

Best,
Joy

Thomas Richter

unread,

Nov 1, 2012, 5:41:58 AM11/1/12

to

Am 31.10.2012 19:49, schrieb prabh...@gmail.com:

> Will storing *similar* pixels from multiple images within a *single* block give better compression ?

How do you find similar pixels, and how do you encode which pixels
belong to which image?

There is not much to be gained, unless you have additional information
on the image. For example, if the two shots are from the same scene, so
you could use motion compensation.

> Do you have any suggestions for addressing this multiple-image problem (like rearranging the data smartly in some manner etc.) ?

It all depends on how the images are related to each other. If they just
"look similar", I don't believe much can be gained. There have been a
couple of advanced object-oriented image compressions out, it was a hot
research topic more than ten years ago and did not keep the promise of
providing more advanced compression rates compared to the known
transform-based codecs (it just created stranger artifacts).

Transform coding in Z direction (transforming the two images in common -
a very simple example would be coding the difference image) also does
not provide much gain, unless the images are really correlated pixel by
pixel.

Thus, in the end, it boils down to the question of finding a good model
for creating one image from the other. You did not mention how the
images are related, so I cannot really give any advice.

Greetings,
Thomas

Thomas Richter

unread,

Nov 1, 2012, 5:43:39 AM11/1/12

to

Am 31.10.2012 19:39, schrieb prabh...@gmail.com:

> Does MJPEG use some interesting trick to store similar frames optimally that I can reuse for my usecase ?

No, it just compresses the video frame by frame with JPEG.

Joy James Prabhu

unread,

Nov 1, 2012, 3:50:20 PM11/1/12

to

> > Will storing *similar* pixels from multiple images within a *single* block give better compression ?
>
>
>
> How do you find similar pixels, and how do you encode which pixels
>
> belong to which image?
>
>
>
> There is not much to be gained, unless you have additional information
>
> on the image. For example, if the two shots are from the same scene, so
>
> you could use motion compensation.
>

I too agree that merging pixels is not really going to give better compression.
To be clear, by similarity, I meant similarity of the DCT coefficients of YCrCb components or something derived from that. This similarity metric makes atleast some sense for larger subblocks (like 64x64 or 512x512 pixels) of images rather than at pixel granularity, but as JPEG does 8x8 block compression primarily and does not exploit similarity at larger granularity, I guess this will not really be a good fit for JPEG.

For encoding, I was thinking of using some simple mechanism like interleaving image blocks in some custom predefined order or using some hash function (so that no indexing data overhead is incurred.)

I definitely cannot assume images are correlated pixel by pixel. Atmost, I can assume some larger blocks of these images are similar with respect to color and brightness like multiple images of the same building or a group of people. I think other standard compression utilities like libgzip or libtar or packJPG(?) would itself be good fit for this usecase.

But, will rearranging the blocks of these images and then compressing them using these utilities make a difference ? Also, can you suggest some utilities that are suitable for JPG images in particular.

Thanks,
Joy

glen herrmannsfeldt

unread,

Nov 1, 2012, 6:03:01 PM11/1/12

to

The usual versions of MPEG do use frame to frame information.

Not only constant, but they do motion estimation such that if one
part of a scene moves between frames they only need to store which
part moved and where it moved to.

-- glen

Thomas Richter

unread,

Nov 2, 2012, 3:26:51 PM11/2/12

to

Am 01.11.2012 20:50, schrieb Joy James Prabhu:

> I too agree that merging pixels is not really going to give better compression.
> To be clear, by similarity, I meant similarity of the DCT coefficients of YCrCb components or something derived from that. This similarity metric makes atleast some sense for larger subblocks (like 64x64 or 512x512 pixels) of images rather than at pixel granularity, but as JPEG does 8x8 block compression primarily and does not exploit similarity at larger granularity, I guess this will not really be a good fit for JPEG.

The problem is that this doesn't you buy anything for regular JPEG. The
blocks are, as already said, compressed independently from each other,
thus if two blocks look alike is irrelevant for the performance of the
compressor. If N blocks look the same, the rate is simply N times the
rate of a single block - JPEG is not very sophisticated - there is no
adaption or compression across blocks.

> For encoding, I was thinking of using some simple mechanism like interleaving image blocks in some custom predefined order or using some hash function (so that no indexing data overhead is incurred.)

Again, this doesn't buy you anything. The only advantage you can make
use of is the similarity of the DC coefficients - and that is not much.

> I definitely cannot assume images are correlated pixel by pixel. Atmost, I can assume some larger blocks of these images are similar with respect to color and brightness like multiple images of the same building or a group of people. I think other standard compression utilities like libgzip or libtar or
packJPG(?) would itself be good fit for this usecase.

packJPEG will take advantage of this, but it does more than JPEG. The
output is not JPEG compliant, though, but losslessly decodable to JPEG.

> But, will rearranging the blocks of these images and then compressing them using these utilities make a difference ? Also, can you suggest some utilities that are suitable for JPG images in particular.

Yes, it will, for packJPEG for example. zip will probably make a
difference if the blocks are really identical. Tar is not a compressor,
actually, at all.

Greetings,
Thomas

BGB

unread,

Nov 2, 2012, 6:51:03 PM11/2/12

to

yeah, pretty much.

an MJPEG variant which did have something like this is a vaguely
tempting thought though (say, adding block-based motion compensation,
...), but wouldn't really be MJPEG anymore though.

say, several new markers are added, say:
Define Video Frame Header: DVFH, gives video-frame type, and may be
followed by Huffman-coded motion vectors.

or, for example:
SOI, ..., SOF0, DVFH, SOS <image data>, [ <motion vectors> ], EOI

or such...

Thomas Richter

unread,

Nov 2, 2012, 7:16:07 PM11/2/12

to

Am 02.11.2012 23:51, schrieb BGB:

> an MJPEG variant which did have something like this is a vaguely
> tempting thought though (say, adding block-based motion compensation,
> ...), but wouldn't really be MJPEG anymore though.
>
> say, several new markers are added, say:
> Define Video Frame Header: DVFH, gives video-frame type, and may be
> followed by Huffman-coded motion vectors.

Then it is called H.261. Actually, H.261 is pretty much JPEG plus motion
compensation (plus loop filter).

BGB

unread,

Nov 2, 2012, 9:24:36 PM11/2/12

to

skimmed H.261 spec, and it doesn't really look like JPEG to me.
(I don't see any of the usual markers or structures, ...).

it looks more like MPEG, basically with different markers and a
fixed-format structures and codings (no DHT or DQT to be seen,
apparently no user-defined video resolution, ...).

I was thinking of basically just keeping pretty much everything from
JPEG intact (same headers, same marker structure, ...), apart from
adding in an additional header and some extra data.

this would make implementing the codec really easy, and also allow me to
keep my existing alpha-blending and normal-map extensions (basically,
just a minor tweak of my existing codec code).

as well, RCT and RDCT would remain as options (allowing for the
possibility of a lossless video coding as well).

current thinking (say, something like):
FF B0: DVFH, Define Video Frame Header
FF B1: SOMV, Start Of Motion Vectors

if DVFH is absent, the image is interpreted as a raw JPEG image.

so:
FF B0 xx xx (xx = length)
TfTd HmMt ...

Tf = 4-bit frame-type (0=I-Frame, 1=P-Frame, 2-15=Reserved)
Td = 4-bit delta type (0=None, 1=Subtract I-Frame YCbCr, 2-15=Reserved)
Hm = Huffman table for motion vectors.
Mt = Motion Vector Type (0=Default, 1=15=Reserved)

Motion Vectors:
Huffman coded values: ZcBc
Zc = zero count
Bc = bit-count

these are followed by the bits encoding the offset.

values are stored first as all the X offsets, then all the Y offsets,
which are stored as a delta from the prior value, and give the offsets
on a per-block basis (TBD: 8x8 or 16x16 blocks).

X0 dX1 dX2 ... Y0 dY1 dY2 ...

then some logic is basically shoved in between the colorspace transforms
and DCT transforms.

also possible would be using APPn markers, but the issue is that these
images wouldn't decode correctly with a normal JPEG decoder, so there
isn't much to gain from using APPn markers.

or such...

Thomas Richter

unread,

Nov 3, 2012, 6:19:12 AM11/3/12

to

Am 03.11.2012 02:24, schrieb BGB:

> skimmed H.261 spec, and it doesn't really look like JPEG to me.
> (I don't see any of the usual markers or structures, ...).

The codestream syntax is different, but the philosophy is pretty close
to JPEG plus motion compensation. It is a simple DCT based video codec.

> I was thinking of basically just keeping pretty much everything from
> JPEG intact (same headers, same marker structure, ...), apart from
> adding in an additional header and some extra data.

And what would be the use case for that, given that H.261 and an entire
family of considerably more powerful video codecs already exist?

BGB

unread,

Nov 3, 2012, 3:20:18 PM11/3/12

to

On 11/3/2012 5:19 AM, Thomas Richter wrote:
> Am 03.11.2012 02:24, schrieb BGB:
>
>> skimmed H.261 spec, and it doesn't really look like JPEG to me.
>> (I don't see any of the usual markers or structures, ...).
>
> The codestream syntax is different, but the philosophy is pretty close
> to JPEG plus motion compensation. It is a simple DCT based video codec.
>

yes, but it is apparently confined to specific resolutions (CIF and QCIF).

it thus wouldn't allow defining a resolution like, say, 256x256 or 512x512.

or, IOW, something actually half-way useful for streaming video into a
texture.

nevermind that it would only do simple/flat RGB videos, with no layers
or transparency support (this is a problem with most of the "standard"
video codecs FWIW).

most tend to lack support for things like user-defined layers or
metadata, ...

>> I was thinking of basically just keeping pretty much everything from
>> JPEG intact (same headers, same marker structure, ...), apart from
>> adding in an additional header and some extra data.
>
> And what would be the use case for that, given that H.261 and an entire
> family of considerably more powerful video codecs already exist?

mostly for animated textures and similar.

potentially for in-game cutscenes or similar, but this is less certain.

I am also imagining if it were basically being used in a manner more
akin to a Flash-animation, basically with alpha-blended JPEG layers
serving as sprites subject to basic animation-control events.

for example, layers can be set to rotate or scroll, and possibly be
assigned to particular fragment shaders or have blending options set,
... (most of this is controlled by using APP markers to embed
shader-scripts).

in this case, any layer-images defined in a frame would be streamed into
the relevant textures, and any sub-images invoked as sprites within a
frame will be drawn at the indicated position, ...

this is also a reason for having separate "component layers" and
"tag-layers", where a component layer gives something like a normal-map
or bump-map, whereas a tag-layer gives an image which may be invokes as
a tag or sprite.

consider, for example, a person is making a fan:
they might have several layers:
TagLayer,"FanFrame"
CompLayer,"RGB" //RGB layer
CompLayer,"XYZ" //normal map
CompLayer,"DASe" //depth, alpha, specular-exponent
...
TagLayer,"FanBlades"
CompLayer,"RGB"
CompLayer,"XYZ"
CompLayer,"DASe"
...
ShaderInfo (text glob)
$flags alpha
$layer
$tag_image FanBlades
$rotate 0.5 0.5 720 //*1
$blend alpha
$layer
$tag_image FanFrame
$blend alpha

*1: rotate 720 degrees per second relative to image-center

currently, I am using JPEG and MJPEG here (with a somewhat tweaked JPEG
variant).

it is also possible to use a pile of PNGs and text-files driving control
(this is basically the form it is in before feeding it into the tool to
convert it into an MJPEG video, but this conversion results in a notable
reduction in storage space and disk-overhead).

currently, a standard MJPEG codec will only display the base-RGB layer,
which in this case, could probably be used for a "rendered down" version
of the video.

the main reason for adding motion compensation would be hopefully to be
able to make these sequences take up a little less space, and whether or
not it is applied to the Base-RGB layer can left as an open question.

Thomas Richter

unread,

Nov 4, 2012, 7:10:35 AM11/4/12

to

Am 03.11.2012 20:20, schrieb BGB:

> > The codestream syntax is different, but the philosophy is pretty close
> > to JPEG plus motion compensation. It is a simple DCT based video codec.
> >
>
> yes, but it is apparently confined to specific resolutions (CIF and QCIF).

Yes, that goes for many video codecs, but again, nevermind about this.
I'm not talking about this specific codec, but the world of video
compression in general. HEVC, for example, is very flexible in the
resolutions it supports.

My question would really be: Why invest time into that given that many
video codecs already exist and are strong in the market?

> most tend to lack support for things like user-defined layers or
> metadata, ...

HEVC certainly does.

> > And what would be the use case for that, given that H.261 and an entire
> > family of considerably more powerful video codecs already exist?
>
> mostly for animated textures and similar.

First question: Why isn't that covered by with the existing codecs?
Second question: If it is just texture, why a video codec?

> the main reason for adding motion compensation would be hopefully to be
> able to make these sequences take up a little less space, and whether or
> not it is applied to the Base-RGB layer can left as an open question.

What about "use a video codec"? I just don't see the problem you have
with existing technology?

BGB

unread,

Nov 4, 2012, 1:10:42 PM11/4/12

to

On 11/4/2012 6:10 AM, Thomas Richter wrote:
> Am 03.11.2012 20:20, schrieb BGB:
>
>> > The codestream syntax is different, but the philosophy is pretty close
>> > to JPEG plus motion compensation. It is a simple DCT based video
>> codec.
>> >
>>
>> yes, but it is apparently confined to specific resolutions (CIF and
>> QCIF).
>
> Yes, that goes for many video codecs, but again, nevermind about this.
> I'm not talking about this specific codec, but the world of video
> compression in general. HEVC, for example, is very flexible in the
> resolutions it supports.
>
> My question would really be: Why invest time into that given that many
> video codecs already exist and are strong in the market?
>

basically, just hacking something onto JPEG requires writing less code,
and not depending on OS-specific codec mechanisms, and doesn't require
redesigning existing features to fit onto a new bitstream.

JPEG pretty much supports any resolution up to around 65535 x 65535
apparently, but for images this size and larger, it may well be more
efficient to split the image up into tiles or similar.

(meanwhile, my 3D engine doesn't currently support any textures larger
than around 4096x4096 anyways...).

it also potentially allows the same codec code for both still-image and
video compression.

also, as-is, AFAICT there shouldn't be any patent issues with this codec.

>> most tend to lack support for things like user-defined layers or
>> metadata, ...
>
> HEVC certainly does.
>

but, I meant simpler (and not patent-encumbered) ones, like MPEG-1 and
friends.

it is like, originally there was JPEG, and it was fairly straightforward
and generic, albeit maybe a little over-engineered in some areas (hence,
why there was JFIF).

in this case, the imagined codec is basically a JFIF-like subset of
JPEG, but with an optional feature to enable motion vectors and residual
encoding (at the cost of breaking decoder compatibility).

as-is, if residual encoding is used without motion vectors, it would
simply encode a delta of the images (and if residual encoding is not
enabled, it will simply store the frame as a raw image).

basically, as-is, most of the stuff about how the image is encoded
depends on various factors, namely which buffer-arrays and flags were
passed into the encoder, ...

say, buffer arrays:
RGBA;
Normal XYZ;
Luma, Specular, ...

some flags indicate the color-transform and whether or not to use RDCT,
or encode the quality-level for lossy images (the quality level is in
the low 8-bits, and the remaining bits are encoder flags).

probably another flag for "enable residual encoding" could exist
(probably controlled via a command-line option to the "AVI compiler").

things like whether or not to encode an alpha-channel are detected
automatically (based on whether or not all of the alpha values are 255,
in which case it will conclude that no alpha channel is needed).

note that images would probably be delta'ed against other images in the
same layer, like it wouldn't make much sense to delta the normal-map
against the RGB layer or the RGB-layer by the normal map.

in the informal spec written thus far, the video-header adds a "layer
ID" field, which is basically an 8-bit value which indicates which
"layer" contains the relevant I-Frame (or, alternatively, where the
contents of the current I-Frame are supposed to go to). theoretically, I
could use a 16-bit value "just in case", but this would probably be
overkill (and I would likely restrict it to a subset of this range anyways).

any existing videos will be essentially "grandfathered in" in such a
system as well.

>> > And what would be the use case for that, given that H.261 and an
>> entire
>> > family of considerably more powerful video codecs already exist?
>>
>> mostly for animated textures and similar.
>
> First question: Why isn't that covered by with the existing codecs?
> Second question: If it is just texture, why a video codec?
>

because it is an *animated* texture.

an animated texture may have multiple frames which have a fair amount of
nearly-identical areas, or border on being a short video sequence, so
there is something to gain from supporting residual encoding.

granted, since most of these sequences are typically very short, the
existence or absence of motion compensation is not really a deal-breaker.

but, in any case, with a several MB animated texture file, if a person
can shave it down by say, 500kB, this may still be a worthwhile gain.

an animated texture is sort of in the middle ground between a normal
texture-map and a video file, generally because:
it is mapped directly onto OpenGL textures;
it is typically decoded/streamed in real-time into said textures (and
generally looped).

(these textures are then generally/presumably used on pieces of scene
geometry).

likewise, for animated textures, compatibility with other applications
is a lower priority, partly as the programs which would actually be
convenient to be able to interact with (like GIMP or Paint.NET) don't
really understand animated images anyways, and apps which understand
video (like video-editors or media players) have little real use-case
for animated textures (they would only display a small amount of the
information anyways, just looking like a very short video clip).

otherwise, it would require a very specialized video player or editor to
make much use of the data contained in an animated texture anyways (so
basically, its main use-case is more for the original development to be
done using a bunch of PNGs and text-files, and then "compiling" these
into the animated texture file).

theoretically, there could be a case for "decompiling" an animated
texture into a collection of text files and PNGs again, but as-of-yet, I
haven't ran into a use-case for needing to do this.

having a thumbnail could be helpful sometimes, but this isn't really
critical.

basically, it is sort of like the use-case for the id Software RoQ
format, except that this codec would support more layers (unlike RoQ
which was pretty much RGB only, and otherwise very limited).

I didn't use RoQ personally, as I didn't want to have to try to figure
it out enough to write my own decoder for it (the only decoders for it
commonly available are GPL, which is sort of a no-go in my case).

http://www.modwiki.net/wiki/ROQ_%28file_format%29

>> the main reason for adding motion compensation would be hopefully to be
>> able to make these sequences take up a little less space, and whether or
>> not it is applied to the Base-RGB layer can left as an open question.
>
> What about "use a video codec"? I just don't see the problem you have
> with existing technology?
>

the main issue is mostly not wanting to depend on other peoples' code,
or having to significantly change the existing code (because, well,
there is hardly an unlimited amount of time to invest into these things).

I also have a fair amount of code which works on a specific set of
extensions (basically, MJPEG AVIs with a bunch of extension features
into the image).

staying with the same basic technology avoids needing to significantly
alter the existing code (although, enabling the motion compensation may
imply also changing the AVI FOURCC to avoid confusing existing decoders).

(I may or may not consider eventually dropping the AVI containers, more
likely this would be for a more customized container format, possibly a
little more like SWF or similar).

note that the 3D engine also makes heavy use of PNG-based textures, but
these are generally loaded via an alternate code-path (and the use cases
for PNG, standard JPEG, and "BGBTech JPEG", are different...).

admittedly, yeah, some of this may be a bit jurry-rigged, like the code
for decoding and playing back the videos is sort of tangled up with
OpenGL calls, and with the JPEG codec internals, but whatever sometimes...

sometimes, there are things to be said from having lower levels of
abstraction as well.