Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Lossless & Non-Degrading Lossing DCT-Based Coding

17 views
Skip to first unread message

Mark

unread,
May 23, 2012, 1:33:43 PM5/23/12
to
From 2011 June 11
(http://groups.google.com/group/comp.compression/msg/c4d559fd5896f43c?
hl=en&dmode=source)

I meant to posted this response in its entirety last year, but didn't
get around to it. It includes some extra stuff about the music scale
and its relation to the DCT coefficients, the patent covering what I
mentioned, a novel typology of different kinds of "losslessness", the
relation of the DCT-8 to the Hopf fibration of the 7-sphere and an
interesting integer Haar wavelet transform.

From me:
> A patent search (and IEEE journal search on "Integer DCT") takes the
> winds out of the sails of all these developments, big time. This was
> going on in the early Millennial decade [1]. Nowadays, the IEEE
> Journals are filled with 3-D modelling papers and the like. Probably
> something driven by MPEG-4 and MPEG-7 (and to a lesser degree
> JPEG-2000).

From glen herrmannsfeldt:
> If you want an interesting case, search for "Simplified Integer Cosine Transform."
> It was developed for the case where the forward transform had to
> be done on a slow processor without a multiplier. (Specifically,
> the RCA CDP1802, in 1994.)
>
> The inverse transform could be done on a very fast processor.
>
> This ICT optimized for the minimum number if '1' bits in the
> coefficients to get the appropriate compression.
>
> -- glen

Interestingly, I found the following formulae which is suitable for
2x2 Haar wavelet transforms:

A = (a + b + c + d - e) div 2
B = (a - b + c - d + e) div 2
C = (a + b - c - d + e) div 2
D = (a - b - c - d + e) div 2
where
e = (a + b + c + d) mod 2.

The transform is its own inverse, with the roles (A, B, C, D) <-> (a,
b, c, d) swapped, and it involves no loss of precision.

Each iteration works on 2x2 blocks and proceeds by extracting out all
the B, C, D residuals, replacing each 2x2 blocks by its A-residual
(thus halving the resolution of the 2-D array) and reiterating the
process on it.

The patent I mentioned is US patent 6990506. This has coefficients for
DCT-4, DCT-8 and DCT-16; the DCT-8 coefficients being the ones I
listed (17, 24, 23, 20, 17, 12, 7, 6). For DCT-4, they list the
following sets: (13, 17, 13, 7), (17, 23, 17, 7) and (25, 17, 25, 31).
For DCT-16, they list (17, 22, 24, 28, 23, 12, 20, 20, 17, 12, 12, 16,
7, 8, 6, 6).

Regarding the other issues in my previous replies from last year:

I don't know if I posted this response in its entirety last year, but,
if so, I don't recall it included the extra stuff about the music
scale and its relation to the DCT coefficients. I also mention a novel
typology of different kinds of "losslessness" and the relation of
DCT-8 to the Hopf fibration of the 7-sphere.

Thomas Richter <t...@math.tu-berlin.de>:
> That's just the same transform applied twice, no problem. Possibly there is still some
> optimization potential for this, but the naive tensor product might do.

The optimizations are significant. The JPEG book I mentioned before
(JPEG: Still Image and Compression Standard; William B. Pennebaker and
Joan L. Mitchell) describes one method in detail.

There may be enough in our discussion to be suitable for publication,
regardless of the fact that "Integer DCT" is an old idea. The
important issue being raised here is the one in the subject header:
"*Non-Degrading* Lossing". More generally, the novel concept being
raised here is the classification of coding-decoding methods into 4
levels of "lossing" as follows:

(1) bijections ("lossless"), where the encoder E and decoder D satisfy
the identities ED = 1, DE = 1

(2) retractions ("non-degrading lossing"), where one has ED = 1, but
not DE = 1. Then DE becomes a one-time only "lossing" projection. Note
the idempotency property (DE)(DE) = D(ED)E = D1E = DE follows.

(3) generalized inverses ("non-drifting lossing"), where DED = D and
EDE = E. Then both DE and ED are projections; DE is the "lossing"
projection, and ED is the "degrading" projection.

(4) the general case ("drifting lossing"), where no special relation
exists between D and E. A further classification might be obtained by
requiring (DE)^{n+1} D = (DE)^n D and (ED)^{n+1} D = (ED)^n D for some
fixed n = 2, 3, 4, ...

> Thanks for posting a solution here (that is, a scaled DCT, but this is perfect). I afraid the option to re-define the specs isn't available, i.e. the integer solution has to fit to the DCT of the reference implementation.

There's always an option with replacing or changing "standards". Even
you mentioned "-2" in "JPEG-2"; nothing stays the same. More
generally, the entire industry of standards definitions is in a major
upswing over the past 10 years (just look at how many sections of
MPEG-4 there are!), and new standards come out on a regular basis now.

Even in such "established" standards, like MPEG, one has room for
development in the following ways (a) extensions like those seen in
JPEG-2 are allowed for because there always exist marker segments and
packets that are "reserved for future extensions", (b) standards like
MPEG-4 admit other standards, like Dirac (a wavelet based video coding
standard) to be registered for inclusion.

With a real-number based transform like DCT, one *might* be able to
accomplish conditions (2) or (3) with fixed point arithmetic, although
this does not seem to be the goal actually being stated in the
references you mentioned.

With an orthogonal integer transform one has a greater range of
freedom to achieve not just the conditions (2) and (3), but even the
condition (1). This brings us in line with the situation with wavelet
transforms, such as that used in JPEG-2000 and in Dirac.

It is the possibility of achieving an EXACT form of (2) or (3) with
integer arithmetic that would make the idea of new standards not only
possible but even necessary.

The special properties of the 7-sphere S_7 come to mind because of the
so-called "Hopf fibration" on S_7. This consists of a projection of
H1: S_7 -> S_3 and H2: S_3 -> S_1; S_3 being the 3-sphere (a.k.a. the
"hypersphere" or the space of "unit quaternions" or, equivalently, the
underlying space of the Lie group SU(2)) and S_1 the 1-sphere (circle,
or equivalently the underlying space of the Lie group U(1)). The
inverse image H1^{-1}(x) for each point x in S_3 is the "fiber" of x:
a copy of the 4-sphere S_4. The inverse image H2^{-1}(x), for each x
in S_1, is a copy of the 2-sphere S_2. In effect this is a reduction
S_7 -> S_3 x S_4 and S_3 -> S_1 x S_2. Correspondingly, one has the
decomposition of the DCT coefficients {G,F,E,D,C,B,A} -> {G,E,C,A} +
{B,D,F} and {B,D,F} -> {B,F} + {D}.

So, I suspect there's something going on with the special features of
the 7-sphere.

An orthogonal transform in 8-D is just a listing of 8 mutually
orthogonal vectors on the 7-sphere, the 7-sphere being taken as a
sphere of constant radius in 8-dimensions. Given set of 8 integer
vectors mutually orthogonal on a 7-sphere, you can always rotate them
so that one of the vectors is your DC vector {D,D,D,D,D,D,D,D} for
some scaling factor D. That's the only important criterion required in
coming up with an integer orthogonal transform -- that one of the
vectors be the DC vector.

The rotation can be done in such a way that rational numbers remain
rational. For instance, in 2-D a rotation (A,B) -> (cA + sB, cB - sA)
can be done such that cA + sB = cB - sA, or c(B-A) = s(A+B) which has
integer solutions. Since c^2 + s^2 need not be 1, this generally means
the rotation is onto a sphere of different radius (i.e. different
scaling factor).

For orthogonal transforms, the Lie group is O(8) has 28 degrees of
freedom. After rotating one of the vectors into the DC position, that
leaves you with 21 degrees of freedom left over to rotate the other 7
vectors. One should be able to use the extra degrees of freedom to get
the vectors over into the "standard positions" I laid out in my first
article; namely that one of them be in the position {D,-D,-D,D,D,-D,-
D,D}, another in the position {F,B,-B,-F,-F,-B,B,F} where B^2 + F^2 =
2D^2, a third in the position {B,-F,F,-B,-B,F,-F,B}, and the remaining
4 vectors in the positions I described before.

So, there may be no actual loss of generality in considering the
pattern of coefficients I described originally; other than a change in
scaling factors.

>> And thank you for your references -- that was (in fact) the implicit question in the article.
>
> No problem, I also kept looking, but haven't had the time to go into all the math yet.

I started to look at the second reference. They're using good old
fashioned Hungarian combinatorics for part of their paper. In fact,
what they show in the Appendix might already be covered by what's
known as Halmos's Theorem and they might have done nothing more than
re-state and re-prove that result.

I also started to look at the first paper. All I have are the PDF's
(barely readible); I can't access IEEE publications.

Making this consistent with the notation I devised A, ..., G = cos(n
pi/16), n = 7, ..., 1; one can actually write down a rational
approximation:
11707 * 1/2 (A, B, C, D, E, F, G) = (1142, 2240, 3252, 4139, 4867,
5407, 5741) + (-.038800, +.037471, +.030359, +.049544, +.007376, -.
071156, +.026639)
with an L^2 Difference of 0.110059 and a L^infinty Difference of
0.071156.

This is actually the ONLY rationalization with L^infinity under 0.1
whose divisor is 16-bit or less.

They don't actually use that anywhere, but it may be useful as a basis
for comparison.

They use the following inverse DCT diagram, written in equational form
as:
(F0', F2', F4', F6') = (A0 F0, A2 F2, A4 F4, A6 F6)
(F1', F3', F5', F7') = (A1 F1, A3 F3, A5 F5, A7 F7)
(a0, b0) = (F0' + F4', F0' - F4'),
(a1, b1) = (F1' + F7', F1' - F7'),
(a2, b2) = (beta1 (F2' + F6'), alpha1 (F0' - F4')),
(a3, b3) = (F5' + F3', F5' - F3'),
(e0, e1) = (alpha2 (b3 + a1), beta2 (b3 - a1)),
e2 = b2 - a2,
(e3, d1) = (delta a3 + gamma b1, delta b1 - gamma a3),
(d0, d2, d3) = (e3 - e1, e0 - e3, e1 + d1),
(c0, c3) = (a0 + a2, a0 - a2),
(c1, c2) = (b0 + e2, b0 - e2),
(f0, f7) = (c0 + d0, c0 - d0),
(f1, f6) = (c1 + d1, c1 - d1),
(f2, f5) = (c2 + d2, c2 - d2),
(f3, f4) = (c3 + d3, c3 - d3).
The requirements for this to match inverse DCT are that
A0 = D = A7,
A2 (beta1, alpha1) = (F, F+B),
A6 (beta1, alpha1) = (B, F-B),
A1 (delta, beta2, gamma, alpha2) = (E, E-A, G-E+A, A-E+G+C),
A7 (delta, beta2, gamma, alpha2) = (C, G-C, G-C-A, A+E+C-G),
A3 (gamma, beta2, delta, alpha2) = (A, C-A, E-C+A, G+C-E-A),
A5 (gamma, beta2, delta, alpha2) = (G, G+E, G+E+C, G+C+E+A).
In turn, this requires the following identities:
F(F-B) = B(F+B) (= D^2),
E(G-C) = C(E-A), E(A+E) = C(C+G),
A(G+E) = G(C-A), A(A+C) = G(G-E),
GE = GC + CA + AE.
The first of these identities -- as the Pythagoreans first discovered
-- has no integer solution since it is equivalent to (1 + B/F)^2 = 2.

Instead, they're only dealing with integer approximations to this
transform, but not exact integer transforms that satisfy all these
identities.

> Hehe. Actually, this is an arithmetic scale, quite unlike in musical scale which is (approximately) geometric. (-;

In general, the mathematical problem is the same. With music theory
the problem is to "rationalize" (i.e. approximate by rational numbers)
the series
{ 440 Hz * e^{kn}: n = 0, ..., 11 } = { A, A#, B, C, C#, D, D#, E, F,
F#, G, G# } where k = log_e(2)/12,
while for the discrete Fourier transforms, the series
{ e^{kn}: n = 0, ..., 7 } = { 1, G + iA, F + iB, E + iC, D + iD, C +
iE, B + iF, A + iG }, where k = i pi/16
is being rationalized.

As the Pythagoreans first discovered, the problem of finding EXACT
rational solutions is unsolvable, which forces a retreat to the lesser
problem of finding rational approximations.

In both cases, we're dealing with the problem of trying to convert a
logarithmic scale into something that works on a rational grid.

In a similar way, just as there are multiple integer solutions for the
orthogonal transform (like the ones I posted), one has multiple
rationalizations for the music scale
Unity = 1, m2, M2, m3, M3, P4, Tritone, P5, m6, M6, m7, M7, Octave =
2
in common use, such as the "Just" scale and "Pythagorean" scale:
Ratio Logarithmic Just Pythagorean
Unity 1 1 1
m2 2^{1/12} 16/15 256/243
M2 2^{1/6} 9/8 9/8
m3 2^{1/4} 6/5 32/27
M3 2^{1/3} 5/4 81/64
P4 2^{5/12} 4/3 4/3
Tritone 2^{1/2} 64/45 729/512
P5 2^{7/12} 3/2 3/2
m6 2^{2/3} 8/5 128/81
M6 2^{3/4} 5/3 27/16
m7 2^{5/6} 7/4 16/9
M7 2^{11/12} 15/8 243/128
Octave 2 2 2

Thomas Richter

unread,
May 24, 2012, 3:58:36 AM5/24/12
to
Hi Mark,

> I meant to posted this response in its entirety last year, but didn't
> get around to it. It includes some extra stuff about the music scale
> and its relation to the DCT coefficients, the patent covering what I
> mentioned, a novel typology of different kinds of "losslessness", the
> relation of the DCT-8 to the Hopf fibration of the 7-sphere and an
> interesting integer Haar wavelet transform.

> Interestingly, I found the following formulae which is suitable for
> 2x2 Haar wavelet transforms:
>
> A = (a + b + c + d - e) div 2
> B = (a - b + c - d + e) div 2
> C = (a + b - c - d + e) div 2
> D = (a - b - c - d + e) div 2
> where
> e = (a + b + c + d) mod 2.
>
> The transform is its own inverse, with the roles (A, B, C, D)<-> (a,
> b, c, d) swapped, and it involves no loss of precision.

Though this looks to me like the well-known Walsh-Hadamar transform, or
a variation thereof.

> The patent I mentioned is US patent 6990506. This has coefficients for
> DCT-4, DCT-8 and DCT-16; the DCT-8 coefficients being the ones I
> listed (17, 24, 23, 20, 17, 12, 7, 6). For DCT-4, they list the
> following sets: (13, 17, 13, 7), (17, 23, 17, 7) and (25, 17, 25, 31).
> For DCT-16, they list (17, 22, 24, 28, 23, 12, 20, 20, 17, 12, 12, 16,
> 7, 8, 6, 6).

Thanks, though I believe you then need to include a scaling stage, i.e.
you need to absorb some of the missing factors into the quantization.
This is suitable for idempotent transformations (i.e. for the
transformation S there is a T such that T S is a projection, i.e. there
is a pseudo-inverse), but what I need is a bijection, i.e. a precise
integer inverse.

> There may be enough in our discussion to be suitable for publication,
> regardless of the fact that "Integer DCT" is an old idea. The
> important issue being raised here is the one in the subject header:
> "*Non-Degrading* Lossing". More generally, the novel concept being
> raised here is the classification of coding-decoding methods into 4
> levels of "lossing" as follows:
>
> (1) bijections ("lossless"), where the encoder E and decoder D satisfy
> the identities ED = 1, DE = 1

Exactly, and this is what I'm looking for.

> (2) retractions ("non-degrading lossing"), where one has ED = 1, but
> not DE = 1. Then DE becomes a one-time only "lossing" projection. Note
> the idempotency property (DE)(DE) = D(ED)E = D1E = DE follows.

This is what I call an idempotent transformation, see above. This is
nice, but not quite what I'm looking for. Actually, if you have a
bijection and a quantization operation, then you trivially always have
an idempotent transformation (in the above sense) since quantization is
a projection.

JPEG XR without color transformation is idempotent in this sense for a
non-trivial quantization - which was one of the selling arguments.

> (4) the general case ("drifting lossing"), where no special relation
> exists between D and E. A further classification might be obtained by
> requiring (DE)^{n+1} D = (DE)^n D and (ED)^{n+1} D = (ED)^n D for some
> fixed n = 2, 3, 4, ...

Which is what we currently have in JPEG.

> There's always an option with replacing or changing "standards". Even
> you mentioned "-2" in "JPEG-2"; nothing stays the same. More
> generally, the entire industry of standards definitions is in a major
> upswing over the past 10 years (just look at how many sections of
> MPEG-4 there are!), and new standards come out on a regular basis now.

Let me state again the precise requirements: The requirements are that
I'm looking for a transformation S that is a bijection such that the
error to the DCT is minimal. I.e., I'm looking for backwards and
forwards compatibility: An image encoded in the new transformation shall
be reconstructable with old codecs (though not without loss, but with
minimal loss), and an image encoded with the old standard shall be
reconstructable with the new codec (though also with loss, though
minimal additional loss when replacing the IDCT with an integer DCT
approximation).

But anyhow, thanks for looking into this, I also did some research on
this and there is a work from around 2000 from the Peking University(?)
on so-called SERMS, a matrix factorization that lifts a generic matrix
into products of elementary row matrices (thus the name SERM = simple
elementary row matrices). They also applied this idea to the DCT and
came up with a lifting scheme for the DCT that is bijective and has a
very minor loss compared to the common DCT in JPEG. (depending on the
image >40dB).

Whether the JPEG picks this up I do not know yet, i.e. what I would need
is a strong requirement statement. IOW, if anyone reading this is
interested in such a JPEG extension, and you see some perspective in
actively marketing such a solution, I would be more than happy to send
you a questionnaire for the demand analysis for collecting feedback. The
more feedback I get from this the more likely it is that we (=the JPEG)
picks it up. Needless to say, I'm not the JPEG, just one of the members,
so I cannot make the decision just by myself.

There is also an implementation of all this, i.e. a verification
software. It is not yet quite clear whether or in which way I can make
this available.

I also implemented some other mechanisms for lossless DCT coding, one of
them is the idea to code the coding residuals in a side-channel. As
trivial as this idea may seem, it compresses better than the SERMs
approach for lossless, approximately as good as PNG, but is fully
backwards compatible to current 10918 codecs. The actual SERMs
transformation has the drawback that it is quite complex (i.e. the
number of multiplications is quite high compared to the known fixpoint
algorithms), a drawback that is avoided by the residual coder. Though
the latter of course also adds complexity, so in the end SERMs is still
faster.

> Even in such "established" standards, like MPEG, one has room for
> development in the following ways (a) extensions like those seen in
> JPEG-2 are allowed for because there always exist marker segments and
> packets that are "reserved for future extensions", (b) standards like
> MPEG-4 admit other standards, like Dirac (a wavelet based video coding
> standard) to be registered for inclusion.

Well, you can always do that. In fact, JPEG 2000 includes JPEG as
substandard (in 15444-2, part-2 extensions), and now - as I just
finished the write-up - also 29199-2, i.e. JPEG XR. However, while this
is certainly interesting from a "standards design perspective", it is
unrealistic that we'll get a JPEG codec in the field that reads 15444-2
file formats just to render a 10918 codestream. In that sense, I very
much propose the idea of forwards and backwards compatibility.
Everything else will likely not be accepted by the market. Look for
example at 10918-3 (the JPEG file format, SPIFF, plus other extensions):
It was incompatible which what was already there (JFIF) and thus became
nothing than a historic side remark.

Anyhow, if anyone is interested in this, I can provide a presentation I
made on this in the last JPEG meeting in San Jose, and I would also be
more than happy to provide the questionnaire for collecting feedback.

> With an orthogonal integer transform one has a greater range of
> freedom to achieve not just the conditions (2) and (3), but even the
> condition (1). This brings us in line with the situation with wavelet
> transforms, such as that used in JPEG-2000 and in Dirac.
>
> It is the possibility of achieving an EXACT form of (2) or (3) with
> integer arithmetic that would make the idea of new standards not only
> possible but even necessary.

As said, I only care about (1) in the moment.

> The special properties of the 7-sphere S_7 come to mind because of the
> so-called "Hopf fibration" on S_7. This consists of a projection of
> H1: S_7 -> S_3 and H2: S_3 -> S_1; S_3 being the 3-sphere (a.k.a. the
> "hypersphere" or the space of "unit quaternions" or, equivalently, the
> underlying space of the Lie group SU(2)) and S_1 the 1-sphere (circle,
> or equivalently the underlying space of the Lie group U(1)). The
> inverse image H1^{-1}(x) for each point x in S_3 is the "fiber" of x:
> a copy of the 4-sphere S_4. The inverse image H2^{-1}(x), for each x
> in S_1, is a copy of the 2-sphere S_2. In effect this is a reduction
> S_7 -> S_3 x S_4 and S_3 -> S_1 x S_2. Correspondingly, one has the
> decomposition of the DCT coefficients {G,F,E,D,C,B,A} -> {G,E,C,A} +
> {B,D,F} and {B,D,F} -> {B,F} + {D}.

You know what? Now that you mention it, I remember. I did a lot of
differential geometry quite a while back, though I'm probably a bit
rusty. I'll look this up - I'm not sure in how far it will help me with
my concrete problem at hand (for which I have one solution, as stated),
but it still looks interesting.

> An orthogonal transform in 8-D is just a listing of 8 mutually
> orthogonal vectors on the 7-sphere, the 7-sphere being taken as a
> sphere of constant radius in 8-dimensions. Given set of 8 integer
> vectors mutually orthogonal on a 7-sphere, you can always rotate them
> so that one of the vectors is your DC vector {D,D,D,D,D,D,D,D} for
> some scaling factor D. That's the only important criterion required in
> coming up with an integer orthogonal transform -- that one of the
> vectors be the DC vector.

As far as a general idempotent DCT is concerned, yes, but my solution
space is much more confined, see above. Not only do I need to map the DC
vector to the DC vector, but I also need bijection (not just
idempotency) and an approximation of all the other vectors as well.

> The rotation can be done in such a way that rational numbers remain
> rational. For instance, in 2-D a rotation (A,B) -> (cA + sB, cB - sA)
> can be done such that cA + sB = cB - sA, or c(B-A) = s(A+B) which has
> integer solutions. Since c^2 + s^2 need not be 1, this generally means
> the rotation is onto a sphere of different radius (i.e. different
> scaling factor).

Exactly, but this means that you do not have a bijection anymore. It is
acceptable for lossy transformations and satisfying from a mathematical
perspective.

> I started to look at the second reference. They're using good old
> fashioned Hungarian combinatorics for part of their paper. In fact,
> what they show in the Appendix might already be covered by what's
> known as Halmos's Theorem and they might have done nothing more than
> re-state and re-prove that result.

Halmos's theorem says nothing to me. Do you have a reference for that?

>> Hehe. Actually, this is an arithmetic scale, quite unlike in musical scale which is (approximately) geometric. (-;
>
> In general, the mathematical problem is the same. With music theory
> the problem is to "rationalize" (i.e. approximate by rational numbers)
> the series
> { 440 Hz * e^{kn}: n = 0, ..., 11 } = { A, A#, B, C, C#, D, D#, E, F,
> F#, G, G# } where k = log_e(2)/12,
> while for the discrete Fourier transforms, the series
> { e^{kn}: n = 0, ..., 7 } = { 1, G + iA, F + iB, E + iC, D + iD, C +
> iE, B + iF, A + iG }, where k = i pi/16
> is being rationalized.

Indeed, yes. (-;

So thanks for sharing - I've been busy with this as well, as you see,
and I hope this is moving forward a bit. Unfortunately, it does not only
depend on me.

Greetings,
Thomas
0 new messages