From 2011 June 11
(
http://groups.google.com/group/comp.compression/msg/c4d559fd5896f43c?
hl=en&dmode=source)
I meant to posted this response in its entirety last year, but didn't
get around to it. It includes some extra stuff about the music scale
and its relation to the DCT coefficients, the patent covering what I
mentioned, a novel typology of different kinds of "losslessness", the
relation of the DCT-8 to the Hopf fibration of the 7-sphere and an
interesting integer Haar wavelet transform.
From me:
> A patent search (and IEEE journal search on "Integer DCT") takes the
> winds out of the sails of all these developments, big time. This was
> going on in the early Millennial decade [1]. Nowadays, the IEEE
> Journals are filled with 3-D modelling papers and the like. Probably
> something driven by MPEG-4 and MPEG-7 (and to a lesser degree
> JPEG-2000).
From glen herrmannsfeldt:
> If you want an interesting case, search for "Simplified Integer Cosine Transform."
> It was developed for the case where the forward transform had to
> be done on a slow processor without a multiplier. (Specifically,
> the RCA CDP1802, in 1994.)
>
> The inverse transform could be done on a very fast processor.
>
> This ICT optimized for the minimum number if '1' bits in the
> coefficients to get the appropriate compression.
>
> -- glen
Interestingly, I found the following formulae which is suitable for
2x2 Haar wavelet transforms:
A = (a + b + c + d - e) div 2
B = (a - b + c - d + e) div 2
C = (a + b - c - d + e) div 2
D = (a - b - c - d + e) div 2
where
e = (a + b + c + d) mod 2.
The transform is its own inverse, with the roles (A, B, C, D) <-> (a,
b, c, d) swapped, and it involves no loss of precision.
Each iteration works on 2x2 blocks and proceeds by extracting out all
the B, C, D residuals, replacing each 2x2 blocks by its A-residual
(thus halving the resolution of the 2-D array) and reiterating the
process on it.
The patent I mentioned is US patent 6990506. This has coefficients for
DCT-4, DCT-8 and DCT-16; the DCT-8 coefficients being the ones I
listed (17, 24, 23, 20, 17, 12, 7, 6). For DCT-4, they list the
following sets: (13, 17, 13, 7), (17, 23, 17, 7) and (25, 17, 25, 31).
For DCT-16, they list (17, 22, 24, 28, 23, 12, 20, 20, 17, 12, 12, 16,
7, 8, 6, 6).
Regarding the other issues in my previous replies from last year:
I don't know if I posted this response in its entirety last year, but,
if so, I don't recall it included the extra stuff about the music
scale and its relation to the DCT coefficients. I also mention a novel
typology of different kinds of "losslessness" and the relation of
DCT-8 to the Hopf fibration of the 7-sphere.
Thomas Richter <
t...@math.tu-berlin.de>:
> That's just the same transform applied twice, no problem. Possibly there is still some
> optimization potential for this, but the naive tensor product might do.
The optimizations are significant. The JPEG book I mentioned before
(JPEG: Still Image and Compression Standard; William B. Pennebaker and
Joan L. Mitchell) describes one method in detail.
There may be enough in our discussion to be suitable for publication,
regardless of the fact that "Integer DCT" is an old idea. The
important issue being raised here is the one in the subject header:
"*Non-Degrading* Lossing". More generally, the novel concept being
raised here is the classification of coding-decoding methods into 4
levels of "lossing" as follows:
(1) bijections ("lossless"), where the encoder E and decoder D satisfy
the identities ED = 1, DE = 1
(2) retractions ("non-degrading lossing"), where one has ED = 1, but
not DE = 1. Then DE becomes a one-time only "lossing" projection. Note
the idempotency property (DE)(DE) = D(ED)E = D1E = DE follows.
(3) generalized inverses ("non-drifting lossing"), where DED = D and
EDE = E. Then both DE and ED are projections; DE is the "lossing"
projection, and ED is the "degrading" projection.
(4) the general case ("drifting lossing"), where no special relation
exists between D and E. A further classification might be obtained by
requiring (DE)^{n+1} D = (DE)^n D and (ED)^{n+1} D = (ED)^n D for some
fixed n = 2, 3, 4, ...
> Thanks for posting a solution here (that is, a scaled DCT, but this is perfect). I afraid the option to re-define the specs isn't available, i.e. the integer solution has to fit to the DCT of the reference implementation.
There's always an option with replacing or changing "standards". Even
you mentioned "-2" in "JPEG-2"; nothing stays the same. More
generally, the entire industry of standards definitions is in a major
upswing over the past 10 years (just look at how many sections of
MPEG-4 there are!), and new standards come out on a regular basis now.
Even in such "established" standards, like MPEG, one has room for
development in the following ways (a) extensions like those seen in
JPEG-2 are allowed for because there always exist marker segments and
packets that are "reserved for future extensions", (b) standards like
MPEG-4 admit other standards, like Dirac (a wavelet based video coding
standard) to be registered for inclusion.
With a real-number based transform like DCT, one *might* be able to
accomplish conditions (2) or (3) with fixed point arithmetic, although
this does not seem to be the goal actually being stated in the
references you mentioned.
With an orthogonal integer transform one has a greater range of
freedom to achieve not just the conditions (2) and (3), but even the
condition (1). This brings us in line with the situation with wavelet
transforms, such as that used in JPEG-2000 and in Dirac.
It is the possibility of achieving an EXACT form of (2) or (3) with
integer arithmetic that would make the idea of new standards not only
possible but even necessary.
The special properties of the 7-sphere S_7 come to mind because of the
so-called "Hopf fibration" on S_7. This consists of a projection of
H1: S_7 -> S_3 and H2: S_3 -> S_1; S_3 being the 3-sphere (a.k.a. the
"hypersphere" or the space of "unit quaternions" or, equivalently, the
underlying space of the Lie group SU(2)) and S_1 the 1-sphere (circle,
or equivalently the underlying space of the Lie group U(1)). The
inverse image H1^{-1}(x) for each point x in S_3 is the "fiber" of x:
a copy of the 4-sphere S_4. The inverse image H2^{-1}(x), for each x
in S_1, is a copy of the 2-sphere S_2. In effect this is a reduction
S_7 -> S_3 x S_4 and S_3 -> S_1 x S_2. Correspondingly, one has the
decomposition of the DCT coefficients {G,F,E,D,C,B,A} -> {G,E,C,A} +
{B,D,F} and {B,D,F} -> {B,F} + {D}.
So, I suspect there's something going on with the special features of
the 7-sphere.
An orthogonal transform in 8-D is just a listing of 8 mutually
orthogonal vectors on the 7-sphere, the 7-sphere being taken as a
sphere of constant radius in 8-dimensions. Given set of 8 integer
vectors mutually orthogonal on a 7-sphere, you can always rotate them
so that one of the vectors is your DC vector {D,D,D,D,D,D,D,D} for
some scaling factor D. That's the only important criterion required in
coming up with an integer orthogonal transform -- that one of the
vectors be the DC vector.
The rotation can be done in such a way that rational numbers remain
rational. For instance, in 2-D a rotation (A,B) -> (cA + sB, cB - sA)
can be done such that cA + sB = cB - sA, or c(B-A) = s(A+B) which has
integer solutions. Since c^2 + s^2 need not be 1, this generally means
the rotation is onto a sphere of different radius (i.e. different
scaling factor).
For orthogonal transforms, the Lie group is O(8) has 28 degrees of
freedom. After rotating one of the vectors into the DC position, that
leaves you with 21 degrees of freedom left over to rotate the other 7
vectors. One should be able to use the extra degrees of freedom to get
the vectors over into the "standard positions" I laid out in my first
article; namely that one of them be in the position {D,-D,-D,D,D,-D,-
D,D}, another in the position {F,B,-B,-F,-F,-B,B,F} where B^2 + F^2 =
2D^2, a third in the position {B,-F,F,-B,-B,F,-F,B}, and the remaining
4 vectors in the positions I described before.
So, there may be no actual loss of generality in considering the
pattern of coefficients I described originally; other than a change in
scaling factors.
>> And thank you for your references -- that was (in fact) the implicit question in the article.
>
> No problem, I also kept looking, but haven't had the time to go into all the math yet.
I started to look at the second reference. They're using good old
fashioned Hungarian combinatorics for part of their paper. In fact,
what they show in the Appendix might already be covered by what's
known as Halmos's Theorem and they might have done nothing more than
re-state and re-prove that result.
I also started to look at the first paper. All I have are the PDF's
(barely readible); I can't access IEEE publications.
Making this consistent with the notation I devised A, ..., G = cos(n
pi/16), n = 7, ..., 1; one can actually write down a rational
approximation:
11707 * 1/2 (A, B, C, D, E, F, G) = (1142, 2240, 3252, 4139, 4867,
5407, 5741) + (-.038800, +.037471, +.030359, +.049544, +.007376, -.
071156, +.026639)
with an L^2 Difference of 0.110059 and a L^infinty Difference of
0.071156.
This is actually the ONLY rationalization with L^infinity under 0.1
whose divisor is 16-bit or less.
They don't actually use that anywhere, but it may be useful as a basis
for comparison.
They use the following inverse DCT diagram, written in equational form
as:
(F0', F2', F4', F6') = (A0 F0, A2 F2, A4 F4, A6 F6)
(F1', F3', F5', F7') = (A1 F1, A3 F3, A5 F5, A7 F7)
(a0, b0) = (F0' + F4', F0' - F4'),
(a1, b1) = (F1' + F7', F1' - F7'),
(a2, b2) = (beta1 (F2' + F6'), alpha1 (F0' - F4')),
(a3, b3) = (F5' + F3', F5' - F3'),
(e0, e1) = (alpha2 (b3 + a1), beta2 (b3 - a1)),
e2 = b2 - a2,
(e3, d1) = (delta a3 + gamma b1, delta b1 - gamma a3),
(d0, d2, d3) = (e3 - e1, e0 - e3, e1 + d1),
(c0, c3) = (a0 + a2, a0 - a2),
(c1, c2) = (b0 + e2, b0 - e2),
(f0, f7) = (c0 + d0, c0 - d0),
(f1, f6) = (c1 + d1, c1 - d1),
(f2, f5) = (c2 + d2, c2 - d2),
(f3, f4) = (c3 + d3, c3 - d3).
The requirements for this to match inverse DCT are that
A0 = D = A7,
A2 (beta1, alpha1) = (F, F+B),
A6 (beta1, alpha1) = (B, F-B),
A1 (delta, beta2, gamma, alpha2) = (E, E-A, G-E+A, A-E+G+C),
A7 (delta, beta2, gamma, alpha2) = (C, G-C, G-C-A, A+E+C-G),
A3 (gamma, beta2, delta, alpha2) = (A, C-A, E-C+A, G+C-E-A),
A5 (gamma, beta2, delta, alpha2) = (G, G+E, G+E+C, G+C+E+A).
In turn, this requires the following identities:
F(F-B) = B(F+B) (= D^2),
E(G-C) = C(E-A), E(A+E) = C(C+G),
A(G+E) = G(C-A), A(A+C) = G(G-E),
GE = GC + CA + AE.
The first of these identities -- as the Pythagoreans first discovered
-- has no integer solution since it is equivalent to (1 + B/F)^2 = 2.
Instead, they're only dealing with integer approximations to this
transform, but not exact integer transforms that satisfy all these
identities.
> Hehe. Actually, this is an arithmetic scale, quite unlike in musical scale which is (approximately) geometric. (-;
In general, the mathematical problem is the same. With music theory
the problem is to "rationalize" (i.e. approximate by rational numbers)
the series
{ 440 Hz * e^{kn}: n = 0, ..., 11 } = { A, A#, B, C, C#, D, D#, E, F,
F#, G, G# } where k = log_e(2)/12,
while for the discrete Fourier transforms, the series
{ e^{kn}: n = 0, ..., 7 } = { 1, G + iA, F + iB, E + iC, D + iD, C +
iE, B + iF, A + iG }, where k = i pi/16
is being rationalized.
As the Pythagoreans first discovered, the problem of finding EXACT
rational solutions is unsolvable, which forces a retreat to the lesser
problem of finding rational approximations.
In both cases, we're dealing with the problem of trying to convert a
logarithmic scale into something that works on a rational grid.
In a similar way, just as there are multiple integer solutions for the
orthogonal transform (like the ones I posted), one has multiple
rationalizations for the music scale
Unity = 1, m2, M2, m3, M3, P4, Tritone, P5, m6, M6, m7, M7, Octave =
2
in common use, such as the "Just" scale and "Pythagorean" scale:
Ratio Logarithmic Just Pythagorean
Unity 1 1 1
m2 2^{1/12} 16/15 256/243
M2 2^{1/6} 9/8 9/8
m3 2^{1/4} 6/5 32/27
M3 2^{1/3} 5/4 81/64
P4 2^{5/12} 4/3 4/3
Tritone 2^{1/2} 64/45 729/512
P5 2^{7/12} 3/2 3/2
m6 2^{2/3} 8/5 128/81
M6 2^{3/4} 5/3 27/16
m7 2^{5/6} 7/4 16/9
M7 2^{11/12} 15/8 243/128
Octave 2 2 2