This leads to the question of whether something similar can actually
be done with DCT-based coding, after all; and (if so) what magic
numbers accomplish the trick.
A lossless approximation to 1-dimensional DCT-baaed coding should be
achievable by using this transformation matrix for the forward
transform:
17 17 17 17 17 17 17 17
24 20 12 6 -6 -12 -20 -24
23 7 -7 -23 -23 -7 7 23
20 -6 -24 -12 12 24 6 -20
17 -17 -17 17 17 -17 -17 17
12 -24 6 20 -20 -6 24 -12
7 -23 23 -7 -7 23 -23 7
6 -12 20 -24 24 -20 12 -6
and its transpose
17 24 23 20 17 12 7 6
17 20 7 -6 -17 -24 -23 -12
17 12 -7 -24 -17 6 23 20
17 6 -23 -12 17 20 -7 -24
17 -6 -23 12 17 -20 -7 24
17 -12 -7 24 -17 -6 23 -20
17 -20 7 6 -17 24 -23 12
17 -24 23 -20 17 -12 7 -6
for the inverse transform, in place of the forward and inverse DCT.
This should also be sufficient to ALSO provide the basis for a *non-
degrading* lossing transform; i.e., a transform where
Dequantization -> Inverse Transform -> Forward Transform ->
Quantization
yields the identity operator.
Both transforms are amenable to factoring. For instance, the forward
transform:
A0 = 17 a0 + 17 a1 + 17 a2 + 17 a3 + 17 a4 + 17 a5 + 17 a6 + 17 a7
A1 = 24 a0 + 20 a1 + 12 a2 + 6 a3 - 6 a4 - 12 a5 - 20 a6 - 24 a7
A2 = 23 a0 + 7 a1 - 7 a2 - 23 a3 - 23 a4 - 7 a5 + 7 a6 + 23 a7
A3 = 20 a0 - 6 a1 - 24 a2 - 12 a3 + 12 a4 + 24 a5 + 6 a6 - 20 a7
A4 = 17 a0 - 17 a1 - 17 a2 + 17 a3 + 17 a4 - 17 a5 - 17 a6 + 17 a7
A5 = 12 a0 - 24 a1 + 6 a2 + 20 a3 - 20 a4 - 6 a5 + 24 a6 - 12 a7
A6 = 7 a0 - 23 a1 + 23 a2 - 7 a3 - 7 a4 + 23 a5 - 23 a6 + 7 a7
A7 = 6 a0 - 12 a1 + 20 a2 - 24 a3 + 24 a4 - 20 a5 + 12 a6 - 6 a7
can be rewritten as
A0 = 17 (s0734 + s1625), A4 = 17 (s0734 - s1625)
A2 = 23 d0734 + 7 d1625, A6 = 7 d0734 - 23 d1625
A1 = 6 (4 d07 + d34) + 4 (5 d16 + 3 d25)
A3 = 4 (5 d07 - 3 d34) - 6 (d16 + 4 d25)
A5 = 4 (3 d07 + 5 d34) - 6 (4 d16 - d25)
A7 = 6 (d07 - 4 d34) - 4 (3 d16 - 5 d25)
where
s07 = a0 + a7, s16 = a1 + a6, s25 = a2 + a5, s34 = a3 + a4,
d07 = a0 - a7, d16 = a1 - a6, d25 = a2 - a5, d34 = a3 - a4,
s0734 = s07 + s34, s1625 = s16 + s25,
d0734 = s07 - s34, d1625 = s16 - s25.
Other more efficients methods may be devised to better handle A1, A3,
A5 and A7.
The matrices are both orthogonal (up to an overall factor); their
product is 2312 times the identity matrix. Their orthogonality (up to
rescaling) is easily verified, since the requirement reduces to the
conditions:
4*17^2 = 2*(23^2 + 7^2) = 24^2 + 20^2 + 12^2 + 6^2 = 2312/2
24*20 = 24*12 + 12*6 + 6*20
which are all integer identities. The matrix approximates 24 times the
DCT matrix, or 48 times the normalized DCT matrix.
It is the ONLY relatively prime integer solution that lies entirely in
the range -100 to 100 whose coefficients have the same pattern and
ordering relation as those of the DCT matrix. The only other
relatively prime integer solutions in the range -500 to 500 satisfying
the ordering property are the solutions with the following two sets of
coefficients
(116, 113, 96, 85, 78, 41, 12, -12, -41, -78, -85, -96, -113, -116)
(120, 113, 100, 85, 60, 41, 30, -30, -41, -60, -85, -100, -113, -120)
in place of
(24, 23, 20, 17, 12, 7, 6, -6, -7, -12, -17, -20, -23, -24)
both solutions having divisors apprxomately equal to 240 + 5/12
relative to the normalized DCT matrix.
There are only 5 other relatively prime solutions in the range -1000
to 1000 satisfying the ordering property. They are the ones whose
coefficients are drawn from the respective sets:
(618, 575, 428, 425, 396, 175, 24, -24, -175, -396, -425, -428, -575,
-618)
(660, 607, 606, 493, 384, 343, 148, -148, -343, -384, -493, -606,
-607, -660)
(696, 607, 580, 493, 348, 343, 174, -174, -343, -348, -493, -580,
-607, -696)
(816, 623, 600, 577, 552, 527, 34, -34, -527, -552, -577, -600, -623,
-816)
(912, 721, 660, 629, 556, 521, 78, -78, -521, -556, -629, -660, -721,
-912)
Their divisors are approximately 1202, 1394 + 5/12, 1394 + 5/12, 1632
and 1779 relative to the normalized DCT matrix.
Denoting the DCT coefficients in order by
1 > A > B > C > D > E > F > G > 0 > g > f > e > d > c > b > a > -1
the DCT matrix can be written compactly in the form
D D D D D D D D
A C E G g e c a
B F f b b f F B
C g a e E A G c
D d d D D d d D
E g A C c a G e
F b f B B f b F
G e C a A c E g
with the relations
(a, b, c, d, e, f, g) = (-A, -B, -C, -D, -E, -F, -G)
4 D^2 = 2 (B^2 + F^2) = A^2 + C^2 + E^2 + G^2 = 2
AC = AE + EG + GC
The normalized DCT matrix is 1/2 this matrix.
Let T, T' denote the integer matrix and its transpose; and let t, t'
denote the DCT matrix and its transpose. Then one has the properties:
T T' = 8 D^2 I = T' T.
t t' = 4 I = t' t.
T ~ 24 t and T' ~ 24 t' for the first integer solution listed.
The normalized DCT matrix is t/2 and its inverse is t'/2.
For the 2-dimensional DCT, the transformation matrix is the tensor
product
t (x) t
(using (x) to denote the tensor product symbol). For the integer
transform, one can replace this by any of the following tensor
products:
T (x) T, T' (x) T'
or
T (x) T', T' (x) T.
The latter two give a closer approximation to t (x) t, but lose the
vertical <-> horizontal symmetry.
Thanks for the work; please note that this is not the first
implementation of a lossless DCT approximation. In the standardization
of H.264, another approximation had been proposed, though for a 4x4
transform. I'm not clear about the 8x8 approximation used there, but it
seems to be described here:
Reznik, Y.A.; Hinds, A.T.; Rijavec, N.;
Low Complexity Fixed-Point Approximation of Inverse Discrete Cosine
Transform
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE
International Conference on
As I know Yuri (we met several times over the years), it's probably a
very careful analysis of all the possibilities how to do a DCT in
integer, so it is probably worth reading - though I unortunately I
haven't had the time to go through it.
Anyhow, back to your result:
What is of course interesting is how close this transformation
approximates a DCT; that is, given we forward transform with a DCT
approximation, then divide by the common scaling factor to get floating
point values, and then backwards transform with a regular floating point
implementation of a DCT, how much error does it pick up?
Especially, it would be interesting if that would satisfy the error
bounds of JPEG-2.
What would also be interesting is the complexity of the algorithm, that
is, how many multiplications and additions does it take? You usually
want to avoid a full 8x8 matrix multiplication, especially for hardware
implementations, so this figure is pretty important.
So sorry for not providing an immediate answer, but you probably got a
couple of interesting problems to work on, and to investigate the
problem further. As a suggestion, look into the above paper.
Greetings,
Thomas
Not the small integer solutions. They require *redefining* the
standard (or coming up with a better one) that replaces the ideal DCT
by the integer approximation to it.
But some of the large integer solutions may approximate JPEG (and even
JPEG-2) well enough to satisfy the requirements listed in the
compliance sections of the respective standards. The drawback is that
this will require 64 bit arithmetic or even higher precision.
> What would also be interesting is the complexity of the algorithm, that
> is, how many multiplications and additions does it take?
You're about to be in for a surprise. The complexity is the same as
algorithms published in the literature, because several of the key
trig properties also hold for the integer solutions. Those details are
listed below -- along with the algorithms (!)
And thank you for your references -- that was (in fact) the implicit
question in the article.
The letters A-G are meant to draw an analogy with Music Theory. The
central issue, there, is to find integer or ratio approximations of
the n/12 powers of 2 for n = 0, 1, 2, ..., 12; the notes A-G are
certain cardinal points on this scale. It also simplifies the
presentation to use the letter notation (the same numbers, in fact,
can also be used with the 32 element DFT).
Here, the analogy is trying to "rationalize" the ideal DCT into
something that works with integers, and as you were probably referring
to, the solutions fit fairly well to the ideal; especially the one
with G = 116.
It turns out that there are VERY interesting algorithms that make
everything work, both in the forward and reverse direction. The
numeric solutions were derived from a parametrization of the solutions
to the respective (quadratic) Diophantine equations. Those parameters
also play a key role in the algorithms and are indirectly responsible
for making the trig identities work in the integer domain (up to
rescaling).
I'll start by listing two of the real-number versions of the
algorithms for the ideal DCT and then show how they are each rendered
in integer form. The integer version is actually a bit cleaner, it
turns out, and can also be adapted to work with the real number
version.
In all cases, the transformation is listed with (f0, ..., f7) as
inputs, (F0, ..., F7) as outputs. These are just the 1-D algorithms.
If you want to mess around with tensor products, you may be able to
craft optimized forms of the 2-D algorithms from these. In that case,
the small numbers solutions are a practical necessity, since the
scaling factor goes up quadratically with 2 dimensions.
FORMULAE 1:
E. Feig and E. Linzer. Discrete Cosine Transform Algorithms for Image
Data Compression. _Proceedings Electronic Imaging '90 East_, pp.
84-87. Boston, MA (October 29 - November 1, 1990).
The data flow, written in equational form, is given by:
(a0, b0) = (f0 + f7, f0 - f7), (a1, b1) = (f1 + f6, f1 - f6),
(a2, b2) = (f2 + f5, f2 - f5), (a3, b3) = (f3 + f4, f3 - f4),
(c0, d0) = (a0 + a3, a0 - a3), (c1, d1) = (a1 + a2, a1 - a2)
2 (F0, F4) = D (c0 + c1, c0 - c1)
2 (F2, F6) = (F d0 + B d1, B d0 - F d1)
2 (F1, F7) = (G b0 + E b1 + C b2 + A b3, A b0 - C b1 + E b2 - G b3)
2 (F5, F3) = (C b0 - G b1 + A b2 + E b3, E b0 - A b1 - G b2 - C b3)
There is a scaling factor of 2 in the final results, if (A,...,G) are
defined as the cosines, respectively of 7/8 pi, ..., 1/8 pi.
For the inverse transform, write down the inverses of the respective
formulae. In detail, this can be done keeping the scaling factors
intact, with the following:
B0 = G F1 + E F3 + C F5 + A F7
B1 = E F1 - A F3 - G F5 - C F7
B2 = C F1 - G F3 + A F5 + E F7
B3 = A F1 - C F3 + E F5 - G F7
The ratios Bn/bn = (G^2 + E^2 + C^2 + A^2)/2 = 2 D^2, for n = 0, 1, 2,
3.
(C0, C1) = D (F0 + F4, F0 - F4)
(D0, D1) = (F F2 + B F6, B F2 - F F6)
Here, the ratios Dn/dn = (B^2 + F^2)/2 = D^2 and Cn/cn = D^2 for n =
0, 1.
(A0, A3) = (C0 + D0, C0 - D0), (A1, A2) = (C1 + D1, C1 - D1).
The ratios An/an = 2D^2 for n = 0, 1, 2, 3.
Finally,
4D^2 (f0, f7) = (A0 + B0, A0 - B0),
4D^2 (f1, f6) = (A1 + B1, A1 - B1),
4D^2 (f2, f5) = (A2 + B2, A2 - B2),
4D^2 (f3, f4) = (A3 + B3, A3 - B3).
The scaling factor 4D^2 for the inverse transform complements the
scaling factor of 2 for the forward transform, the two multiplying out
to 8D^2.
The integer coefficients (A,B,...,G) are meant to approximate 1/2 the
DCT coefficients (A,B,...,G), so 8D^2 corresponds to 1.
FORMULAE 2:
p. 43 [A. Ligtenberg, M. Vetterli, A Discrete Fourier-cosine transform
chip. _IEEE J. on Selected Areas in Commun._, SAC-449-61 (January
1986)]
Here, the data flow is given by:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),
(c0, d0) = (a0 + a3, a0 - a3),
(c1, Dd4) = (a4 + a5, D (a4 - a5)), (Dc4, d1) = (D (b4 + b5), b4 -
b5),
(y2, y3) = (b3 + Dc4, b3 - Dc4), (y4, y5) = (b0 + Dd4, b0 - Dd4),
2 (F0, F4) = D (c0 + c1, c0 - c1), 2 (F2, F6) = rot(B, F, d1, d0),
2 (F1, F7) = rot(A, G, y4, y2), 2 (F5, F3) = rot(E, C, y3, y5).
The "rotation" function is given by rot(C, S, x, y) = (Cx + Sy, Cy -
Sx). It can be optimized to:
rot(C, S, x, y) = (C(x + y) + (S - C)y, C(x + y) - (S + C)x).
This is not actually a "rotation" since no requirement is being
imposed that C^2 + S^2 = 1 here. But the function is useful, anyhow,
as is its optimization. Apparently, the cited reference unnecessarily
assumes C^2 + S^2 = 1, we won't, but will still keep the name "rot".
To derive the results above, the following identities are used:
D(G + A) = E, D(G - A) = C, D(E + C) = G, D(E - C) = A,
a4 + a5 = a1 + a2, a4 - a5 = b1 + b2,
b4 + b5 = b1 - b2, b4 - b5 = a1 - a2.
Formulae 1 and its inverse are both readily adapted to the integer
transform, as is. The only requirements are the Diophantine equations
4D^2 = 2(B^2 + F^2) = A^2 + C^2 + E^2 + G^2
GE = GC + CA + AE.
Formulae 2 can also be adapted to the integer transform, provided we
have the properties
E / (G + A) = C / (G - A), G / (E + C) = A / (E - C)
which both come out of the Diophantine equation GE = GC + CA + AE.
In particular, to establish these results, we make use of the
following decomposition
wxae = A = yz(c-d)h, yzdg = C = wx(b-a)f,
yzcg = E = wx(b+a)f, wxbe = G = yz(c+d)h.
and assume the following properties:
E = f/e (G + A), C = f/e (G - A),
G = h/g (E + C), A = h/g (E - C),
eg = 2fh,
wx (e^2 + 2 f^2) = 2L = yz (g^2 + 2 h^2),
xz (eh + fg) = L,
yz (c^2 + d^2) = 2 K^2 L = wx (a^2 + b^2),
D = KL.
Notice, by the way, that the decompositions of B and F don't play any
role here. The only requirement imposed on (B, F) is that B^2 + F^2 =
2 D^2.
For the solutions listed, this can be satisfied with the following
assignments:
( G, E, C, A) -> ( a, b, c, d, e, f, g, h, w, x,
y, z, K, L)
( 24, 20, 12, 6) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2, 1,
1, 1, 1, 17)
(116, 96, 78, 12) -> ( 3, 29, 16, 13, 4, 3, 3, 2, 1, 1,
2, 1, 5, 17)
(120, 100, 60, 30) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2z, x,
x, z, 1, 85), xz = 5
(618, 428, 396, 24) -> ( 4, 103, 107, 99, 3, 2, 4, 3, 2, 1,
1, 1, 25, 17)
(660, 606, 384, 148) -> (37, 165, 101, 64, 4, 3, 3, 2, 1, 1,
2, 1, 29, 17)
(696, 580, 348, 174) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2z, x,
x, z, 1, 493), xz = 29
(816, 600, 552, 34) -> ( 1, 24, 25, 23, 17, 12, 24, 17, 2, 1,
1, 1, 1, 577)
(912, 660, 556, 78) -> (13, 152, 165, 139, 3, 2, 4, 3, 2, 1,
1, 1, 37, 17)
The decompositions where x and z are indefinite work, independently of
how the factors (x,z) are chosen. For xz = 5, one can have either
(x,z) = (5,1) or (x,z) = (1,5). For xz = 29, similarly, one can have
either (x,z) = (29,1) or (x,z) = (1,29).
The transform for Formulae 2 is then given by:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),
(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (x(e b3 + f c4), z(g b3 - h c4)),
(x4, x5) = (x(e b0 + f d4), z(g b0 - h d4)),
(F0, F4) = D (c0 + c1, c0 - c1), (F2, F6) = rot(B, F, d1, d0),
(F1, F7) = w rot(b, a, x4, x2), (F5, F3) = y rot(c, d, x3, x5),
and the inverse transform is then given by:
(C0, C1) = D (F0 + F4, F0 - F4), (D1, D0) = rot(B, F, F6, F2),
(X3, X5) = z rot(c, d, F3, F5), (X4, X2) = x rot(b, a, F7, F1),
(B3, C4) = (2 (zh X2 + xf X3), zg X2 - xe X3),
(B0, D4) = (2 (zh X4 + xf X5), zg X4 - xe X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),
8 D^2 (f0, f7) = (A0 + B0, A0 - B0), 8 D^2 (f1, f2) = (A4 + B4, A4 -
B4),
8 D^2 (f3, f4) = (A3 + B3, A3 - B3), 8 D^2 (f5, f6) = (A5 + B5, A5 -
B5).
For the case (G, F, E, D, C, B, A) = (24, 23, 20, 17, 12, 7, 6), the
transform and inverse transform thus reduce respectively to:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),
(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (3 b3 + 2 c4, 4 b3 - 3 c4),
(x4, x5) = (3 b0 + 2 d4, 4 b0 - 3 d4),
(F0, F4) = 17 (c0 + c1, c0 - c1), (F2, F6) = rot(7, 23, d1, d0),
(F1, F7) = 2 rot(1, 4, x4, x2), (F5, F3) = rot(5, 3, x3, x5),
and
(C0, C1) = 17 (F0 + F4, F0 - F4), (D1, D0) = rot(7, 23, F6, F2),
(X3, X5) = rot(5, 3, F3, F5), (X4, X2) = rot(4, 1, F7, F1),
(B3, C4) = (6 X2 + 4 X3, 4 X2 - 3 X3),
(B0, D4) = (6 X4 + 4 X5, 4 X4 - 3 X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),
2312 (f0, f7) = (A0 + B0, A0 - B0), 2312 (f1, f2) = (A4 + B4, A4 -
B4),
2312 (f3, f4) = (A3 + B3, A3 - B3), 2312 (f5, f6) = (A5 + B5, A5 -
B5).
For the next smallest solution (G, F, E, D, C, B, A) = (116, 113, 96,
85, 78, 41, 12), the transform and inverse transform respectively
reduce to the following:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),
(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (4 b3 + 3 c4, 3 b3 - 2 c4),
(x4, x5) = (4 b0 + 3 d4, 3 b0 - 2 d4),
(F0, F4) = 85 (c0 + c1, c0 - c1), (F2, F6) = rot(41, 113, d1, d0),
(F1, F7) = rot(29, 3, x4, x2), (F5, F3) = 2 rot(16, 13, x3, x5),
and
(C0, C1) = 85 (F0 + F4, F0 - F4), (D1, D0) = rot(41, 113, F6, F2),
(X3, X5) = rot(16, 13, F3, F5), (X4, X2) = rot(29, 3, F7, F1),
(B3, C4) = (4 X2 + 6 X3, 3 X2 - 4 X3),
(B0, D4) = (4 X4 + 6 X5, 3 X4 - 4 X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),
57800 (f0, f7) = (A0 + B0, A0 - B0), 57800 (f1, f2) = (A4 + B4, A4 -
B4),
57800 (f3, f4) = (A3 + B3, A3 - B3), 57800 (f5, f6) = (A5 + B5, A5 -
B5).
The divisor 57800 = 25*2312 is 25 times larger than that for the
smallest solution.
I ended up changing the ordering of the letters in the second article
so that G > F > E > D > C > B > A. So, everything's backwards -- the
first article's (A,B,C,D,E,F,G) is equal to the second article's
(G,F,E,D,C,B,A). It just seems more natural to have G larger than A.
There is a third set of formulae that come out of:
A. Ligtenberg, R. H. Wright, and J. H. O' Neill. A VLSI Orthogonal
Transform Chip for Real-Time Image Compression. _Visual Communication
& Image Process._ II (October 1987)
who develop the general solution for coefficients matching certain
dataflow diagrams. In addition, the FORMULAE 2 can be readily adapted
to 2-D form; both are described in Chapter 4 of
JPEG: Sill Image and Compression Standard
William B. Pennebaker and Joan L. Mitchell
Van Nostrand Reinhold, New York
TA 1632.P45 1993
but I haven't gotten around to converting either of these to integer
form (nor their inverses).
Thanks for posting a solution here (that is, a scaled DCT, but this is
perfect). I afraid the option to re-define the specs isn't available,
i.e. the integer solution has to fit to the DCT of the reference
implementation.
> But some of the large integer solutions may approximate JPEG (and even
> JPEG-2) well enough to satisfy the requirements listed in the
> compliance sections of the respective standards. The drawback is that
> this will require 64 bit arithmetic or even higher precision.
That's probably the price, yes. It is interesting how compliance is
defined there, though, now that I went through JPEG-2. Apparently,
compliance is defined in terms of the encoder (which is not what
happened in later standards), and there with respect to a reference
implementation, and with respect to the *quantized* coefficients which
are allowed to have a maximum error of +/-1. That means that the maximum
allowable error is actually frequency dependent due to the way how the
quantization matrices used for testing are defined.
Anyhow, this procedure allows at least to get an l^infinity bound, which
again means that there is an l^1-bound on the rows of the matrix with
respect to the reference, so it's possible to proof whether such a
matrix is compliant or not.
>> What would also be interesting is the complexity of the algorithm, that
>> is, how many multiplications and additions does it take?
>
> You're about to be in for a surprise. The complexity is the same as
> algorithms published in the literature, because several of the key
> trig properties also hold for the integer solutions. Those details are
> listed below -- along with the algorithms (!)
Well, I'm not so much surprised, actually. (-; I would rather say that
you give an upper bound for the complexity since the presented
implementation (as nice as it is with its integer coefficients) uses the
standard algorithm; one can get away with lower complexity, here again
one from Yurij:
http://www.reznik.org/papers/SPIE07_MPEG-C_IDCT.pdf
Of course, what is quoted above is only a fixpoint implementation (or a
couple of them), and they are not precisely invertible, exactly due to
the three rotations within them that are not exactly invertible.
One can certainly improve the algorithm a bit by replacing some
multiplications with shifts and additions, though then potentially
loosing the point-symmetry of the overall transform. (Interestingly, IEC
23002-1 and Yurij call this "Linearity". What's meant is only "linear"
with respect to multiplication with -1, not the usual one).
> And thank you for your references -- that was (in fact) the implicit
> question in the article.
No problem, I also kept looking, but haven't had the time to go into all
the math yet.
> The letters A-G are meant to draw an analogy with Music Theory. The
> central issue, there, is to find integer or ratio approximations of
> the n/12 powers of 2 for n = 0, 1, 2, ..., 12; the notes A-G are
> certain cardinal points on this scale. It also simplifies the
> presentation to use the letter notation (the same numbers, in fact,
> can also be used with the 32 element DFT).
Hehe. Actually, this is an arithmetic scale, quite unlike in musical
scale which is (approximately) geometric. (-;
> Here, the analogy is trying to "rationalize" the ideal DCT into
> something that works with integers, and as you were probably referring
> to, the solutions fit fairly well to the ideal; especially the one
> with G = 116.
I'll try.
> It turns out that there are VERY interesting algorithms that make
> everything work, both in the forward and reverse direction. The
> numeric solutions were derived from a parametrization of the solutions
> to the respective (quadratic) Diophantine equations. Those parameters
> also play a key role in the algorithms and are indirectly responsible
> for making the trig identities work in the integer domain (up to
> rescaling).
>
> I'll start by listing two of the real-number versions of the
> algorithms for the ideal DCT and then show how they are each rendered
> in integer form. The integer version is actually a bit cleaner, it
> turns out, and can also be adapted to work with the real number
> version.
>
> In all cases, the transformation is listed with (f0, ..., f7) as
> inputs, (F0, ..., F7) as outputs. These are just the 1-D algorithms.
> If you want to mess around with tensor products, you may be able to
> craft optimized forms of the 2-D algorithms from these.
That's just the same transform applied twice, no problem. Possibly there
is still some optimization potential for this, but the naive tensor
product might do.
> In that case,
> the small numbers solutions are a practical necessity, since the
> scaling factor goes up quadratically with 2 dimensions.
Yup.
> The "rotation" function is given by rot(C, S, x, y) = (Cx + Sy, Cy -
> Sx). It can be optimized to:
> rot(C, S, x, y) = (C(x + y) + (S - C)y, C(x + y) - (S + C)x).
Yup, that's the crucial part. Everything else is lossless.
> Formulae 1 and its inverse are both readily adapted to the integer
> transform, as is. The only requirements are the Diophantine equations
> 4D^2 = 2(B^2 + F^2) = A^2 + C^2 + E^2 + G^2
> GE = GC + CA + AE.
>
> Formulae 2 can also be adapted to the integer transform, provided we
> have the properties
> E / (G + A) = C / (G - A), G / (E + C) = A / (E - C)
> which both come out of the Diophantine equation GE = GC + CA + AE.
>
> In particular, to establish these results, we make use of the
> following decomposition
> wxae = A = yz(c-d)h, yzdg = C = wx(b-a)f,
> yzcg = E = wx(b+a)f, wxbe = G = yz(c+d)h.
> and assume the following properties:
> E = f/e (G + A), C = f/e (G - A),
> G = h/g (E + C), A = h/g (E - C),
> eg = 2fh,
> wx (e^2 + 2 f^2) = 2L = yz (g^2 + 2 h^2),
> xz (eh + fg) = L,
> yz (c^2 + d^2) = 2 K^2 L = wx (a^2 + b^2),
> D = KL.
> Notice, by the way, that the decompositions of B and F don't play any
> role here. The only requirement imposed on (B, F) is that B^2 + F^2 =
> 2 D^2.
/* snip */
Nice! I guess I play a little with that.
> The divisor 57800 = 25*2312 is 25 times larger than that for the
> smallest solution.
Unfortunately already 16 bits "extra". That means, however, that for a
totally lossless implementation, these "extra bits" also need to be
stored away, which then make up more data than the actual bits "after
quantization", so to say. The approach is nice, but probably not ideal
from a data-compression perspective. A very minimalistic approach would
be to accept a loss in the DCT and store a residual separately. Not
nearly as elegant as your solution, but one might get away with shorter
streams.
Thanks & Greetings,
Thomas
There is always an important rule when doing stuff like this: if it's
obvious enough to think of, then there's 7 billion people on the
planet, which means probably about 1000 others are thinking of it at
the same time. Maybe 1 million who do DSP and EE, and 1 million who
work with number theory, take 1/1000 cross-section and you get around
1000 who do both.
A patent search (and IEEE journal search on "Integer DCT") takes the
winds out of the sails of all these developments, big time. This was
going on in the early Millennial decade [1]. Nowadays, the IEEE
Journals are filled with 3-D modelling papers and the like. Probably
something driven by MPEG-4 and MPEG-7 (and to a lesser degree
JPEG-2000).
I didn't put too much stock in research integer DCT, because of the
advent of integer wavelet transforms. But it's a good problem to look
at ... after the fact.
Technically, DCT is not the correct name for any of the transforms I
described, but more like "Scaled Integer Orthogonal Transformation
that is close to the DCT". So, we're looking for the intersection of
the group O(8) x [0,infinity) on the integer lattice. This is not too
far removed from the problem of finding the Platonic solids. I was
initially thinking that one might even resort to exploiting the
special properties of the 7-sphere and make use of Octonions. This
actually seems to be the case, but I haven't yet looked into it in any
detail and probably won't in the near future.
(Complex numbers are 2-D numbers, Quaternions 4-D, and Octonions 8-D.
After that there's nothing else for division rings; so 8x8 integer
transforms are probably the largest that admit elegant algebraic
treatments).
(snip)
> A patent search (and IEEE journal search on "Integer DCT") takes the
> winds out of the sails of all these developments, big time. This was
> going on in the early Millennial decade [1]. Nowadays, the IEEE
> Journals are filled with 3-D modelling papers and the like. Probably
> something driven by MPEG-4 and MPEG-7 (and to a lesser degree
> JPEG-2000).
If you want an interesting case, search for "Simplified
Integer Cosine Transform."
It was developed for the case where the forward transform had to
be done on a slow processor without a multiplier. (Specifically,
the RCA CDP1802, in 1994.)
The inverse transform could be done on a very fast processor.
This ICT optimized for the minimum number if '1' bits in the
coefficients to get the appropriate compression.
-- glen
I forgot to add in the following:
Note:
[1] The name belatedly coined (by me) for the otherwise as-of-yet
unnamed decade which ended in 2010.
(http://www.docstoc.com/docs/55568021/Space-Magazine, p. 11
"The End of the Millennial Decade. What's in Store for the 2010's")