Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Lossless & Non-Degrading Lossing DCT-Based Coding

34 views
Skip to first unread message

Rock Brentwood

unread,
Jun 3, 2011, 5:35:43 PM6/3/11
to
One of the main novelties of the (relatively) new JPEG 2000 standard,
in contrast to JPEG, is that it uses both a lossing and lossless
variety of the wavelet transform, the latter relying on certain magic
number properties.

This leads to the question of whether something similar can actually
be done with DCT-based coding, after all; and (if so) what magic
numbers accomplish the trick.

A lossless approximation to 1-dimensional DCT-baaed coding should be
achievable by using this transformation matrix for the forward
transform:

17 17 17 17 17 17 17 17
24 20 12 6 -6 -12 -20 -24
23 7 -7 -23 -23 -7 7 23
20 -6 -24 -12 12 24 6 -20
17 -17 -17 17 17 -17 -17 17
12 -24 6 20 -20 -6 24 -12
7 -23 23 -7 -7 23 -23 7
6 -12 20 -24 24 -20 12 -6

and its transpose

17 24 23 20 17 12 7 6
17 20 7 -6 -17 -24 -23 -12
17 12 -7 -24 -17 6 23 20
17 6 -23 -12 17 20 -7 -24
17 -6 -23 12 17 -20 -7 24
17 -12 -7 24 -17 -6 23 -20
17 -20 7 6 -17 24 -23 12
17 -24 23 -20 17 -12 7 -6

for the inverse transform, in place of the forward and inverse DCT.
This should also be sufficient to ALSO provide the basis for a *non-
degrading* lossing transform; i.e., a transform where
Dequantization -> Inverse Transform -> Forward Transform ->
Quantization
yields the identity operator.

Both transforms are amenable to factoring. For instance, the forward
transform:
A0 = 17 a0 + 17 a1 + 17 a2 + 17 a3 + 17 a4 + 17 a5 + 17 a6 + 17 a7
A1 = 24 a0 + 20 a1 + 12 a2 + 6 a3 - 6 a4 - 12 a5 - 20 a6 - 24 a7
A2 = 23 a0 + 7 a1 - 7 a2 - 23 a3 - 23 a4 - 7 a5 + 7 a6 + 23 a7
A3 = 20 a0 - 6 a1 - 24 a2 - 12 a3 + 12 a4 + 24 a5 + 6 a6 - 20 a7
A4 = 17 a0 - 17 a1 - 17 a2 + 17 a3 + 17 a4 - 17 a5 - 17 a6 + 17 a7
A5 = 12 a0 - 24 a1 + 6 a2 + 20 a3 - 20 a4 - 6 a5 + 24 a6 - 12 a7
A6 = 7 a0 - 23 a1 + 23 a2 - 7 a3 - 7 a4 + 23 a5 - 23 a6 + 7 a7
A7 = 6 a0 - 12 a1 + 20 a2 - 24 a3 + 24 a4 - 20 a5 + 12 a6 - 6 a7
can be rewritten as
A0 = 17 (s0734 + s1625), A4 = 17 (s0734 - s1625)
A2 = 23 d0734 + 7 d1625, A6 = 7 d0734 - 23 d1625
A1 = 6 (4 d07 + d34) + 4 (5 d16 + 3 d25)
A3 = 4 (5 d07 - 3 d34) - 6 (d16 + 4 d25)
A5 = 4 (3 d07 + 5 d34) - 6 (4 d16 - d25)
A7 = 6 (d07 - 4 d34) - 4 (3 d16 - 5 d25)
where
s07 = a0 + a7, s16 = a1 + a6, s25 = a2 + a5, s34 = a3 + a4,
d07 = a0 - a7, d16 = a1 - a6, d25 = a2 - a5, d34 = a3 - a4,
s0734 = s07 + s34, s1625 = s16 + s25,
d0734 = s07 - s34, d1625 = s16 - s25.
Other more efficients methods may be devised to better handle A1, A3,
A5 and A7.

The matrices are both orthogonal (up to an overall factor); their
product is 2312 times the identity matrix. Their orthogonality (up to
rescaling) is easily verified, since the requirement reduces to the
conditions:
4*17^2 = 2*(23^2 + 7^2) = 24^2 + 20^2 + 12^2 + 6^2 = 2312/2
24*20 = 24*12 + 12*6 + 6*20
which are all integer identities. The matrix approximates 24 times the
DCT matrix, or 48 times the normalized DCT matrix.

It is the ONLY relatively prime integer solution that lies entirely in
the range -100 to 100 whose coefficients have the same pattern and
ordering relation as those of the DCT matrix. The only other
relatively prime integer solutions in the range -500 to 500 satisfying
the ordering property are the solutions with the following two sets of
coefficients
(116, 113, 96, 85, 78, 41, 12, -12, -41, -78, -85, -96, -113, -116)
(120, 113, 100, 85, 60, 41, 30, -30, -41, -60, -85, -100, -113, -120)
in place of
(24, 23, 20, 17, 12, 7, 6, -6, -7, -12, -17, -20, -23, -24)
both solutions having divisors apprxomately equal to 240 + 5/12
relative to the normalized DCT matrix.

There are only 5 other relatively prime solutions in the range -1000
to 1000 satisfying the ordering property. They are the ones whose
coefficients are drawn from the respective sets:
(618, 575, 428, 425, 396, 175, 24, -24, -175, -396, -425, -428, -575,
-618)
(660, 607, 606, 493, 384, 343, 148, -148, -343, -384, -493, -606,
-607, -660)
(696, 607, 580, 493, 348, 343, 174, -174, -343, -348, -493, -580,
-607, -696)
(816, 623, 600, 577, 552, 527, 34, -34, -527, -552, -577, -600, -623,
-816)
(912, 721, 660, 629, 556, 521, 78, -78, -521, -556, -629, -660, -721,
-912)
Their divisors are approximately 1202, 1394 + 5/12, 1394 + 5/12, 1632
and 1779 relative to the normalized DCT matrix.

Denoting the DCT coefficients in order by
1 > A > B > C > D > E > F > G > 0 > g > f > e > d > c > b > a > -1
the DCT matrix can be written compactly in the form
D D D D D D D D
A C E G g e c a
B F f b b f F B
C g a e E A G c
D d d D D d d D
E g A C c a G e
F b f B B f b F
G e C a A c E g
with the relations
(a, b, c, d, e, f, g) = (-A, -B, -C, -D, -E, -F, -G)
4 D^2 = 2 (B^2 + F^2) = A^2 + C^2 + E^2 + G^2 = 2
AC = AE + EG + GC
The normalized DCT matrix is 1/2 this matrix.

Let T, T' denote the integer matrix and its transpose; and let t, t'
denote the DCT matrix and its transpose. Then one has the properties:
T T' = 8 D^2 I = T' T.
t t' = 4 I = t' t.
T ~ 24 t and T' ~ 24 t' for the first integer solution listed.
The normalized DCT matrix is t/2 and its inverse is t'/2.

For the 2-dimensional DCT, the transformation matrix is the tensor
product
t (x) t
(using (x) to denote the tensor product symbol). For the integer
transform, one can replace this by any of the following tensor
products:
T (x) T, T' (x) T'
or
T (x) T', T' (x) T.

The latter two give a closer approximation to t (x) t, but lose the
vertical <-> horizontal symmetry.

Thomas Richter

unread,
Jun 3, 2011, 7:18:57 PM6/3/11
to
On 03.06.2011 23:35, Rock Brentwood wrote:
> One of the main novelties of the (relatively) new JPEG 2000 standard,
> in contrast to JPEG, is that it uses both a lossing and lossless
> variety of the wavelet transform, the latter relying on certain magic
> number properties.
>
> This leads to the question of whether something similar can actually
> be done with DCT-based coding, after all; and (if so) what magic
> numbers accomplish the trick.

Thanks for the work; please note that this is not the first
implementation of a lossless DCT approximation. In the standardization
of H.264, another approximation had been proposed, though for a 4x4
transform. I'm not clear about the 8x8 approximation used there, but it
seems to be described here:

Reznik, Y.A.; Hinds, A.T.; Rijavec, N.;

Low Complexity Fixed-Point Approximation of Inverse Discrete Cosine
Transform

Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE
International Conference on

As I know Yuri (we met several times over the years), it's probably a
very careful analysis of all the possibilities how to do a DCT in
integer, so it is probably worth reading - though I unortunately I
haven't had the time to go through it.

Anyhow, back to your result:

What is of course interesting is how close this transformation
approximates a DCT; that is, given we forward transform with a DCT
approximation, then divide by the common scaling factor to get floating
point values, and then backwards transform with a regular floating point
implementation of a DCT, how much error does it pick up?

Especially, it would be interesting if that would satisfy the error
bounds of JPEG-2.

What would also be interesting is the complexity of the algorithm, that
is, how many multiplications and additions does it take? You usually
want to avoid a full 8x8 matrix multiplication, especially for hardware
implementations, so this figure is pretty important.

So sorry for not providing an immediate answer, but you probably got a
couple of interesting problems to work on, and to investigate the
problem further. As a suggestion, look into the above paper.

Greetings,
Thomas

Rock Brentwood

unread,
Jun 7, 2011, 8:42:08 PM6/7/11
to
On Jun 3, 6:18 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Thanks for the work; please note that this is not the first
> implementation of a lossless DCT approximation...

> Especially, it would be interesting if that would satisfy the error
> bounds of JPEG-2.

Not the small integer solutions. They require *redefining* the
standard (or coming up with a better one) that replaces the ideal DCT
by the integer approximation to it.

But some of the large integer solutions may approximate JPEG (and even
JPEG-2) well enough to satisfy the requirements listed in the
compliance sections of the respective standards. The drawback is that
this will require 64 bit arithmetic or even higher precision.

> What would also be interesting is the complexity of the algorithm, that
> is, how many multiplications and additions does it take?

You're about to be in for a surprise. The complexity is the same as
algorithms published in the literature, because several of the key
trig properties also hold for the integer solutions. Those details are
listed below -- along with the algorithms (!)

And thank you for your references -- that was (in fact) the implicit
question in the article.

The letters A-G are meant to draw an analogy with Music Theory. The
central issue, there, is to find integer or ratio approximations of
the n/12 powers of 2 for n = 0, 1, 2, ..., 12; the notes A-G are
certain cardinal points on this scale. It also simplifies the
presentation to use the letter notation (the same numbers, in fact,
can also be used with the 32 element DFT).

Here, the analogy is trying to "rationalize" the ideal DCT into
something that works with integers, and as you were probably referring
to, the solutions fit fairly well to the ideal; especially the one
with G = 116.

It turns out that there are VERY interesting algorithms that make
everything work, both in the forward and reverse direction. The
numeric solutions were derived from a parametrization of the solutions
to the respective (quadratic) Diophantine equations. Those parameters
also play a key role in the algorithms and are indirectly responsible
for making the trig identities work in the integer domain (up to
rescaling).

I'll start by listing two of the real-number versions of the
algorithms for the ideal DCT and then show how they are each rendered
in integer form. The integer version is actually a bit cleaner, it
turns out, and can also be adapted to work with the real number
version.

In all cases, the transformation is listed with (f0, ..., f7) as
inputs, (F0, ..., F7) as outputs. These are just the 1-D algorithms.
If you want to mess around with tensor products, you may be able to
craft optimized forms of the 2-D algorithms from these. In that case,
the small numbers solutions are a practical necessity, since the
scaling factor goes up quadratically with 2 dimensions.

FORMULAE 1:
E. Feig and E. Linzer. Discrete Cosine Transform Algorithms for Image
Data Compression. _Proceedings Electronic Imaging '90 East_, pp.
84-87. Boston, MA (October 29 - November 1, 1990).

The data flow, written in equational form, is given by:
(a0, b0) = (f0 + f7, f0 - f7), (a1, b1) = (f1 + f6, f1 - f6),
(a2, b2) = (f2 + f5, f2 - f5), (a3, b3) = (f3 + f4, f3 - f4),
(c0, d0) = (a0 + a3, a0 - a3), (c1, d1) = (a1 + a2, a1 - a2)

2 (F0, F4) = D (c0 + c1, c0 - c1)
2 (F2, F6) = (F d0 + B d1, B d0 - F d1)
2 (F1, F7) = (G b0 + E b1 + C b2 + A b3, A b0 - C b1 + E b2 - G b3)
2 (F5, F3) = (C b0 - G b1 + A b2 + E b3, E b0 - A b1 - G b2 - C b3)
There is a scaling factor of 2 in the final results, if (A,...,G) are
defined as the cosines, respectively of 7/8 pi, ..., 1/8 pi.

For the inverse transform, write down the inverses of the respective
formulae. In detail, this can be done keeping the scaling factors
intact, with the following:
B0 = G F1 + E F3 + C F5 + A F7
B1 = E F1 - A F3 - G F5 - C F7
B2 = C F1 - G F3 + A F5 + E F7
B3 = A F1 - C F3 + E F5 - G F7
The ratios Bn/bn = (G^2 + E^2 + C^2 + A^2)/2 = 2 D^2, for n = 0, 1, 2,
3.
(C0, C1) = D (F0 + F4, F0 - F4)
(D0, D1) = (F F2 + B F6, B F2 - F F6)
Here, the ratios Dn/dn = (B^2 + F^2)/2 = D^2 and Cn/cn = D^2 for n =
0, 1.
(A0, A3) = (C0 + D0, C0 - D0), (A1, A2) = (C1 + D1, C1 - D1).
The ratios An/an = 2D^2 for n = 0, 1, 2, 3.

Finally,
4D^2 (f0, f7) = (A0 + B0, A0 - B0),
4D^2 (f1, f6) = (A1 + B1, A1 - B1),
4D^2 (f2, f5) = (A2 + B2, A2 - B2),
4D^2 (f3, f4) = (A3 + B3, A3 - B3).
The scaling factor 4D^2 for the inverse transform complements the
scaling factor of 2 for the forward transform, the two multiplying out
to 8D^2.

The integer coefficients (A,B,...,G) are meant to approximate 1/2 the
DCT coefficients (A,B,...,G), so 8D^2 corresponds to 1.

FORMULAE 2:
p. 43 [A. Ligtenberg, M. Vetterli, A Discrete Fourier-cosine transform
chip. _IEEE J. on Selected Areas in Commun._, SAC-449-61 (January
1986)]

Here, the data flow is given by:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),

(c0, d0) = (a0 + a3, a0 - a3),
(c1, Dd4) = (a4 + a5, D (a4 - a5)), (Dc4, d1) = (D (b4 + b5), b4 -
b5),
(y2, y3) = (b3 + Dc4, b3 - Dc4), (y4, y5) = (b0 + Dd4, b0 - Dd4),

2 (F0, F4) = D (c0 + c1, c0 - c1), 2 (F2, F6) = rot(B, F, d1, d0),
2 (F1, F7) = rot(A, G, y4, y2), 2 (F5, F3) = rot(E, C, y3, y5).

The "rotation" function is given by rot(C, S, x, y) = (Cx + Sy, Cy -
Sx). It can be optimized to:
rot(C, S, x, y) = (C(x + y) + (S - C)y, C(x + y) - (S + C)x).
This is not actually a "rotation" since no requirement is being
imposed that C^2 + S^2 = 1 here. But the function is useful, anyhow,
as is its optimization. Apparently, the cited reference unnecessarily
assumes C^2 + S^2 = 1, we won't, but will still keep the name "rot".

To derive the results above, the following identities are used:
D(G + A) = E, D(G - A) = C, D(E + C) = G, D(E - C) = A,
a4 + a5 = a1 + a2, a4 - a5 = b1 + b2,
b4 + b5 = b1 - b2, b4 - b5 = a1 - a2.

Formulae 1 and its inverse are both readily adapted to the integer
transform, as is. The only requirements are the Diophantine equations
4D^2 = 2(B^2 + F^2) = A^2 + C^2 + E^2 + G^2
GE = GC + CA + AE.

Formulae 2 can also be adapted to the integer transform, provided we
have the properties
E / (G + A) = C / (G - A), G / (E + C) = A / (E - C)
which both come out of the Diophantine equation GE = GC + CA + AE.

In particular, to establish these results, we make use of the
following decomposition
wxae = A = yz(c-d)h, yzdg = C = wx(b-a)f,
yzcg = E = wx(b+a)f, wxbe = G = yz(c+d)h.
and assume the following properties:
E = f/e (G + A), C = f/e (G - A),
G = h/g (E + C), A = h/g (E - C),
eg = 2fh,
wx (e^2 + 2 f^2) = 2L = yz (g^2 + 2 h^2),
xz (eh + fg) = L,
yz (c^2 + d^2) = 2 K^2 L = wx (a^2 + b^2),
D = KL.
Notice, by the way, that the decompositions of B and F don't play any
role here. The only requirement imposed on (B, F) is that B^2 + F^2 =
2 D^2.

For the solutions listed, this can be satisfied with the following
assignments:
( G, E, C, A) -> ( a, b, c, d, e, f, g, h, w, x,
y, z, K, L)
( 24, 20, 12, 6) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2, 1,
1, 1, 1, 17)
(116, 96, 78, 12) -> ( 3, 29, 16, 13, 4, 3, 3, 2, 1, 1,
2, 1, 5, 17)
(120, 100, 60, 30) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2z, x,
x, z, 1, 85), xz = 5
(618, 428, 396, 24) -> ( 4, 103, 107, 99, 3, 2, 4, 3, 2, 1,
1, 1, 25, 17)
(660, 606, 384, 148) -> (37, 165, 101, 64, 4, 3, 3, 2, 1, 1,
2, 1, 29, 17)
(696, 580, 348, 174) -> ( 1, 4, 5, 3, 3, 2, 4, 3, 2z, x,
x, z, 1, 493), xz = 29
(816, 600, 552, 34) -> ( 1, 24, 25, 23, 17, 12, 24, 17, 2, 1,
1, 1, 1, 577)
(912, 660, 556, 78) -> (13, 152, 165, 139, 3, 2, 4, 3, 2, 1,
1, 1, 37, 17)
The decompositions where x and z are indefinite work, independently of
how the factors (x,z) are chosen. For xz = 5, one can have either
(x,z) = (5,1) or (x,z) = (1,5). For xz = 29, similarly, one can have
either (x,z) = (29,1) or (x,z) = (1,29).

The transform for Formulae 2 is then given by:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),

(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (x(e b3 + f c4), z(g b3 - h c4)),
(x4, x5) = (x(e b0 + f d4), z(g b0 - h d4)),

(F0, F4) = D (c0 + c1, c0 - c1), (F2, F6) = rot(B, F, d1, d0),
(F1, F7) = w rot(b, a, x4, x2), (F5, F3) = y rot(c, d, x3, x5),
and the inverse transform is then given by:
(C0, C1) = D (F0 + F4, F0 - F4), (D1, D0) = rot(B, F, F6, F2),
(X3, X5) = z rot(c, d, F3, F5), (X4, X2) = x rot(b, a, F7, F1),

(B3, C4) = (2 (zh X2 + xf X3), zg X2 - xe X3),
(B0, D4) = (2 (zh X4 + xf X5), zg X4 - xe X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),

8 D^2 (f0, f7) = (A0 + B0, A0 - B0), 8 D^2 (f1, f2) = (A4 + B4, A4 -
B4),
8 D^2 (f3, f4) = (A3 + B3, A3 - B3), 8 D^2 (f5, f6) = (A5 + B5, A5 -
B5).

For the case (G, F, E, D, C, B, A) = (24, 23, 20, 17, 12, 7, 6), the
transform and inverse transform thus reduce respectively to:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),

(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (3 b3 + 2 c4, 4 b3 - 3 c4),
(x4, x5) = (3 b0 + 2 d4, 4 b0 - 3 d4),

(F0, F4) = 17 (c0 + c1, c0 - c1), (F2, F6) = rot(7, 23, d1, d0),
(F1, F7) = 2 rot(1, 4, x4, x2), (F5, F3) = rot(5, 3, x3, x5),

and
(C0, C1) = 17 (F0 + F4, F0 - F4), (D1, D0) = rot(7, 23, F6, F2),
(X3, X5) = rot(5, 3, F3, F5), (X4, X2) = rot(4, 1, F7, F1),

(B3, C4) = (6 X2 + 4 X3, 4 X2 - 3 X3),
(B0, D4) = (6 X4 + 4 X5, 4 X4 - 3 X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),

2312 (f0, f7) = (A0 + B0, A0 - B0), 2312 (f1, f2) = (A4 + B4, A4 -
B4),
2312 (f3, f4) = (A3 + B3, A3 - B3), 2312 (f5, f6) = (A5 + B5, A5 -
B5).

For the next smallest solution (G, F, E, D, C, B, A) = (116, 113, 96,
85, 78, 41, 12), the transform and inverse transform respectively
reduce to the following:
(a0, b0) = (f0 + f7, f0 - f7), (a4, b4) = (f1 + f2, f1 - f2),
(a3, b3) = (f3 + f4, f3 - f4), (a5, b5) = (f5 + f6, f5 - f6),

(c0, d0) = (a0 + a3, a0 - a3),
(c1, d4) = (a4 + a5, a4 - a5), (c4, d1) = (b4 + b5, b4 - b5),
(x2, x3) = (4 b3 + 3 c4, 3 b3 - 2 c4),
(x4, x5) = (4 b0 + 3 d4, 3 b0 - 2 d4),

(F0, F4) = 85 (c0 + c1, c0 - c1), (F2, F6) = rot(41, 113, d1, d0),
(F1, F7) = rot(29, 3, x4, x2), (F5, F3) = 2 rot(16, 13, x3, x5),
and
(C0, C1) = 85 (F0 + F4, F0 - F4), (D1, D0) = rot(41, 113, F6, F2),
(X3, X5) = rot(16, 13, F3, F5), (X4, X2) = rot(29, 3, F7, F1),

(B3, C4) = (4 X2 + 6 X3, 3 X2 - 4 X3),
(B0, D4) = (4 X4 + 6 X5, 3 X4 - 4 X5),
(B4, B5) = (C4 + D1, C4 - D1), (A4, A5) = (C1 + D4, C1 - D4),
(A0, A3) = (C0 + D0, C0 - D0),

57800 (f0, f7) = (A0 + B0, A0 - B0), 57800 (f1, f2) = (A4 + B4, A4 -
B4),
57800 (f3, f4) = (A3 + B3, A3 - B3), 57800 (f5, f6) = (A5 + B5, A5 -
B5).
The divisor 57800 = 25*2312 is 25 times larger than that for the
smallest solution.

Rock Brentwood

unread,
Jun 7, 2011, 8:50:42 PM6/7/11
to
On Jun 7, 7:42 pm, Rock Brentwood <federation2...@netzero.com> wrote:
> The integer coefficients (A,B,...,G) are meant to approximate 1/2 the
> DCT coefficients (A,B,...,G), so 8D^2 corresponds to 1.

I ended up changing the ordering of the letters in the second article
so that G > F > E > D > C > B > A. So, everything's backwards -- the
first article's (A,B,C,D,E,F,G) is equal to the second article's
(G,F,E,D,C,B,A). It just seems more natural to have G larger than A.

There is a third set of formulae that come out of:
A. Ligtenberg, R. H. Wright, and J. H. O' Neill. A VLSI Orthogonal
Transform Chip for Real-Time Image Compression. _Visual Communication
& Image Process._ II (October 1987)

who develop the general solution for coefficients matching certain
dataflow diagrams. In addition, the FORMULAE 2 can be readily adapted
to 2-D form; both are described in Chapter 4 of
JPEG: Sill Image and Compression Standard
William B. Pennebaker and Joan L. Mitchell
Van Nostrand Reinhold, New York
TA 1632.P45 1993

but I haven't gotten around to converting either of these to integer
form (nor their inverses).

Thomas Richter

unread,
Jun 8, 2011, 4:26:30 PM6/8/11
to
Am 08.06.2011 02:42, schrieb Rock Brentwood:
> On Jun 3, 6:18 pm, Thomas Richter<t...@math.tu-berlin.de> wrote:
>> Thanks for the work; please note that this is not the first
>> implementation of a lossless DCT approximation...
>> Especially, it would be interesting if that would satisfy the error
>> bounds of JPEG-2.
>
> Not the small integer solutions. They require *redefining* the
> standard (or coming up with a better one) that replaces the ideal DCT
> by the integer approximation to it.

Thanks for posting a solution here (that is, a scaled DCT, but this is
perfect). I afraid the option to re-define the specs isn't available,
i.e. the integer solution has to fit to the DCT of the reference
implementation.

> But some of the large integer solutions may approximate JPEG (and even
> JPEG-2) well enough to satisfy the requirements listed in the
> compliance sections of the respective standards. The drawback is that
> this will require 64 bit arithmetic or even higher precision.

That's probably the price, yes. It is interesting how compliance is
defined there, though, now that I went through JPEG-2. Apparently,
compliance is defined in terms of the encoder (which is not what
happened in later standards), and there with respect to a reference
implementation, and with respect to the *quantized* coefficients which
are allowed to have a maximum error of +/-1. That means that the maximum
allowable error is actually frequency dependent due to the way how the
quantization matrices used for testing are defined.

Anyhow, this procedure allows at least to get an l^infinity bound, which
again means that there is an l^1-bound on the rows of the matrix with
respect to the reference, so it's possible to proof whether such a
matrix is compliant or not.

>> What would also be interesting is the complexity of the algorithm, that
>> is, how many multiplications and additions does it take?
>
> You're about to be in for a surprise. The complexity is the same as
> algorithms published in the literature, because several of the key
> trig properties also hold for the integer solutions. Those details are
> listed below -- along with the algorithms (!)

Well, I'm not so much surprised, actually. (-; I would rather say that
you give an upper bound for the complexity since the presented
implementation (as nice as it is with its integer coefficients) uses the
standard algorithm; one can get away with lower complexity, here again
one from Yurij:

http://www.reznik.org/papers/SPIE07_MPEG-C_IDCT.pdf

Of course, what is quoted above is only a fixpoint implementation (or a
couple of them), and they are not precisely invertible, exactly due to
the three rotations within them that are not exactly invertible.

One can certainly improve the algorithm a bit by replacing some
multiplications with shifts and additions, though then potentially
loosing the point-symmetry of the overall transform. (Interestingly, IEC
23002-1 and Yurij call this "Linearity". What's meant is only "linear"
with respect to multiplication with -1, not the usual one).

> And thank you for your references -- that was (in fact) the implicit
> question in the article.

No problem, I also kept looking, but haven't had the time to go into all
the math yet.

> The letters A-G are meant to draw an analogy with Music Theory. The
> central issue, there, is to find integer or ratio approximations of
> the n/12 powers of 2 for n = 0, 1, 2, ..., 12; the notes A-G are
> certain cardinal points on this scale. It also simplifies the
> presentation to use the letter notation (the same numbers, in fact,
> can also be used with the 32 element DFT).

Hehe. Actually, this is an arithmetic scale, quite unlike in musical
scale which is (approximately) geometric. (-;

> Here, the analogy is trying to "rationalize" the ideal DCT into
> something that works with integers, and as you were probably referring
> to, the solutions fit fairly well to the ideal; especially the one
> with G = 116.

I'll try.

> It turns out that there are VERY interesting algorithms that make
> everything work, both in the forward and reverse direction. The
> numeric solutions were derived from a parametrization of the solutions
> to the respective (quadratic) Diophantine equations. Those parameters
> also play a key role in the algorithms and are indirectly responsible
> for making the trig identities work in the integer domain (up to
> rescaling).
>
> I'll start by listing two of the real-number versions of the
> algorithms for the ideal DCT and then show how they are each rendered
> in integer form. The integer version is actually a bit cleaner, it
> turns out, and can also be adapted to work with the real number
> version.
>
> In all cases, the transformation is listed with (f0, ..., f7) as
> inputs, (F0, ..., F7) as outputs. These are just the 1-D algorithms.
> If you want to mess around with tensor products, you may be able to
> craft optimized forms of the 2-D algorithms from these.

That's just the same transform applied twice, no problem. Possibly there
is still some optimization potential for this, but the naive tensor
product might do.

> In that case,
> the small numbers solutions are a practical necessity, since the
> scaling factor goes up quadratically with 2 dimensions.

Yup.

> The "rotation" function is given by rot(C, S, x, y) = (Cx + Sy, Cy -
> Sx). It can be optimized to:
> rot(C, S, x, y) = (C(x + y) + (S - C)y, C(x + y) - (S + C)x).

Yup, that's the crucial part. Everything else is lossless.

> Formulae 1 and its inverse are both readily adapted to the integer
> transform, as is. The only requirements are the Diophantine equations
> 4D^2 = 2(B^2 + F^2) = A^2 + C^2 + E^2 + G^2
> GE = GC + CA + AE.
>
> Formulae 2 can also be adapted to the integer transform, provided we
> have the properties
> E / (G + A) = C / (G - A), G / (E + C) = A / (E - C)
> which both come out of the Diophantine equation GE = GC + CA + AE.
>
> In particular, to establish these results, we make use of the
> following decomposition
> wxae = A = yz(c-d)h, yzdg = C = wx(b-a)f,
> yzcg = E = wx(b+a)f, wxbe = G = yz(c+d)h.
> and assume the following properties:
> E = f/e (G + A), C = f/e (G - A),
> G = h/g (E + C), A = h/g (E - C),
> eg = 2fh,
> wx (e^2 + 2 f^2) = 2L = yz (g^2 + 2 h^2),
> xz (eh + fg) = L,
> yz (c^2 + d^2) = 2 K^2 L = wx (a^2 + b^2),
> D = KL.
> Notice, by the way, that the decompositions of B and F don't play any
> role here. The only requirement imposed on (B, F) is that B^2 + F^2 =
> 2 D^2.

/* snip */

Nice! I guess I play a little with that.

> The divisor 57800 = 25*2312 is 25 times larger than that for the
> smallest solution.

Unfortunately already 16 bits "extra". That means, however, that for a
totally lossless implementation, these "extra bits" also need to be
stored away, which then make up more data than the actual bits "after
quantization", so to say. The approach is nice, but probably not ideal
from a data-compression perspective. A very minimalistic approach would
be to accept a loss in the DCT and store a residual separately. Not
nearly as elegant as your solution, but one might get away with shorter
streams.

Thanks & Greetings,
Thomas


Rock Brentwood

unread,
Jun 10, 2011, 5:25:15 PM6/10/11
to
On Jun 8, 3:26 pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Am 08.06.2011 02:42, schrieb Rock Brentwood:
[... other commentary deleted ...]

> The approach is nice, but probably not ideal
> from a data-compression perspective. A very minimalistic approach would
> be to accept a loss in the DCT and store a residual separately. Not
> nearly as elegant as your solution, but one might get away with shorter
> streams.

There is always an important rule when doing stuff like this: if it's
obvious enough to think of, then there's 7 billion people on the
planet, which means probably about 1000 others are thinking of it at
the same time. Maybe 1 million who do DSP and EE, and 1 million who
work with number theory, take 1/1000 cross-section and you get around
1000 who do both.

A patent search (and IEEE journal search on "Integer DCT") takes the
winds out of the sails of all these developments, big time. This was
going on in the early Millennial decade [1]. Nowadays, the IEEE
Journals are filled with 3-D modelling papers and the like. Probably
something driven by MPEG-4 and MPEG-7 (and to a lesser degree
JPEG-2000).

I didn't put too much stock in research integer DCT, because of the
advent of integer wavelet transforms. But it's a good problem to look
at ... after the fact.

Technically, DCT is not the correct name for any of the transforms I
described, but more like "Scaled Integer Orthogonal Transformation
that is close to the DCT". So, we're looking for the intersection of
the group O(8) x [0,infinity) on the integer lattice. This is not too
far removed from the problem of finding the Platonic solids. I was
initially thinking that one might even resort to exploiting the
special properties of the 7-sphere and make use of Octonions. This
actually seems to be the case, but I haven't yet looked into it in any
detail and probably won't in the near future.

(Complex numbers are 2-D numbers, Quaternions 4-D, and Octonions 8-D.
After that there's nothing else for division rings; so 8x8 integer
transforms are probably the largest that admit elegant algebraic
treatments).

glen herrmannsfeldt

unread,
Jun 10, 2011, 6:17:33 PM6/10/11
to
Rock Brentwood <federat...@netzero.com> wrote:

(snip)

> A patent search (and IEEE journal search on "Integer DCT") takes the
> winds out of the sails of all these developments, big time. This was
> going on in the early Millennial decade [1]. Nowadays, the IEEE
> Journals are filled with 3-D modelling papers and the like. Probably
> something driven by MPEG-4 and MPEG-7 (and to a lesser degree
> JPEG-2000).

If you want an interesting case, search for "Simplified
Integer Cosine Transform."

It was developed for the case where the forward transform had to
be done on a slow processor without a multiplier. (Specifically,
the RCA CDP1802, in 1994.)

The inverse transform could be done on a very fast processor.

This ICT optimized for the minimum number if '1' bits in the
coefficients to get the appropriate compression.

-- glen

Rock Brentwood

unread,
Jun 14, 2011, 4:25:04 PM6/14/11
to
On Jun 10, 5:17 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

> Rock Brentwood <federation2...@netzero.com> wrote:
> > This was going on in the early Millennial decade [1].
>
> If you want an interesting case, search for "Simplified
> Integer Cosine Transform."
>
> It was developed for the case where the forward transform had to
> be done on a slow processor without a multiplier. (Specifically,
> the RCA CDP1802, in 1994.)

I forgot to add in the following:

Note:
[1] The name belatedly coined (by me) for the otherwise as-of-yet
unnamed decade which ended in 2010.
(http://www.docstoc.com/docs/55568021/Space-Magazine, p. 11
"The End of the Millennial Decade. What's in Store for the 2010's")

federat...@netzero.com

unread,
May 7, 2015, 8:52:50 PM5/7/15
to
From me on 2011 June 08 02:42:
> Not the small integer solutions. They require *redefining* the
> standard (or coming up with a better one) that replaces the ideal DCT
> by the integer approximation to it.

On Wednesday, June 8, 2011 at 3:26:30 PM UTC-5, Thomas Richter wrote:
> Thanks for posting a solution here (that is, a scaled DCT, but this is
> perfect). I afraid the option to re-define the specs isn't available,
> i.e. the integer solution has to fit to the DCT of the reference
> implementation.

Not so fast! Actually, I spoke too soon on this, too.

They do NOT require redefining the current JPEG standard. Both the forward transform and reverse transform produce results that are (at least to me) visually indistinguishable from the original. Some comparative code is provided below for those wishing to experiment.

And actually, the key point is not this, but that you can bridge from JPEG to the integer-based routines, where the forward and reverse "DCT" replacements produce 0 rounding error, with minimal disruption. The same standard can be used with only the DCT tabulation replaced.

There is still the residual JPG <-> Integer DCT error, but once you stay inside Integer DCT format, the sequence (Decode Encode) could potentially be designed so as to reduce to the identity, thereby limiting the degradation to only one level with the identity (Encode Decode)^n = (Encode Decode) for all n = 2, 3, 4, ... This is ideally suited for replacing the lossless mode, the replacement being this (a "semi-lossless") mode or "fully-lossless" mode. The latter is if (Encode Decode) = Identity, and would require keeping the AC coefficients intact.

The coefficients I'm using are those discussed in the 2011 article. Here is DCT with the JPEG coefficients (DeDCT converts an 8 x 8 block InB to ExB, just a straight algorithm for illustration's sake, none of the fancy optimizations mentioned in the 2011 article):

static void GlomTab(double Row[8], double A, double B, double C, double D, double E, double F, double G, double H) {
Row[0] = A, Row[1] = B, Row[2] = C, Row[3] = D,
Row[4] = E, Row[5] = F, Row[6] = G, Row[7] = H;
}

void DeDCT(int ExB[010][010], int InB[010][010]) {
static double Tab[010][010]; static long Visits = 0L;
if (Visits++ == 0L) {
// Do, Re, Mi, Fa, Sol, La, Ti(, Do).
// You don't need trigonometry because angle bisection
// is a compass and ruler construction.
double D = sqrt(2.0), B = sqrt(2.0 - D), F = sqrt(2.0 + D);
double A = sqrt(2.0 - F), C = sqrt(2.0 - B);
double E = sqrt(2.0 + B), G = sqrt(2.0 + F);
// Normalize.
A /= 4, B /= 4, C /= 4, D /= 4, E /= 4, F /= 4, G /= 4;
// The resulting table is orthonormal if and only if
// 8D^2 = 4(B^2 + F^2) = 2(A^2 + C^2 + E^2 + G^2) = 1 and GE = EA + AC + CG.
GlomTab(Tab[0], +D,+D,+D,+D,+D,+D,+D,+D);
GlomTab(Tab[1], +G,+E,+C,+A,-A,-C,-E,-G);
GlomTab(Tab[2], +F,+B,-B,-F,-F,-B,+B,+F);
GlomTab(Tab[3], +E,-A,-G,-C,+C,+G,+A,-E);
GlomTab(Tab[4], +D,-D,-D,+D,+D,-D,-D,+D);
GlomTab(Tab[5], +C,-G,+A,+E,-E,-A,+G,-C);
GlomTab(Tab[6], +B,-F,+F,-B,-B,+F,-F,+B);
GlomTab(Tab[7], +A,-C,+E,-G,+G,-E,+C,-A);
}
for (int X = 0; X < 8; X++) for (int Y = 0; Y < 8; Y++) {
float S = 0;
for (int U = 0; U < 8; U++) for (int V = 0; V < 8; V++)
S += Tab[U][X]*Tab[V][Y]*InB[U][V];
ExB[X][Y] = (int)S;
}
}

and here it is with the integer coefficients:

void DeDCT(int ExB[010][010], int InB[010][010]) {
static int Tab[010][010]; static long Visits = 0L;
const int Norm = 2312, HalfNorm = Norm/2;
// An EXACT transform with integer coefficients, normalized at 2312.
if (Visits++ == 0L) {
// Do, Re, Mi, Fa, Sol, La, Ti(, Do).
// You don't need angle bisection either.
int A = 6, B = 7, C = 12, D = 17, E = 20, F = 23, G = 24;
GlomTab(Tab[0], +D,+D,+D,+D,+D,+D,+D,+D);
GlomTab(Tab[1], +G,+E,+C,+A,-A,-C,-E,-G);
GlomTab(Tab[2], +F,+B,-B,-F,-F,-B,+B,+F);
GlomTab(Tab[3], +E,-A,-G,-C,+C,+G,+A,-E);
GlomTab(Tab[4], +D,-D,-D,+D,+D,-D,-D,+D);
GlomTab(Tab[5], +C,-G,+A,+E,-E,-A,+G,-C);
GlomTab(Tab[6], +B,-F,+F,-B,-B,+F,-F,+B);
GlomTab(Tab[7], +A,-C,+E,-G,+G,-E,+C,-A);
}
for (int X = 0; X < 8; X++) for (int Y = 0; Y < 8; Y++) {
int S = 0;
for (int U = 0; U < 8; U++) for (int V = 0; V < 8; V++)
S += Tab[U][X]*Tab[V][Y]*InB[U][V];
ExB[X][Y] = (int)(S + HalfNorm)/Norm;
}
}

Thomas Richter

unread,
May 8, 2015, 3:07:08 AM5/8/15
to
On 08.05.2015 02:52, federat...@netzero.com wrote:

>
> Not so fast! Actually, I spoke too soon on this, too.
>
> They do NOT require redefining the current JPEG standard. Both the forward transform and reverse transform produce results that are (at least to me) visually indistinguishable from the original. Some comparative code is provided below for those wishing to experiment.
>
> And actually, the key point is not this, but that you can bridge from JPEG to the integer-based routines, where the forward and reverse "DCT" replacements produce 0 rounding error, with minimal disruption. The same standard can be used with only the DCT tabulation replaced.
>
> There is still the residual JPG <-> Integer DCT error, but once you stay inside Integer DCT format, the sequence (Decode Encode) could potentially be designed so as to reduce to the identity, thereby limiting the degradation to only one level with the identity (Encode Decode)^n = (Encode Decode) for all n = 2, 3, 4, ... This is ideally suited for replacing the lossless mode, the replacement being this (a "semi-lossless") mode or "fully-lossless" mode. The latter is if (Encode Decode) = Identity, and would require keeping the AC coefficients intact.
>
> The coefficients I'm using are those discussed in the 2011 article. Here is DCT with the JPEG coefficients (DeDCT converts an 8 x 8 block InB to ExB, just a straight algorithm for illustration's sake, none of the fancy optimizations mentioned in the 2011 article):
>
>
> const int Norm = 2312, HalfNorm = Norm/2;

> ExB[X][Y] = (int)(S + HalfNorm)/Norm;
^^^^^^^^^^^^^^^^
The problem lies here.

This is because your DCT is not normalized, unlike the DCT required by
the standard. There are now two options: Either, you divide by
sqrt(Norm) on the encoder side, normalizing it. But this prohibits
lossless reconstruction because the division is not one to one. Or you
encode upscaled coefficients (i.e. "more bits") perform the division at
decoder side, exactly as you propose, but loose backwards compatibility.

Alternatively, if the scale factor would be a power of two (which it is
not, and cannot - there are no Pythagorean numbers that are all powers
of two), you could encode more bits, but hide the extra bits from the
legacy decoder ("Refinement scans" in the language of XT now).

Actually, if you take the integer approximation of the well-known
Loeffler-factorization of the DCT, and spend three additional fractional
bits, to be encoded separately, the resulting DCT "turns out to be" one
to one on the ring of integers [0...255]. So in that sense, there is an
easier solution.

However, this solution "sucks" because you need additional three bits
per pixel for lossless compression, and one can do a lot better than
that. Which is exactly why we never considered this as a valid solution
to the problem.

The current JPEG XT part 8 - lossless coding (see for example the
implementation you find on www.jpeg.org) offers two approaches for
lossless coding: Option one is simple closed-loop coding with residual
coding in the spatial domain, i.e. you encode and decode with a fully
defined integer to integer DCT (actually, the unscaled Loeffler
factorization), compute the error in the spatial domain at encoder side,
and encode this residual error in a side channel. Ugly, but works well.

The second option is a lifting of the same factorization (essentially,
the factorization described by Plonka & Tasche), where each of the
elementary rotations in the Loeffler factorization is replaced by three
lifting matrices (which you can always do). This is integer to integer
and one-to-one by design (each lifting step is), but causes a small
error, due to the integer approximations.

Now, it seems at first sight that the second implementation (with the
P&T DCT) is more elegant and simpler, but if you compute the complexity
of the two implementations, even at the encoder side, it turns out that
it is easier to turn through the Loeffler DCT twice (forwards,
backwards, compute error) than to perform a lifted integer to integer
DCT (the P&T DCT).

Greetings,
Thomas

0 new messages