New combinatorial coder: tighter & faster than Arithmetic coding

nightlight

unread,

Nov 20, 2005, 9:34:06 AM11/20/05

to

There is a new algorithm for entropy coding, constrained
coding and fast combinatorial enumeration described in the
arXiv preprint cs.IT/0511057 (also submitted to DCC 2006):

http://arxiv.org/abs/cs.IT/0511057

It is coding based on enumerative combinatorics (walks
on Pascal triangle, counting of integer lattice paths).
The new algorithm significantly improves on the previously
known method "enumerative coding" (which is the regular
un/ranking procedure in combinatorics), increasing its speed
and reducing memory size by factor O(n), where n is number
of input symbols). It is the algorithm Rissanen was looking for
when he found a partial solution, arithmetic coding (in 1976).

The new algorithm, even the exploratory unoptimized
prototype runs much faster (see the table below, copied
from the of paper, with speedup factors over 200) and
always compresses better than the best of arithmetic
coders (such as Moffat et al 1998). Besides its usefulness
in entropy coding (for all types of file, image,...
compressors where Huffman, arithmetic or similar coding
is used in the final phase) and for encoding of complex objects
(e.g. permutations, trees, graphs, self-delimiting sequences,..
etc) the new algorithm extends significantly the domain of
enumerative combinatorics accessible to computer explorations.

Here is the abstract:
---------------------------------------------------------
Title: Quantized Indexing: Beyond Arithmetic Coding
Author: Ratko V. Tomic

Quantized Indexing is a fast and space-efficient form of
enumerative (combinatorial) coding, the strongest among
asymptotically optimal universal entropy coding algorithms.
The present advance in enumerative coding is similar to
that made by arithmetic coding with respect to its unlimited
precision predecessor, Elias coding.

The arithmetic precision, execution time, table sizes and
coding delay are all reduced by a factor O(n) at a
redundancy below log(e)/2^(g-1) bits/symbol (for n input
symbols and g-bit QI precision). Due to its tighter
enumeration, QI output redundancy is below that of
arithmetic coding (which can be derived as a lower
accuracy approximation of QI). The relative compression
gain vanishes in large n and in high entropy limits and
increases for shorter outputs and for less predictable data.

QI is significantly faster than the fastest arithmetic
coders, from factor 6 in high entropy limit to over 100
in low entropy limit (`typically' 10-20 times faster).
These speedups are result of using only 3 adds, 1 shift
and 2 array lookups (all in 32 bit precision) per less
probable symbol and no coding operations for the most
probable symbol.

Further, the exact enumeration algorithm is sharpened
and its lattice walks formulation is generalized.
A new numeric type with a broader applicability,
sliding window integer, is introduced.
---------------------------------------------------------

Below are few test results of QI vs. AC (AC: Moffat et al. 1998,
ver 3 from 2000, on its `best' settings, RAM to RAM coding,
all buffers prealloced, no slow streams; only inner coding
loops timed via sub-microsecond timers) for input sizes N
(K=1024 bits) and for given number of 1's randomly distributed
(except for the input "Vary" which represents a non-stationary
source). Both coders tested were binary order 0 coders (i.e.
they consider input data as a sequence of uncorrelated bits).
Each result is an average obtained from 500 random inputs.

Column Speed: coding times ratio TimeAC/TimeQI
(e.g. QI is up to 247.5 times faster than AC, cf. top right)

Column N: output size percent using (A/Q-1)*100
(e.g. AC output is up to 110% larger than QI's, cf. bottom left)

Input array Vary = array of int32 {...,-2,-1,0,+1,+2,...}.

---------------------------------------------------------------
#1's N:4K Speed N:8K Speed N:32K Speed N:128K Speed
---------------------------------------------------------------
8 6.846 68.3 6.421 112.8 5.447 199.6 5.966 247.5x
16 4.175 59.7 3.830 78.5 3.389 138.1 3.730 168.0
32 2.297 49.7 2.090 58.9 2.096 95.9 2.220 117.2
N/64 1.370 40.3 0.606 41.0 0.186 41.7 0.073 42.5
N/32 0.897 30.8 0.343 33.7 0.123 34.2 0.049 34.5
N/16 0.505 21.8 0.197 25.3 0.084 24.6 0.040 24.8
N/8 0.359 14.4 0.155 16.7 0.069 16.8 0.045 16.8
N/4 0.288 9.2 0.138 10.8 0.083 10.6 0.068 10.5
N/2 0.509 6.6 0.445 6.6 0.367 6.4 0.332 6.4
Vary 110.899 21.9 96.736 19.6 71.308 16.5 52.580 14.1
---------------------------------------------------------------

David A. Scott

unread,

Nov 20, 2005, 10:39:30 AM11/20/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132497246.4...@f14g2000cwb.googlegroups.com:

>
> The new algorithm, even the exploratory unoptimized
> prototype runs much faster (see the table below, copied
> from the of paper, with speedup factors over 200) and
> always compresses better than the best of arithmetic
> coders (such as Moffat et al 1998). Besides its usefulness
> in entropy coding (for all types of file, image,...
> compressors where Huffman, arithmetic or similar coding
> is used in the final phase) and for encoding of complex objects
> (e.g. permutations, trees, graphs, self-delimiting sequences,..
> etc) the new algorithm extends significantly the domain of
> enumerative combinatorics accessible to computer explorations.
>
>

Sounds like hype. Especially the statement

"always compresses better than the best of arithmetic coders"

This is obviouly a false statement. It is impossible
for any compressor to always compress better than every other
compressor. First of all the best compressors would have to
be bijective and then the best they can do is change the mappings
from one set to another. Take any compressor that could make one
stream or file smaller. Then the identity compressor could beat it
on files that the previous compressor lengthened since its a loose
loose situation if you make something else smaller or better compressed
then it has to make something else longer.

Second if you actaully took the time to test the online versions
of code that use Moffat style code you would soon relize it not the
best.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

unread,

Nov 20, 2005, 11:48:15 AM11/20/05

to

> Sounds like hype. Especially the statement "always compresses better
> than the best of arithmetic coders" This is obviouly a false statement.
> It is impossible for any compressor to always compress better than
> every other compressor.

The two are not arbitrary unrelated algorithms -- arithmetic coding
(AC) and QI are both approximations of the exact enumerative coding
(the exact enumeration of messages). The difference is that AC first
approximates the enumeration problem itself (via stirling approximation
and ignoring some factors, cf. the paper p.2), then it applies
additional precision reduction (truncation to r-bit arithmetic). The QI
doesn't do the 1st AC approximation at all, i.e. it works from exact
enumeration and it performs only the second part of the AC
approximation and even here it selects precision truncation which has 4
times smaller redundancy than that of AC's truncation. So, the AC is a
lower accuracy approximation of QI, hence the AC always produces larger
or at best equal output as QI.

What you're referring to is trivial observation that for any given
algorithm A, one can create another algorithm B which will compress a
given input X better than A. It is trivial since you can select a
message X for which A produces an output of more than one bit, then you
simply create an algorithm B which outputs 0 for message X and outputs
1 followed by <output of A> otherwise.

The relation between AC and QI is not of this kind, though. The AC
message index is the same number that QI produces, but multiplied with
a number which is strictly greater than 1 (cf. footnote on p.2 on
double index majorization by AC). This may or may not (depending on
how fractions round to next whole bit) result in a greater number of
whole bits output by AC, but mathematically it cannot ever result in a
smaller number of whole bits in the AC output.

When Rissanen was looking to solve the precision problem of the exact
enumeration (in 1976), he ran into problem which seemed intractable
(due to a missing piece of formalism in the conventional EC, which the
paper above constructs as a natural setting for QI). Hence he
approximated the enumeration first (by majorizing the index), which
allowed him to construct a decodable index -- this was the arithmetic
coding. Rissanen (who is the original creator of arithmetic coding and
of MDL modeling principle) recognized instantly the solution in this
paper, his old nemesis conquered at last, commenting: "I understand
what you have done. Very clever!"

nightlight

unread,

Nov 20, 2005, 12:07:49 PM11/20/05

to

> Second if you actaully took the time to test the online versions of code that
> use Moffat style code you would soon relize it not the best.

One can take Moffat et al code (version 3, 2000) and select some other
'accuracy vs speed' tradeoffs. If you know of a version of that coder
which is faster and produces smaller output (the figures I listed refer
to binary 0-order coders test), you are welcome present the comparison
figures between the two and I will gladly check it out. Keep in mind,
the speed factors in the paper show QI is 200+ times faster than AC on
some inputs, and typically 10-20 times faster (this was a generic QI
coder, i.e. the same loop and the same steps coding the dense and the
sparse arrays, and not something tuned to run fast for sparse arrays,
such as run length coder, which otherwise blows on size and speed for
denser inputs). There are dime a dozen quasi-arithmetic coders, which
sacrifice some compression quality (say another 5-10% larger output) to
gain some speed (a factor 2-5).

The new algorithm does much less work (and a simpler work: just an add
of the table addend, a machine word) per coding step and always does
fewer coding steps, thus it has a fundamental reason for a much better
performance than AC.

Ben Rudiak-Gould

unread,

Nov 20, 2005, 12:14:50 PM11/20/05

to

nightlight wrote:
> http://arxiv.org/abs/cs.IT/0511057

I looked quickly through the paper, and here's my impression of what it's about.

The idea of "enumerative coding" is that you divide the set of possible
unencoded messages into disjoint subsets with the property that all of the
messages in a particular subset are equally likely to occur (according to
some model). Then the coded message is a code identifying the subset which
contains the message, followed by a code identifying an element of that subset.

For example, suppose that your messages are the bit strings [01]*, and you
divide them into subsets according to how many zeroes and ones they contain:

{}, {0}, {1}, {00}, {01,10}, {11}, {000}, {001,010,100}, ...

If our message has K ones and L zeroes, we can encode the index of the
appropriate subset by encoding K and L at a cost of log K + log L bits.
Encoding the index of the appropriate element of that subset takes

log C(K,K+L) = log ((K+L)! / (K! L!))

bits.

Compare this to arithmetic coding with a static order-0 model. Here we first
encode the length of the message (K+L) and p(one) = K/(K+L), which again
takes log K + log L bits. Then for each one bit in the message we encode log
((K+L)/K) bits, and for each zero bit we encode log ((K+L)/L) bits, for a
total of

K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))

bits, where I write N? to mean N^N.

Arithmetic coding is asymptotically as efficient as enumerative coding,
since O(N?) = O(N!), but enumerative coding will always use fewer bits,
since N! < N? for interesting values of N.

I think that this paper describes an efficient algorithm for enumerative
coding for the order-0 model I used above, but generalized to arbitrarily
many symbols. If so, it won't replace coders based on sophisticated models
with arithmetic-coding back ends, but it may find a niche in applications
that use static order-0 models, such as video compression.

-- Ben

Willem

unread,

Nov 20, 2005, 12:33:41 PM11/20/05

to

nightlight wrote:
)> Sounds like hype. Especially the statement "always compresses better
)> than the best of arithmetic coders" This is obviouly a false statement.
)> It is impossible for any compressor to always compress better than
)> every other compressor.
)
) The two are not arbitrary unrelated algorithms -- arithmetic coding
) (AC) and QI are both approximations of the exact enumerative coding
) (the exact enumeration of messages).

AC is not only enumerative coding. It is also used with adaptive models
and higher-order statistics.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

nightlight

unread,

Nov 20, 2005, 1:04:48 PM11/20/05

to

> If so, it won't replace coders based on sophisticated models

> with arithmetic-coding back ends,...

The AC modeling paradigm ('probability of next single symbol' modeling
interface bottleneck) was created following path of 'virtue out of
necessity'. The necessity being the particular entanglement of the
inner most coding steps with the model parameters p(a). The EC/QI can
code the same way, too, except they can also code a much better way
(Kolmogorov modeling paradigm). There is a bit on the modeling on the
page 9 in the paper and a much longer discussion in the tech report
cited:

TR05-0625 "Quantized indexing: Background information"
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

As you may have noted in the row "Vary" of the test, which is a
non-probabilistic sequence for the order-0 coders, one can easily
provide inputs which will "surprise" catastrophically the arithmetic
0-order coder, but you can't provide inputs on which AC works well and
EC fails as dramatically (as does AC on the data "Vary"). The same
phenomenon exists for the higher order coders, as soon as the input
falkls outside of the "predictable" at a given order. The adaptive AC
fails badly, while EC merely comes out slighly sub-optimally (since the
effect of a "suprise" is always much greater on the predictor scheme
than on the 'descriptor' scheme; with EC only the frequency table
encoding may turn out suboptimal, while the bulk of output, the index
is still optimal).

The essential difference between the adaptive AC and EC/QI is that AC
subjects the modeling engine to an aritificial constraint (the
memoryless coder allowed to have only a single symbol at a time, a
limitation largely not relevant in practical coding since Morse
telegraph days). The EC can do that, too, but it normally doesn't.
Trying to predict furture, as AC modeling paradigm insists, is
inherently harder than describing the present or past. Even the
practical forms of AC modeling recognize the problem and allow a
backdoor cheating against the ideal pure prediction (via the fragile
and ad hoc escape mechanisms, where the coder is "allowed" to look
ahead and clue the decoder as to what is coming).

As another aspect of the distinction, note that the EC mdeling engine
normally needs to know only whether, say messages X and Y have the same
probabilities (whatever their values might be), while the AC modeling
engine needs to know what the actual values of the probabilities are,
which is a much harder task.

The upshot is that the native EC/QI modeling (the Kolmogorov/MDL
scheme) is much simpler and more robust than the predictive AC paradigm
(EC/QI can reuse the existent AC modeling engine, though, e.g. by
splitting the output into different probability classes, cf p. 9, [N7]).

Willem

unread,

Nov 20, 2005, 1:17:57 PM11/20/05

to

nightlight wrote:
)> If so, it won't replace coders based on sophisticated models
)> with arithmetic-coding back ends,...
)
) The AC modeling paradigm ('probability of next single symbol' modeling
) interface bottleneck) was created following path of 'virtue out of
) necessity'. The necessity being the particular entanglement of the
) ...
)
) As you may have noted in the row "Vary" of the test, which is a
) non-probabilistic sequence for the order-0 coders, one can easily
) provide inputs which will "surprise" catastrophically the arithmetic
) 0-order coder,

You don't seem to grasp the existence of higher-order models.
Would something like PPM be easy to implement using this QI coder ?

nightlight

unread,

Nov 20, 2005, 1:35:30 PM11/20/05

to

> AC is not only enumerative coding. It is also used with adaptive models
> and higher-order statistics.

The coding aspects in both cases are message enumerations -- with EC in
the exact lists of messages (performed using the virtual dictionary of
combinations), with AC in the cummulative probablity space (which only
asymptotically for large n approaches the exact EC coding, while being
uniformly suboptimal for any finite n). It is a historic fluke that EC
was presented in conjunction with the 'descriptive'/universal way of
modeling (MDL/Kolmogorov), while AC with the predictive way (such as
PPM).

Both coding methods can work with modeling engines either way, though.
The coding algorithm of AC does not separate cleanly the model
parameters from the coder arithmetic (by virtue of hardwiring
'probabilities of next single symbol' into its coding addends). Thus,
its native mode is the 'predictive' way, as a virtue out of necessity.

EC has a much cleaner separation between the modeling parameters and
the coding computations -- the modeling engine of EC defines the
enumerative classes (the subsets of "equiprobable messages"), while
coder computations deal only with the universal combinatorial
properties of sequences (belonging to a given class). The result is a
much faster operation, since the 'universal properties of sequences'
are independent of the particular source parameters, thus they can be
precomputed and optimized away, outside of the coding loop, as it were.
If AC were to code via precomputed values, it would need a separate
table for each set of source probabilities, which is impractical even
for a static binary source.

Willem

unread,

Nov 20, 2005, 1:53:20 PM11/20/05

to

nightlight wrote:
)> AC is not only enumerative coding. It is also used with adaptive models
)> and higher-order statistics.
)
) The coding aspects in both cases are message enumerations -- with EC in
) the exact lists of messages (performed using the virtual dictionary of
) combinations), with AC in the cummulative probablity space (which only
) asymptotically for large n approaches the exact EC coding, while being
) uniformly suboptimal for any finite n).

Not true. Adaptive coding is trivially capable of outperforming a model
that uses only the cumulative probability, in cases where there are local
variations in the probability distribution.

Also, adaptive coding works very well with higher-order models, where it
is intractable to transmit the model because of its exponential size.

Ben Rudiak-Gould

unread,

Nov 20, 2005, 2:19:21 PM11/20/05

to

nightlight wrote:
>The AC modeling paradigm ('probability of next single symbol' modeling
>interface bottleneck) was created following path of 'virtue out of

>necessity'. [etc, etc, etc]

Look, I understand all of this. I understand the weaknesses of PPM and of
arithmetic coding, and I understand that an enumerative coder can do better
in principle. None of that matters. What matters is this: what is your
contribution? Can you, in practice, do a better job than state-of-the-art
compressors based on PPM+ari? Going on about theoretical benefits is a waste
of time. Theoretically, the best way to write a chess-playing program is to
use minimax on the complete game tree. Actual chess-playing algorithms are
gross hacks which attempt, in a highly inelegant way, to approximate that
ideal. Pointing out that that's true accomplishes nothing. Everyone versed
in the art knows it already.

>As you may have noted in the row "Vary" of the test, which is a
>non-probabilistic sequence for the order-0 coders, one can easily
>provide inputs which will "surprise" catastrophically the arithmetic
>0-order coder, but you can't provide inputs on which AC works well and
>EC fails as dramatically (as does AC on the data "Vary").

Two questions. First, do you understand the difference between adaptive
order-0 and static order-0 models? Second, how would the "QI" and "AC" in
your tests perform if the input consisted of N/2 zeroes followed by N/2 ones?

-- Ben

nightlight

unread,

Nov 20, 2005, 2:29:41 PM11/20/05

to

>>) As you may have noted in the row "Vary" of the test, which is a

>> non-probabilistic sequence for the order-0 coders, one can easily

>> provide inputs which will "surprise" catastrophically the arithmetic

>> 0-order coder,

> You don't seem to grasp the existence of higher-order models.

That was a test of the same order 0 coders. At any order, you can
suprise the "predictive" scheme catastrophically (unless it cheats via
some ad hoc esc mechanism, allowing it to use temporarily universal way
of coding), while you can't do so to a descriptive/universal scheme.

> Would something like PPM be easy to implement using this QI coder ?

There are couple ways, at least, to do that. QI can code (via lattice
jumps) to the 'probability of the next single symbol' exactly as AC,
but that costs it much of speed advantage (since mantissa scaling used
by the jumps, requires multiplications & divisions). But it can also
code from the AC modeling engine instructions much faster by splitting
the outputs into the probability classes (which needs also an error
estimates on the probabilities given to the coder; the scheme grows the
classes in the manner of Context Tree Weighing modeling scheme).

OTOH, it could do much better, with any given a priori knowledge about
the input that is fed to PPM, by coding the native EC way, which is by
splitting the input into proper enumerative classes. Such segmentation
can be done based on the last s symbols (e.g. for order-s static Markov
source; a method similar to Context Tree Weighing to select the output
stream), or dynamically via BWT-like segmentation which doesn't assume
a priory idealized finite order Markov source (you can check the
discussion of this in the Ref [23] cited earlier). Even with the kludgy
ad hoc entropy coding schemes (such as MTF variants which crudely
mangle the enumerative classes present in the final BWT matrix), BWT is
already surprisingly competitive with PPM in compression quality, while
executing much quicker. With the proper handling of these presently
mangled BWT enumerative classes, which QI is naturally suited to do
quickly and optimally, BWT should consistently outperform PPM not just
in speed but in compression quality.

In short, AC is a particular historical approximation of QI, hence
anything AC can do, QI can do but generally much faster (or at least as
fast) and generally more accurately (or at least as accurately).
Similarly, the PPM or generally the predictive way of modeling via the
'probability of the next single symbol' is another historical accident
(tailored to a particular hardwired coding algorithm of AC), by no
means the only (and certainly not the best) way to model the properties
of finite sequences of symbols. AC algorithm happens to require that
all properties must be translated into exactly such form (as 'the
probabilities of the single next symbol') so it can encode it. That is
more of a handicap than a strength for the modeling engine to bend and
distort everything else around the AC computational peculiarities, to
have to translate and funnel all the conceivable properties of symbol
sequences available to modeling engine through that particular coder
interface bottleneck.

So, instead of questing anyones grasp of 'higher order models' you
might wish to reaxamine first, how do some manage to convince
themselves to perceive, with all their sincerity, a plain, gross
handicap as a great strength.

nightlight

unread,

Nov 20, 2005, 3:39:00 PM11/20/05

to

> Can you, in practice, do a better job than state-of-the-art compressors
> based on PPM+ari?

QI is a very new algorithm (and a first truly practical form of genuine
EC). A better BWT variant, without mangling of its contexts (as the MTF
phase presently does), is certainly high on the list of things to apply
QI to. It is still only one among many interesting possibilities
opened by the existence of practical EC. The whole field of practical
EC modeling is presently in a very primitive form and most people have
many misconceptions what it is (or generally on modeling, identifying
it entirely with the AC modeling paradigm). Hopefully a publication of
a clean, well performing reference implementation of QI (on which I am
working, and which will perform even better than the tests results of
the crude research prototype shown in the preprint) in the forthcoming
months, should bring more people to try out new ways of modeling and
answer those questions.

> First, do you understand the difference between adaptive order-0 and
> static order-0 models? Second, how would the "QI" and "AC" in
> your tests perform if the input consisted of N/2 zeroes followed by N/2
> ones?

The sequence "Vary" was a sequence similar to that, since the negative
half of that sequence is mostly 1's and the positive part mostly 0's.
The rest of the answer obviously depends on how the QI segments the
sequence, what are the rules of the game. If you insist on requiring
that QI/EC models ...000111... sequence in a single block of size N,
then obviously it will produce output which is N + log*(N) + 1/2 log(N)
bits (since it needs to encode the length of a sequence in log*(N) =
log(log(...N) bits and the count of 1's in log(N) bits, while the
index is C(N,N/2) which is Catalan number with approx N - 1/2 log(N)
bits). Of course, if you wish to compare apples to apples, then if the
predictive AC coder is allowed some adaptation, one would need to allow
QI to also have some kind of corresponding capabilities (i.e. not to
function as a static order-0 coder), such as allowing the encoder to
select the input segmentation, in which case it would produce output
smaller than adaptive AC (as array "Vary" illustrates). Comparing
static vs adaptive order-0 coders is comparing apples and oranges.

There is no instrinsic algorithmic constraint for EC/QI to code static
order-0 way (i.e. a la Lynch-Davisson 1966 EC coder). If you take AC
and EC/QI and make both code static order-0 way with the same
segmentation, the EC/QI will give you slighly smaller output and code
much faster. Then if you make AC adaptive, with adaptation rate faster
than N/2, you need to decide what is the corresponding EC/QI mode of
coding? Whatever it is, it certainly cannot be the same scheme that was
just tested as a comparison of the static order-0 coders, which is what
you seem to be hinting it ought to be. The most natural EC/QI
counterpart to AC adapting faster than N/2 is to allow EC/QI coder the
segmentation choice to at least 2 parts, in which case it will come out
on top again (as the array "Vary" showed; even though the QI coder
tested was somewhat more primitive than that: it had simply used fixed
1Kbit blocks).

In any case, this particular tangent is more about verbal pigeonholing,
rather than about the substance of the difference. The basic difference
between AC and QI coding algorithms is that AC enumerates messages via
cummulative probabilities and EC/QI via exact ranking of combinations
of symbol sequences. The latter way is more accurate and it is much
faster (since much of the coder work is with the universal sequence
properties, thus it is precomputed in the quantized binomial tables,
outside of the coding loop).

The modeling scheme is an entirely unrelated issue. You can select a
modeling scheme where both coders servicing that modeler perform
equally poorly (e.g. by forcing QI to do lattice jumps a la AC, in
which case it will have to work exactly as hard per symbol as AC; with
adaptive output splitting QI will compress well as adaptive AC, but
code much faster). Generally, though, given the same modeling scheme,
QI will always perform at least as well as AC, in speed and in accuracy
(but mostly better, especially in speed). Anything else is comparing
apples and oranges (note that the test in the row "Vary" is of that
apple-vs-orange kind, except that QI was the one slightly handicapped
since it wasn't really as adaptive as the EC counterpart of adaptive AC
ought to be i.e. free select at least one input partition for AC
allowed to adapt faster than N/2; QI/EC produces 0 length index for
sequence of all 1's or all 0's).

nightlight

unread,

Nov 20, 2005, 4:02:39 PM11/20/05

to

> Adaptive coding is trivially capable of outperforming a
> model that uses only the cumulative probability, in cases
> where there are local variations in the probability distribution.

You are comparing different modeling schemes (adaptive vs static),
while I was comparing the bare coding algorithms (both can be attached
to the same model). At any given point AC gives a position (an
interval) on the cummulative probability line for the sequence seen so
far, while EC/QI gives the exact ordinal of the same sequence in the
same list of sequences (the list is what model produces, we assume the
same model and look at the coder difference). The only coder level
difference is that the cummulative probabilty index is an approximation
(a uniform majorization via the Stirling approx. of binomials and
dropping of the two factors < 1, cf p. 2 and ref [23] for more details)
of the exact ordinal index and that the QI algorithm can compute it
generally much faster than AC algorithm can (and always at least as
fast). The reason for the speed difference is discussed in ref [23] --
it is basically that the QI algorithm has much cleaner separation
between the coding computations and the specific instance properties of
the source, than the AC alogotithm, hence QI can shift much of the
coding computation outside of the instance specific coding loop.

> Also, adaptive coding works very well with higher-order models, where
> it is intractable to transmit the model because of its exponential size.

This is an interesting point, but outside of the comparison of QI and
AC coding algorithms.

Taken on its own, though, your statement is a bit self-contradictory,
since the model information obviously can be transmitted in
sub-exponential size (e.g. via Adaptive AC scheme). What you really
mean is that the first simple-minded encoding of the model parameters
that you can came up with, off the top of your head, is exponential.
Well, ok, so what? For example, the BWT, which is a bit less obvious
way to extract and transmit a higher order models, is not exponential
in size (and it is also not an adaptive PPM scheme). In fact, even when
coupled with the crude MTF obliteration of the context boundaries, it
performs nearly as well as PPM in compression quality (while running
much faster).

nightlight

unread,

Nov 20, 2005, 5:41:51 PM11/20/05

to

The most relevant papers for this discussion are available online at:

Quantized Indexing References
http://www.1stworks.com/ref/tr/tr05-0625r.htm

Specifically of interest here are the papers:

5. J. Rissanen Generalised Kraft inequality and arithmetic coding
IBM J. Res. Dev. 20, 198-203, 19 76
http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

27. J. Rissanen Arithmetic codings as number representations
Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979
http://www.1stworks.com/ref/ariNumRepr.pdf

28. L.D. Davisson Universal noiseless coding
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973

http://cg.ensmp.fr/%7Evert/proj/bibli/local/Davisson1973Universal.pdf

29. J. Rissanen, G.G. Langdon Universal Modeling and Coding
IEEE Trans. Inform. Theory IT-19 (1), 12-23, 1981

http://cg.ensmp.fr/%7Evert/proj/bibli/local/Rissanen1981Universal.pdf

34. J.G. Cleary, I.H. Witten A Comparison of Enumerative and Adaptive
Codes, IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984
http://www.1stworks.com/ref/Cleary84Enum.pdf

Phil Carmody

unread,

Nov 20, 2005, 5:58:02 PM11/20/05

to

"nightlight" <night...@omegapoint.com> writes:
> the speed factors in the paper show QI is 200+ times faster than AC on

The question this raises for me is why AC implementations are 200x
slower than reading from RAM? That sounds a little slow. Computer
arithmetic is now incredibly fast, and RAM is slow in comparison.

/If/ AC is only 50x slower than reading from RAM, say, (a figure that
wouldn't surprise me), then there's no way on earth that QI could be
200 times faster in _any_ circumstances.

Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow

Phil Carmody

unread,

Nov 20, 2005, 6:04:06 PM11/20/05

to

"nightlight" <night...@omegapoint.com> writes:
> > First, do you understand the difference between adaptive order-0 and
> > static order-0 models? Second, how would the "QI" and "AC" in
> > your tests perform if the input consisted of N/2 zeroes followed by N/2
> > ones?
>
> The sequence "Vary" was a sequence similar to that, since the negative
> half of that sequence is mostly 1's and the positive part mostly 0's.

What do you mean by "negative half" and "positive part"?
Applying the concept of sign to sequences of bits is, erm, unconventional.

Phil Carmody

unread,

Nov 20, 2005, 6:09:48 PM11/20/05

to

Ben Rudiak-Gould <br276d...@cam.ac.uk> writes:
> X-Accept-Language: en-us, en

I'm confused - what on earth is the point of such a header in
news-posting context? And why is a cantabrigian prioritising
en-us above en?!?!?

Send my regards to the L&LL,

nightlight

unread,

Nov 20, 2005, 6:35:47 PM11/20/05

to

> The question this raises for me is why AC implementations are 200x
> slower than reading from RAM?

The AC has to process and encode each bit, 32 per machine word, for
which QI needs only to establish if there is any of the "less frequent
symbols" (e.g. 1). Even though that AC implementation uses large loop
unrolling and other speed optimizations, it only needs 8 times more
work per one bit than QI needs to fetch the dword and answer "is it
only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to
get the speed factor of 256. If you look their code (binary coder, no
contexts):

http://www.cs.mu.oz.au/%7Ealistair/arith_coder/

you will see that they certainly do that much more work per encoding of
each bit, in fact even more than the ratios show, if you count the
numbers of instructions. But, as you note, the fast processors with
lots of cache mask out some of the speed difference, penalizing more
the new memory reads, which both coders have to do. On an older 400Mhz
Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI
than on the recent VAIO lapop where those figures in the preprint came
from.

nightlight

unread,

Nov 20, 2005, 8:19:24 PM11/20/05

to

>> The sequence "Vary" was a sequence similar to that, since the negative
>> half of that sequence is mostly 1's and the positive part mostly 0's.
>
> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm, unconventional.

The negative 32 bit integer, say -1 has 32 1's. The idea was to give
both coders (which were order 0 binary coders, thus they view data as
sequence of uncorrelated bits) some simple sequence which has
regularity invisible to order-0 (but visible to some higher order
coder). It was meant to illustrate the robustness of the universal way
of coding of the EC/QI coder and the fragility of the
predictive/adaptive way of the AC coder.

The origin of the greater robustness in the "descriptive" way of
modeling is in the fact that effect of the "surprise" affects chiefly
the count component of the output, which for the 1K input block is on
average a 10 bit number (e.g. specifying # of 1's in the block). The
rest is a pure combinatorial index, giving the shortest possible
decription, absent any further knowledge by the coder, on how that
number of bits is placed in that block. With the adaptive AC, each
coding step and the entire output is affected by the incorrectly
conjectured probabilities. Thus it produces as much as twice the
output size of the QI.

Obviously, the AC can be made to code the 'universal'/descriptive way,
in which case the error would be much smaller (similar to the rest of
tables, for the matching density of 1's and input size i.e. few percent
extra output), while the speed ratio would still remain dramatically in
QI favor.

The example was meant to help demythologize a bit the
'adaptive/predictive coding mystique' that lots of people are
apparently mesmerized by (as the previous discussion in this thread
illustrates). The 'predictive coding', at any modeling order, is a
handicap (a historical fluke of bending the entire modeling engine to
the peculiarities of the particular interpretation of the AC
computational algorithm, the perceived need to have probability of next
symbol to perform the encoding), not a strength as often touted, and a
'descriptive'/universal type algorithm at the same order of modeling
will always come out ahead.

When AC is made to code the universal/descriptive way, the
"probabilities" used for the computation are simply interpreted as
normalized plain counts (instead of the presumed infinite n limits of
such ratios, the 'probabilities", which is the adaptive AC
interpretation of the AC algorithm and which contains a much bigger
implicit assumption about the infinite rest of the sequence, that in
the case of the array "Vary" takes it to a catastrophically wrong
track). But in this universal mode, AC uses the plain normalized counts
in the exactly same way as the counts of EC/QI, slightly less
accurately (by 1/2 log(n) excess bits in the AC output, for the
sequence of length n) and encoded much more slowly. The Rissanen's 1976
coder (which was the first finite precision arithmetic coder) was
coding this way. See the reference:

5. J. Rissanen Generalised Kraft inequality and arithmetic coding
IBM J. Res. Dev. 20, 198-203, 19 76
http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

where the quantities Phi (in eq. 5) are merely the approximate binomial
coefficients of regular EC (see footnote on page 2 in the preprint and
ref [23] for details). Although Rissanen suggests in that paper a table
based faster coding alternative (which allows skipping of the most
frequent symbol as in QI), for the AC one needs different tables of
O(n^2) size for each source probability, while QI needs only a single
table of O(n^2), the quantized binomials C(n,k), which is valid for all
quasi-stationary sources and all probabilities.

That's why the AC algorithm was a doubly partial solution of the
precision problem of the exact enumeration -- it not only produced an
approximate (and a larger) enumerative index, but it also left unsolved
the O(n^3) table size problem of the exact enumeration (which it then
worked around by trading the table size for quite a few extra CPU
cycles, as illustrated in the tests chart). The QI solves both problems
- it takes the factor O(n) out of the exact EC computation speed, as
did AC, and it takes the factor O(n) out of the table sizes, which AC
missed (thus QI doesn't need to trade off the coding speed to reduce
the table sizes as AC does; although QI can do that kind of tradeoffs
too, cf. page 9, [N2] in the preprint, but from a much better starting
point than AC). Rissanen was pleasantly surprised to see that "great
difficulty" (see the preprint pages 1-2, for citations) finaly solved
optimally and commented on the solution "Very clever!"

Malcolm Taylor

unread,

Nov 20, 2005, 9:55:00 PM11/20/05

to

How about comparing QI with a faster and more efficient binary
arithmetic coder such as the fpaq one (see
http://www.cs.fit.edu/~mmahoney/compression/ for more info).

Malcolm

David A. Scott

unread,

Nov 21, 2005, 12:58:28 AM11/21/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132514981.0...@g44g2000cwa.googlegroups.com:

>
> That was a test of the same order 0 coders. At any order, you can
> suprise the "predictive" scheme catastrophically (unless it cheats via
> some ad hoc esc mechanism, allowing it to use temporarily universal way
> of coding), while you can't do so to a descriptive/universal scheme.
>
>

Look is your so called coded able to actaully compress some files.
Do you have test files to test the order 0 codings. So one can campare
it to real world order 0 encoders. With out real examples one can test
I don't belive you.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

David A. Scott

unread,

Nov 21, 2005, 1:05:44 AM11/21/05

to

Phil Carmody <thefatphi...@yahoo.co.uk> wrote in
news:87wtj2n...@megaspaz.fatphil.org:

> "nightlight" <night...@omegapoint.com> writes:
>> > First, do you understand the difference between adaptive order-0
>> > and static order-0 models? Second, how would the "QI" and "AC" in
>> > your tests perform if the input consisted of N/2 zeroes followed by
>> > N/2 ones?
>>
>> The sequence "Vary" was a sequence similar to that, since the
>> negative half of that sequence is mostly 1's and the positive part
>> mostly 0's.
>
> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm,
> unconventional.
>
> Phil

I guess that means if your given say a file with say one
thousand bytes of zero followed by say one thousand bytes of all
ones 0xFF he has no idea how his compressor or some arbitrary
arithmetic would compress it. It really should not be that hard
to test. But maybe he can't test it with his method yet.

Willem

unread,

Nov 21, 2005, 4:45:20 AM11/21/05

to

nightlight wrote:
) The negative 32 bit integer, say -1 has 32 1's. The idea was to give
) both coders (which were order 0 binary coders, thus they view data as
) sequence of uncorrelated bits) some simple sequence which has
) regularity invisible to order-0 (but visible to some higher order
) coder). It was meant to illustrate the robustness of the universal way
) of coding of the EC/QI coder and the fragility of the
) predictive/adaptive way of the AC coder.

If an adaptive order-0 AC coder performs badly on a sequence that has
varying statistics, then you wrote a shitty adaptive model.
A good adaptive model views symbols that are nearer with more weight.

It's no wonder that badly written code is outperformed.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

nightlight

unread,

Nov 21, 2005, 4:49:22 AM11/21/05

to

> How about comparing QI with a faster and more efficient binary
> arithmetic coder such as the fpaq one (see
> http://www.cs.fit.edu/~mmahoney/compression/ for more info).

QI is a fundamental algoritm not a program (although several research
variants of the algoritham are implemented). If your algorithm needs to
do more work than about 3 adds, 1 shift and two 1-dim array reads, per
less probably symbol, and any operation at all (other than traversing
the array to examine what the symbols are) per more probable symbol,
then it is inherently slower algorithm than QI. Any arithmetic coder
falls in this category, since a general purpose AC needs to encode 0's
and 1's explicitly. Check the simplicity of the generic EC/QI coding
loop (from p. 8 in [23])

23. TR05-0625 "Quantized indexing: Background information"
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

I=0; // init cumulative path index
for(k=n=0; n<M; ++n) // M is the number of input bits
{ // or a block size
if (getbit(n)==1) // get bit at 0-based offset n
{ // skip on bit=0, add on bit=1
++k; // increment count of 1's found
I=I+C(n,k); // add binomial to the path index
}
} // k is now the final count of 1's

The encoding step of the less frequent symbol (bit=1 by convention; for
the more frequent symbol only the outer loop executes) is simply an add
I=I+C(n,k); of a quantized binomial coefficient C(n,k), which is a 32
bit integer (taken out of an 1-dim array as (row[n])[k], with another
array giving the row n subarray) to a large integer "I" (which is of
the size of entire output). The 32 bit int C(n,k) is always added to
the leading 32 bits of "I" (thus at most 1 carry can occur into
leading 0's of "I"). The symbolic getbit(n) gets the next unprocessed
input bit, which is just a shift of a 32 bit word holding the next set
of 32 input bits. That is really as simple and as fast as entropy
coding can ever get. The reason for such extreme simplicity is that the
real caluculation, the one of the quantized addends C(n,k), was done
outside of the coding loop.

That's the basic beauty of the EC/QI's much cleaner division of labor
than the one of AC (see [23] pages 19-26, 30-32), where each addend is
inseparably entangled with the source instance properties (the
particular symbol probabilities at that point), hence the addends can't
be precomputed generically/universally for all sources an taken outside
of the AC coding loop. That gives QI a huge and a fundamental
algorithmic edge which can't be compensated by any special programming
tricks available uniquely to AC.

Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter) were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code. We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level).

Any other arithmetic coder which improves on Moffat coder in these
kinds of simple common sense but otherwise trivial ways (as the
fpaq0.cpp seems to show), without providing any essential improvement
at the algorithmic level will do roughly the same as the chart showed
for the streamlined Moffat's coder.

Willem

unread,

Nov 21, 2005, 4:52:27 AM11/21/05

to

nightlight wrote:
)> How about comparing QI with a faster and more efficient binary
)> arithmetic coder such as the fpaq one (see
)> http://www.cs.fit.edu/~mmahoney/compression/ for more info).
)
) QI is a fundamental algoritm not a program (although several research
) variants of the algoritham are implemented). If your algorithm needs to
) do more work than about 3 adds, 1 shift and two 1-dim array reads, per

How large are these arrays ?

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

nightlight

unread,

Nov 21, 2005, 6:30:36 AM11/21/05

to

> If an adaptive order-0 AC coder performs badly on a sequence
> that has varying statistics, then you wrote a shitty adaptive
> model.
> A good adaptive model views symbols that are nearer with more weight.
>

Making AC adapt faster to more variable inputs, costs in extra
redundancy for the stationary inputs (which don't need a waste in code
space equivalent to an explicitly reserved escape code space used to
change the statistics on the fly). You can't have it both ways, optimal
for the stationary or slow varying source yet adapting fast to the
dynamic source, while coding within the adaptive AC scheme.

The faster AC adaptation implies fewer significant bits in the
probabilities p(a). You will have roughly 1/2 log(n) bits in p(a) if
you count exactly, with full weight, the n past bits, i.e. p(a) has
only half the precision bits of the precision of n. Any forgetting of
the older stuff (be it via sharp cutoff or some decaying law) simply
shortens the effective n, thus the precision of p(a). The fast changing
AC thus comes down to the similar approximation that quasi-arithmetic
coders do when they approximate p(a) with numbers which have only few
bits set, so they can scale the AC interval quicker via few shifts and
adds instead of multiplies. Here it would make AC more robust to the
sudden drastic changes of statistics, but at the cost of extra 5-10
percentage points compression loss for the stationary inputs.

Thus, this is a matter of a tradeoff you need to select. In the extreme
preference for the robustness to the input dynamics, you can easily
have AC code the 'universal' / descriptive way instead of the usual
adaptive / predictive way, i.e. make it code exactly as QI does, which
is extremely robust to statistical variability and which with QI is
optimal, while the AC coding this way will still produces an excess of
O(log(n)) output bits (where n is the input size or indexing block
size; the regular adaptive AC has this funadmental excess vs QI as
well, cf. preprint footnote on page 2). The O(log(n)) bits AC output
excess becomes pretty significant compared to the output size if the
source entropy is low (see top rows in the earlier chart) or if the n
is small (which may be required for very dynamic sources, in order to
segment the input into quasi-static sections), as illustrated by the
leftmost column in the test chart

Moffat et al 1998, which is the AC we used, explain their particular
tradeoff in their paper (hyperlinked earlier). You can tell them what
you think of their order 0 binary coder if you think you have a case.
I don't think you do. The problem is inherent in the adaptive /
predictive coding scheme -- they could choose to look Ok in the last
row of the chart, while looking much worse in the top rows (when just
few bits are set), or they could choose what they did, look bad in the
last row and look Ok in the top rows. On the other hand, there is no
such an input where they would look Ok, while QI would look bad (not
just at order-0, but at any order source, provided both coders code in
the same order model, obviously).

It is a trivial observation (but which you nevertheless keep bringing
up in different wordings), that you can make the two coders use
inadequate order model for some source and have both look very bad on
such source, compared to some other coder using a higher order model.
As long as you compare apples to apples and oranges to organges, thus
let them use the same order model (adequate or inadequate to the order
of the source), the adaptive AC will always do worse than QI, and very
much so for the very unpredicatble sources. That is the fundamental
weakness or fragility of any predictive scheme when faced with
unpredictable (for the model order that coder has) inputs.

The universal / descriptive schemes don't have such fatal fragility,
which was the whole point of the Davisson's minimax universal codes
(constructed using enumerative coding) -- they may code slightly
suboptimally, when given the inadequate order model for a given source,
but their worst case is never farther than about O(log(n)/n) bits per
symbol from the output of the best specialized coder which was tuned
specifically to the particular source. (And when they do know or model
the proper source parameters, the EC/QI code optimally.)

The point of the array "Vary" in the test was to illustrate how an
adaptive AC, when tuned to perform nearly optimally on the stationary
inputs, thus to appear fairly competitive with QI on such inputs (with
no more than about 6-7% worse output in the rest of the chart), is very
fragile and it breaks down catastrophically if the statistics suddenly
changes. The array "Vary" has only one drastic change of statistics, in
the very middle of the array when -1 goes to 0, hence the negative
effects on AC decrease as the array size increases. Had the statistics
changed again in the longer arrays such as in 128k input, the AC
performance would have been reset down, to the worse performance
figures from the shorter arrays.

nightlight

unread,

Nov 21, 2005, 7:10:23 AM11/21/05

to

>> two 1-dim array reads, per
>
> How large are these arrays ?

In the code used for the test results in the preprint, the two arrays
were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
computed cf. report page 9, [N2]) in the main quantized binomial table
and n pointers in the row pointer array (pointing into the rows of
binomial table). The n used was 1024, hence the big table was 256K
entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
machine it was tested on. There would have been no visible difference
in the compression ratios for the the range of the tests, had only 16
bit mantissas been used by QI, thus the table could have been 1/2
Mbyte. But the research prototype code wasn't really optimized, it
simply kept table space for the maximum mantissa it may ever use and
there was no need to waste time coding separate 16-bit mantissa
routines to save a bit of table memory.

If one were aiming to merely match the AC (and not to be uniformly
ahead) on compression, a much smaller table could have been used, e.g.
n=256 needs for the big table only 16K entries of 8 bits each to
produce an output excess below 1 bit per 256 bit input block (the QI
worst case redundancy is below log(e)/2^(g-1) bits/symbol for precision
g). One can also trade off the table size for some execution time (cf.
preprint, p. 9, [N2]), by skipping the rows of binomials, e.g. by
skipping evey 2nd row in Pascal triangle, you need an extra add to
compute a missing binomial, hence on average it has on average 1/2 add
per step more work, but the big table size drops in half, e.g. to 8k
for the n=256 fixed block coder.

Similarly, the same tables of log(x!) used for the exponent
computations (see p.9 [N2]), are accurate enough to give you all but
the lowest 10-11 bits of the 32 bit mantissa for the quantized
binomials, although, again, this kind of saving (to cut down the big
table to 1/3rd) wasn't a great priority. For an embedded production
coder, yes, one would certainly take advantage of such savings. For a
desktop coder, e.g. in a video codec, it probably wouldn't matter much.

nightlight

unread,

Nov 21, 2005, 7:17:02 AM11/21/05

to

>> two 1-dim array reads, per

> How large are these arrays ?

In the code used for the test results in the preprint, the two arrays

were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
computed cf. report page 9, [N2]) in the main quantized binomial table
and n pointers in the row pointer array (pointing into the rows of
binomial table). The n used was 1024, hence the big table was 256K
entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
machine it was tested on. There would have been no visible difference
in the compression ratios for the the range of the tests, had only 16
bit mantissas been used by QI, thus the table could have been 1/2
Mbyte. But the research prototype code wasn't really optimized, it
simply kept table space for the maximum mantissa it may ever use and
there was no need to waste time coding separate 16-bit mantissa
routines to save a bit of table memory.

If one were aiming to merely match the AC on compression (and not
to be uniformly ahead), a much smaller table could have been used
such as n=256, which needs for the big table only 16K entries of 8 bits

each to produce an output excess of less than 1 bit per 256 bit input

block
(the QI worst case redundancy is below log(e)/2^(g-1) bits/symbol for
precision g). One can also trade off the table size for some execution
time
(cf. preprint, p. 9, [N2]), by skipping the rows of binomials, e.g. by

skipping every 2nd row in Pascal triangle, you need an extra add to
compute a missing binomial, hence on average it has 1/2 add per step

Thomas Richter

unread,

Nov 21, 2005, 7:17:59 AM11/21/05

to

Hi,

> QI is a fundamental algoritm not a program (although several research
> variants of the algoritham are implemented). If your algorithm needs to
> do more work than about 3 adds, 1 shift and two 1-dim array reads, per
> less probably symbol, and any operation at all (other than traversing
> the array to examine what the symbols are) per more probable symbol,
> then it is inherently slower algorithm than QI. Any arithmetic coder
> falls in this category, since a general purpose AC needs to encode 0's
> and 1's explicitly. Check the simplicity of the generic EC/QI coding
> loop (from p. 8 in [23])

This is incorrect. Check the MQ and QM coder family. Typical
implementations have about the same complexity and are fine
implementations of arithmetic coding.

So long,
Thomas

Willem

unread,

Nov 21, 2005, 7:22:30 AM11/21/05

to

nightlight wrote:
)>> two 1-dim array reads, per
)>
)> How large are these arrays ?
)
) In the code used for the test results in the preprint, the two arrays
) were n^2/4 of 32 bit entries (just mantissa was stored; exponent was
) computed cf. report page 9, [N2]) in the main quantized binomial table
) and n pointers in the row pointer array (pointing into the rows of
) binomial table). The n used was 1024, hence the big table was 256K
) entries i.e. 1 Mbyte, which was about 0.05% of the memory (2G) in the
) machine it was tested on. ...

Are you familiar with the concepts of CPU caches ?
A cache miss on most CPUs is easily as costly as a multiply.
On modern CPUs it may get near the cost of a division.

nightlight

unread,

Nov 21, 2005, 8:41:31 AM11/21/05

to

> Are you familiar with the concepts of CPU caches ?
> A cache miss on most CPUs is easily as costly as a multiply.
> On modern CPUs it may get near the cost of a division.

Of course, that's why n=1K was used for the tests. Using larger n made
the active parts of the table (which are roughly the sqrt(n) wide band
around the straight line k = p*n for given probability of 1, p) fall
out of the cache on some machines, reducing the speed ratios against
AC. The 1k seems to be just about the right size for most of todays
PC's laptops & desktops. Now, with the older CPUs, e.g. 300-400Mhz old
Celerons from 6-7 years ago, the cache is small, but the ratio of the
cache speed vs main RAM speed is small enough, so that the speed ratios
in the chart (QI vs AC) look even better on such machines by about
factor 1.5-2, even with larger n (e.g. 2 or 4k). With larger n, the
compression is usually better for very low entropy, and it also runs
faster since the inter-block housekeeping may get costly if the blocks
are too small (due to the mixed radix coding used to eliminate bit
fraction losses at the block boundaries; in practice one may ignore
this tiny compression improvement, but it does matter for accuracy
benchmarks).

The production version of QI targeted to some CPU range, could easily
optimize the tradeoff by using the right kind of row skipping &
mantissa interpolation, to reduce the table size if the cache gain
offsets the extra calculation for the interpolations. Other simple
improvement (which wasn't used in the tests, bit which does work well
in another prototype coder) is to use variable n for different sections
of the table. Normally, for fixed block coding, the table is triangular
(Pascal triangle). But one can get much more bang per K of tables, by
using V shaped table, which has much longer n range near the lattice
axes (addends fo sparse densities) and shorter n near the center of
symmetry of Pascal triangle (the high entropy limit). In the simple
fixed block coding used in the test, the accuracy was a large overkill
in the high entropy regions, which could have easily (with more
complicated code) been cut in 1/2 or 1/3rd without affecting the
compression ratios. You can check Lawrence & Tjalkens et al papers on
this kind of enumerative coding:

12. J. C. Lawrence "A new universal coding scheme for the binary
memoryless source" IEEE Trans. Inform. Theory IT-23 (4), 466-472,
1977
http://citeseer.ist.psu.edu/lawrence77new.html

13. T.J. Tjalkens, F.M.J. Willems "A universal variable-to-fixed
length source code based on Lawrence's algorithm" IEEE Trans.
Inform. Theory IT-38 (2), 247-253, 1992
http://citeseer.ist.psu.edu/585782.html

Message has been deleted

nightlight

unread,

Nov 21, 2005, 9:05:23 AM11/21/05

to

>> This is incorrect. Check the MQ and QM coder family. Typical
>> implementations have about the same complexity and are fine
>> implementations of arithmetic coding.

It is correect if you restrict the statement to full precision
arithmetic coders, as I did, (thus the Moffat et al 1998 coder is the
best of full AC implementations). The quasi-arithmetic coders which
sacrifice 5-10% of compression for speed are dime a dozen. Moffat98
paper shows some benchmarks against the best Q coder implementation
they found and it runs only 2-3 times faster than their binary coder.
That is still up to 100 times slower than QI, at the great compression
cost in the low entropy range of the inputs (it would do Ok only
on medium to high entropy inputs, just a point or two below the full
AC).

But, even the quasi-AC still needs to either code each symbol, 0 and 1,
or they need an O(n^2) size table for each different probability (they
may just use a few of such tables, with very coarse quantization of
probabilities) -- you can also check Rissanen's discussion of these two
tradeoffs he already outlined in his 1976 AC paper. An AC coder may
also use a multiplication table of O(n^2) size, thus do no
multiplications, but it still needs to code each symbol 0 and 1. To
avoid coding each symbol, like QI, there is no way for AC around having
a different O(n^2) table for each different probability (however many
quantized p() levels the pick).

If you think someone can do it, code via AC only the less frequent
symbol and just skip over the most frequent symbol (without
renormalization) and still use the single O(n^2) table for
all probabilities, post a link to a paper showing that.

Basically, the AC enumerative addends which avoid the coding on the
more probable symbol are of the form A(n,k) = p^k q^(n-k) (where p=p(1)
and q=1-p = p(0) >= p(1), k=count of 1's, n=number of coded symbols so
far; you get these 1-step addends by expanding the regular AC scaling,
cf. [23] pp. 19-22). They have the probability p hardwired into the
addend. You can have a table of A(n,k) for AC, just as you have the
quantized binomial table C(n,k) for QI, and then each coder will do
exactly the same amount of work per coding step, but the entries in the
AC table work only for the given p & q used to compute that table,
while the entries in the single QI table work for all p & q. The AC is
simply an approximation (of QI), which in the approximation process
(cf. preprint footnote on p.2 ) has mangled the delicate combinatorics
of the binomial coefficients, entangling them with the p & q, and has
lost the addend universality.

When I post the reference implementation of QI on the coder page (in
the next 2-3 months), anyone is welcome to test it against different
kinds of quasi-arithmetic coders. I would be curious to see the
results.

[this may a duplicate, since I already posted this reply, but it didn't
show up somehow.]

Willem

unread,

Nov 21, 2005, 9:44:36 AM11/21/05

to

nightlight wrote:
) ...
) tradeoffs he already outlined in his 1976 AC paper. An AC coder may
) also use a multiplication table of O(n^2) size, thus do no
) multiplications, ...

Are you aware of the fact that a multiplication table is
a very stupid idea on any recent CPU ?

nightlight

unread,

Nov 21, 2005, 10:40:09 AM11/21/05

to

> Are you aware of the fact that a multiplication table is
> a very stupid idea on any recent CPU ?

Depends what you call "any recent CPU". For battery powered, embedded
processors (and there are more of those than deskotops; and they're as
"recent"), the absence of multiplications and divisions in a coder is
quite an important element for stretching the battery life. What may
be a "stupid idea" for a desktop, may be a smart idea for a cell phone
or a camera. I don't wish to appear condenscending or anything, but
scientific papers are written on this very topic, you know, and there
are people who specialize in how to transform the existent algorithms
so they consume less power. Multiplications & divisions are often the
key targets for elimination in these transformations. Since your
conversational manners in this thread haven't been exactly exemplary, I
will skip the relevant links and let you enlighten yourself on that
subject on your own.

David A. Scott

unread,

Nov 21, 2005, 10:54:22 AM11/21/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132578897.5...@g14g2000cwa.googlegroups.com:

>
> If you think someone can do it, code via AC only the less frequent

> symbol and just skip over the most frequent symbol and still use the

> single O(n^2) table for all probabilities, post a link to a paper

> showing that and I will show you where the error in their proof was.
>
>

How about you stating what you feel a typical binary zero order
based arthmetic coder would code one thousand bytes of all zero
followed by one thousand bytes of all ones. Acatually a good zero
order binary bit compressor would as a limit compress any permutation
of the above bit string to the same length file give or take one byte.
How good can your coder do. Or is it that you can't do something this
simple?

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

Willem

unread,

Nov 21, 2005, 11:20:53 AM11/21/05

to

nightlight wrote:
) <snip>
) ... Since your
) conversational manners in this thread haven't been exactly exemplary, I
) will skip the relevant links and let you enlighten yourself on that
) subject on your own.

I assume that answering simple questions with several paragraphs of
in-depth analysis of the QI coder, most of which is irrelevant to said
question, is exemplary conversational manner ? I do admit that politicians
do this all the time, but I personally would not take example from them.

nightlight

unread,

Nov 21, 2005, 11:22:28 AM11/21/05

to

> How about you stating what you feel a typical binary zero order
> based arthmetic coder would code one thousand bytes of all zero
> followed by one thousand bytes of all ones.

That question (besides being trivial) was already discussed in this
thread... see the post:

http://groups.google.com/group/sci.math/msg/a656f2937e768fa3

David A. Scott

unread,

Nov 21, 2005, 3:29:26 PM11/21/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132590148.6...@g43g2000cwa.googlegroups.com:

Yes the question was trival on purpose to see if you had code
to actually do something. I see by the anwser its not yet quite
there. if it ever gets working and debuged it would be interesting
to see how well it can compress certain files. If your lucky it
might to better compression for a few files but then again it
might not. The proof is in the actual results not in your paper.

nightlight

unread,

Nov 21, 2005, 5:26:25 PM11/21/05

to

> Yes the question was trival on purpose to see if you had
> code to actually do something. I see by the anwser its
> not yet quite there.

Of course there is plenty of code which compresses and
expands correctly arbitrary data. Each of those test points
listed in the preprint has 500 runs over random inputs, with
decoding output compared to input. The last decoding glitch
(where decoder output didn't compare to input) was fixed nearly
a year ago.

The preprint should also be enough for anyone with a bit of math
knowledge to understand the constructive proof of decodability and
the worst case redundancy derivation. The preprint & the algorithm
have been already discussed via emails with number of highly
competent people in the coding field (including Rissanen, the
inventor of arithmetic coding), and the presentations, with the
demo coder with a DLL to keep and play with, were given to
several companies.

It is still a research project, too, in the most interesting areas,
such as the whole field of modeling the universal/descriptive way
(the practical forms of Kolmogorov/MDL approach to modeling).
There is also a video codec being develped, using the full power
of QI, not just the coding but the more powerful new modeling. At
the moment these two projects have higher priority than the
publicizing of the algorithm or releasing the reference source
(which are just sideway offshoots from the main projects, as
time allows).

> if it ever gets working and debuged it would be interesting
> to see how well it can compress certain files.

The reference implementation, which performs at least as well as
the test figures show for the (unoptimized research) prototype code,
will be released within the next 2-3 months on the QI page.

David A. Scott

unread,

Nov 22, 2005, 1:03:59 AM11/22/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132611985.5...@z14g2000cwz.googlegroups.com:

>> if it ever gets working and debuged it would be interesting
>> to see how well it can compress certain files.
>
> The reference implementation, which performs at least as well as
> the test figures show for the (unoptimized research) prototype code,
> will be released within the next 2-3 months on the QI page.
>
>

In your paper you talk about a test input string of
00101001 since you seemed to pick it for an example.
What would this string compress to. That is if its not
to hard solve. I am trying to see what you get.

nightlight

unread,

Nov 22, 2005, 3:57:33 AM11/22/05

to

> In your paper you talk about a test input string of
> 00101001 since you seemed to pick it for an example.
> What would this string compress to. That is if its not
> to hard solve. I am trying to see what you get.

The answer is on page 6, where it shows the string index
I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
bits of the index is L=log(56)=5.807... bits since the valid values of
the index are 0..55 (there are 56 paths from A to B). The fractional
bits of L don't normally go to waste since they are coded in the mixed
radix with other items sent, usually with the fractional bits of
indices for other blocks (cf. p.9 [N4]). The encoder also sends the
count of 1's per block, which is 3 here and the length of the entire
string which is 8. The latter two items get coded in variety of ways in
different variants and different parts of the coder/s and experimental
prototypes (cf. p.9 [N6]).

David A. Scott

unread,

Nov 22, 2005, 9:37:06 AM11/22/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132649853....@g14g2000cwa.googlegroups.com:

Again that is not a complete anwser!!!
I see the 43 so you are claiming that along with a 3 for the number of ones
and along with a length indicator 8 that that you could encode this
complete string. But given the task to write this out as an complete
compression how would you do it. Do you write 3 universal numbers such
as 011 for 3 and 0001000 for the 8 and 00000101011 for 43
this is already longer than the string you started with.
A good arhtimetic zero order string compress easly beats this.
Could you at least finish your example to show how wonderful
your method is or is the string you choose to hard to deal with?

Sachin Garg

unread,

Nov 22, 2005, 10:30:51 AM11/22/05

to

I feel the bijective winds flowing :-)

Are you not trying to twist this discussion towards how his scheme
should be bijective to save these few more bits?

Sachin Garg [India]
http://www.sachingarg.com

David A. Scott

unread,

Nov 22, 2005, 10:46:35 AM11/22/05

to

"Sachin Garg" <sch...@gmail.com> wrote in
news:1132673451.5...@f14g2000cwb.googlegroups.com:

Why would I do that he claims its already optimal ;-)
And he had experts look at it ;-)

>
>
> Sachin Garg [India]
> http://www.sachingarg.com

Hope things have been well with you I have been sick alot this
year hope the next is better.

>
>

The guy stated it was better than arithmetic I am trying to see
how it would compress even his own example. Getting it to 3
universal codes is not proof of anything. Besides
00101001 it should compress any string of 8 bits with 3 ones to
roughly the same size. Not sure it even does that. The point is
a properly coded arithmetic is hard to beat there are no gaps or
unsed codings. Not sure he wants to or even can completely code a
simple example that was based on his on choosing.

nightlight

unread,

Nov 22, 2005, 12:21:06 PM11/22/05

to

> A good arhtimetic zero order string compress easly beats this.

Not really. The Moffat98 coder outputs for that string 13 bits (in
"frugal bits" mode, hardwirded binary coder, no contexts) which is the
best it can do. One can, of course, rig an arithmetic coder to produce
fewer bits for such very short inputs, but that will cost it on longer
inputs. (So you don't need to feel compelled to tell me about some
coder xyz which puts out 8 bits for S8 or some such.)

> Could you at least finish your example to show how wonderful
> your method is or is the string you choose to hard to deal with?

That has no more relation to "my method" than the variable names I used
in the code. Even the index 43 is in the domain of conventional
enumerative coding (which was known at least since 19th century), thus
it has nothing to do with QI. A variant of QI which uses Fenwick's PE
(Punctured Elias) codes to encode the total input length and tapered
Huffman codes for counts of 1's, outputs 1+3+6=10 bits. The first few
of PE codes

[ see: http://citeseer.ist.psu.edu/fenwick96punctured.html ]

are listed below with their mapping to the total count N of input bytes
(this is the same input granularity, 1 byte, as used by the Moffat98
coder):
--------------------------
N Code
-------------------------
1 = 0
2 = 101
3 = 1001
4 = 11011
5 = 10001 ... etc.
------------------------

In this case N=1, and the code for the input byte count is a single
bit: 0. The count of 1's is coded as tapered Huffman code (which
encodes number k from 0..8 in two sets, 0..6 in 3 bits and 7,8 in 4
bits), thus it uses 3 bits for k=3. Hence the total output is 1+3+6
bits. (There are, of course many trivial ways to encode this S8 in
fewer bits, down to a single bit code.) In any case, none of this time
and bandwidth squandering chicken feed is "my method".

Sachin Garg

unread,

Nov 22, 2005, 4:35:10 PM11/22/05

to

Best Wishes.
I hope it wasnt anything serious.

> The guy stated it was better than arithmetic I am trying to see
> how it would compress even his own example. Getting it to 3
> universal codes is not proof of anything. Besides
> 00101001 it should compress any string of 8 bits with 3 ones to
> roughly the same size. Not sure it even does that. The point is
> a properly coded arithmetic is hard to beat there are no gaps or
> unsed codings. Not sure he wants to or even can completely code a
> simple example that was based on his on choosing.

As far as compression ratios are concerned, with enough word-size AC
produces *very* close to theoritical SUMi(log2(1/Pi)) bits, which is
the best expected. I havent yet gone into details of what differences
it causes to how model is managed or updated, there might be something
to loose or gain there.

But if this new scheme can speed things up as claimed, it will ofcourse
be good. We will know when we see code, discussions never really end
:-)

I have made a post on this at http://www.c10n.info/archives/251
There are some more answers there.

nightlight

unread,

Nov 22, 2005, 5:47:08 PM11/22/05

to

> As far as compression ratios are concerned, with enough word-size
> AC produces *very* close to theoritical SUMi(log2(1/Pi)) bits,

The redundancy of AC is roughly: R = r/n + 1/2^r bits/symbol, see
[27] p. 51:

27. J. Rissanen "Arithmetic codings as number representations"
Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979
http://www.1stworks.com/ref/ariNumRepr.pdf

Thus increasing of coder precision r (in bits), decreases the 2nd term
but increases the 1st term (the overhead of termination). The optimum r
is around log(n), where n is number of symbols in the input, which
gives optimum redundancy of R=log(n)/n + 1/n.

The EC/QI redundancy is 1/2 log(n)/n +1/n (which is also the
theoretical lower bound, i.e. no coder can do better than that), thus
the QI output is about 1/2 log(n) bits smaller than AC output at the
same precision r. This is roughly the AC output excess which shows up
in our tests and which is mostly visibile in the top rows where the
total output is small (low entropy limit). The row "Vary" is an
entirely different effect -- the fragility of the predictive AC
approach when trying to predict what can't be predicted in order-0 vs.
the robustness of descriptive EC/QI approach (in the same order-0
modeling).

nightlight

unread,

Nov 23, 2005, 12:52:50 AM11/23/05

to

> K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))
>
> bits, where I write N? to mean N^N.
>
> Arithmetic coding is asymptotically as efficient as enumerative coding,
> since O(N?) = O(N!), but enumerative coding will always use fewer bits,
> since N! < N? for interesting values of N.

Roughly correct, except that it is not as bad as the difference N^N vs
N! ~ (N/e)^N since factor e in N! cancels out in the binomial
coefficients. The distinction left is the Stirling approximation itself
and subsequent dropping of combined sqrt() factors and the O(1+1/n)
factor, both smaller than 1, yielding thus larger addend than exact
EC/QI addend (and the final path count), with excess of about 1/2
log(Pi N pq) bits (p and q a probabilities of 1 and 0). You can see
exactly how these approximations work out and how to get AC by
approximating EC/QI in [23] pp. 19-26. Note that these AC
approximations are _before_ any precision reductions. The precision
reduction itself also works tighter with QI (because it is done bottom
up instead of top down as in AC), yielding 1/2^(r-1) which is half of
the corresponding redundancy for AC 1/2^(r-2).

The small (when compared to input size as usually done, but possibly
large when compared to oputput size, which is the relevant comparison,
the one that matters) additional redundancy for AC is only one among
multiple adverse side-effects of the AC approximation (cf. preprint,
footnotes on p. 2. and [23]).

David A. Scott

unread,

Nov 23, 2005, 12:54:22 AM11/23/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132680066....@f14g2000cwb.googlegroups.com:

>> A good arhtimetic zero order string compress easly beats this.
>
> Not really. The Moffat98 coder outputs for that string 13 bits (in
> "frugal bits" mode, hardwirded binary coder, no contexts) which is the
> best it can do. One can, of course, rig an arithmetic coder to produce
> fewer bits for such very short inputs, but that will cost it on longer
> inputs. (So you don't need to feel compelled to tell me about some
> coder xyz which puts out 8 bits for S8 or some such.)
>
>

Well the Moffat98 coder which is most likely not optimised for use
in compressing strings may not be a good source for the estimate of
13 bits. So your wrong once again a good zero order string compressor
can easily beat this. Using one that is really more for crypto than
for speed one gets
00101001 mapping to 1100011100 which is 8 bits to 10 bits still
10 bits is significantly less than 13.
if you use a string of 8 with less ones such as
00101000 it maps to 110010001 which is 9 bits
if you you drop down to 1 one such as
00100000 is maps to 1100111 which get us finally to 7 bits
and if you change all the ones to zeros you get
00000000 to 1000 4 bits
This is not a rigged compressor it is one based on 64 bit registers.
I think you lack an understanding of how arithmetic codeing can really
work. But then I guess that was obvious in you first post when you said
"The new algorithm, even the exploratory unoptimized
prototype runs much faster (see the table below, copied
from the of paper, with speedup factors over 200) and
always compresses better than the best of arithmetic
coders"
It clearly does not always compress better than the
best. But I guess you have made up your mind and don't
want the facts to get in your way. I will look again if
you ever get a working model in the next few years.

nightlight

unread,

Nov 23, 2005, 2:22:11 AM11/23/05

to

> So your wrong once again a good zero order string compressor
> can easily beat this. Using one that is really more for crypto than
> for speed one gets 00101001 mapping to 1100011100 which is 8
> bits to 10 bits still 10 bits is significantly less than 13.

You're welcome to take Moffat98 code (use frugal bits and F=27 or 26
and make sure to extract their raw bit count of the output, not the
byte rounded figure) at:

http://www.cs.mu.oz.au/~alistair/arith_coder/

and report results for a bit wider sample of input sizes than the 1
instance of 1 byte input you seem to be focused on. Take the same sizes
& densities used in the test chart QI vs AC, use 500 random samples or
so per point, and report how well does your coder do then. Taking 1
value of 1 byte as your compression sample is, to be frank, ridiculous
as a way to compare the compression algorithms (or to waste readers'
time or my time on such vacuous discussion).

> But then I guess that was obvious in you first post when you said
> "The new algorithm, even the exploratory unoptimized
> prototype runs much faster (see the table below, copied
> from the of paper, with speedup factors over 200) and
> always compresses better than the best of arithmetic
> coders"
> It clearly does not always compress better than the
> best.

The AC is an approximation of EC/QI and it approximates it by uniformly
increasing the sizes of coding addends (see the two last messages above
& cited refs). Hence, when all else is set the same (such as arithmetic
precision & model/side info available to coders, encodings of data
sizes etc), i.e. when apples to apples are compared, QI will always
produce smaller or at worst equal output to AC (where the "equal" case
occurs alsmost exclusively when one rounds the output to the whole
number of bits, in which case both round up to the same number of whole
bits).

This is not a matter of tests or benchmarks but of matematical
properties of the two algorithms that are shown in the preprint (and in
the cited tech reports). The test charts were put in the preprint to
illustrate the point not to prove it. If you wish to disprove
something there, you'll need to read and grasp that bit of elementary
math (which my 7th grade daughter was able to understand; well, she did
get a clean 800 on the math SAT she took a year after that).

If you think your coder is better than Moffat98 AC, show it. Present
your chart with the wider (than the ridiculous 1 instance of 1 byte)
input sample, at least the sizes & densities I listed in my chart, and
make sure _all is counted_ the same way as in the Moffat98 i.e. no size
or counts information is passed to your coder without counting it
within your total output size -- all such 'side information' you may be
passing to your decoder has to be accounted for. The test loop simply
calls expand loop with no sizes (expanded or compressed) given, but
with just a pointer to the compressed data (hence the data sizes
encoding must be a part of the compressed package & counted in the
compressed size to compare to Moffat98 AC).

Ignorant

unread,

Nov 23, 2005, 5:25:55 AM11/23/05

to

Last heard, Intel is looking for terribly resosource wasting
ANDALSO hungry cpu
intensive algorithms for their next generation dies. They surely will
be interested in
this new algorithm . Get someone gainfully employed with INTEL to be
intersted in
this %^&* new method.

David A. Scott

unread,

Nov 23, 2005, 9:24:47 AM11/23/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132730530.9...@z14g2000cwz.googlegroups.com:

>> So your wrong once again a good zero order string compressor
>> can easily beat this. Using one that is really more for crypto than
>> for speed one gets 00101001 mapping to 1100011100 which is 8
>> bits to 10 bits still 10 bits is significantly less than 13.
>
> You're welcome to take Moffat98 code (use frugal bits and F=27 or 26
> and make sure to extract their raw bit count of the output, not the
> byte rounded figure) at:
>
> http://www.cs.mu.oz.au/~alistair/arith_coder/
>
> and report results for a bit wider sample of input sizes than the 1
> instance of 1 byte input you seem to be focused on. Take the same sizes
> & densities used in the test chart QI vs AC, use 500 random samples or
> so per point, and report how well does your coder do then. Taking 1
> value of 1 byte as your compression sample is, to be frank, ridiculous
> as a way to compare the compression algorithms (or to waste readers'
> time or my time on such vacuous discussion).

Frankly taking you word on anything seems a bit a waste of time.
I used YOUR EXAMPLE. it was for a binary 2 state coder. I showed what
a 2 state arithmetic coder could do to YOUR STRING even though it was
not designed for it. It did 23% better than what you stated modffat98
code would do. I am not going to write some crap for an article any
idoit can do that if they get the herd to blindly approve of it.

I also showed what happened when the ones where replaced by zeros
the output string got samller again it was YOUR EXAMPLE.

Sencondly the methods not for strings limited by 8 bits as you imply
it works for any number of bits from one to whatever bit size you can
count to.

What you fail to see this method is a complete String Space mapping
that is bijective every String has a compressed 2 state arithmetic string
replacement and visa versa. Since its a total mapping of the String Space
every String is used. And very String can be thought of as either a
compressed string or not as a compressed string. Your methods if you ever
get code for them and thats a big if. Can never compress every string
better. The best you could hope for if you get your method to work is that
it might lead to a different bijective mapping of the String Space to the
String Space. As far as I know Moffat code never was intended to be
fully bijective so whatout extreme modifaction would not easily lead to
a full bijective mapping of complete String Space to String Space so I see
why you might be blinded to thinking otherwise it happens all the time.
Godd luck on attempting to get your method working but my personal feeling
is that it will not replace arithemtic codings unless you can show real
results hand waving will not do it.

If you use another sample for "One" "Zero" strings I could post back
the compressed string results or I could give the code to Mark or Sachin
then they could run hundreds of cases that are hundreds or even thousands
of bits long they don't have to be multiples of 8. That way you don't have
to worry about special modes for specail cases.

Ben Rudiak-Gould

unread,

Nov 23, 2005, 10:49:29 AM11/23/05

to

Phil Carmody wrote:
> I'm confused - what on earth is the point of such a header in
> news-posting context?

I dunno. Ask Thunderbird. Just be glad it doesn't look like this one:

<http://groups.google.com/group/linux.debian.user/msg/00c02868c4898642?dmode=source>

> And why is a cantabrigian prioritising en-us above en?!?!?

That's what my laptop speaks. Like me, it's studying abroad.

-- Ben

Phil Carmody

unread,

Nov 23, 2005, 7:09:41 PM11/23/05

to

Willem <wil...@stack.nl> writes:
> Are you familiar with the concepts of CPU caches ?
> A cache miss on most CPUs is easily as costly as a multiply.
> On modern CPUs it may get near the cost of a division.

"as costly as a multiply" amuses me. One could execute 50 multiplies
in the space of a cache miss on several of the modern processors I
use for everyday use.
Hmm, about 200 madds on one of the architectures, I'm familiar with.

Phil
(disclaimer - I work at a semiconductor company, not everything I'm
familiar with would be familiar to everyone)
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow

Phil Carmody

unread,

Nov 23, 2005, 7:19:37 PM11/23/05

to

"nightlight" <night...@omegapoint.com> writes:
> > Are you aware of the fact that a multiplication table is
> > a very stupid idea on any recent CPU ?
>
> Depends what you call "any recent CPU". For battery powered, embedded
> processors (and there are more of those than deskotops; and they're as
> "recent"), the absence of multiplications and divisions in a coder is
> quite an important element for stretching the battery life. What may
> be a "stupid idea" for a desktop, may be a smart idea for a cell phone
> or a camera. I don't wish to appear condenscending or anything, but
> scientific papers are written on this very topic, you know, and there
> are people who specialize in how to transform the existent algorithms
> so they consume less power.

Answer yes or no -
Do you work for one of the leading providers of processors for
"cell phone or camera" applications?

If not - consider yourself fatally condescended.

You appear, quite frankly, to be just plain wrong in your knowledge
of what modern processors actually do.

Phil

Phil Carmody

unread,

Nov 23, 2005, 7:32:22 PM11/23/05

to

Willem <wil...@stack.nl> writes:
> nightlight wrote:
> ) <snip>
> ) ... Since your
> ) conversational manners in this thread haven't been exactly exemplary, I
> ) will skip the relevant links and let you enlighten yourself on that
> ) subject on your own.
>
> I assume that answering simple questions with several paragraphs of
> in-depth analysis of the QI coder, most of which is irrelevant to said
> question, is exemplary conversational manner ?

I'm glad you asked me that question. But before I answer it, I must
first say >sound of strangling<

Phil Carmody

unread,

Nov 23, 2005, 7:37:58 PM11/23/05

to

"nightlight" <night...@omegapoint.com> writes:

> >> This is incorrect. Check the MQ and QM coder family. Typical
> >> implementations have about the same complexity and are fine
> >> implementations of arithmetic coding.
>
> It is correect if you restrict the statement to full precision
> arithmetic coders, as I did, (thus the Moffat et al 1998 coder is the
> best of full AC implementations). The quasi-arithmetic coders which
> sacrifice 5-10% of compression for speed are dime a dozen.

But what about the quasi-arithmetic coders that only sacrifice 1%
or less of compression for speed? You seem to _conveniently_ ignore
anything which counters your claims. Sure, you throw in a few
references in order to /appear/ credible, but if you're ignoring
the real world then you're just a hollow vessel.

Phil

Phil Carmody

unread,

Nov 23, 2005, 7:39:28 PM11/23/05

to

"nightlight" <night...@omegapoint.com> writes:
> >> The sequence "Vary" was a sequence similar to that, since the negative
> >> half of that sequence is mostly 1's and the positive part mostly 0's.
> >
> > What do you mean by "negative half" and "positive part"?
> > Applying the concept of sign to sequences of bits is, erm, unconventional.
>
> The negative 32 bit integer, say -1 has 32 1's.

Can I carve that on your gravestone?

Soon. Please.

Phil
(I'm, nothing if not polite)

Matt Mahoney

unread,

Nov 23, 2005, 8:11:22 PM11/23/05

to

nightlight wrote:
> > How about comparing QI with a faster and more efficient binary
> > arithmetic coder such as the fpaq one (see
> > http://www.cs.fit.edu/~mmahoney/compression/ for more info).
>
> QI is a fundamental algoritm not a program (although several research
> variants of the algoritham are implemented). If your algorithm needs to
> do more work than about 3 adds, 1 shift and two 1-dim array reads, per
> less probably symbol, and any operation at all (other than traversing
> the array to examine what the symbols are) per more probable symbol,
> then it is inherently slower algorithm than QI. Any arithmetic coder
> falls in this category, since a general purpose AC needs to encode 0's
> and 1's explicitly. Check the simplicity of the generic EC/QI coding
> loop (from p. 8 in [23])

>
> 23. TR05-0625 "Quantized indexing: Background information"
> http://www.1stworks.com/ref/TR/tr05-0625a.pdf
>

> I=0; // init cumulative path index
> for(k=n=0; n<M; ++n) // M is the number of input bits
> { // or a block size
> if (getbit(n)==1) // get bit at 0-based offset n
> { // skip on bit=0, add on bit=1
> ++k; // increment count of 1's found
> I=I+C(n,k); // add binomial to the path index
> }
> } // k is now the final count of 1's
>

I think this is an interesting coding technique. QI and AC are both
very close to the Shannon limit in compression ratio so I think the
only real issue is speed. For QI, there are two issues. First, I
think looking up C(n,k) will be slow if the table is large due to cache
misses. I don't see how this can be avoided because each table entry
is used only once.

Second, you have to reduce the input to a static order 0 model. You
could do this, for example, with a binary BWT + MTF (which is just xor
with the previous bit of the transform). But in this form there is an
optimization for arithmetic coding that would make QI less competitive.
If multiplication is expensive, one speedup trick is to use table
lookup:

(low, high, p, bit) -> (low, high, output)

where p is the bit probability. Since there are 3 dimensions (not
counting the 2 values of bit) you have to use a very course
quantization to keep the state table size small, so you sacrifice
compression. But if p is fixed, you can create a table with 1 less
dimension and use a higher resolution for low and high.

Suppose instead you compute the range in high resolution without
tables:

mid = low + p*(high - low);
if (bit) low = mid+1; else high = mid;

then you can use the fact that p is fixed to replace the multiplcation
with shifts and adds.

Notice also that the second statement can be coded with conditional
moves to avoid branch mispredictions, but that isn't possible (I think)
in the QI code. That's important because the branch is followed by a
long latency memory access. Even if the processor is good at branch
prediction it doesn't help because the bits are independent. The best
it can do is guess the "if" is always false (since it false > 50%) so
every table lookup will have been mispredicted.

-- Matt Mahoney

Matt Mahoney

unread,

Nov 23, 2005, 9:02:45 PM11/23/05

to

David A. Scott wrote:
> Phil Carmody <thefatphi...@yahoo.co.uk> wrote in
> news:87wtj2n...@megaspaz.fatphil.org:
>
> > "nightlight" <night...@omegapoint.com> writes:
> >> > First, do you understand the difference between adaptive order-0
> >> > and static order-0 models? Second, how would the "QI" and "AC" in
> >> > your tests perform if the input consisted of N/2 zeroes followed by
> >> > N/2 ones?

> >>
> >> The sequence "Vary" was a sequence similar to that, since the
> >> negative half of that sequence is mostly 1's and the positive part
> >> mostly 0's.
> >
> > What do you mean by "negative half" and "positive part"?
> > Applying the concept of sign to sequences of bits is, erm,
> > unconventional.
> >

> > Phil
>
> I guess that means if your given say a file with say one
> thousand bytes of zero followed by say one thousand bytes of all
> ones 0xFF he has no idea how his compressor or some arbitrary
> arithmetic would compress it. It really should not be that hard
> to test. But maybe he can't test it with his method yet.
>
>
> David A. Scott

You compress it like this:
Input bit string is 0{8192}1{8192} (x{n} means repeat bit x n times)
After binary BWT (right context, no EOF): 10{8191}1{8191}0, start=16383
After MTF (xor with last bit): 110{8190}10{8190}1
So the QI code is n=16384,k=4,I, where I is one of C(n,k) codes. The
size of this code is log2(C(n,k)) = log2(16384!/(4!16380!)) = about 52
bits.

-- Matt Mahoney

David A. Scott

unread,

Nov 24, 2005, 12:38:14 AM11/24/05

to

"Matt Mahoney" <matma...@yahoo.com> wrote in
news:1132797765....@g47g2000cwa.googlegroups.com:

Matt the number of bits you calculated is the number of bits to
represent the number of combinations of 4 ones and 16380 zeroes.
Unfortunately you would have to encode the 16383 for the BWT also
you would have to encode the length 16384 and the 4 for the ones.
It appears in the direction the guy is headed it would have to be
with some universal code since the numbers are concatenated together

so the 16383 16384 and 4 would have to be code with some universal
coder and these would add directly to the 52 bits so its worse than
your estimate. The very fact these number are seperate add to the
total overhead and would most likely make it worse than a proper
bijective combining of the numbers something he is not likely to
understand or do.

Just look at the example he gave at begining
00101001 there are 56 numbers your method calls that 5.8 bits
I took 10 however he needs the 8 for length and 3 for the ones
so 8 is 1000 4 bits and 3 is 11 2 bits that is 6 more bits for
a total of 11,8 but wait its worse than that these have to be
universal numbers so its even more bits. He could get clever
and instead of the 8 he could decide to encode the number of zeros
as well as ones then he could use 3 bits for the 5 this would
make a total of 10,8 But since he likes a special set of universal
numbers the count for both the 3 and 5 go up so now
we are in the 12+ territory. The fact you have to use universal numbers
at all adds to the length of compressed data.

There are ways to add number that use less space then the wasteful
universal numbers. The example I like to give is if your using BWT for
a file of 256 bytes you might think after adding a pointer that the average
length would be 257 bytes or more actaully its less than 257 bytes since
you need on the average less than 8 bits. You would have to look at
DSC to see that. Most people realize you need 8 bits for 256 values and
more if you go off on the univeral number tangent which is where someone
not interested in squeezing the last bit out might go.

> -- Matt Mahoney

nightlight

unread,

Nov 24, 2005, 1:08:17 PM11/24/05

to

> QI and AC are both very close to the Shannon limit in
> compression ratio so I think the only real issue is speed.

The difference (for (quasi-)static Bernoulli sequence, n symbols long)
is about 1/2*log(n) bits in favor of EC/QI. Hence, the significance of
the difference depends on the size of output. For short or low entropy
inputs the total output may be comparable to 1/2*log(n). You can notice
the effects of the smaller output sizes and the increasing QI gains vs
AC in the top rows and in the left columns of the test table. Such
situations are not as infrequent in coding as some might think, since
short inputs may occur in coding complex structures or data base
records, with lots of small/unpredictable pieces, or even for longer
inputs which vary rapidly and not very predictably (the abrupt changes
effectively reset the n relevant here to 0) while the low entropy
inputs, such as very long but very sparse bit arrays, are used by
search engines & data warehouses (the 'bitmap indexing').

Many of such cases are at present not compressed at all or are
compressed suboptimally using some ad hoc scheme jiggered for the very
specific problem. Consider sending alphabet mappings or color palettes
etc, where you need to encode permutation of n items, say n=256 one
byte alphabet. Just about any case I have seen over years would code
such mapping as a simple array of 256 bytes specifying the permutation.
The optimum solution, the factorial radix encoding, needs 1679 bits =
209.8 bytes which would save 46.2 bytes over the plain method. But
encoding such number in factorial radix would need 1679 bit wide
arithmetic, which is too much trouble to create just for this. Using AC
would work, too, but I haven't seen one used for this kind of problem,
probably because it would require too much tweaking of some delicate
and often cryptic AC source code to make it work optimally for the
constrained codes. The QI coding is simple: instead of binomial n^2/4
size table, one uses here a table of n entries which are factorials
1!..256! in the 'sliding window integer' (SWI) format, computed as
L[i]={L[i-1]*i} for i=1..256 and L[0]=1 (see [24] pp. 49-51). The QI
output is the enumerative index, which here means taking digits of
factorial radix Fi and computing the sum of products Fi*L[i], i=2..256,
where the multiplies are at most of 8 bit by 9 bit numbers (to get the
output which has the same number of integer bits, 1679, as the optimum
solution). Similarly, for a non-binary radix R one would use a table of
SWI powers {R^i} or for general mixed radix R[i], the n-1 entry table
has L[i]={L[i-1]*R[i]}. For coding trees, one would use table of SWI
quantized 'ballot numbers' T(n,k) (see [26] p. 78; this is a very handy
reference which covers many other useful combinatorial objects and
their un/ranking, all of which can now be computed n times faster and
using n times smaller tables via QI than via conventional algorithms).

Complex data structures occur often in lossless coding of large,
complex images or in video codecs, and the conventional EC has been
used in these areas (see refs from [A] p. 2). QI itself clicked
together in the attempts to improve the EC performance in a video codec
project. An early precursor of QI for a highly special case goes back 7
years, to a fun problem of trying to code every legal chess positions
optimally, in 155 bits/position (that method managed to reach 157
bits/position). An unintended byproduct of that little fun project was
a rediscovery of EC, but in an entirely unconventional visual
formulation (which resembled lattice QCD simulations from my physics
grad school days) and with all the conventions picked backwards (such
as colex instead of lex ranking, bottom up instead of top down, low to
high instead of high to low addend accumulation, which turned out to be
the key ingredients in cracking the decades old EC precision problem).

In cases of very long and very sparse inputs, by virtue of not using
any coding operations for the most frequent symbol (other than skipping
over; unlike AC which needs to code, renormalize interval, explicitly
for all symbols), QI runs at run-length coder speeds, while coding
tighter than RLC, and unlike RLC, QI is immune to any denser
clusters/segments which would clobber any plain RLC.

Another area where even the unlimited precision EC is advantageous over
AC or Huffman coding is "constrained coding" (used for low level media
codes), where EC has been used since 1950s (see [A] p. 2).

The availability of a truly practical algorithm for EC (or enumerative
combinatorics generally), in my view opens the universal/descriptive
modeling frontier (the Kolmogorov, MDL way, which is at present largely
in the realm of academic theorizing) to a gold rush for the new kind of
highly practical and effective modeling patterns. BWT is an early,
accidental precursor (and only half complete, the transform itself,
while the second BWT phase is still a kludge), of this new kind of
powerful algorithms. The new methods which will emerge here will turn
out far superior to the adaptive/predictive AC+PPM paradigm (aka
'probability of single next symbol' pinhole; cf. [23] pp. 26-36). As
always with these 'vision thingies', only time will tell.

In short, I think there is much more than just speed advantage.

> For QI, there are two issues. First, I think looking up C(n,k)
> will be slow if the table is large due to cache misses. I don't
> see how this can be avoided because each table entry is used
> only once.

For binary coding of quasi-stationary sources (which is just one of
many EC/QI areas of usefulness), the binomial table needs n^2/4
entries. It is sufficient to store only mantissa for each entry, since
exponents can be computed quickly from a much smaller table (via couple
adds from an array of size n containing log(k!); e.g. for 32 bit SWI
mantissa, this small table gives correctly not just the exponents but
also all but the lowest 10 bits of the mantissa).

For example, n=1024 bits/block table needs 1/2 MBytes for 17 bit
mantissa (16 bits stored, implicit leading 1 not stored), which will
produce total output excess of less than 1 bit per 64 Kbit input (in
practice less than half of that very high redundancy estimate d(g) =
log(e)/2^(g-1), cf. [A] footnote on p. 8).

In many applications n=256 may be perfectly fine, which needs 32 KB
table for 17 bit mantissa, or 16 KB table for 9 bit mantissa (with
output excess below 1 bit per 256 bit input). Any of these errors can
be made several orders of magnitude smaller, without lengthening the
mantissa, by storing only the corrections for the log(k!) table
predictions, which, from the measurements, are well under half the
length of the full mantissa (for up to n=2^14, at least).

Further significant reduction in table sizes is done by interpolating
the addends L(M) after skipping one more fronts (which are rows in
Pascal triangle) after each stored front ([24] pp. 29-30). For example,
from binomial relation C(2n+1,k) = C(2n,k) + C(2n,k-1), you can skip
every odd front, cutting down table size in half, at the cost of at
most one extra addition, or average cost of 0.5 additions per less
frequent symbol. Similarly you can cut table sizes by a factor m by
storing only every m-th front and recalculating the skipped ones (m=3
or 4 are useful cases). Note that with these extra additions, one
doesn't need to do SW rounding up of the interpolated addends -- it is
better to simply add e.g. the two terms, right into the output index
(eqs. (23),(24) in [A])without any loss of precision (provided the
decoder follows the same rule, obviously). That is not only faster, but
it cuts down on the redundancy per symbol by a factor m, or
equivalently it reduces the required mantissa size by log(m) bits.

Finally, if you really, really want to reduce the table size, say to
zero, and you don't mind doing as much computation as arithmetic coder,
you can compute the binomials (of the left neighbors, cf. [A] Fig 1 on
p. 4) along the whole lattice path, using C(n,k) = (n/k)*C(n-1,k-1) for
vertical steps and C(n,k) = [n/(n-k)]*C(n-1,n-1-k) for horizontal
steps, using SWI rounding for the results (hence all factors are
mantissa size integers). If you have read section on AC in [23] pp.
19-25, you realize by now, that if you toss the factors (n/k) to the
other side of these relations (thus reverse the path and go from the
endpoint B to origin A), in large n limit where the approximations in
[23] p. 22 will hold and where probabilities are p=k/n and q=(n-k)/n,
you recover none other but the arithmetic coding algorithm as a just
slightly less accurate (by 1/2 log(n)/n bits/symbol) approximation of
QI, and which also computes the addends entirely on the fly without any
tables, at the cost of multiplying the approximate binomials with p=k/n
on each vertical step and by q=(n-k)/n on each horizontal step. (Note
that the AC interval zooming is interpreted here as simple subtractions
from of SWI exponents.)

That realization should immediately nip in the bud any hope of getting
the full AC to perform anywhere remotely close to the speeds of the
table driven QI, noticing also that the table driven QI does not need
any coding operations for the most frequent symbol and that AC
explicitly does need such operations (that is how you get from one
binomial to the next along the path, you can't skip that). Namely, no
matter how much you streamline and unroll the per symbol work of the AC
(techniques which one can also use in QI, although the QI research
prototype tested in [A] p. 10 didn't use any such, while the AC [22]
did), it can't ever get near the QI's coding loop for the most frequent
symbol:

while (*++s==mfs);

which examines and "encodes" 32 or 64 of the most frequent symbol,
packaged as a machine size word mfs, in each loop step (VC 6 generates
here 3 machine instructions; with inline asm a single instruction will
do: repz scasd; for fairness, the latter wasn't done in the test coder
in [A] p. 10, otherwise the top speed factors would have gone to 300+).

> Second, you have to reduce the input to a static order 0 model.
> You could do this, for example, with a binary BWT + MTF (which
> is just xor with the previous bit of the transform).

You are imposing on EC/QI an artificial constraint -- that the AC
modeling paradigm must be obeyed to the letter i.e. that the modeling
engine must translate all it knows and all that it can extract about
the data in front of it, and convert it somehow, by hook or by crook
(with all the fragile zero frequency & escape probability adhockeries),
into the 'probability of the next single symbol', since that is how
under the conventional wisdom about AC interpretation one has to feed
the coder such info.

First, from the previous comments, it should be clear that the AC
algorithm need not be interpreted in the probabilistic language at all,
but one can simply view it as an approximation of the exact enumeration
that computes the enumerative addend tables on the fly. The values p &
q are, in this most direct interpretation, merely normalized plain
counts k/n and (n-k)/n of ones and zeros that the encoder simply
counted from the data and passed to the decoder in any suitable form
(with any encoding optimization assumed when prior information about
the counts is available). This direct interpretation of AC, in contrast
to the conventional wisdom, doesn't cast in stone any priori
assumptions what the infinite n limits of such normalized counts must
be or even that such limits must exist. That is something a modeling
engine should read out of the data and not impose on the data, as the
catastrophic failure of the conventional wisdom constrained AC in the
row "Vary" ([A] p.10) illustrates.

The concept of "probability" as used in the conventional AC Modeling
Paradigm has built in much larger assumptions about the data than is
necessary or useful. The rest of the ACMP: "of the next single symbol"
is similarly a gratuitous and completely unnecessary constraint, still
propagating somehow by conceptual inertia of the conventional wisdom,
which wasn't relevant in practical coding since the Morse telegraph era
and the coders with 6 bit or some such memory buffer limitation.

With the needless modeling preconceptions and pigeonholes out of the
way, QI can code exactly as AC does under ACMP: taking in a bit a time,
with probabilities p & q given and code it into the appropriate
enumerative range. The procedure involves lattice jumps and rescaling
of the SWI mantissa on each jump (to the destination binomial), which
reduces QI speed to that of AC, hence it is not worth the trouble. A
much faster way and more suitable for QI is to split the output into
separate streams so that the input symbols within given probability
uncertainty range p +/- dp get enumerated into the common stream (since
the uncertainty dp drops as 1/sqrt(n), the ranges get refined as the
coding progresses, thus more classes get split off; this branching is
similar to Rissanen's Context algorithm or to the newer Context Tree
Weighing scheme).

For quasi-stationary Markov sources of order m, QI would use the last m
symbols as the output stream selector (with optional stream merging and
later splitting based on n if the Markov process is empirically derived
and not given a priori).

For the general sources the best modeling engine for EC/QI is presently
BWT, the BW transform alone without MTF or other such, context
structure and boundaries obliterating, adhockeries (which don't buy you
anything in enumerative coding, anyway), and use QI to first encode the
exact optimum segmentation of the BWT output column, then encode
enumeratively each segment (which now belongs to a single approximate
enumerative class). Note that within ECMP (cf. [23] pp. 26-36), the use
of BWT is not a reduction to order 0 source (since no MTF is done), but
simply one among possible ways to empirically derive an input
segmentation (which is clearly not a literal contiguous segmentation of
the input sequence) into approximate enumerative classes. While there
is no such BWT based QI modeling engine functioning at present, the
work on it is ongoing (largely in service of the mentioned video codec
project, which has finally resumed after this QI discovery detour).

> Since there are 3 dimensions (not counting the 2 values of bit)
> you have to use a very course quantization to keep the state

> table size small, so you sacrifice compression....

Good observations there (and in the follow up). It is indeed very
difficult to get rid of the hardwired p & q=1-p in the AC addends,
which are A(K,L)=p^K*q^L (see [A] pp. 19-26) and the resulting 3
dimensions (p,K,L) for the tables.

If you keep optimizing the AC speed hard and long enough, though, the
omega point of such optimization is none other but the QI algorithm
(where you finally recover the delicate binomial relations which were
originally obliterated by the AC large n approximation, then you
reinterpret p & q as normalized counts, unwind recursions and rework
them into bottom up recurrence form eq. (22)-(23) in [A], so the
optimum quantization, eq. (21), can be used and you got it now working
with the 2 dimensional (K,L) tables of eq. (21) in [A}, reinventing the
QI algorithm in the process). As explained in [A] (pp. 1-2, 6, 8) AC
was an interim compromise solution of the EC precision problem.
Rissanen started it his first shot at a practical realization of
Kolmogorov's algorithmic approach to coding (inspired by the K's
visionary little gem [31] where the EC is described [personal
communication]; over the years he grew this idea into the MDL principle
[30],[32]), which after the initial partial success, arithmetic coding,
got stuck in a three decade long loop of small tweaks, due to the
unfortunate conventional EC enumerative conventions and the missing key
piece of the EC algorithm (the eq. (5) or (21) in [A]).

References

[A] R.V.Tomic "Quantized Indexing: Beyond Arithmetic Coding"
arXiv preprint: http://arxiv.org/abs/cs.IT/0511057

23. R.V.Tomic "Quantized indexing: Background information"
1stWorks TR05-0625, 1-39, 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

24. R.V.Tomic "Fast, optimal entropy coder"
1stWorks TR04-0815, 1-52, 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

26. F. Ruskey "Combinatorial Generation"
CSC-425/520 Univ. of Victoria CA,2003
http://www.1stworks.com/ref/RuskeyCombGen.pdf

31. A.N. Kolmogorov "Three approaches to the quantitative
definition of information" (in Russian)
Problems of Inform.Transmission, 1(1), 1-7, 1965
http://www.1stworks.com/ref/kolmogorov65.pdf

30. A. Barron, J. Rissanen, Bin Yu "The minimum description length
principle in coding and modeling"
IEEE Trans. Inform. Theory IT-44 (6), 2743-2760, 1998
http://cg.ensmp.fr/%7Evert/proj/bibli/local/Barron1998minimum.pdf
MDL Central: http://www.mdl-research.org/
Algorithmic Information Theory:
http://www.clrc.rhul.ac.uk/publications/publicationsoverview.htm

32. P.Grunwald, P.Vitanyi
"Shannon Information and Kolmogorov Complexity"
arXiv cs.IT/0410002, http://cul.arxiv.org/abs/cs.IT/0410002

For extended reference list with papers & books available
online see: http://www.1stworks.com/ref/qi.htm

nightlight

unread,

Nov 24, 2005, 1:41:15 PM11/24/05

to

> QI and AC are both very close to the Shannon limit in
> compression ratio so I think the only real issue is speed.

The difference (for (quasi-)static Bernoulli sequence, n symbols long)

> For QI, there are two issues. First, I think looking up C(n,k)

> will be slow if the table is large due to cache misses. I don't
> see how this can be avoided because each table entry is used
> only once.

while (*++s==mfs);

> Second, you have to reduce the input to a static order 0 model.
> You could do this, for example, with a binary BWT + MTF (which
> is just xor with the previous bit of the transform).

You are imposing on EC/QI an artificial constraint -- that the AC

modeling paradigm must be obeyed to the letter i.e. that the modeling
engine must translate all it knows and all that it can extract about
the data in front of it, and convert it somehow, by hook or by crook
(with all the fragile zero frequency & escape probability adhockeries),
into the 'probability of the next single symbol', since that is how
under the conventional wisdom about AC interpretation one has to feed
the coder such info.

First, from the previous comments, it should be clear that the AC
algorithm need not be interpreted in the probabilistic language at all,
but one can simply view it as an approximation of the exact enumeration
that computes the enumerative addend tables on the fly. The values p &
q are, in this most direct interpretation, merely normalized plain
counts k/n and (n-k)/n of ones and zeros that the encoder simply
counted from the data and passed to the decoder in any suitable form
(with any encoding optimization assumed when prior information about
the counts is available). This direct interpretation of AC, in contrast
to the conventional wisdom, doesn't cast in stone any prior

> Since there are 3 dimensions (not counting the 2 values of bit)

> you have to use a very course quantization to keep the state

> table size small, so you sacrifice compression....

Good observations there (and in the follow up). It is indeed very
difficult to get rid of the hardwired p & q=1-p in the AC addends,
which are A(K,L)=p^K*q^L (see [A] pp. 19-26) and the resulting 3
dimensions (p,K,L) for the tables.

If you keep optimizing the AC speed hard and long enough, though, the
omega point of such optimization is none other but the QI algorithm
(where you finally recover the delicate binomial relations which were
originally obliterated by the AC large n approximation, then you
reinterpret p & q as normalized counts, unwind recursions and rework
them into bottom up recurrence form eq. (22)-(23) in [A], so the
optimum quantization, eq. (21), can be used and you got it now working
with the 2 dimensional (K,L) tables of eq. (21) in [A}, reinventing the
QI algorithm in the process). As explained in [A] (pp. 1-2, 6, 8) AC
was an interim compromise solution of the EC precision problem.

Rissanen started it as his first shot at a practical realization of

Message has been deleted

David A. Scott

unread,

Nov 24, 2005, 2:56:28 PM11/24/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132860704.6...@g47g2000cwa.googlegroups.com:

Well in your first post

"always compresses better than the best of arithmetic

coders ". That statement could be considered as an insult
to the reader as well as a waste of time. Many people might
not read past that statement alone. But I was willing to give
it a chance. But I see I was wasting my time and yours. So
until or even if you actaully get working code with real files
I see these exchanges as leading to nothing productive. Your method
will only have value if it can actaully do real world compression.
Sorry that trying to focus on some narrow aspect caused you to
be so upset I only hope some day your stuff will lead to
something that can actually be tested. Testing with real code
beats hand waving anyday.

Message has been deleted

nightlight

unread,

Nov 24, 2005, 3:24:48 PM11/24/05

to

> Well in your first post "always compresses better than the best of
> arithmetic coders ".

That type of statement can be made about two compression algorithms A
and B, if one can deduce A as an approximation of B and show that the
coding addends of A are mathematically always greater or at best equal
to the adends of B, and similarly that B needs always fewer and simpler
arithmetic operations to encode a symbol (check ref. [23] pp. 19-26,
to see why that is so for the relation between AC and QI on both these
elements). Whether this or that exacutable program implementing some
forms of A or B performs as well as the respective algorithm
mathematically allows for, is unrelated to the fundamental
mathematical relation between the two algorithms. For all I know,
someone could produce in a month a QI implementation that is twice as
fast and compresses better (e.g. due to better counts coding) than the
prototype I used for the tests.

That same post, which started this thread and which you quote above,
didn't begin with "There is new compression program" but with "There
is a new algorithm for entropy coding...". When I put a reference
implementation on the QI page, you're welcome to object to and suggest
improvements to any particular encoding choice for the counts and
lengths, or for that matter, to variable and file names picked.

David A. Scott

unread,

Nov 24, 2005, 4:04:53 PM11/24/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132863888.2...@g43g2000cwa.googlegroups.com:

>
> That same post, which started this thread and which you quote above,
> didn't begin with "There is new compression program" but with "There
> is a new algorithm for entropy coding...". When I put a reference
> implementation on the QI page, you're welcome to object to and suggest
> improvements to any particular encoding choice for the counts and
> lengths, or for that matter, to variable and file names picked.
>
>

I hope you don't mind again after you get a test implementation
of your entropy coding working on files that someone shows you
a similar arithmetic coding that compresses all or most of your
test files to a smaller size. Of couse yours will run faster but
I predict it will not compress files as small as a good arithmetic
but that will only be tested in the future when you get some real
working code.

Willem

unread,

Nov 24, 2005, 4:27:37 PM11/24/05

to

nightlight wrote:
)> Well in your first post "always compresses better than the best of
)> arithmetic coders ".
)
) That type of statement can be made about two compression algorithms A
) and B, if one can deduce A as an approximation of B and show that the
) coding addends of A are mathematically always greater or at best equal
) to the adends of B, ...

This is only true if you make the assumption that the model suits the data
perfectly. However, any model is only as good as the number of bits used
to transmit it, so this assumption is false in any practical application.
To most people, this is basic knowledge, and the phrase "always compresses
better" therefore means "I don't know what I'm talking about".

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Message has been deleted

nightlight

unread,

Nov 24, 2005, 9:40:10 PM11/24/05

to

> However, any model is only as good as the number of bits used
> to transmit it, so this assumption is false in any practical application.
> To most people, this is basic knowledge, and the phrase "always compresses
> better" therefore means "I don't know what I'm talking about".

No, it only means the obvious -- all other things set equal, the
algorithm QI will always compress better than the algorithm AC, due to
the transparently deterministic and uniform mathematical relation
between their coding addends. Anything else is comparing apples and
oranges or blowing smoke. Whether some arbitrary implementations of AC
and QI satisfy the same relation is irrelevant for the plain
mathematical validity of the original statement (since it is trivial to
write an implementation of any given algorithm that performs worse than
the existent implementation of another algorithm).

Note also that the exact EC is known (since Davisson in the West, and
few years longer in the Russian literature, mostly from the work of
Kolmogorov school while the grandmaster, the big K, was still alive) to
produce the shortest encoding possible, at least for the
quasi-stationary Bernoulli and finite order Markov sources, which means
that QI, which is the nearest finite precision approximation of the
exact EC, is the lowest redundancy coder one can have at any given
finite precision (assuming all else set to equal, obviously). That is
not a matter of tests or benchamrks or programs but a simple
mathematical property of QI (cf. preprint pp. 7-8). As to the QI
speed,
even though its comparison to the best full AC implementation is one
sided to the point of being embarrassing to even report on that kind of
no-contest matchup, there is still plenty of room for the present QI
implementation speed and memory use improvements (e.g. better addend
tables & interpolation formulas, faster multialphabet coding, better
codings for counts & lengths etc). And of course, it needs a fully
functional general purpose Kolmogorov/MDL style modeling engine (such
as BWT). These things are all an ongoing work. With the public
release of the algorithm and the reference implementation to come out
shortly, there will be many more interesting and exciting advances on
this new coding & compression frontier from many talented programmers.

David A. Scott

unread,

Nov 25, 2005, 12:38:24 AM11/25/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132886410.7...@o13g2000cwo.googlegroups.com:

>> However, any model is only as good as the number of bits used
>> to transmit it, so this assumption is false in any practical
>> application. To most people, this is basic knowledge, and the phrase
>> "always compresses better" therefore means "I don't know what I'm
>> talking about".
>
> No, it only means the obvious -- all other things set equal, the
> algorithm QI will always compress better than the algorithm AC, due to
> the transparently deterministic and uniform mathematical relation
> between their coding addends. Anything else is comparing apples and
> oranges or blowing smoke. Whether some arbitrary implementations of AC
> and QI satisfy the same relation is irrelevant for the plain
> mathematical validity of the original statement (since it is trivial
> to write an implementation of any given algorithm that performs worse
> than the existent implementation of another algorithm).
>

I understand that you think your God or close to it. But I for one
would like to conduct a poll.

How many here that read this group agree with Willem
that the statement
"always compresses better"

means A "I don't know what I'm talking about"

or B the the nightlight is correct in what he claims it means

or C they are both wrong

I vote for Willem A any others willing to vote or are most people
bored with this when it was first stated that it always compresses better.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

Willem

unread,

Nov 25, 2005, 7:09:58 AM11/25/05

to

nightlight wrote:
)> However, any model is only as good as the number of bits used
)> to transmit it, so this assumption is false in any practical application.
)> To most people, this is basic knowledge, and the phrase "always compresses
)> better" therefore means "I don't know what I'm talking about".
)
) No, it only means the obvious -- all other things set equal, the
) algorithm QI will always compress better than the algorithm AC, due to
) the transparently deterministic and uniform mathematical relation
) between their coding addends.

It is trivially provable that if two algorithms map the same percentage of
their output space, then one cannot always compress better than the other.
The best of arith coders map 100% of their output space. Therefore your
statement is trivially false.

Ignorant

unread,

Nov 25, 2005, 9:00:51 AM11/25/05

to

Arithmetic coding works by range division and the proof of the
approach to entropy is
easy using induction over peano natural numbers. Probability
assumtions have to be made to derive even approx. excess coded length
at any step with a given range to distribute. [A very low probability
symbol has to have a code range of atleast 1 output symbol in the
allowed range].

Rissanens orginal paper is so unreadable that it may be only for
IBM thin heads.
The first 2 readable papers were by G,N,N martin and C.B. Jones. CACM
got the
credit because it was widely circulated by the community.

All said and done how do you measure entropy for a given string in
the first place?
It is dependent on the model ... and you can always blame the model if
the result
donot match expectations.
So, what is the bandwith being made noisy about.
go ahead, give a working program to try....It will be tried....

Message has been deleted

nightlight

unread,

Nov 25, 2005, 9:05:55 AM11/25/05

to

> It is trivially provable that if two algorithms map the same percentage of
> their output space, then one cannot always compress better than the other.

That is correct. But:

> The best of arith coders map 100% of their output space.

That is trivially false (provided that you haven't defined "the best of
arith coders" to be an empty set). Neither AC nor QI (nor any finite
precision coder) "maps 100% of their output space" (i.e. uses all of
the available output codes). The finite precision AC redundancy is
log(e)/2^(g-2) bits/symbol (for g bit arithmetic precision). The QI
precision redundancy is half that, R(g)=log(e)/2^(g-1) (see preprint,
p. 8).

With QI this precision redundancy appears as unused values at the end
of the indexing intervals due to the rounding up of the SW integers
(which are computed via eq. (21)). When compounded over multiple steps
of recurrence (21), these unused values at the ends of smaller
intervals, end up scattered as dark points over the next composite
interval. After larger number of steps, a large n interval will have
these holes scattered in a fractal-like way. The same kind of holes in
the coding space happen with AC, except that the truncation is imposed
there top down (from larger to smaller indexing intervals), which
requires even more (twice as much as for QI) code space to be reserved
(since the higher level intervals must provide for the worst case
expansions of the smaller intervals composing them).

The precision redundancy is only one of the two main redundancy
components. The second component arises when the de/coder doesn't know
the source parameters. Even with a stationary Markov source of order 0,
and binary alphabet, this information costs adaptive (and also the
static) AC 1/2 log(n)/n bits per symbol more than it costs QI. This
difference comes from the AC approximation of the binomials (unrelated
to the arithmetic precision g, hence present even with no truncation),
which is a majorization of binomials, thus it always results in larger
addends than the original binomials (cf. [23] 22-25). You can see its
effects in the test chart (p. 10) in the top rows (the low entropy
limit) and in the left columns (small n). There are also additional,
more complex but asymptotically smaller (in large n limit) components
of the AC redundancy due to approximate enumeration even at unlimited
precision, which are not counted above and which are absent in EC & QI
(cf.

M. Drmota, H.K. Hwang, W. Szpankowski
Precise Average Redundancy of an Idealized Arithmetic Coding
DCC-2002, 222-231, Snowbirds, 2002.
http://www.dmg.tuwien.ac.at/drmota/dcc.pdf )

In both main components, AC has uniformly larger redundancy than QI.
Hence it will always produce larger or at best equal output to QI (the
equal case is, except for few individual extreme points such as p=0 or
1, occurs only if you round up the output of both coders to the next
whole bit and both sizes fall onto the same higher bit boundary; if one
has mutiple such pieces, one can reclaim these fractions using mixed
radix codes, cf. preprint, p. 9, [N4]).

Note also that QI has all the precision caused holes in the code space
readily accessible in the quantized addend tables, hence these can be
reused by the modeling engine or by the coder itself (cf. preprint,
bottom of p. 8). The QI redundnacy R(g) gives the worst case loss of
code space, i.e. assuming that every interval has such hole at the end
of the index interval (the last index value unused). In practice (from
measurements, cf. footnote on p8), only about 1/4 of lattice points do
have such hole and the most found on any path is about 1/2 of the R(g)
value.

Now, in order to reclaim these unused codes for modeling Esc codes or
for coder uses, the consumer's task would be much simpler if the
pattern of the holes were more regular and predictable, e.g. it would
be best to have a guaranteed usable hole on every step. But since R(g)
already does assume such case (that every point has 1 unused code at
the end of the index interval), one can construct the quantized table
via eq. (21) with purposefully inserted holes, whenever they don't come
out from the quantization (which is about 3/4 of a time), and the R(g)
will still remain the same (while the worst case would not be affected
by this hole insertion, the average redundancy would increase; although
for all practical purposes, both types are invisible with QI's 32 bit
mantissa).

David A. Scott

unread,

Nov 25, 2005, 9:18:02 AM11/25/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132927554....@g43g2000cwa.googlegroups.com:

>> It is trivially provable that if two algorithms map the same
>> percentage of their output space, then one cannot always compress
>> better than the other.
>
> That is correct. But:
>
>> The best of arith coders map 100% of their output space.
>
> That is trivially false (provided that you haven't defined "the best
> of arith coders" to be an empty set). Neither AC nor QI (nor any
> finite precision coder) "maps 100% of their output space" (i.e. uses
> all of the available output codes). The finite precision AC redundancy
> is log(e)/2^(g-2) bits/symbol (for g bit arithmetic precision). The QI
> precision redundancy is half that, R(g)=log(e)/2^(g-1) (see preprint,
> p. 8).
>

Actually good arithmetic coders do map to all the output space.
Take files for example. A good arithmetic can treat any file as either
input or output file. 100% of the space is used. Just because you
seem to think they don't doesn't make it so. You have a very strange
defination of best if you exculde bijective arithmetic file compressors.
You need to wakeup and smell the coffee there is lots of stuff out there
for example arb255.exe is a working example of a completely bijective
256 symbol file compressor. You don't have to wait so called months
its already here.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

David A. Scott

unread,

Nov 25, 2005, 9:21:09 AM11/25/05

to

"Ignorant" <shu...@vsnl.com> wrote in
news:1132927484.9...@g14g2000cwa.googlegroups.com:

>
> Dear light in the dark,
> can you send a copy of your paper ..i would scan it .... faster
> or not is irrlevant.
> I call even huffman coding as a very specialised Arithmetic coder so..
>
> Mahesh Naik
> mumbai,india
>
>

Good huffman coding is a form of arithmetic and for some files striaght
huffman works best as an entropy encoder and sometimes arithmetic works
better.

nightlight

unread,

Nov 25, 2005, 9:43:01 AM11/25/05

to

> Rissanens orginal paper is so unreadable that it may be only
> for IBM thin heads.

Rissanen has a very unconventional way of thinking and uses strange
notation, so his papers are indeed a bit harder to read than typical
information theory papers. On the other hand, they contain rich insight
(often stated extremely concisely) and are a gold mine for later
re-reading, after you have already struggled with the same problem for
a while. You will always find valuable bits you missed before.

> All said and done how do you measure entropy for a given string in
> the first place? It is dependent on the model ... and you can always
> blame the model if the result donot match expectations.

To compare AC & QI coding efficiency, that doesn't matter, provided you
use the same model with each. In that case, when everything else is set
equal, the difference left is due the to the pure coding efficiency
differences. In the QI & AC situation, the answer is even simpler since
the AC coding addends are a direct approximation of the QI's coding
addends, and they are always larger (equal for special extreme points
p=1, or 0) than the QI's addends (since both factors dropped by the AC
from the exact enumeration addend are smaller than one, leading always
to a larger AC addend, cf. preprint footnote on p. 2).

nightlight

unread,

Nov 25, 2005, 10:02:08 AM11/25/05

to

>> It is trivially provable that if two algorithms map the same percentage of
>> their output space, then one cannot always compress better than the other.

That is correct. But:

>> The best of arith coders map 100% of their output space.

That is trivially false (provided that you haven't defined "the best of
arith coders" to be an empty set). Neither AC nor QI (nor any finite
precision coder) "maps 100% of their output space" (i.e. uses all of
the available output codes). The finite precision AC redundancy is
log(e)/2^(g-2) bits/symbol (for g bit arithmetic precision). The QI
precision redundancy is half that, R(g)=log(e)/2^(g-1) (see preprint,
p. 8).

With QI this precision redundancy appears as unused values at the end

nightlight

unread,

Nov 25, 2005, 10:02:48 AM11/25/05

to

>> Rissanens orginal paper is so unreadable that it may be only
>> for IBM thin heads.

Rissanen has a very unconventional way of thinking and uses strange
notation, so his papers are indeed a bit harder to read than typical
information theory papers. On the other hand, they contain rich insight
(often stated extremely concisely) and are a gold mine for later
re-reading, after you have already struggled with the same problem for
a while. You will always find valuable bits you missed before.

>> All said and done how do you measure entropy for a given string in
>> the first place? It is dependent on the model ... and you can always
>> blame the model if the result donot match expectations.

David A. Scott

unread,

Nov 25, 2005, 10:12:55 AM11/25/05

to

"nightlight" <night...@omegapoint.com> wrote in
news:1132930928....@g49g2000cwa.googlegroups.com:

>>> It is trivially provable that if two algorithms map the same
>>> percentage of their output space, then one cannot always compress
>>> better than the other.
>
>
> That is correct. But:
>
>
>>> The best of arith coders map 100% of their output space.
>
>
> That is trivially false (provided that you haven't defined "the best
> of arith coders" to be an empty set). Neither AC nor QI (nor any
> finite precision coder) "maps 100% of their output space" (i.e. uses
> all of the available output codes).

Actually it appears your to lazy to actually check the fact is
totally bijective file compressors exist and make 100% of file space.
I think its kind of strange that you make such statements when there
is working code that proves you wrong. What is it about working code
you don't seem to understand. again arb255.exe is a fully working
arithmetic file compressor that use the whole file space.

Yes ignore the facts quote some more old papers.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

Willem

unread,

Nov 25, 2005, 11:34:05 AM11/25/05

to

nightlight wrote:
) To compare AC & QI coding efficiency, that doesn't matter, provided you
) use the same model with each. In that case, when everything else is set
) equal, the difference left is due the to the pure coding efficiency
) differences. In the QI & AC situation, the answer is even simpler since
) the AC coding addends are a direct approximation of the QI's coding
) addends, and they are always larger (equal for special extreme points
) p=1, or 0) than the QI's addends (since both factors dropped by the AC
) from the exact enumeration addend are smaller than one, leading always
) to a larger AC addend, cf. preprint footnote on p. 2).

However, this will not always lead to AC generating more bits. If the
model does not perfectly fit the data, any bias inherent in an encoder
could happen to be such that it performs better, not worse.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

Willem

unread,

Nov 25, 2005, 11:44:34 AM11/25/05

to

nightlight wrote:
)> It is trivially provable that if two algorithms map the same percentage of
)> their output space, then one cannot always compress better than the other.
)
) That is correct. But:
)
)> The best of arith coders map 100% of their output space.
)
) That is trivially false (provided that you haven't defined "the best of
) arith coders" to be an empty set).

It is trivially true, because actual real-world arith encoders exist that
map 100% of their output space. There are also actual real-world huffman
encoders that map 100%, by the way.

nightlight

unread,

Nov 25, 2005, 12:08:13 PM11/25/05

to

The three papers are available (as pdf files) on the QI page:

http://www.1stworks.com/ref/qi.htm

Ben Rudiak-Gould

unread,

Nov 25, 2005, 12:14:17 PM11/25/05

to

Matt Mahoney wrote:
> Input bit string is 0{8192}1{8192} (x{n} means repeat bit x n times)
> After binary BWT (right context, no EOF): 10{8191}1{8191}0, start=16383
> After MTF (xor with last bit): 110{8190}10{8190}1
> So the QI code is n=16384,k=4,I, where I is one of C(n,k) codes. The
> size of this code is log2(C(n,k)) = log2(16384!/(4!16380!)) = about 52
> bits.

Thank you! I finally understand how his BWT modelling works.

-- Ben

LosTFx

unread,

Nov 25, 2005, 1:49:41 PM11/25/05

to

> The difference is about 1/2*log(n) bits in favor of EC/QI

I think i'm missing some points. Are we saying that on a 4GBytes file
we can save 2 bytes?

I tried to see what's the impact of a slow implementation of AC on a
standard compression program. I choose one paq6 variant, because it
uses 8 coding per byte and its AC also uses carry counting (not all
variant of course). I completely commented the AC coding part (modeling
but no coding and no output).
On my vaio laptop it saves about 0.1 sec every 2 minutes (500KB file).
I know that laptops have a very slow ram, i think that on a desktop it
can be twice as fast. I think that's why people keep their attention on
modeling instead of coding.

So it's my personal opinion that EC/QI is not a great improvement if we
use the old paradigm "modeling + coding". Maybe it can simplify and
speedup things in the MDL/Kolmogorov paradigm.
I must admit that I really dont know anything about MDL, I'm gonna read
something. And I'll wait some code given to humanity so I can pun my
hands on it and experiment...

-Fabio

Matt Mahoney

unread,

Nov 25, 2005, 1:57:10 PM11/25/05

to

nightlight wrote:
> > QI and AC are both very close to the Shannon limit in
> > compression ratio so I think the only real issue is speed.
>
> The difference (for (quasi-)static Bernoulli sequence, n symbols long)
> is about 1/2*log(n) bits in favor of EC/QI.

I understand the extra 2*log(n) for AC comes from the need to encode
the length of the uncompressed string (using a universal integer code
such as an Elias code) or implicitly by modeing EOF. However it is
also possible to use the length of the compressed string to encode
information about the original length and need only log(1/H) bits more,
where H is the compression ratio (e.g. 1 bit for a compression ratio of
0.5). Here is one example of how to do this using a bijective
arithmetic coder. http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html

Of course it is not usually worth the effort to save a few bytes unless
you have lots of small inputs. But if you have lots of small inputs
then maybe they are fixed length and you don't need to encode the
lengths at all.

In the PAQ coders it hardly matters. The arithmetic coder uses 1% to
0.1% of the CPU time depending on how much compression you want. The
rest is dominated by modeling, mostly cache misses. I measured the
compression ratio to be about 0.0001 bpc worse than ideal on typical
data using 32 bit arithmetic and 12 bits to represent probabilities.

Nevertheless I think you have an important contribution to coding
theory. It will be interesting to see if you can apply it to beat the
best BWT compressors like SBC and GRZipII in compression ratio and
speed.

-- Matt Mahoney

Ben Rudiak-Gould

unread,

Nov 25, 2005, 2:03:48 PM11/25/05

to

nightlight wrote:
> Roughly correct, except that it is not as bad as the difference N^N vs
> N! ~ (N/e)^N since factor e in N! cancels out in the binomial
> coefficients. The distinction left is the Stirling approximation itself
> and subsequent dropping of combined sqrt() factors and the O(1+1/n)
> factor, both smaller than 1, yielding thus larger addend than exact
> EC/QI addend (and the final path count), with excess of about 1/2
> log(Pi N pq) bits (p and q a probabilities of 1 and 0).

Indeed I was wrong, and I'm once again confused. If there are m zeroes and n
ones, then an ideal enumerative coder will divide the space of (m+n)!/m!n!
possibilities into pieces of size (m+n-1)!/(m-1)!n! and (m+n-1)!/m!(n-1)!
when encoding the first bit. But (m+n-1)!/(m-1)!n! / (m+n)!/m!n! = m/(m+n).
Isn't this the same as an unlimited-precision adaptive order-0 arithmetic
coder running backwards, "downdating" its model until it ends up with m=n=0
at the end? If so, how could an EC outperform an adaptive order-0 AC acting
on the reversal of the input data?

Never mind, I just figured it out. The downdating AC is different from the
updating AC in that it effectively updates the model *before* encoding the
bit, instead of after. In particular, the first (last) input bit is always
predicted perfectly, and you can't surprise the model (you never have to
encode a zero bit with m=0 or a one bit with n=0). That's very cool.

But a disadvantage is that you have to send the final state of the model, at
a cost of log(N^2 pq) bits. For the updating AC you only need to encode the
input data length, at a cost of log(N) bits. So it's not immediately obvious
that EC comes out ahead for arbitrary input data. Still, I agree that it's
more elegant, and may well be much faster. And who cares about overhead
that's logarithmic in N, anyway?

-- Ben

David A. Scott

unread,

Nov 25, 2005, 3:36:59 PM11/25/05

to

"Matt Mahoney" <matma...@yahoo.com> wrote in
news:1132945030.8...@z14g2000cwz.googlegroups.com:

>
> I understand the extra 2*log(n) for AC comes from the need to encode
> the length of the uncompressed string (using a universal integer code
> such as an Elias code) or implicitly by modeing EOF. However it is
> also possible to use the length of the compressed string to encode
> information about the original length and need only log(1/H) bits more,
> where H is the compression ratio (e.g. 1 bit for a compression ratio of
> 0.5). Here is one example of how to do this using a bijective
> arithmetic coder. http://www.ics.uci.edu/~dan/pubs/DC-Sec3.html
>
>

Matt I didn't see a bijective arithmetic coder there are you sure thats
the right pointer.
however Matt Timmermanns page
http://www3.sympatico.ca/mt0000/biacode/biacode.html
Does have the first true arithmetic bijective file encoder that is
on the web.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

Scott A Crosby

unread,

Nov 26, 2005, 2:35:00 AM11/26/05

to

On Fri, 25 Nov 2005 15:12:55 +0000 (UTC), "David A. Scott" <daVvid_...@email.com> writes:

>>>> The best of arith coders map 100% of their output space.
>>
>>
>> That is trivially false (provided that you haven't defined "the best
>> of arith coders" to be an empty set). Neither AC nor QI (nor any
>> finite precision coder) "maps 100% of their output space" (i.e. uses
>> all of the available output codes).
>
>
> Actually it appears your to lazy to actually check the fact is
> totally bijective file compressors exist and make 100% of file space.

So does the identity coder, e.g. 'cat'.

I don't see what 'map 100% of their output space' has to do with
coding efficiency.

Secondly, given a model of a set of data, its perfectly resonable to
say that one coder will always perform better than another.

How about proposing a concrete problem.

We have a file consisting of bits with a probability model $M$ of P(0
bit) = p, P(1 bit) = 1-p with no higher-order statistics, so all bits
are independent of each other. Assume that $p$ is already known and
that the length has been seperately encoded.

Then, ignoring trivial cases e.g. p=1, p=0, its clear that huffman
will never shrink the file under the assumption of bits coded one at a
time. Or, if huffman is allowed to encode multiple bits at a time, it
still will usually do worse than ARI. Also, ARI will always shrink the
file if p != .5.

Ignoring these special cases, ARI will always do better than
huffman. Also, ARI should come fairly close to the long-run entropy of
the file. $-E = p*log(p) + (1-p)log(1-p)$

Mor formally, for every epsilon, there should exist constants A,B such
that for any file matching the model $M$ of length $n$ encoded to
length $len$, with probability 1-epsilon it should be true that:

len < A*n + B

From information theory it must be that $A>=E$, but its believable
that E-A>0, because of round-off or quantization errors in any real
coder. Its also believable that B>0 because of round-off or
quantization errors near the end of the file. Does anyone know if
there's an arithmatic coder where for epsilon <.5, has constants $A=E$
and $B=0$?

Unless I'm misunderstanding nightlife, all he's saying is that he's
shaving the constants A&B compared to the best known arithmatic coder
by using a new paradigm. If he has done that, then I have a few
questions:

Can the algorithm be easily generalized to models with more
symbols. Eg, bytes instead of bits. Can it be generalized to handle
data from a higher-order process, or do you need the statistics to
be simplified down to order-0 with something like BWT. Is it faster
in these circumsances than ARI? Can you compare the A&B constants
with an ARI?

Also, nightlife, remember that this is a tough audience that
frequently gets claims about new paradigms in compression. A simple
working program that can create, encode, and decode the above files
using both your algorithm and an arithmatic coder would be much more
convincing. I hope to see such an example.

Scott

Willem

unread,

Nov 26, 2005, 4:25:46 AM11/26/05

to

Scott wrote:
) On Fri, 25 Nov 2005 15:12:55 +0000 (UTC), "David A. Scott" <daVvid_...@email.com> writes:
)
)>>>> The best of arith coders map 100% of their output space.
)>>
)>>
)>> That is trivially false (provided that you haven't defined "the best
)>> of arith coders" to be an empty set). Neither AC nor QI (nor any
)>> finite precision coder) "maps 100% of their output space" (i.e. uses
)>> all of the available output codes).
)>
)>
)> Actually it appears your to lazy to actually check the fact is
)> totally bijective file compressors exist and make 100% of file space.
)
) So does the identity coder, e.g. 'cat'.

True. And there is no compressor that is 'always better' than cat.
I'll even go so far as to say that 'cat' is the best compressor for
random data.

) I don't see what 'map 100% of their output space' has to do with
) coding efficiency.

The only criterion by which you can absolutely claim that a coder is
'always better' than another coder is when it maps more of its output
space. Which is what this discussion is about.

) Secondly, given a model of a set of data, its perfectly resonable to
) say that one coder will always perform better than another.

Only under the assumption that the model is perfect. In practice, it is
never even near perfect. Even in theory, it is only as good as the number
of bits needed to transmit it. Therefore, it is not at all reasonable.

) How about proposing a concrete problem.
)
) We have a file consisting of bits with a probability model $M$ of P(0
) bit) = p, P(1 bit) = 1-p with no higher-order statistics, so all bits
) are independent of each other. Assume that $p$ is already known and
) that the length has been seperately encoded.

I reject this problem, on the basis that it is constructed to make the
model fit the data perfectly.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

Malcolm Taylor

unread,

Nov 26, 2005, 4:38:58 AM11/26/05

to

> Can the algorithm be easily generalized to models with more
> symbols. Eg, bytes instead of bits. Can it be generalized to handle
> data from a higher-order process, or do you need the statistics to
> be simplified down to order-0 with something like BWT. Is it faster
> in these circumsances than ARI? Can you compare the A&B constants
> with an ARI?

It can be used for bytes by doing a binary decomposition.

It is my understanding that the enumerative coding method can only cope
with modelling switching sources. That is, the data must be split into
classes based on the statistics, and the classes are coded entirely
independantly.
For example, you could split the data according to the context (eg. one
class is all bits following '01' another following '10') - this
ironically leads to BWT, with the output segmented on context boundaries
(can also be done on bytes).

The downside with the enumerative method, is that it will never be able
to cope with complex statistical models that use mixtures or weighting
(eg. paq or the more recent PPM variants). Still, it doesn't really
provide any advantages for them anyway, so not too much of a loss.

I do look forward to the BWT based on it though. It seems ideally
suited, and if the claims of speed are correct, then we could have a BWT
with the speed of a huffman version, but the compression of the best
arithmetic versions. It could potentially provide a new efficiency
benchmark!

Malcolm

PS. David, please cut him some slack. He has only displayed the same
zeal for his ideas, that you yourself have displayed for bijective
coders. We should be applauding that, not trying to tear him down.

nightlight

unread,

Nov 26, 2005, 10:09:30 AM11/26/05

to

> It is trivially true, because actual real-world
> arith encoders exist that map 100% of their
> output space.

Before you dig yourself even deeper on this subtopic,
you would be well advised to study a bit what does
redundancy mean, such as the AC precision redundancy
log(e)/2^(g-2), and how does its presence affect the
utilization of the code space (read the rest of my
earlier post for a hint, since it is all much more
obvious with the QI bottom up SW integer rounding
and than note that AC addends and code can be obtained
by dividing all EC addends with the final binomial on
the lattice path, which merely normalizes the EC index
to 1.0; cf. [23] pp. 19-26). In particular, compare the
outputs of AC coders with different values of precision
g, including the limits of infinite g and the g=1 or 2,
than contrast what you will realize there about the code
space utilization of these different coders with your
present understanding { in which the redundancy, such as
that of any finite precision AC, paradoxically, has no
effect or apparently any relation at all with the output
code space utilization; consider a counter-example: code
an alphabet of 128 letters, but using symbol codes having
just 1 bit redundancy, hence you're coding using 8 bits
per symbol, then look what percentage of these 8 bit
codes are you using when coding these 128 letters }.

> There are also actual real-world huffman encoders
> that map 100%, by the way.

Huffman code is 'more optimal' than the finite precision
AC or EC (QI), provided one can build a Huffman tree
for an alphabet with 'letters' of size of the entire
input. Then the Huffman is absolutely the tightest code
one can have. Short of that ideal, the regular 32 bit
AC (or QI) is more optimal than Huffman coder limited
to code only in character size chunks of the input.

23. R.V. Tomic "Quantized indexing: Background information"
1stWorks TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

More at:

http://www.1stworks.com/ref/qi.htm
http://www.1stworks.com/ref/tr/tr05-0625r.htm

David A. Scott

unread,

Nov 26, 2005, 2:39:44 PM11/26/05

to

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:BaqdnUhjWdH...@rcn.net:

> > It is trivially true, because actual real-world
> > arith encoders exist that map 100% of their
> > output space.
>
> Before you dig yourself even deeper on this subtopic,

Do you even know what a bijective arithemetic file
compressor is. Which by the way actually exist or do
you only quote papers written by those who don't have
real world bijective file compressors and maybe aren't
even sure of there existance?

Can you actually anwser a question without quoting
a paper. Especailly when working code exists?

What is it about the fact that its easy to write bijective
arithmetic file compressors that you don't understand?

I will cut you some slack as some one suggested even
though you seen very hostile. I could make your
combinatorial coder bijective to files and it would
compress much better than where it seems to me your
headed. In the end would it be better than the arithmetic
the anwser is no. Each would compress some files better
and some worse which is the best that can done. Would it
be faster I don't know.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

David A. Scott

unread,

Nov 26, 2005, 3:31:35 PM11/26/05

to

Willem <wil...@stack.nl> wrote in
news:slrndogagq...@toad.stack.nl:

I agree its a bad test. A better test might be if a trusted individual
was found to do this is to produce random strings of bits from a source
that is accepted where there are no higher order statics and the p(0) = p
could vary. Note almost none of the strings would actually show the ratio
unless very long. The string could be from 1 bit to 10000 bits long.
Each program would have to be able to use any number of bits to compress
a string. Using ascii "1" and "0" could be used instead of real bits to
make the test more readable. They don't even have to make there code
bijective. Though the program should be able to compress maybe a lower
limit on the string length to give the other side half a chance.

The input could be a file of only ascii ones and zeroes same with the
output. The file names could be fixed and the test recreated on any
machine useing C. So others could verify the results. The code would
have to be checked so that one group does not use the hard limits if
say 10000 bits used what happens of 10001 used the code should be such
that the upper limits are at least in the millions.

The result might be that one coder works better than the other for
various values of p or for various lengths of strings. Obviously
when the number of ones and zeros close to the same and randomly
distributed the copy will be the best.

Also once compressed you have to show it uncompresses back to
the original string.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements