Quantized Indexing Source Code Available

Message has been deleted

nightlight

unread,

Dec 24, 2005, 7:36:18 AM12/24/05

to

Quantized Indexing is a new entropy coding algorithm
based on enumerative combinatorics of symbol sequences.
It codes typically 10-20 times faster (over 200
times in low entropy limit, and over 6 times in high
entropy limit) than the fastest arithmetic coders,
while compressing tighter for all input ranges.
The Quantized Indexing web page:

http://www.1stWorks.com/ref/qi.htm

contains preprints describing the algorithm and the C source code
for algorithms researchers. The source code contains several
variants of QI coding, such as generic binary, sparse binary, mixed
radix (inlcuding factorial radix and optimal permutations
encoder/decoder).

The sparse binary coder runs additional 2-5 times faster on the sparse
data than the generic QI coder (note that in our benchmarks against AC
we had used only the generic QI coder). The sparse coding useful for
compression of B&W images, outputs of other higher level coders (such
as BWT output), as well as for coding frequency tables of the multi-
block version of QI (or other entropy coders). It would be particularly
useful for encoding of data base and web search engine bit-map indexes
and the keyword incidence maps, which, at least for the major engines
such as Google, Yahoo, MSN are billions of bits long and extremely
sparse. With that type of data the generic QI, and especially the
sparse QI coder, run at the speeds of run-length coders (which is
2-3 orders of magnitude faster than AC or LZ type coders that are
often used in this role), but without the fragility of the RL or
adaptive AC against the small clusters of denser data.

The source also contains a little tutorial program for general
enumerative coding (which is in enumerative combinatorics called
ranking & unranking), showing step by step encoding & decoding of
user entered short binary or hex inputs, with all bit strings and
indexes shown in full as they transform. Another small tutorial
feature provides details of permutations coding and factorial radix.

Another set of functions included shows in detail the properties of the
quantized binomial tables, including all gaps in the output code space
(this was a subject of some controversy in the earlier thread) and
various types of redundancies due to quantization.

A recent thread here has additional discussion & links about QI
algorithm:

"New combinatorial coder: tighter & faster than Arithmetic coding"
http://groups.google.com/group/comp.compression/browse_thread/thread/ffc7f7ca84c76378

Few posts with additional info:

http://groups.google.com/group/comp.compression/msg/ae53ee708dae058c
http://groups.google.com/group/comp.compression/msg/6418d7524322c6c3
http://groups.google.com/group/comp.compression/msg/d601691935c4c116
http://groups.google.com/group/comp.compression/msg/a656f2937e768fa3
http://groups.google.com/group/comp.compression/msg/57984b354bb2ec1a
http://groups.google.com/group/comp.compression/msg/368b484e5c26dce3
http://groups.google.com/group/comp.compression/msg/b89ea4829c82659c

nightlight

unread,

Dec 24, 2005, 8:08:35 AM12/24/05

to

As an illustration how simple enumerative coding is, below is
the actual source code for the exact EC (for 32 bit inputs only,
although one could extend it to 64 bit fairly easily). To display
properly use fixed font for this post.

//-- Encode bit-string x to enum index I

dword enc32(dword x)
{ int k,n;
dword I;
k=1; I=0;
while(x) // assumes that unused/higher bits in x are 0
{
n=loBit(x); // get bit offset of the lowest bit set to 1
x&=x-1; // clear the lowest bit=1 in "buffer" x
I+=bc32(n,k++); // add binomial C(n,k) to the index, k=# of ones
} // increment count of ones, k
return I; // return enumerative index I
}

//-- Decode enum index I to bit-string

dword dec32(dword I,int n,int k)
{ dword x,b;
x=0;
do{
x<<=1; // fill in decoded bit as 0 (at position 0)
b=bc32(n,k); // find the largest binomial coefficient C(n,k)<=I
if (I>=b) // check if we can subtract b from I
{ // ==> yes, decoded bit is 1
I-=b; ++x; // reduce index I and set decoded bit=1
if (!--k) // decrement count of 1's and stop if no more 1's left
{
x<<=n; // pad the rest of output with 0 bits (in the low bits)
break; // leave the decoding loop
}
} // ==> no, decoded bit is 0; try next smaller b(n,k)
}
while(--n>=0); // this loop can be made to go faster using
return x; // the binary (instead of sequential) search for n
}

nightlight

unread,

Dec 29, 2005, 6:05:18 AM12/29/05

to

I received some questions about the source and the algorithm via email
and since the matter may be of interest to others who downloaded it, I
will
reply here as well:

> Concerning the source, I am afraid it is of little use in this form.
> I don't run windows here, and I do not feel like I would really need it.
> Thus, if you'd like to present your work, it would be most useful to
> depend on ANSI-C only.

The main nonstandard VC6/win32 C use in the source has to do with the
nanosecond timer & thread control (in Qiutl.c). That was added in case
users wish to check the precise performance against other coders.
There are few additional macros in Qitypes.h such as hiBit(), loBit(),
shifts & mul/div for various 32/64 bit mixes (which VC6 has implemented
very poorly and which are used in mixed radix code). For the next
release I'll try to add some control macros in the top level
application header "Intro.h" which control how these currently
nonstandard C elements are implemented, with ANSI C variant supported.
The matter of endian and word size is a bit more involved to extend and
any future generalization there depend on the requests from people
experimenting with the code. Right now the coders assume 32 bit little
endian architecture. Extension to 64 bit should be fairly simple (and
may be included within next few revs), while the big endian if it comes
up, will require more line by line work through the routines.

> Concerning the algorithm itself: This pretty much reminds me
> on the ELS coder, and I think it might be only benefitial to check
> out some papers on this matter.

The ELS coder is a member of AC algorithms. It is essentially a
rediscovered variant of the orignal Rissanen's 1976 AC coder, AC-76
(which presents that kind of additive integer AC in a mathematically
much cleaner and in a more rigorous way than the ELS paper from
Pegasus does on their version):

5. J. Rissanen Generalised Kraft inequality and arithmetic
coding, IBM J. Res. Dev. 20, 198-203, 1976
http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

The key differences between EC/QI and AC-76 (or ELS) type of coding
is discussed in [T2] pp. 19-25. The obvious similarity to EC/QI is
not accidental -- Rissanen was trying to solve the very same EC
precision problem that QI solves (this story was discussed in
more detail in Rissanen's 2nd AC paper:

27. J. Rissanen Arithmetic codings as number representations
Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979
http://www.1stworks.com/ref/ariNumRepr.pdf

as partially recounted in the intro section of [T3] and chapter in
[T2]). As explained in [T2] p. 22-23, he could not find the right
quantization for the exact enumeration. So he approximated the
unlimited precision enumeration first (Stirling, etc), then found
how to quantize the latter. The QI performs quantization of exact
EC directly, without approximating the exact enumeration. The
benefits of the new procedure relative to AC-76 (or its rediscovery,
ELS) are described in detail in [T2] pp. 19-25. Both, the coding
accuracy and the speed are improved via new procedure (speed
particularly dramatically).

Regarding the speed difference, you should note that QI performs
no coding operations on the most probable symbol, other than
skipping it (at the memory bus speed) while AC-76 & ELS do have
to update much larger coder state (beyond just the bare count of
symbols processed). As explained in [T2], AC-76 (or ELS) could
construct tables which would allow them skipping, but they would
need such tables for each source probability.

> It seems to me that your algorithm is a special form of ELS tuned
> to multinomials.

The QI is not a special case of ELS or AC-76 -- it is a
more accurate and a much faster solution of the same
problem. The AC-76 and the later ACs can be obtained
via approximation of QI (cf. [T2] pp. 22-23). Note also that
multinomials are only a special case of EC/QI addends --
they occur for order-0 Markov source. E.g. Cover's paper
[1] shows exact EC order-1 binary Markov enumerator.
The Oktem's thesis [23] has several other types of addends,
some of which don't even have closed forms (or at least they
didn't check the combinatorial literature well enough).
Constrained coding (cf. [26] and other Immink's work)
also has numerous other types of addends.

In short, the multinomials are just one type of EC/QI
addends. Even the QI sample source includes several of
non-multinomial types of addends, such as powers and
factorials for radix & permutation coding. Similarly, coding
of trees would use Ballot numbers (which reduce to Catalan
numbers on the main lattice diagonal x=y).

> Whether this can be exploited for general-purpose
> compression is not yet clear to me.

I think the difficulty that this and several other of your
questions show is result of using the AC modeling paradigm
to try to figure out how would one model with EC/QI. The
EC/QI doesn't model that way at all. This was discussed
at some length in [T2] pp. 26-35 (see especially 30-31
for direct comparison with AC modeling paradigm/pattern).

Briefly, the division of labor between the coder and
modeler in EC/QI is different than for AC & its modeler.
The EC/QI expect the modeler to perform particular kind
of partition of input which hands to coder instances of
equiprobable "messages" for enumeration (computation
of index). The "partition" or "segmentation" is only
in simple cases (such as quasi-stationary order-0 source)
the literal segmentation of input sequence. Generally,
this is not the case. E.g. for a known order-M
quasi-stationary Markov source, one would partition
EC output into M separate streams -- each symbol is
coded via EC/QI into a stream selected based on the
last M symbols. For unknown source, the BWT algorithm
performs such partition in its output column (cf. [T2]
pp 34-35; note that I use only the transform of BWT,
not the MTF or other entropy coding usualy done in the
2nd phase of BWT). The BW transform can be used as
general purpose segmentation module of the EC modeling
engine. The second half of an earlier reply to Matt Mahoney
gives some more detail on this:

http://groups.google.com/group/comp.compression/msg/2c769e3a278a62f4?hl=en&

The AC end EC modeling schemes belong to different schools of
thought in information theory, AC to Shannon, EC to Kolmogorov
school. The latter, due to the absence of practical coding
algorithm, has been living in an undeserved shadow. I think
that its potential has yet to be tapped and that it will prove
itself a much more powerful modeling scheme than what is done
today using AC (which functions through a modeling bottleneck,
as argued in T2). Future will tell.

References
--------------------------------------------------------------------------
T1. R. V. Tomic "Fast, optimal entropy coder"
1stWorks TR04-0815, 52p, Aug 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T2. R. V. Tomic "Quantized indexing: Background information"
1stWorks TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

T3. R. V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
arXiv cs.IT/0511057, 10p, Nov 2005 (also: 1stWorks TR05-1115)
http://arxiv.org/abs/cs.IT/0511057

Additional relevant papers are at:

Quantized Indexing RefLib page:
http://www.1stworks.com/ref/RefLib.htm

Quantized Indexing Home Page:
http://www.1stworks.com/ref/qi.htm

nightlight

unread,

Dec 29, 2005, 6:20:01 AM12/29/05

to

> for a known order-M quasi-stationary Markov source,
> one would partition EC output into M separate streams
> -- each symbol is coded via EC/QI into a stream selected
> based on the last M symbols.

Ooops, there is a bad typo up there. The fragment:

"into M separate streams"

should say:

"into up to A^M separate streams (where A is alphabet size)"

nightlight

unread,

Dec 30, 2005, 9:52:42 PM12/30/05

to

> Quantized Indexing RefLib page:
> http://www.1stworks.com/ref/RefLib.htm

There were two bad links in the RefLib.htm file. The missing files were
Ruskey's Combinatorial Generation [38] and Potapov's Theory of
Information [37] textbooks. Strangely, while there were many failed
attempts to download (errors 404) these files, no one left any feedback
to fix the link. Anyway, they files should be Ok now.

nightlight

unread,

Jan 4, 2006, 3:40:05 AM1/4/06

to

There was an small update to the source. The main addition (requested
in several emails) was an option:

#define _ASM 1 // Set to 0 to disable inline asm

in the Qitypes.h which allows disabling of the VC6 inline asm (used in
some macros). The speed of operations (mostly radix codes decoder)
drops by a few percent without the inline asm. There were also some
minor compiler warnings that some people emailed about which were
cleaned up in the latest code (I left the file name unchanged for the
old links to work):

http://www.1stworks.com/ref/C/QIC100.zip

There is also a thread about the algorithm discovery in the Computer
Chess Club ( http://www.talkchess.com/ <-- signup screen), where the
seed for it was planted way back in the last century. Here is the
excerpt for those interested in that aspect:

------ Computer Chess Club thread excerpt (few typos fixed) ----------

"Number of positions in chess -- The rest of the story"
http://www.talkchess.com/forums/1/message.html?476509

Posted by Ratko V. Tomic on January 03, 2006 at 08:12:12:

> Uri Blass: There is a better upper bound see:
>
> http://chessprogramming.org/cccsearch/ccc.php?art_id=77068
>
> Uri

Hi Uri, that little excercise in enumeration you brought up
(and mentioned in the other thread) set me off back then to
try make it work as a general purpose compression algorithm.
While a neat idea on paper, the problem was that the arithmetic
precision had to be of the size of output.

After struggling for a while, I searched the literature and
it turned out such compression algorithm already existed,
called "Enumerative Coding", since 1960s (first in Russian
literature, from Kolmogorov and his disciples, then shortly
thereafter here, in USA, from Lynch and Davisson). And, as
in my version, the precision problem was still unsolved after
over four decades of various attempts to make the algorithm
practical.

Since I arrived at it on my own, my conventions for
enumeration happened to be backwards from those that
existed in the literature (mine sorted right to left,
the so-called colex sorting of combinations, and built
up the enumerative addends bottom up, while the standard
scheme sorted lexicographically & worked recursively top
down, plus all my pictures were rotated 45 degrees from
theirs). Further, due to my playing with lattice methods
in QCD (in my physics graduate school days), I also had
my own visual representation of combinatorics as lattice
walks, which is a very intuitive, heuristically rewarding way
of looking at it, allowing one to see all of the combinatorial
identities at a glance (especially useful for tossing and
checking out algorithms in the head when going to sleep
or waking up, without a pencil and paper). The lattice
formulation turns out to have existed in the EC literature
as well (as the Schalkwijk's Pascal triangle walks), although
not in as general or elegant formalism as mine, lacking even
a notation for the lattice walks, key sets of paths, enumerative
classes, constraints... (stuff I worked out while doing physics).

Since that thread, I kept returning to the problem, on and
off, trying various ideas. Nothing worked. Then, in summer
2004, when my wife and kids went to a summer camp for
a week, I stayed home to work on a programming project
(a video codec). The first night home alone, at about 2AM,
while debugging the latest batch of code, out of nowhere
an idea popped into my head on that pesky enumeration
problem, something I didn't yet try. I quickly coded just a
toy version, allowing input buffers of 32 bits only, and by
dawn it worked -- a version using arithmetic precision of only
8 bits encoded & decoded correctly all 2^32 possible inputs.

That same early Sunday morning, it must have been around 6AM,
I called and woke up the company owner (I am a CTO & a chief
scientist), and he, still half awake, yielded to my enthusism and
agreed to suspend the original project, so I could try if the idea
works on data of any size.

At the time I didn't have a proof, and wasn't sure even at the
heuristic level, that it can always be decoded. I also didn't
know what the maximum or average redundancy would result
from the reduced precision. Within a week I had a simple
version of code working on buffers up to 4 kilobits, using only
16 bit arithmetic precision (instead of 4 kilobit precision).
It worked again, and it was very fast, even in that crude version.
The redundancy due to the limited precision arithmetic was
measured and it was on average about 0.05 bits (and always
below 0.07 bits) for the entire 4k block.

The next couple months I extended the algorithm to any input
size and to general alphabet (from the original binary alphabet).
I also found a proof of general decodability and an expression
for the max redundancy due to finite precision. The max
redundancy is always below log(e)/2^(g-1) for g bit arithmetic
precision (I use now g=32 bit precision). The fourty years old
puzzle was finally cracked. The accidental backwards conventions
of my initial approach turned out to be the critical element
exposing the key to the solution, which is virtually impossible
to spot from within the conventional enumerative coding scheme.

I also developed a new modeling scheme for combinatorial methods
of coding (such as the new algorithm) which is quite promising
on its own. It is basically a scheme along the lines of Kolmogorov's
algorithmic approach to information theory (in contrast to Shannon's
probabilistic approach, which is dominant at present, and where
modeling for arithmetic coding consists in calculating the
probabilities of the next single symbol).

The algorithm, which I named "Quantized Indexing", turned out
pretty amazing. It codes always tighter than the present best
entropy coding algorithm, the Arithmetic Coding (which is only
a particular approximation of QI), yet it codes much faster than
AC due to using only a simple table add (of a machine size word)
for the less frequent symbol and no coding operations for the
most frequent symbol (AC needs coding operations for both types
of symbols, and more expensive operations at that, such as mul, div).

As result, QI runs typically 10-20 times faster than the
fastest full Arithmetic Coder implementations (and always at
least 6 times faster, which occurs in high entropy limit,
while for very sparse data, the low entropy limit, QI runs
well over 200 times faster).

Recently, I posted a preprint about the algorithm to arXiv:

http://arxiv.org/abs/cs.IT/0511057

and also created a web page with additional more detailed
technical reports and the C source code:

http://www.1stworks.com/ref/qi.htm

A nice little surprise came when I emailed about the
preprint to Jorma Rissanen, the inventor of arithmetic coding
himself. He had struggled for several years with the very
same enumerative coding precision problem, inventing in the
process arithmetic coding as a compromise solution (in 1976,
while working for IBM). Although busy with a lecture tour,
he read the paper right away and was quite pleased to see
his old problem solved at last and a bit surprised at how
"very clever" the solution turned out to be.

So, that's the chain of unlikely events triggered by your
original code snippet for enumerating the chess positions.

David A. Scott

unread,

Jan 4, 2006, 6:17:17 AM1/4/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136364005.1...@g43g2000cwa.googlegroups.com:

>
> The algorithm, which I named "Quantized Indexing", turned out
> pretty amazing. It codes always tighter than the present best
> entropy coding algorithm, the Arithmetic Coding (which is only
> a particular approximation of QI), yet it codes much faster than
> AC due to using only a simple table add (of a machine size word)
> for the less frequent symbol and no coding operations for the
> most frequent symbol (AC needs coding operations for both types
> of symbols, and more expensive operations at that, such as mul, div).
>
>

I know you claim to have this great code. But others such as Matt
the inventor of PAQ codes at least wrote a simple arithmetic coder
FPAQ0 to show how his methods would compete on files where zero order
arithmetic coding would come into play. Since your model is always
"tigther than the present best enropy coding algorithm" do you have
actually test code to compare with real files and test sets. Or is
the method not yet advanced enough to do real compression on files
yet.

Don't get me wrong Matt is not saying FAPQ0 is the best entropy
coder he knows it isn't that good. He is just showing how it works
for a certain model. The same sort of thing could be dome with your
method. It least that is if your method is comparable at all with
real world entropy coders. And since using the same model as most
one could calculate the true entropy of versus standard test models.

If one does this one can see Matts code does not produce on average
the shortest file. Maybe yours could do better if you ever actually
get real working code. But I find it hard to belive it could compress
as well as some current entropy encoders.

Where Matts code shines is his use of various models and a slick
way to combine them which makes his family of PAQ coders among the
best on the net. The point is if your code is half as good as you
claim then his simple 2 state enropy coder could be replaces by
you faster and tighter 2 state coders wich would bring you name fame.
But I won't hold my breath.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

unread,

Jan 4, 2006, 8:00:59 AM1/4/06

to

> "tigther than the present best entropy coding algorithm" do you

> have actually test code to compare with real files and test sets.
> Or is the method not yet advanced enough to do real compression
> on files yet.

You can download the source code which shows the coding aspects where
QI is essentially different from the existent entropy coding
algorithms. All that the source will show you is that QI is more
accurate and much faster entropy coder than any AC you may have, tested
under the same conditions (e.g. same model, same coding of model
information, of data lengths, counts, output serialization or
packaging... etc). There are some quasi-arithmetic coders, which run
faster than the full AC (paying for speed in extra output redundancy),
but even these won't run 20 or 30 times faster, let alone 200 times
faster than the full AC (they're usually 2-3 times faster), as QI does.
But if you have one such, you're welcome to test it. I would be curious
to know.

The source code itself is not a file archiver or video codec or any
such higher level application. Any differences for these higher level
applications, however interesting they may be otherwise, are a simple
consequence of the fundamental differences:

a) AC always codes with greater redundancy than QI (under the same
coding conditions, obviously; this is the result of AC being a direct
approximation of QI, see [T2] pp. 19-25, the chapter on that exact
difference, how much and how much for which data, with additional
details on AC in [40],[41],[41a],[41b]) and

b) AC codes much more slowly than QI due to:

.. b1) AC has to perform coding operations for all input symbols, while
QI can just skip over the most probable symbols at memory speed (you
can see the top of coding loop in EncDec.c where it merely scans the
memory for the the less probable symbol, 32 symbols for each loop step
at basically memory scan speed), and

.. b2) AC performs more complex operations for the least probable
symbols (which QI also needs to encode explicitly) i.e. mul/div vs
simple array lookup and add. This difference, which remains even for
the uncompressable data (where the least & most probable symbols are
approximately equally likely), allows QI to still code at least 6 times
faster than the full AC even in high entropy limit.

All that is, of course, measurable using the source code provided
(which also includes very accurate timing functions). The above are not
religious claims or invitation to believe or debate belief systems, but
a simple statement of easily verifiable empirical facts. If you need to
know also how it would do on "real" file, and can't extrapolate from
how it does on memory buffers filled with arbitrary content, well, you
are welcome to add such file related code and try it. Now, if you do
add a file i/o which takes hundreds times longer than the coding, I can
predict you won't see almost any speed difference.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T2. R. V. Tomic "Quantized indexing: Background information" 1stWorks
TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

40. J.C. Kieffer "Second Order Analysis of Data Compression
Algorithms" (Preprint from J.C.K. lectures)
http://citeseer.ist.psu.edu/370131.html

41. M.Drmota, H-K. Hwang, W. Szpankowski "Precise Average Redundancy
of an Idealized Arithmetic Coding" DCC 2002, 222-231.
http://citeseer.ist.psu.edu/drmota02precise.html

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory
and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec
2002
http://alexandria.tue.nl/extra2/200213835.pdf

41b.B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive
Arithmetic Coding" Proc. 7th IMA Intern. Conf. on Cryptography and
Coding, 1999
http://www.1stworks.com/ref/RyabkoAri99.pdf

David A. Scott

unread,

Jan 4, 2006, 1:12:40 PM1/4/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136379659....@g44g2000cwa.googlegroups.com:

>> "tigther than the present best entropy coding algorithm" do you
>> have actually test code to compare with real files and test sets.
>> Or is the method not yet advanced enough to do real compression
>> on files yet.
>
> You can download the source code which shows the coding aspects where
> QI is essentially different from the existent entropy coding

> algorithms. All that the source will show you is that QI is more...

I guess that means you don't yet have code where you can compare
it to even simple airhmtic file coders. I don't have faith in your
work from your earlier posts.

A simple No you can't actully test it in any real applications yet
would have been enough. Again from the earlier thread its not obvious
to me you have a full understanding of arithmetic coding methods.

>
> All that is, of course, measurable using the source code provided
> (which also includes very accurate timing functions). The above are not
> religious claims or invitation to believe or debate belief systems, but
> a simple statement of easily verifiable empirical facts. If you need to
> know also how it would do on "real" file, and can't extrapolate from
> how it does on memory buffers filled with arbitrary content, well, you
> are welcome to add such file related code and try it. Now, if you do
> add a file i/o which takes hundreds times longer than the coding, I can
> predict you won't see almost any speed difference.

Very funny.
Its very strange you make a big deal of claiming you compare it
against what you claim is a good entropy coder Moffat. Yet you don't
even test against it on a level playing ground. You think by modifying
Moffat that you are giving an honest test. However if you really wanted
an honest test since you are the new kid on the block. You would think
you could easily convert your code to work on files like Moffat's or are
you afraid to test it on the same playing field Moffat and others have
picked so various methods can be tested against yours.

I for one belive you shifted ground because you fear real aithemetic
coders and people could take existing software without modification and
show you directly that you coder does not lead to shorter compressed output
than already existing coders. I suspect this since most would have tested
the Moffat code on the same playground instead of moding it to one of your
choice where its not easier to compare against any other standard codings.

Look maybe you have something. If your method is any good at all
surely you could easily add the stuff you stripped out of Moffat's
to your code so that you can compare the compression results or is
there some reason you can't. If you can't than it would seem to be of
little use.

See:

http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8
4c76378/792884848866dc4c?q=moffat&rnum=4#792884848866dc4c

From below if its true at all and if your code works at all you should
have the courage of your convictions to test it against Moffat and others
where they were designed to run. Doesn't yours work there whats the
problem?

YOUR QUOTE
"Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter) were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code. We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level)."

If you wanted to take the best shot Again let me stated that for the
dense. If you wanted to take the best shot. Then test it like others on
files where Moffats was made to work. If you don't one can only wonder
what you are hiding. Especailly since you clain this is so much better
than arithmetic.

I thought about downloading as you suggested and converting it to files
so it really can honestly be compared to Moffat. But I realize from your
post that you would claim I did a poor job of converting so I will not do
it. After all your the one making the cliam its better than Moffat. Your
the one saying you compared it to Moffat. Yet you really only compared it
to a modifed verision of Moffat that you yourself modified.

David A. Scott

unread,

Jan 4, 2006, 6:07:16 PM1/4/06

to

NIGHTLIGHT what is wrong with you?

You claim a coder better than the best arithmetic, Yet
your only proof is your word by using Moffat code you
modifed yourself.

Look people here are willing to help those trying to
learn. Yet you never seem to anwser real questions.

Such as why you went to the trouble to modify Moffats
code yourself and then proclaim your method is better
than the best present day arithmetic encoders.

Please stop with the ridiculus post of refereces that
have nothing to do with this unless your trying to pull
the wool over peoples eyes.

Since we don't work for you and don't have to kiss
your ass for a job. If you really want to show its better
and can compress to smaller sizes than Moffats. Then do
an honest test on unmodifed code.

Why is it you can change his code and put him down
with code that is not even completely his?
Yet you seem unable to change your code to work with
the data his is designed for? Surely you have the
ability with your team of people to do that. Is it
that your afraid other current arithmetic coders
can already compress most files better?

nightlight

unread,

Jan 4, 2006, 6:08:59 PM1/4/06

to

My tests on my time, your tests on your time.

Ignorant

unread,

Jan 6, 2006, 6:53:03 AM1/6/06

to

You must be damn rich to afford time ownership.
Can I share your secret to owning time?
No entropy ..please.

nightlight

unread,

Jan 6, 2006, 1:43:08 PM1/6/06

to

> You claim a coder better than the best arithmetic,
> Yet your only proof is your word by using Moffat
> code you modifed yourself.

That's not the "only" proof. Analysis of the algorithms,
as sketched below (and covered in more detail in the the
preprint & the tech reports) will give you the same answer,
showing it even more clearly. The source code which is
available allows anyone to test as they wish. The source
also allows examination so that one can explain the results
from the code itself, rather than form the mathematical
description of the algorithms.

> Such as why you went to the trouble to modify Moffats
> code yourself and then proclaim your method is better
> than the best present day arithmetic encoders.

The main mod on their code was to make it code from memory
to memory instead of using much slower C streams. The
alternative of adding stream in/out to QI code would simply
add a large and variable term and uncertainty to both coding
times making tests measure mostly the irrelevant aspects (such
as how much time the C stream library takes under all OS &
buffering fluctuations).

Hence, regarding the speed tests, the removal of their
stream i/o has allowed much more accurate and reliable
measurements of the coding speeds differences. If you wish
to measure your C stream i/o functions speed, which combines
some addons of coding times, you go ahead, test that. That
doesn't interest me.

The other mods were selection of the coding mode and precision
via their own options (in the headers). Binary coder, order 0
mode was used. Their code for the higher order contexts was
commented out, so the coder can run faster (instead of checking
higher order model variables inside their coding loop). Also, all
their memory allocs were moved outside of their coding loop (just
their coding loop was timed using the hi-res win32 timer).

As with the speed tests, the restriction to order 0 model was
made because that is where the coders differ (since one can
always use the same model and the same encoding of the model
parameters with both coders on higher order models). Hence, the
optimum way to measure the coding efficiency _difference_ was to
remove any extra variables in which they don't differ. Again, if
you are interested in the modeling quality of Moffat98 AC
implementation beyond order 0, go test that.

Of course, both aspects, the speed and the coding accuracy
advantage of QI, are self-evident to anyone who understands the
mathematical description of the two coding algorithms. QI is
always more accurate since AC is its direct approximation
(assuming you use both under 'everything else set the same way',
such as model, model parameter encoding, etc). AC and QI use the
same type of addends, except that AC has different normalization,
in which it rescales what is in QI an exact integer "path count"
at some point (x,y), by dividing it with QI's total "path count"
(e.g. the path count at the path endpoint B in Fig 1, p. 4). This
AC scaling turns QI's integers into unlimited binary fractions,
which the AC then truncates to the given number of bits.

This truncation of the infinite fractions even for a small number
of symbols (which is absent in QI's integer format of addends),
is a loss of precision which leads to AC losing parts of its
coding interval in each step. If one were to use fractions of
infinite precision, all intervals would fit exactly next to each
other, without gaps. Since allowing intervals to overlap would
result in non-decodable output (by Kraft inequality), any loss in
precision for specifying interval boundaries must leave unused
gaps in the output code space.

The basic arithmetic difference in coding is the extra loss of
precision for AC. A rough analogy would be as if two of us are
balancing some expenses and I use exact integer number of cents
from the receipts, while you take integers from the receipts
and divide them by some large total, then, since you will
generally get an infinite decimal fraction, you terminate it to
some number of places. Hence you're making an error even before
the first add, while my integer scheme won't have any error at
all (until the sum reaches certain magnitude).

The QI.exe included with the source has a command "cbr" which
lists all such code space gaps for any n-symbol input, as well as
the cumulative redundancy in bits resulting from the gaps.
Anpother command, "ct" lists various types of redundancies for
the entire tables, exeamining every quantized binomial
coefficient, for blocks up up 2^20 bits. In QI the Kraft
inequality is eq. (20) on page 8, and QI's loss of accuracy is
due to rounding up the integer addends once their magnitude
exceeds the number of bits QI uses. As explained on p. 8, the
error of QI's addends is the smallest one satisfying both the
given finite precision of g bits and the Kraft inequality
(eq. (20)).

Hence, I don't even need a test to know that, all else (model
etc) set the same, QI will code at least as tightly as anything
else you may have now or in the future. The test is useful to
find how much exactly does the difference in coding accuracy
amount to against some specific coder. Note also that AC has an
additional coding redundancy of about 1-2 bits (max is 2, avg
1.5) on the total output even for the _infinite precision_ coders
(see [41] and [41a] I mentioned before).

For a finite precision coders and if AC is set to code in
"decrementing mode" (described in [34]), which is its most
accurate coding mode, the additional redudnancy vs QI will be
about 2-3 bits on the total output (that's the best case when
using AC's "frugal bit" mode, which is how I have tested it).

For non-decrementing finite precision AC, such as the usual
'adaptive AC', e.g. Moffat98, or for the Rissanen's AC-76 (which
is a static AC), as shown in [T2] p. 22 for order-0 source, there
is an additional AC redundancy which is approximately:
1/2 log(2*Pi p*q n) bits, where p and q are probabilities of
1's and 0's and n is the input size in bits. This redundancy is
due to AC approximating the enumeration itself (Stirling approx.
plus dropping of the square root factor), before any finite
precision is imposed. This is normally the dominant difference
and it is what all but the last row in the table on p. 10, in
[T3], show. For stationary order-0 binary inputs, adaptive AC can
code using Krichevsky-Trofimov (KT) estimator (see [41a], [40]),
which removes this worst case O(1/2*log(n)) redundnacy, by
largely accounting for the square root factor. That results in
lowering the adaptive AC redundancy for the low entropy inputs
(where the relative error was the greatest) and increasing it for
the higher entropy inputs (where the relative error was the
smallest), hence KT estimator is a tradeoff.

Note that the last row on p. 10 compares entirely different
aspects of redundancy, the distinction between predictive vs
descriptive methods of modeling (see [T2] pp. 30-31). Although,
one might call it comparing apples and oranges, since it didn't
really compare the coders proper, the point was that each coder
was modeling input as order-0 source (while the input was more
complex), but each was modeling in its "native" mode -- QI in
descriptive and AC in predictive mode. The lesson of that
example, or any other with large and unpredicatble changes in the
symbol frequencies is that the predictive coders pay much greater
price than the descriptive coders when the input is unpredictable
(relative to whatever model order they use, assuming they both
use the same order models).

We can similarly deduce the nature of the speed difference from
the mathematical descriptions of the algorithms. The QI index
recurrence is eq. (22), which for binary order-0 coding simplifes
to eq. (17), with quantized binomials C(n_j,j). Since QI keeps
these in a table, its coding via (17) consists in adding a table
value (a machine size word, 32 bits in C code provided) to the
output buffer for every "least probable symbol" (1 by
convention), and no coding operation for the most probable symbol
(0 by convention).

In this same coding setup, the AC's calculation does exactly the
same additions of (17), except that all its terms are rescaled
(normalized to total=1, as explained above, see [T2] pp. 22-23),
and that AC doesn't use table to get its rescaled C(n,k) addends,
but it computes them on the fly using multiplicative recurrences,
which in binary order-0 coding are of the form (this is a regular
binomial identity):

C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ...... (1)

when symbol 1 is encoded and:

C(n,k) = C(n+1,k) * (n-k)/(n+1) ...... (2)

when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which
is a ratio of the remaining counts of ones, (k+1), and the total
remaining symbols (n+1), is interpreted within AC as probability
of ones and the factor q=(n-k)/(n+1) in (2) as probability of
zeros at that same place. As explained in [T2] pp. 19-25, AC has
no choice here since its addends are dependent on probabilities,
hence it can't have a single table (as QI does) which applies to
all probabilities. If you wanted to make it code faster using
tables to skip the most probable symbol, you would need a
separate table for each source probabilities. That makes such
scheme quite impractical for AC (see [T2] pp. 19-21).

Consequently, if both coders have processed n symbols, containing
k ones, and have 20 zeros followed by 1 ahead, QI simply picks
the next binomial C(n+20,k+1) from the table and adds it to the
index. The AC has to compute all 20 intervening binomials (for
the 0's), using multiplicative recurrence of type (2), to
calculate the final rescaled addend for the last bit=1. That's
the basic difference in work between the coders which follows
from the math of the algorithms. You don't need any test to see
that if there were 500 zeros followed by 1, AC would have done
500 multiplications via eq (2), while QI has only added n+500 and
picked the addend out of the table. The tests only tell you how
do these obvious differences in the amounts of work translate
into differences in coding speeds. The table on page 10 shows
that e.g. for 128 Kbit input, with 8 ones, where you have on
average 16384 zeros between the ones, that QI will execute 247
times faster (the result in the table is averaged over 500
inputs, each with random placements of 8 ones). As mentioned in
the readme.txt that comes with the source code, in this test
generic QI was used, not the QI tuned for sparse coding mode
(which is provided in the C source), which would have given it
another factor 2-5 on such very sparse and long inputs.

In short, my tests were focused on the aspects in which the
coders differ _inherently_, as described above (since everything
else can be set the same if one wishes so). That's what I
measured and what I reported. If you're interested in something
else, such as quality if Moffat98 AC modeler of order 1,2..., or
in speed of your C stream i/o library, you can write the code and
test that. Neither question interests me, though.

Regarding the QI higher order modeling, while one can use AC
modeling engine, that is a sub-optimal modeling scheme for QI, as
explained in [T2] pp. 26-35. The optimum division of labor
between the modeler and the coder is quite different in QI from
the one in AC (see [T2] pp. 30-31). QI's general purpose native
modeling engine is BWT (the bare BW transform, before MTF or
run-lengths or other entropy coding). That is all still research
in progress, so I'll leave it at that. No sense arguing
heuristics.

Willem

unread,

Jan 6, 2006, 3:01:39 PM1/6/06

to

nightlight wrote:
) That's not the "only" proof. Analysis of the algorithms,
) as sketched below (and covered in more detail in the the
) preprint & the tech reports) will give you the same answer,
) showing it even more clearly.

This analysis is no different from any other analysis in that
you have to make lots of assumptions. This means that if you use
such an analysis to make real-world predictions, then that depends
on how well your assumptions match the real world.

Because of the massive nature of your posts, I haven't been able to
find an answer to a few questions I have about the QI coder:

- if I read you correctly, it is not an adaptive coder,
so how do you transmit the model information for the QI coder ?

- how would you use a QI coder with an adaptive model ?

- assuming I have a stream of symbols, where at each position in
the stream, the probability distribution of the symbols is different,
then how does QI coder adapt itself to all those different distributions ?

(I have a few more questions, but stick to these for now.)

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

David A. Scott

unread,

Jan 6, 2006, 5:39:48 PM1/6/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136572988.6...@z14g2000cwz.googlegroups.com:

>
> Of course, both aspects, the speed and the coding accuracy
> advantage of QI, are self-evident to anyone who understands the
> mathematical description of the two coding algorithms. QI is
> always more accurate since AC is its direct approximation
>

I wish we had someone here that was an expert on both
algorithms since from your previous posts it sure indicates
to me that you are no expert on arithmetic coding. I
admit I know very little about your QI however I now enough
about arithmetic coding to realize that proper optimal bijective
file coding is one of the best methods and it is an optimal
method something you don't seem to understand from your various
posts. Even if QI could be used to make some sort of optimal
file compressor it could never compress all files as well
as an optimal arithmetic. Since you can't grasp that simple
fact you can't be an expert in arithmetic so I doubt what you
think is self evident has any relationship to reality.

You seem to think your stuff is better than arithmetic when
used as an entropy encoder. Yet it appears this so called neat
method of yours has yet to be even tested in any simple code
where one is using an entropy coder. Why is that?

Even Matt did FPAQ0 can't you or your team do something
similar with QI or is it to complex of a task?

I know you could care less what I think. But many people
here would like to see real results. We get plenty of people
that can quote and talk about how good there stuff is and people
here realize talk is cheap. They want to see REAL RESULTS can
you do that or do you care what the average person here thinks
of your method.

Like mentioned above it appears you can modify Moffat which
is not the best but you picked what you wanted to pick and then
called it the best. I would continue asking new questions but
for some reason you seem not to anwser even the most simple
of questions.

Sachin Garg

unread,

Jan 6, 2006, 5:54:36 PM1/6/06

to

Willem wrote:
> nightlight wrote:
> ) That's not the "only" proof. Analysis of the algorithms,
> ) as sketched below (and covered in more detail in the the
> ) preprint & the tech reports) will give you the same answer,
> ) showing it even more clearly.
>
> This analysis is no different from any other analysis in that
> you have to make lots of assumptions. This means that if you use
> such an analysis to make real-world predictions, then that depends
> on how well your assumptions match the real world.
>
>
>
> Because of the massive nature of your posts, I haven't been able to
> find an answer to a few questions I have about the QI coder:

I agree that someone having enough time for reading his massive posts
could rather read the paper and find answers himself.

I will try to answer some questions on the basis of my understanding of
what he has posted here (I have not read his papers yet).

> - if I read you correctly, it is not an adaptive coder,
> so how do you transmit the model information for the QI coder ?

He says that QI is based on enumerative coding, where models are
conceptually different from PPM models which we are more familiar with.

So it will probably mean that we will need to study EC from ground up
and then see how QI fits in (atleast I will have to as I am not
familiar with EC) and how it relates to PPM models.

Or if someone can summarize the difference here for the people like me,
please do so.

> - how would you use a QI coder with an adaptive model ?

He said that QI is the "natural" choice for BWT post-processing. This
probably means that QI itself cant be used for higher order adaptive
coding but by using BWT, the higher-order-adaptive-modelling problem
can be reduced into something which QI can handle.

> - assuming I have a stream of symbols, where at each position in
> the stream, the probability distribution of the symbols is different,
> then how does QI coder adapt itself to all those different distributions ?

I dont know the answer to this one.

I hope this helps (my apologies to nightlight if I am mistaken
somewhere, feel free to correct me).

As for nightlights's comments on QI's speed, I am afraid that as the
modelling scheme for QI is different from modelling scheme for
ArithCoding, we will need to compare speed of "QI+its modelling code"
with "AC+its modelling code". Where both models should be of same
order, or chosen to give same compression ratio. (My doubt here is
"what if QI just *shifts* computation burden to modelling code instead
of reducing it".)

Sachin Garg [India]
http://www.sachingarg.com

David A. Scott

unread,

Jan 6, 2006, 9:15:24 PM1/6/06

to

"Sachin Garg" <sch...@gmail.com> wrote in
news:1136588076.7...@g14g2000cwa.googlegroups.com:

>
> Willem wrote:
>> nightlight wrote:
>> ) That's not the "only" proof. Analysis of the algorithms,
>> ) as sketched below (and covered in more detail in the the
>> ) preprint & the tech reports) will give you the same answer,
>> ) showing it even more clearly.
>>
>> This analysis is no different from any other analysis in that
>> you have to make lots of assumptions. This means that if you use
>> such an analysis to make real-world predictions, then that depends
>> on how well your assumptions match the real world.
>>
>>
>>
>> Because of the massive nature of your posts, I haven't been able to
>> find an answer to a few questions I have about the QI coder:
>
> I agree that someone having enough time for reading his massive posts
> could rather read the paper and find answers himself.
>

If the paper is that small pdf file at his site it does not anwser
the simple questions being asked. So I don't see this as an anwser.
And if its some stuff where one would have to sign some nondiscloser
agreement I think that to would be a waste to time. Since the questions
asked of him are not that hard.

> I will try to answer some questions on the basis of my understanding
> of what he has posted here (I have not read his papers yet).
>
>> - if I read you correctly, it is not an adaptive coder,
>> so how do you transmit the model information for the QI coder ?
>
> He says that QI is based on enumerative coding, where models are
> conceptually different from PPM models which we are more familiar
> with.
>
>
> So it will probably mean that we will need to study EC from ground up
> and then see how QI fits in (atleast I will have to as I am not
> familiar with EC) and how it relates to PPM models.
>
> Or if someone can summarize the difference here for the people like
> me, please do so.
>

He seems to belive arithemtic compression is a poor approximation
of EC compression. Here is summary take two symbol one and zero
suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6
combinations. his method would assign the 6 possible numbers to this
problem so you could say its exact. At least if numbers small. But
the paper really doesn't say how to do compression. You read various
other papers that show how to assign a number to a combination.

All he seem to do is get a number for a combination.
He has in his paper the example of 00101001 in which he calculates
the value as 2+6+35 = 43 this was done in long post message
59 at

http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8
4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363

Big deal thats not compression. He state there

"The answer is on page 6, where it shows the string index
I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
bits of the index is L=log(56)=5.807... bits since the valid values of
the index are 0..55 (there are 56 paths from A to B). The fractional
bits of L don't normally go to waste since they are coded in the mixed
radix with other items sent, usually with the fractional bits of
indices for other blocks (cf. p.9 [N4]). The encoder also sends the
count of 1's per block, which is 3 here and the length of the entire
string which is 8. The latter two items get coded in variety of ways in
different variants and different parts of the coder/s and experimental
prototypes (cf. p.9 [N6])."

This means he never compares it to an arithmetic compressor. He only
makes claims thats it better. I think if he could actually compress the
one example he gives in paper then he might be more beliveable. But he
wants to say its exact. But he can't be pinned down on any application
that compares it to an arithmetic. So not having any simple examples
done in a complete way says him being shown arithmetic ways that beat
his method. He leaves it to you to do the coding. At which point he
could claim you didn't do it the right way. Actually its a clever way
to prevent it from being compared to any real world entropy compressor.

So the only thing I got out of the paper besides the fact he never
completes anything is to say for a string made of only ones and zeros
the compressed encoded result is 3 items that are easy to combine
just don't ask him how to combine them since he seems not likely to do so.
1) the length of entire string
2) the number of ones
3) the index value.

He doesn't risk actually combining these in an output string
its possible he does not want to risk being laughed at or he
fears one could show how a simple bijective string compressor gets
better results. If you can think of any other reason no examples
like this are done to the end please Schin tell us what you think
it is.

>> - how would you use a QI coder with an adaptive model ?
>
> He said that QI is the "natural" choice for BWT post-processing. This
> probably means that QI itself cant be used for higher order adaptive
> coding but by using BWT, the higher-order-adaptive-modelling problem
> can be reduced into something which QI can handle.
>
>> - assuming I have a stream of symbols, where at each position in
>> the stream, the probability distribution of the symbols is different,
>> then how does QI coder adapt itself to all those different
>> distributions ?
>
> I dont know the answer to this one.
>

Maybe it depends on the defination of "natural"

> I hope this helps (my apologies to nightlight if I am mistaken
> somewhere, feel free to correct me).
>

I suspect he will paste in lots of links to various papers that
seems to be his style. So don't hold your breath for useful anwsers.
They may ot may not be related to your questions.

> As for nightlights's comments on QI's speed, I am afraid that as the
> modelling scheme for QI is different from modelling scheme for
> ArithCoding, we will need to compare speed of "QI+its modelling code"
> with "AC+its modelling code". Where both models should be of same
> order, or chosen to give same compression ratio. (My doubt here is
> "what if QI just *shifts* computation burden to modelling code instead
> of reducing it".)
>
> Sachin Garg [India]
> http://www.sachingarg.com
>

David A. Scott

--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 6, 2006, 3:05:15 PM1/6/06

to

-- Errata:

>> 128 Kbit input, with 8 ones, where you have on
>> average 16384 zeros between the ones

The figure 16384 should be repaced by 128*1024/9 = 14563.5... since the
8 ones produce 9 sections of zeros.

Sachin Garg

unread,

Jan 7, 2006, 3:04:48 AM1/7/06

to

Hi David,

How is your health these days? I hope you are doing better now.

> >> Because of the massive nature of your posts, I haven't been able to
> >> find an answer to a few questions I have about the QI coder:
> >
> > I agree that someone having enough time for reading his massive posts
> > could rather read the paper and find answers himself.
> >
>
> If the paper is that small pdf file at his site it does not anwser
> the simple questions being asked. So I don't see this as an anwser.
> And if its some stuff where one would have to sign some nondiscloser
> agreement I think that to would be a waste to time. Since the questions
> asked of him are not that hard.

Oh, I had presumed that hidden in all papers he links to, there will be
answers to the questions here, I didn't realized that they have only
incomplete examples (maybe complete from QI perspective, but incomplete
from compression perspective, which makes them useless atleast for us).

> >> - how would you use a QI coder with an adaptive model ?
> >
> > He said that QI is the "natural" choice for BWT post-processing. This
> > probably means that QI itself cant be used for higher order adaptive
> > coding but by using BWT, the higher-order-adaptive-modelling problem
> > can be reduced into something which QI can handle.
>

> Maybe it depends on the defination of "natural"

I guess what he meant was, more efficient than MTF etc... Anyway, we
can leave discussions on this, hopefully he will come up with a BWT
based compressor implementation to prove his point.

Sachin Garg [India]
http://www.sachingarg.com

nightlight

unread,

Jan 6, 2006, 1:34:26 PM1/6/06

to

> You claim a coder better than the best arithmetic,
> Yet your only proof is your word by using Moffat
> code you modifed yourself.

That's not the "only" proof. Analysis of the algorithms,

as sketched below (and covered in more detail in the the

preprint & the tech reports) will give you the same answer,

showing it even more clearly. The source code which is
available allows anyone to test as they wish. The source
also allows examination so that one can explain the results

from the code itself, rather than form the mathematical
description of the algorithms.

> Such as why you went to the trouble to modify Moffats
> code yourself and then proclaim your method is better
> than the best present day arithmetic encoders.

The main mod on their code was to make it code from memory

Of course, both aspects, the speed and the coding accuracy

advantage of QI, are self-evident to anyone who understands the
mathematical description of the two coding algorithms. QI is
always more accurate since AC is its direct approximation

nightlight

unread,

Jan 6, 2006, 2:51:42 PM1/6/06

to

-- Errata:

> 128 Kbit input, with 8 ones, where you have on

Matt Mahoney

unread,

Jan 6, 2006, 10:42:31 PM1/6/06

to

nightlight wrote:
> This truncation of the infinite fractions even for a small number
> of symbols (which is absent in QI's integer format of addends),
> is a loss of precision which leads to AC losing parts of its
> coding interval in each step. If one were to use fractions of
> infinite precision, all intervals would fit exactly next to each
> other, without gaps. Since allowing intervals to overlap would
> result in non-decodable output (by Kraft inequality), any loss in
> precision for specifying interval boundaries must leave unused
> gaps in the output code space.

Discarding part of the range is one way to deal with finite precision,
for example the carryless rangecoder in ppmd. However the various
coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code
space. These are carryless binary arithmetic coders with 32 bits
precision and 12 bit representation of probabilities. They output a
byte at a time so most of the time the range is represented with 24
bits precision. Coding loss occurs due to rounding of the probability.
In the worst case the range is 2 and the probability is forced to 1/2,
but this is rare. The practical effect is to increase the compressed
size by 0.0001 bpc on typical inputs.

The coding error can be made arbitrarily small with little effort. In
paqar and pasqda the coder includes a carry counter and outputs a bit
at a time, so the range always has at least 30 bits of precision.
Redundancy due to rounding error e in the probability is O(e^2), or
about 2^-60. If this is still too big, you could go to 64 bit
arithmetic and reduce the redundancy to about 2^-124.

I am sure there are applications for QI, but in the PAQ series even a
perfect coder would have negligible effect on both compression ratio
and speed, since CPU time is dominated by modeling. Being restricted
to an order 0 model seems like a severe disadvantage. How would you
transform a context mixing model to order 0?

-- Matt Mahoney

Jasen Betts

unread,

Jan 7, 2006, 4:53:19 AM1/7/06

to

["Followup-To:" header set to comp.compression.]

On 2005-12-29, nightlight <night...@omegapoint.com> wrote:
> I received some questions about the source and the algorithm via email
> and since the matter may be of interest to others who downloaded it, I
> will
> reply here as well:
>
>> Concerning the source, I am afraid it is of little use in this form.
>> I don't run windows here, and I do not feel like I would really need it.
>> Thus, if you'd like to present your work, it would be most useful to
>> depend on ANSI-C only.
>
> The main nonstandard VC6/win32 C use in the source has to do with the
> nanosecond timer & thread control (in Qiutl.c).

the thread control functions are never called.

you have arrays with no length declared in headers

(I'm guessing they should be declared extern) in the headers....
the timer stuff I replaced with calls of the posix gettimeofday(),
(I figure microsecond precision is close enough)

I turned off the ASM. renamed the files to match the names given in the
source. produced stubs for the conio calls you make, and wrote a makefiles
replaced the integer types with names from <stdint.h> it compiles.
and does something

what is all that output supposed to mean, or more to the point,
what do I do to get efficiency statistics, elapsed time, compressed size,
that sort of stuff.

Also I found some printf()s (for error conditions) with the wrong number
of arguments and a few other wierdnesses.

another probem with your code is that as it stands it seems that it only
tests your algorithm's ability to compress pseudo-random data...
pseudo-random data is theorietically extremely compressible.

Bye.
Jasen

David A. Scott

unread,

Jan 7, 2006, 9:19:31 AM1/7/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136572988.6...@z14g2000cwz.googlegroups.com:

....

> infinite precision, all intervals would fit exactly next to each
> other, without gaps. Since allowing intervals to overlap would
> result in non-decodable output (by Kraft inequality), any loss in
> precision for specifying interval boundaries must leave unused
> gaps in the output code space.
>
> The basic arithmetic difference in coding is the extra loss of
> precision for AC. A rough analogy would be as if two of us are
> balancing some expenses and I use exact integer number of cents
> from the receipts, while you take integers from the receipts
> and divide them by some large total, then, since you will
> generally get an infinite decimal fraction, you terminate it to
> some number of places. Hence you're making an error even before
> the first add, while my integer scheme won't have any error at
> all (until the sum reaches certain magnitude).
>
> The QI.exe included with the source has a command "cbr" which
> lists all such code space gaps for any n-symbol input, as well as
> the cumulative redundancy in bits resulting from the gaps.
>

....

You don't seem to grasp the obvious. There are arithmetic coders
that have zero gaps. You don't know what you are talking about.
One such coder is arb255.exe you seem to repeat useless stuff over
and over with out actually thinking. You can't anwser simple
questions or provide simple anwsers. Who are you kidding?

Willem

unread,

Jan 7, 2006, 9:39:04 AM1/7/06

to

nightlight wrote:
) This truncation of the infinite fractions even for a small number
) of symbols (which is absent in QI's integer format of addends),
) is a loss of precision which leads to AC losing parts of its
) coding interval in each step. If one were to use fractions of
) infinite precision, all intervals would fit exactly next to each
) other, without gaps. Since allowing intervals to overlap would
) result in non-decodable output (by Kraft inequality), any loss in
) precision for specifying interval boundaries must leave unused
) gaps in the output code space.

This paragraph clearly demonstrates that you do not understand well
enough how Arith Encoding works. Any decent AC does *not* lose parts
of its coding interval each step. Try to get that through your head.
It *is* possible (hell, it's quite easy) to get the intervals to line
up exactly without infinite precision.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

David A. Scott

unread,

Jan 7, 2006, 10:17:10 AM1/7/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in
news:1136605351.6...@g47g2000cwa.googlegroups.com:

> From: "Matt Mahoney" <matma...@yahoo.com>
> Newsgroups: comp.compression,sci.math
>
> nightlight wrote:
>> This truncation of the infinite fractions even for a small number
>> of symbols (which is absent in QI's integer format of addends),
>> is a loss of precision which leads to AC losing parts of its
>> coding interval in each step. If one were to use fractions of
>> infinite precision, all intervals would fit exactly next to each
>> other, without gaps. Since allowing intervals to overlap would
>> result in non-decodable output (by Kraft inequality), any loss in
>> precision for specifying interval boundaries must leave unused
>> gaps in the output code space.
>
> Discarding part of the range is one way to deal with finite precision,
> for example the carryless rangecoder in ppmd. However the various
> coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code
> space. These are carryless binary arithmetic coders with 32 bits
> precision and 12 bit representation of probabilities.

Matt I think you are correct in saying the coder used in fpaq0 has
no gaps. But the over all code it produces does have gaps due to the
modeling. It has nothing to do with the arguement with nightlight
since not sure he has a grasp of arithmetic at all. I am not just
talking about the file endings I am taking about gaps that exist
through out the whole output file because of the model.
In your model you use 9 bits for every 8 bits of data where starting
bit is a ZERO for each byte and then for EOF you allow the starting
bit to be ONE and stop the compressor. This does not allow all the
code space to be used.
As a result of this modeling every compressed file with FPAQ0 has
the first bit of the fist byte set on output so technically the first
bit out is a total waste of space. However its not that bad as far
as total number of extra bits. Most add the waste purely at end or
with a count field at the beginning. Without all the complications
of a true bijective compressor you could drop back to 8 bits per symbol
and then when marking the compression done you just start with a
pretend bit for the EOF such that X2 changes then flush as before.
On decompression you check for EOF at start of each new byte if all
the bytes of archive have been read. If not just continue in loop.
When checking you can caluclate exactly when to stop in fact you
can do a free check to see if file is actaully the result of the
compression. You don't have a bijective file compressor at this point
but its with in a byte or two for even every long files.
To go the extra step for a full bijective file compressor you
would have to do what arb255.exe did which is basically use the last
bit that is a one in the file as the EOF endicator or if last byte
all zeros or a zero followed by tail of 100..'s you add in bijecitvely
a last bit that is a obe which takes a lot or overhead for just 2 bytes
or so of saving at a cost of time.

Let me state your model does not cost more than what other people
do in fact it is slick. However the cost for a N byte file being
compressed is -lg(1/2) -ln(2/3) - .. -ln(N/N+1) for the zero codings
plus the length for 1 coding at end which is -ln(1/N+2) bits
the zeroes add to -ln(1/N+1) bits. For a file 100,000 bytes long
this is 16.6096549013 bits due to zeroes
and is 16.6096693280 due to one.
for a total of 4 extra bytes that are not needed. The cost of the
zeros can be totally eliminate with very little change to FPAQ0
the cost of the One would be the current cost of a one in the table
and could be no greater than what you are currently using. This
method would allow dense packing and fully use the compressed
space with out gaps except for the very ending bytes.

David A. Scott

unread,

Jan 7, 2006, 10:25:13 AM1/7/06

to

Willem <wil...@stack.nl> wrote in
news:slrndrvkk7....@toad.stack.nl:

> nightlight wrote:
> ) This truncation of the infinite fractions even for a small number
> ) of symbols (which is absent in QI's integer format of addends),
> ) is a loss of precision which leads to AC losing parts of its
> ) coding interval in each step. If one were to use fractions of
> ) infinite precision, all intervals would fit exactly next to each
> ) other, without gaps. Since allowing intervals to overlap would
> ) result in non-decodable output (by Kraft inequality), any loss in
> ) precision for specifying interval boundaries must leave unused
> ) gaps in the output code space.
>
> This paragraph clearly demonstrates that you do not understand well
> enough how Arith Encoding works. Any decent AC does *not* lose parts
> of its coding interval each step. Try to get that through your head.
> It *is* possible (hell, it's quite easy) to get the intervals to line
> up exactly without infinite precision.
>
>
> SaSW, Willem

It his inability to grasp this simple well understood fact that
makes one wonder if he understands the subject field at all. Of course
you can expect his usually reply with lines and lines of quoted text
having nothing to do with this fact which he either will not or can
not seem to grasp. He reminds me of the current crop of students who
are never corrected when thay make common errors so they just go on
making ever larger errors while never having learned how to learn
from mistakes.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 7, 2006, 12:25:12 PM1/7/06

to

> This analysis is no different from any other analysis
> in that you have to make lots of assumptions. This
> means that if you use such an analysis to make
> real-world predictions, then that depends
> on how well your assumptions match the real world.

I see now where you're misreading my redundancy
statements in this and in the earlier thread. The
redundancy I was deriving in [T3] at end of page 8,
and talking about here, is simply the number of
excess bits that the finite precision arithmetic
used by QI will add to the unlimited precision
enumerative code. The paragraph on page 8 after all,
starts with "To obtain QI redundancy d(g) _due to
SW rounding_ in (21)..." The upper bound for
_this_ redundancy d(g) obtained there is:

d(g) < log(e)/2^(g-1) .... (1)

No model or source assumptions are needed for (1), and
none is reflected in (1). The only parameter in (1) is
the assumed arithmetic precision of g bits (the SW
mantissa length). The eq. (1) is simply an upper bound
on any additional bits relative to the exact EC, produced
by QI due to its use of SW rounding up in eq. (21). It has
nothing to do with how well EC or QI will code in any
given setup (model, source, parameter coding method, etc).
If the exact EC model is perfect in a given setup, than
(1) shows what is the maximum that QI can fall short of
that "perfect" output. If EC is coding using an imperfect
model, resulting in some redundancy of R bits per symbol
relative to the best model, (1) shows the maximum that
QI can add to R. But (1) doesn't care or predict what
R is. The two types of redundancy are completely unrelated.

The question of how well the exact EC and AC can code in
different coding setups, is an entirely different and a
much larger topic, well covered in the literature, starting
with Davisson's [28] and many other papers since.

Krichevsky-Trofimov's paper [33] provides great many
bounds for variety of coding setups. Some related later
results are in [40],[41],[41a]-[41d]. The basic result
is that even for the unlimited precision coders (and
coding under 'everything else set the same') the exact
EC has a slightly lower redundancy than the exact AC (by
approximately 1 bit for the entire input, for max & avg).
This is the same difference as between the Huffman and
the Shannon-Fano prefix codes. Even the origin of the
difference is the same: the bottom-up addend construction,
as done by QI and Huffman, is tighter than the top down
addend construction, as done by AC & Shannon-Fano.

Now to your main questions, starting with the last one,
which is the most specific:

> assuming I have a stream of symbols, where at each
> position in the stream, the probability distribution
> of the symbols is different, then how does QI coder
> adapt itself to all those different distributions ?

This is the scenario of AC modeling engine feeding QI,
which was sketched in note N7 on p. 9, [T3]. Two ways
are described there:

a) --- QI coding AC style

QI can code the AC style by performing "Lattice
Jumps". It is simplest to see how this is done by looking
at the decoder, arriving at some point B=(x,y) (see
Fig. 1, p. 4, [T3]). The path count at B is N(x,y)=56.
The index at point B can have values 0..N(x,y)-1, hence
the length of the index at B is log(N(x,y))=log(56) bits.
If bit=1 gets decoded (as shown by path on Fig 1), the
decoder moves up, to point BA=(x,y-1), which has the
path count 21, hence the index has length log(21) bits.
Hence, upon decoding bit=1 at B, the index length has
dropped by log(56)-log(21)=log(8/3) bits, which is
precisely the ideal code length log(1/p) for bit=1
at B, where p=3/8=probability of 1 at B. If bit=0
gets decoded at B, decoder moves to point BL=(x-1,y)
where path count is N(x-1,y)=35, hence the index
length is log(35). In this case the index length
drops by log(56)-log(35)=log(8/5) which is exactly
same as the ideal code length log(1/q), where q=5/8
is probability of 0 at B. It is easy to see from
multiplicative recurrences for binomial coefficients
(eq's (1) & (2) from the previous post here) that
this pattern always holds - after every decode step,
the index length drops by exactly log(1/P), where P
is the probability of the decoded symbol. Analogous
relation holds for each encode step, where the index
length increases by the ideal code length of the
encoded symbol at that point. Note also that due to
integer arithmetic, this is not an approximate
optimality (such as one would get using truncated
infinite fractions, as AC does). With QI/EC, this
coding optimality at every point is built into
the table entries. { You can check the quantization
errors using e.g. QI.exe cbr n36, which shows no
quantization errors for n=36 (or below), and with
n=37, the 1st error for k=16, of just +1 in the
SW mantissa which adds 4e-10 bits to the index.}

With QI, for a general point B=(x,y), the quantized
path count L(x,y) (computed via (21)) is an SW integer
with a g-bit mantissa w(x,y) and exponent e(x,y).
The ideal code lengths and ratios for the steps
from B described above still hold, but only within
the limits d(g). In particular, L(x,y-1)/L(x,y) is
approx. =p=y/(x+y) and L(x-1,y)/L(x,y)=q=x/(x+y).

The index at B will have for the leading g bits
at the bit offset e(x,y) some g-bit integer Iw(x,y)
which is in the interval [0,w(x,y)-1] (this is a
simple consequence of the index at any point
ranging from 0 to path count-1 and the fact that
quantized path count L(x,y) has trailing zeros
after the leading g bits given by w(x,y), hence
L(x,y)-1 will decrement w(x,y)). We can thus view
for any point B(x,y) and index I(x,y), the Iw(x,y)
as a digit in the radix w(x,y).

Suppose now, decoder at B=(x,y) gets from the modeler
some probabilities p' and q' different from p,q. To
continue decoding, decoder makes a jump to another
lattice point B'=(x',y') where x'/(x'+y')=p' and
y'/(x'+y')=q'. One can use Farey fractions (see
[F]) to obtain the optimum such point for any
given binomial table size. Alternatively, one
can simply jump to another point on the same
front i.e. one would keep n fixed, x+y=n=x'+y'
and select point B' using x'=n*p'. The path
count at B' is L(x',y') with mantissa w(x',y')
and exponent e(x',y'), which are different from
w(x,y) and e(x,y). The exponent is easy to adjust:
you simply change the presumed position of the
least significant bit of the index I(x',y') (this
is the origin, A on Fig 1., toward which decoder
is heading, but hasn't reached yet since there
are more symbols to decode; in the QI source
code in file EncDec.c this presumed origin of the
index is given as argument "sb" to function qiDec()).

The main work is with the difference in the path
count mantissas w(x,y) and w(x',y') at B and B'.
Namely at B' the leading g bits of index Iw(x',y')
have to be a digit in the radix w'=w(x',y'). But we
only have a g-bit digit left over from B which is
in the radix w=w(x,y). So, the problem here is
that of radix conversion -- we have a digit Iw
in radix w and we need a digit Iw' in radix w'.
There are several ways to do this. A conceptually
simple one is as follows: decoder extracts the
digit Iw and encodes it as digit of some mixed radix
output integer M, which serves as an accumulator or
recycler for all such 'orphaned' Iw digits. The bits
of M (which are an arbitrary bit pattern, being a
binary form of a mixed radix integer) can simply be
reused, e.g. by growing M at the unprocessed end
of the compressed input (or just having M as separate
component). At this stage the encoder would have
done the opposite - it would have "decoded" (see
file Radix.c, function dec_radix()) the far end
of the compressed data (which was an arbitrary
binary pattern) into a digit Iw in radix w and
concatenated it to the leading end of the index.
There are other similar ways to perform this
radix conversion, all of them using amount of
processing per symbol very similar to the
conventional AC algorithm. They all also have
to perform explicit coding/decoding operations
(which include mul/div) for both, the most and
the least probable symbols, just as AC does.

The summary of this is that if you want the AC
modeling plus AC coding style, you get the AC
speed and the AC accuracy. The AC scheme, with its
'single next symbol probability' bottleneck interface
between the modeler & the coder (where the modeler
micro-manages the coder, symbol by symbol, and where
the whole coder+ modeler processing and interaction
is traversed from top to bottom on every symbol)
is simply intrinsically a poor division of labor
to allow for any high performance coding.

It is analogous to organizing car manufacturing,
and requiring that the next car can be started
only after the current car is complete and out
the door. That's a kind of conceptual constraint
imposed by the AC modeling "paradigm" as its
so-called "online" coding requirement. This
online" is taken to mean some kind of analog,
memoryless, CPU-less, Morse telegraph device.

That has nothing to do with the actual online
as it is done, or any actual requirements or
inherent design constraints. One normally has
a fairly large buffer space and processor which
can access any of it, running programs of high
complexity. Internet would grind to a halt if
its protocols interpreted "online" as a constraint
to have to send or receive a single bit (or a
single symbol) at a time. Even the old style
point-to-point modems had several KB buffers
to accumulate, batch and compress the data.
And similarly for disk sectors & clusters.

The point of the above critique of the present
AC "paradigm" is that it is simply a gratuitous,
historically accidental conceptual bottleneck and
an intrinsic performance drain. Any algorithm that
follows its prescribed division of labor will bog
down. Once you snap out of its "online" spell,
many better possibilities open up. For example,
even retaining the AC modeling engine, with its
"probability of the next single symbol" bottleneck
parametrization of all the information about the
sequence being encoded, but just allowing coder to
ignore the imagined "online" constraint, one can
get much better performance with QI as follows:

b) --- QI Modeling AC style

QI breaks the probabilities into classes, so that
each class includes an interval of probabilities of
size 1/sqrt(n), where n is the size of data to be
encoded. Since coder doesn't assume any more the Morse
telegraph kind of "online", it doesn't assume n is 1
but some much larger number. The modeler is still left
to work under the old "online" spell and imagine that
it has to convert, by hook or crook, all it knows
or that it could know about the input sequence into
the probabilities for the next single symbol p(c).

Consider now a binary sequence of n symbols, for which
the modeler produces, symbol by symbol probabilities
p of bit=1, with p in some interval D=[a,b), of size
d=b-a. We divide D into s=sqrt(n) equal subintervals
of lengths d/s. Each input symbol is assigned to
one of s enumerative classes (thus enumerated by a
separate index) based on the subinterval in which
the modeler's p at that point falls in. Hence, we're
using quantized probabilities to classify the symbols
as "equiprobable". The excess output E in bits per
symbol due to this 'p quantization' is about
(cf. [41c], p. 8):

E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2)

where dp & dq are quantization errors of p and q. Since
dp & dq are =< 1/s, then E =< log(e)/2npq = O(1/n). Note
that since adaptive AC probability estimates have also
sampling error dp, dq of order 1/sqrt(n), this redundancy
is of the similar size as that of the adaptive AC. One can
further optimize this method (to reduce its worst case)
by selecting non-uniform partition of interval D, so that
the subintervals around smaller probabilities are shorter.

In practical situations, the AC modeler would be producing
its predictions p(c) based on statistics from the processed
part of the input, hence its probabilities would already
have a built in sampling error interval (which decreases
as 1/sqrt(n)), which can be used by the QI coder as the
partition criteria for the enumerative classes (instead of
an ad hoc partition described above). Various existent
methods for growing and splitting the contexts based on
such past statistics, such as CTW or Rissanen's Context,
would transfer here as methods for generating enumerative
classes adaptively.

For the multi-alphabet case one would perform the decomposition
described [T1] pp. 31-38 with the only difference that instead
of combining the symbol counts k(c) based on symbol's binary
code c, one would combine the modeler's probabilities p(c).

A special case of interest for this method are the finite
order Markov sources. Here, for order m, the probabilities
of the next symbol are defined by the m previous symbols.
For smaller m, one could simply bypass the computation
of probabilities (since QI doesn't need them) and simply
assign the enumerative class for the next input symbol
directly: using m previous symbols as the class tag
(hence there would be 2^m classes in binary case).

In this case we can notice another advantage of QI/EC
coder for modeling over the AC: to encode a symbol QI
needs to know only whether the symbol has the same or
different probabilities as some other symbols, but unlike
AC, QI doesn't also need to know what values these
probabilities have. Hence, QI places much lower demand
on the modeling engine, since the modeler here can simply
pass on to QI the context ID (the last m symbols) and
QI will code the symbol into the index for that ID,
whatever its probability may be.

In conclusion for this method (b), QI can use AC modeling
engine, with its full speed advantage over AC, and with
the redundancy being same as that of an adaptive AC in
the same setup.

> how would you use a QI coder with an adaptive model ?

Sections (a) and (b) above have couple answers.

> if I read you correctly, it is not an adaptive coder,
> so how do you transmit the model information for the
> QI coder ?

It is not "adaptive AC", but as illustrated in (a),(b)
above, it can function that way. The native QI modeling
is descriptive, in the sense of Rissanen's MDL. So the
QI model information is a much wider type of information
than just a list of probabilities (although it can be
that, too).

Consider an order-0 QI coding. The modeling analogous to
the order-0 adaptive AC, except more resilient against
the "surprises", becomes here the selection of the
segmentation of the input into contiguous sections,
based on measured symbol frequencies. Its resilience
to surprises is illustrated in the row "Vary" (cf. table
on page 10 in [T3]). The adaptive order-0 modeler
for QI has entire input sequence available and it
does not have to rely on a possibly false assumption
that the symbol frequencies in the initial parts of
the sequence are predictive of the frequencies in the
later parts, or gamble on which way might they
be predictive. While AC can code this way, too,
all it would achieve with that would be to advance
into a literal role of a less accurate and a lot
slower imitation of QI.

QI order-0 adaptive modeler identifies contiguous
quasi-stationary sections of the input sequence and
uses them as enumerative classes. There are many ways
to do such segmentation and even more ways to encode
it, along with the corresponding section counts, into
the output. Some of these methods, especially the
encoding aspect, were developed already for conventional
EC, such as described in [11]-[15], [23]. I have also
developed several for QI (some of which were touched
upon in [T2], where the general QI/EC modeling pattern
is presented, pp. 26-35).

Due to a lack practical EC coder and the exaggerated
dominance of the AC modeling paradigm (hypertrophied
to the point of pathology by the absence of practical
competition), this entire field of EC modeling is
highly under-explored. With the precision & performance
problems finally solved by QI, an algorithmic gold mine
has opened, where just about anything you do, and there
is more to do than an eye can see, is a new algorithm,
maybe a great new discovery to be taught to kids ever after.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

28. L.D. Davisson "Universal noiseless coding"
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973
http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf

33. R. Krichevsky, V. Trofimov
"The performance of universal encoding"
IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981
http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf

41. M.Drmota, H-K. Hwang, W. Szpankowski
"Precise Average Redundancy of an Idealized
Arithmetic Coding" DCC 2002, 222-231.
http://citeseer.ist.psu.edu/drmota02precise.html

34. J.G. Cleary, I.H. Witten
"A Comparison of Enumerative and Adaptive Codes"
IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984
http://www.1stworks.com/ref/Cleary84Enum.pdf

F. Farey Fractions:
http://www.cut-the-knot.org/ctk/PickToFarey.shtml

41c. P.G. Howard, J.S. Vitter
"Practical Implementations of Arithmetic Coding"
Tech. Rep. No. 92-18, CS, Brown University, 1992
http://www.1stworks.com/ref/Howard92PractAri.pdf

Willem

unread,

Jan 7, 2006, 12:55:21 PM1/7/06

to

nightlight wrote:
) > if I read you correctly, it is not an adaptive coder,
) > so how do you transmit the model information for the
) > QI coder ?
)
) It is not "adaptive AC", but as illustrated in (a),(b)
) above, it can function that way. The native QI modeling
) is descriptive, in the sense of Rissanen's MDL. So the
) QI model information is a much wider type of information
) than just a list of probabilities (although it can be
) that, too).
)
) <snip theoretical discussion>

I have read through your entire description, and I haven't found even
a single hint to the practical question of what exactly the QI coder
needs as modeling information to do its job.

Can I assume that 'enumerative coder' means that you need the exact
symbol counts for each of the subclasses ? And if not, then what do
you need ? Please try to explain this as simply as possible, within
a single paragraph.

David A. Scott

unread,

Jan 7, 2006, 1:43:18 PM1/7/06

to

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:SY6dnUrMLIWIZyLe...@rcn.net:

> > This analysis is no different from any other analysis
> > in that you have to make lots of assumptions. This
> > means that if you use such an analysis to make
> > real-world predictions, then that depends
> > on how well your assumptions match the real world.
>
> I see now where you're misreading my redundancy
> statements in this and in the earlier thread.

Actually I think he is reading them correctly he
has an understanding of arithmetic coding its not clear
you do.

...

> Now to your main questions, starting with the last one,
> which is the most specific:
>
> > assuming I have a stream of symbols, where at each
> > position in the stream, the probability distribution
> > of the symbols is different, then how does QI coder
> > adapt itself to all those different distributions ?
>
> This is the scenario of AC modeling engine feeding QI,
> which was sketched in note N7 on p. 9, [T3]. Two ways
> are described there:
>
> a) --- QI coding AC style

....

>
> The summary of this is that if you want the AC
> modeling plus AC coding style, you get the AC
> speed and the AC accuracy. The AC scheme, with its
> 'single next symbol probability' bottleneck interface
> between the modeler & the coder (where the modeler
> micro-manages the coder, symbol by symbol, and where
> the whole coder+ modeler processing and interaction
> is traversed from top to bottom on every symbol)
> is simply intrinsically a poor division of labor
> to allow for any high performance coding.
>

I gues that means it doesn't work so hot, Took you
a long time to state it. So you are admiting the
arithmetic may beat the QI when you attempt to shoe
horn in your QI method where a arithmetic might be
a natural fit. No wonder you changed Moffat to what
ever porblem your doing instead of the other way around.

...

>
> b) --- QI Modeling AC style
>

...

> QI breaks the probabilities into classes, so that
> each class includes an interval of probabilities of
> size 1/sqrt(n), where n is the size of data to be
> encoded. Since coder doesn't assume any more the Morse
> telegraph kind of "online", it doesn't assume n is 1
> but some much larger number. The modeler is still left
> to work under the old "online" spell and imagine that
> it has to convert, by hook or crook, all it knows
> or that it could know about the input sequence into
> the probabilities for the next single symbol p(c).
>
> Consider now a binary sequence of n symbols, for which
> the modeler produces, symbol by symbol probabilities
> p of bit=1, with p in some interval D=[a,b), of size
> d=b-a. We divide D into s=sqrt(n) equal subintervals
> of lengths d/s. Each input symbol is assigned to
> one of s enumerative classes (thus enumerated by a
> separate index) based on the subinterval in which
> the modeler's p at that point falls in. Hence, we're
> using quantized probabilities to classify the symbols
> as "equiprobable". The excess output E in bits per
> symbol due to this 'p quantization' is about
> (cf. [41c], p. 8):

Well it likes like your having trouble since you need to
break it up into smaller chunks. Thats to bad since a real
entropy compressor would take a file of millions of symbols
and still compress to roughly the same length no matter how
the symbols arranged. Form what you wrote it seems like your
only doing local considerations.

...
...

Again this show your lack of understanding just
what an order-0 adaptive AC coder does. If a file is made up
of 2 symbol types. And one wanted a to compress it by
the above method it would get the same length no matter
how things ordered. You could have all the zeros then the ones
or any combination you would get roughly the same length file.
In fact even in you own exaples of small strings you use the
fact to get a combination number for each arrangemetn based
entirely on the length of string and number of ones. That is
what the adaptive order-o arithemtic coder would do. If you
have to segment the file then local effects take over and you
would not get the same compression for different combinations
of the ones and zeros. The order if the symbols makes no
difference.

...

Here is a question I doubt you will anwser since you really
don't seem to answer anything. But I leave so others can see
just how your code can't solve simple problems.

I go back yo your on example with a string where you use QI
in lacttice jumps to fet an index. This truely is about as
easy as it gets. Something even you could but your fingers around.
You claim you need only three things.
a) this combination index
b) the number of ones in the string
c) the length of string

How would you combine this infromation into a string so decompresstion
could be done. This is something even you should be able to do.
Yet you will not. You may give several pages of references but
you will not give one complete example.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 7, 2006, 2:32:11 PM1/7/06

to

> Any decent AC does *not* lose parts of its coding
> interval each step. Try to get that through your
> head. It *is* possible (hell, it's quite easy) to
> get the intervals to line up exactly without
> infinite precision.

It seems you and few others here are either restricting
the noun "gap" to a subset of cases where gaps do occur,
or you are making certain implicit assumptions about
the source and the sequences it produces, without being
aware of making them.

In order to clarify which is the case here, why don't
you demonstrate your easy and gapless finite precision
binary AC coding for the finite input sequences produced
by a binary source S3, which satisfy the following
conditions:

a) Each sequence from S3 is exactly 3 bits long,

b) each sequence has exactly two bits=0 and one bit=1 and

c) each sequence is equally likely as any other.

How does your easy gapless AC coding for all possible
input sequences from S3 look like? Since the inputs
are pretty short, you should be able to show it all
using an AC working in 4 bit precision. But if you
need more, well use more (as long as the bit strings
fit on a 72 char line).

NOTE 1: The S3 output needs to be coded using a binary
order-0 AC in incrementing or decrementing or static mode,
which codes bit by bit as if the sequences have arbitrary
lengths. Namely, the short sequences are given above
only to allow you to show explicitly the coding, not
to allow switching over to let you construct the small
Huffman tree for the entire input (or some equivalent
ad hoc codes).

For example, the exactly same coding algorithm should be
demonstrable (e.g. as a little sample exe coder that
will work on a regular PC with, say, 1G of RAM) with
inputs of, say, exactly 30 million bits long with exactly
10 million bits=1 set. In this case you couldn't code it
via the multi-alphabet AC which treats the entire input
as a single character of a gigantic alphabet. Hence,
you can't use any such methods for S3 either. So,
just show it for a plain bitwise binary AC.

NOTE 2: You can't feed your AC input sequences with
0, 2 or 3 ones in order to fill the gaps and declare
them as "used". The S3 specified does not produce
any such sequences.

nightlight

unread,

Jan 7, 2006, 3:57:34 PM1/7/06

to

Can you email me a more detailed list of
changes (or headers etc) and fixes so I
can fix the copy on the web site?

Microsecond timer ought to be Ok (the windows
hi res is about 1/4 microsecond). Those test
loops need to be decoupled to measure separately
encode and decode times, and do the correctness
check outside of that, reseting the generator
to run 3 times with the same seed (it already
has a spare copy of original table & function
to do it). That way, even the time() would allow
accurate speed with enough iterations.

> what is all that output supposed to mean,
> or more to the point, what do I do to get
> efficiency statistics, elapsed time,
> compressed size, that sort of stuff.

The readme.txt explains the main information commands,
such as ct (for tables statistics), cbr (binomial
rows full detail, code space gaps, redundancies, roundoffs,
mantissa excess, etc). The binary coding commands ci
and cc have very similar output, so only cc is described
in detail in the readme. The radix codes (fixed, mixed
and factorial) commands cr, cf, cm, show the same info
as cc, except that counts of 1's is replaced by radix.
All times are given at the end of each line as ns/sym,
separately for encoder and decoder.

> another probem with your code is that as it stands it
> seems that it only tests your algorithm's ability
> to compress pseudo-random data...
> pseudo-random data is theorietically extremely
> compressible.

The coders don't assume or look for a way to model the
random generator. If this were just executable demo,
than one could think of some cheating like that. But
with source, it should be clear that the coders are
just plain 0-order coders and nothing more.

Adding a modeling engine to do the general file
compression or such is one of the items that will
get in eventually. The present source is only a
research code for the few folks interested in
exploring the algorithm and EC coding/modeling
potential. Having played quite a bit with the
algorithm and seen the advantages over the
alternatives, the possibilites it opens in coding
and modeling, I think that in few years this
algorithm and its ofshoots will be running in
every portable device and serve as the bitmap
index and keyword incidence map coder in all the
search engines and data-warehouses. At the moment,
though, the number people who share this view
fits in about 2.3219... bits.

Willem

unread,

Jan 7, 2006, 4:24:11 PM1/7/06

to

nightlight wrote:
) It seems you and few others here are either restricting
) the noun "gap" to a subset of cases where gaps do occur,
) or you are making certain implicit assumptions about
) the source and the sequences it produces, without being
) aware of making them.

Neither. See below.

) In order to clarify which is the case here, why don't
) you demonstrate your easy and gapless finite precision
) binary AC coding for the finite input sequences produced
) by a binary source S3, which satisfy the following
) conditions:
)
) a) Each sequence from S3 is exactly 3 bits long,
)
) b) each sequence has exactly two bits=0 and one bit=1 and
)
) c) each sequence is equally likely as any other.
)
) How does your easy gapless AC coding for all possible
) input sequences from S3 look like? Since the inputs
) are pretty short, you should be able to show it all
) using an AC working in 4 bit precision. But if you
) need more, well use more (as long as the bit strings
) fit on a 72 char line).
)
) NOTE 1: The S3 output needs to be coded using a binary
) order-0 AC in incrementing or decrementing or static mode,
) which codes bit by bit as if the sequences have arbitrary
) lengths. Namely, the short sequences are given above
) only to allow you to show explicitly the coding, not
) to allow switching over to let you construct the small
) Huffman tree for the entire input (or some equivalent
) ad hoc codes).
)
) For example, the exactly same coding algorithm should be
) demonstrable (e.g. as a little sample exe coder that
) will work on a regular PC with, say, 1G of RAM) with
) inputs of, say, exactly 30 million bits long with exactly
) 10 million bits=1 set. In this case you couldn't code it
) via the multi-alphabet AC which treats the entire input
) as a single character of a gigantic alphabet. Hence,
) you can't use any such methods for S3 either. So,
) just show it for a plain bitwise binary AC.
)
) NOTE 2: You can't feed your AC input sequences with
) 0, 2 or 3 ones in order to fill the gaps and declare
) them as "used". The S3 specified does not produce
) any such sequences.

From this note, I assume the model is allowed to 'know'
that the sequence will have a single '1' and two '0' bits,
and update its probabilities accordingly.

I'll sketch out an AC encoding, that should be enough:

The starting range is [0..256)

The steps for encoding the sequence '100' are:

Step 1: Encode the symbol '1'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [170.255)

Step 2: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[170..255) for a '0', and [255..255) for a '1'.
Thus, the range is reduced to [170.255)

Step 3: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[170..255) for a '0', and [255..255) for a '1'.
Thus, the range is reduced to [170.255)

The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '1'.
The probabilities are 1/2 for a '1' and 1/2 for a '0'.
Therefore, the range is subdivided as follows:
[0..85) for a '0', and [85..170) for a '1'.
Thus, the range is reduced to [85.170)

Step 3: Encode the symbol '0'.
The probabilities are 0 for a '1' and 1 for a '0'.
Therefore, the range is subdivided as follows:
[85..170) for a '0', and [170..170) for a '1'.
Thus, the range is reduced to [85..170)

The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
The probabilities are 1/3 for a '1' and 2/3 for a '0'.
Therefore, the range is subdivided as follows:
[0..170) for a '0', and [170..255) for a '1'.
Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '0'.
The probabilities are 1/2 for a '1' and 1/2 for a '0'.
Therefore, the range is subdivided as follows:
[0..85) for a '0', and [85..170) for a '1'.
Thus, the range is reduced to [0.85)

Step 3: Encode the symbol '1'.
The probabilities are 1 for a '1' and 0 for a '0'.
Therefore, the range is subdivided as follows:
[0..0) for a '0', and [0..85) for a '1'.
Thus, the range is reduced to [0..85)

As you can see, the three possible sequences each lead to a range of
approximately the same size, and the ranges of all possible sequences,
when put together, form the starting range without any gaps.

This scheme easily scales up to sequences with millions of bits.

There is a coding gap caused by having to terminate the encoder, but that
is a fixed cost of two or three bits, not a loss at each step as you have
claimed. Furthermore, this termination cost can be avoided as well, with
careful handling.

I hope to have cleared up your misgivings about Arith Coding with this.

Phil Carmody

unread,

Jan 7, 2006, 5:18:07 PM1/7/06

to

Willem <wil...@stack.nl> writes:
> ) a) Each sequence from S3 is exactly 3 bits long,

> There is a coding gap caused by having to terminate the encoder,

Not in this particular case.

Phil
--
What is it: is man only a blunder of God, or God only a blunder of man?
-- Friedrich Nietzsche (1844-1900), The Twilight of the Gods

David A. Scott

unread,

Jan 7, 2006, 6:19:59 PM1/7/06

to

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:HaudnSFNItB...@rcn.net:

If you have a sequence that is defined by 3 bit strings you in
effect have 3 input symbols. With an adaptive coder you don't
need to know in advance that each of the 3 symbols is equal likely
but what the hell lets play the game and compress with a coder the
likes of arb2x or arb255 the only difference being you need two cells
one cell is split equal in half the other is split 1 to 2. Note
this split is not perfect but when one one is useing 64 bit registers
the split is dam close and no gaps.

To explain what is happening lets use 1 bit registers the first
splits so that a one is output when you have the 100 token
the next is split 50 50 so the mappings would be
100 goes to 1
010 goes to 01
001 goes to 00
in fact this is what you would get in huffman case. And you can see
that since its a complete tree any sequence of 1 and 0's on
decompression gives you any of the 3 symbol sequences no gaps.

You might complain in that if you had 30 million bits from
the source your compressing that the compressed length would
vary from 10 million bits in the unlikely event you have 10
million 100's being output even though it should have been
only occuring roughly 1/3 of the time. The compressed length
could also be as long as 20 million bits if the 100 never occured
at all highly unlikely. However using the huffman apprpximation
the average compressed length would be 16.666... million bits.

When you go to more bits it compresses better again no gaps
The ideal length for the 30 million bit sequence is which is
actually 10 million times -lg(1/3) which is roughly
15849625.0072115618145373894394782 bits for the 10 million.
while if using 64 bit register
its - lg (6148914691236517204/18446744073709551616) for 100
which is roughly
15849625.0072115618176657356346099 and slightly less
for the other 2 symbols. In short there would be no way
to tell your not actually using 1/3 and there would be no
gaps. Since you would need 15849625 bits give or take a
bit to express the compression.

If you have high enough interger precession math you could
calculate the index. However sooner of later when you do compression
you have to right that number out as ones and zeros. You gain
nothing in space savings by have the exact number since the
airhtmetic compression has no gaps. Sure you might be one bit
shorter for some combinations but then you will be a bit longer
for otherse. Its just a plain fact the bijective arithmetic
can do this kind of compression in an optimal way. It not
clear you understand how to do this with your method in general.

and

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

Jasen Betts

unread,

Jan 7, 2006, 11:23:27 PM1/7/06

to

["Followup-To:" header set to comp.compression.]

On 2006-01-07, nightlight <nightlight...@and-this.omegapoint.com> wrote:
> Can you email me a more detailed list of
> changes (or headers etc) and fixes so I
> can fix the copy on the web site?

I'm not sure that it's actually working, but I'll email
you a diff in two (formats machine readable and human
readable in case the tools you have can't handle the
former).

I haven't implemented kbhit(), so that bit isn't working yet.
is it used inside any timing loops or can I do an inefficient
implementation.

> Microsecond timer ought to be Ok (the windows
> hi res is about 1/4 microsecond). Those test
> loops need to be decoupled to measure separately
> encode and decode times, and do the correctness
> check outside of that, reseting the generator
> to run 3 times with the same seed (it already
> has a spare copy of original table & function
> to do it). That way, even the time() would allow
> accurate speed with enough iterations.

You didn't consider wasting a little memory by decoding
into a third buffer?

> > what is all that output supposed to mean,
> > or more to the point, what do I do to get
> > efficiency statistics, elapsed time,
> > compressed size, that sort of stuff.
>
> The readme.txt explains the main information commands,

not in terms that I understand.

> Adding a modeling engine to do the general file
> compression or such is one of the items that will
> get in eventually.

I feel that it would make it easier to understand, but
I haven't tried to uderstand the code much more than
to get it to compile.

> I think that in few years this
> algorithm and its ofshoots will be running in
> every portable device and serve as the bitmap
> index and keyword incidence map coder in all the
> search engines and data-warehouses. At the moment,
> though, the number people who share this view
> fits in about 2.3219... bits.

:)

Bye.
Jasen

David A. Scott

unread,

Jan 8, 2006, 4:20:37 PM1/8/06

to

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:HaudnSFNItB...@rcn.net:

> In order to clarify which is the case here, why don't

Nightlight here is what I am willing to do so others
and maybe even you can see the difference. This is
basically your example. But as usually you are not clear
enough. I would like anybody to be to test the code.

1) So first of all the input for the compressor has to be
files of ascii characters in multiples of 3 namely
100 or 010 001 that way the file will be exactly a multiple
of 3 bytes where only those 3 combinations allowed. So file
can be short as 3 byte or millions of bytes long but also
a multiply of 3.

2) I will map that bijectively to a packed binary file you
really don't need the format here but its for the arithmetic
coder I don't wish to mod arb255 very much so that it easy
for you and other to follow the changes I but in.

3) The bijective arithemtic coder will compress with no gaps using
the fixed and unchanging wieghts of as close to 1/3 that it can
get. ( One could design a cusom way but hay powers or 2 good enough)
The output will be an binary file where each of the 3 sequences maps
to roughly 1.5849625 bits per use.

4) The reverse is all so true. Take any file and do the reverse of
3 2 and 1 above and you get a unique sequemce of type 1 such that
if X is an ascii file in begining then steps 2 and 3 would be
compression and reverse of 3 and revese of 2 would be uncompression.
if Y is any file period then:
compress( uncompress (Y)) = Y
and
uncompress( compress(X)) = X

Even though this is your example not sure you can do this with your
method. It not even ideal for the arithmetic but even so it would be
trival to mod arb255 to do this.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 8, 2006, 4:27:25 PM1/8/06

to

First, few minor problems with your answer:

> The starting range is [0..256) ...

> Therefore, the range is subdivided as follows:

> [0..170) for a '0', and [170..255) for a '1'. ...

You start with full "range" [0..256), but then you switch
the endpoints from there on to [*..255). You also meld together
two mutually exclusive notations, the Pascal range A..B
with the semi-open interval math notation [A,B) for a "range".
The Pascal "range" A..B includes into the interval both points
A and B, while the semi-open interval [A,B) doesn't include B.
That notational meld of two mutually exclusive prescriptions
was probably what led to your endpoint mixup 256 vs. 255. The fact that
255 is divisible by 3, while 256 is not gave it an extra wishful nudge
to go with the 255. I will switch to a single coherent notation, [A,B),
hence use e.g. [*,256).

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.

The AC doesn't normally build the complete, ready to go output code on
each step. Part of the output code is implicit in the coder's state
variables and the implementation conventions. If at any point the coder
is told to wrap it up and provide output code, the excess of 1-2 bits on
that total output code will be in that output in the case of an infinite
precision AC coder (ACX). This excess will be 2-3 bits in the case of a
finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on
the total output are part of the AC output throughout -- they are only
represented differently at different stages (implicit in the algorithm
conventions and the internal state variables until the last step, at
which point the output code creation is finalized and they become
explicitly part of that output). Hence, at best your argument above is
vacuous (arguing about ambiguities in verbal conventions), and at worst
it is incorrect. I will grant you the benefit of the best case.

Well, on the positive side, at least you recognize that these excess
bits result in a "coding gap", which they do, of course.

There is an additional and unrelated misconception revealed in the last
part of that sentence:

> ... but that is a fixed cost of two or three

> bits, not a loss at each step as you have claimed.

An infinite number of additive contributions can easily produce a sum
with fixed upper bound, or within some fixed range, such as 1-2 excess
for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the
partial sums after 2 terms are between 1.33 and 1.5. Hence, that the
cost is "fixed" (or rather the cost interval is fixed) gives you no
information on how many steps contributed to the cost.

> Furthermore, this termination cost can be avoided as
> well, with careful handling.

Not in the general AC coding setup. In a special cases, e.g. when the
number of symbols is prescribed and known upfront to encoder and
decoder, you can avoid, at least part of it. In a general AC setup,
without any such "side information" provided, the 1-2 bit excess for ACX
(or 2-3 for ACF) is a part of the standard ACX code length (needed to
select unambiguously the final codeword interval):

L = ceiling(log(1/Pc)) + 1 .... (1)

whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability'
(the probabilities you compute with in AC) of the complete message. For
a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where
p=AC's probability of 1's, k=count of 1's, n=count of all symbols.
Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman
or exact EC code, where the +1 is absent and where the upper bound is 1,
resulting solely from rounding up of the log(1/Pc) to the next whole bit).

Now to the fundamental problem of your answer -- the complete and
hopeless conceptual meld of the three distinct concepts of "interval"
relevant in arithmetic coding. Hence, before we can head toward any
approximation of a coherent discussion, we need a finer resolution vew
of these three kinds of "intervals":

1. IVF ==> the coder's internal variables in finite precision specifying
finite precision intervals (these are your [0,85),
[85,170),... etc. variables).

2. IVX ==> the coder's internal variable specifying an infinite
precision interval (this type is implicit in your description, when you
are referring to your intervals as approximate). While one can't store
infinite precision binary fraction in an internal variable, one can
easily store an equivalent rational fractions with unlimited precision
integers for numerators and denominators (e.g. as done in [41a] pp.
13-14, 46).

3. CWI ==> Codeword interval - this is the interval defined by the AC
codeword (the explicit codeword ready for output or the implicit
codeword which could produced at each step). The AC codeword, implicit
or explicit, is a binary string C of finite length L (for ACX L is given
via (1)). The usual AC convention is to interpret C as representing
fractional binary digits of some rational number Z, defined as: Z = 0.C.
For example, if the AC codeword is C=1010 then L=4, Z=0.1010 =
1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of codewords C to intervals
CWI is used more generally than just for AC analysis e.g. in the Kraft
inequality proofs (cf. Cover & Thomas IT textbook [24]). The common
convention for this mapping is as follows ([24] eq. (24) which I use
below as (2); also [41a] pp. 13-14 & eq. (1.7)):

CWI = [Z, Z+G) .... (2)

where the "granularity" G (=the interval length) is defined as:

G=1/2^L .... (3)

Some examples of codewords C and their corresponding intervals CWI are
shown below:

C ... CWI
-------------------------
0 ... [0, 0.5)
1 ... [0.5, 1)
00 ... [0, 0.25)
01 ... [0.25, 0.5)
10 ... [0.5, 0.75)
11 ... [0.75, 1) ... etc.
-------------------------

The soft & fuzzy pedagogical descriptions of the AC encoding algorithm
usually describe construction of the nested IVX or IVF intervals from
the input sequence, then state that the coder transmits binary digits of
some point Z from within the final IVX or IVF interval as a
specification for that interval and they'll prescribe that the number of
digits should be given via (1) (sometimes omitting the +1 term). But,
the point Z given with L fractional bits C, can't all by itself specify
even the IVF, let alone IVX.

The only interval that Z can specify unambiguously, and only if further
supplemented with a precise convention, such as eqs. (2) and (3), is the
interval CWI. The higher resolution descriptions (such as [41a] pp.
13-14, 41-48) explain and derive code length (1) -- its origin is in the
requirement (in ICW terms used above) that the interval ICW has to fit
within the coder's final "internal variable" interval IVX or IVF (cf.
eq. (1.7) in [41a]).

We can look now at a set {C} = {C1,C2,...} containing all possible
codewords that an AC can produce from a given source (this may be an
infinite set). The corresponding intervals CWI(C1), CWI(C2),... form a
set {CWI(C)} of non-overlapping intervals (consequence of the AC output
decodability). Since
all CWI intervals fit within [0, 1), the non-overlapping property
implies that their lengths G(C1), G(C2),... can add up to at most 1, i.e.

G(C1)+G(C2)+G(C3)+... =< 1 ... (4)

The eq. (4), recalling the definition (3) of G's, is the Kraft
inequality which must hold for any uniquely decodable codes (cf. [24] p.
90, McMillan theorem), including our set {C}.

The concept of "coding gaps" (in code space or in code intervals), which
is our topic here, is normally used in the context of Kraft inequality
(4) as follows: The codes for which (4) holds as the exact equality are
called "compact" codes and such codes don't have "coding gaps". The
codes for which (4) holds only as the "<" part of the "=<" inequality
have "coding space gaps" or "coding gaps" (unused codes). The term
"interval gap" simply refers to these same gaps in terms of the CWI
intervals whose lenghts L(C1), L(C2)... are used in (4).

Even in the low res analysis you apparently did manage somehow to catch
a vague glimpse, however briefly, of what these gaps refer too, since
you said:

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three

> bits, ...

Clearly, you are realizing above that adding 2 or 3 bits as a
termination cost to the codewords, does create gaps somehow and
somewhere. But since that didn't seem to fit into the your
conceptual meld in which all three interval types are just the
"interval" and where you can make your internal variables interval IVF
fit together as tightly as you wish, you couldn't quite see how could
the "interval^3" have this gap. Now, with the three higher res interval
concepts in front of you, you can look at (3) and see that adding 2 or 3
to L will make the corresponding CWI interval length G smaller by 4 to 8
times, hence a pretty big gap will be opened in the CWI coverage (4) of
the interval [0,1) (since 75-87 percent of that CWI will turn into the
unused code space, the interval gap).

But seeing in it the low res back then, you concluded:

>... but that is a fixed cost of two or three

> bits, not a loss at each step as you have claimed.

Now look at (1) which shows how the L is computed -- the smaller the
probability Pc of the whole symbol sequence, the larger the L in (1)
will be. But Pc in (1) is the coding probability for the particular
sequence e.g. for order 0 model (as we used), Pc is a series of
truncated products of probabilities p & q of individual symbols
encountered. Now what happens if you have the actual p=1/3, which is an
infinite binary fraction 1/3=0.01010101... and use instead p'=0.010 for
AC coding? As you probably know from basic coding theory, the greater
the deviation of p' from the actual probability p, the longer on average
the codeword length L. In turn, the increase in avg. L shortens the
average CWI interval length G via (3), contributing to the increase in
CWI interval gap (which is the shortfall from 1.0 in Kraft inequality
(4)). Note also that the average increase in L is the result of
accumulation of these contributions along the way from all single coding
step coding probability deviations from the actual probability at that step.

The exactly same coding gaps happen with the QI coder due to its finite
precision. Since QI's construction of interval lengths (eq. (21) p. 8 in
[T3]) is bottom up, the accumulation is much more transparent here. You
can even see it as it accumulates step-by-step using the QI.exe program,
option "cbr" (to show complete binomial row; set also option n50 since
default is n=1024 which would scroll off). The QI errors are due to
rounding up to g (g=32) bit SWI mantissa in (21). The "cbr" command
shows all quantized SWI binomials in row n, with asterisk next to those
which were rounded up when computed via recurrence (21), which here, for
the binary coder, is simply the Pascal triangle recurrence: C(n,k) =
C(n-1,k) + C(n-1,k-1). The columns labeled dm16 and dm10 show how much
this increments of mantissa have accumulated over all such adds (for all
prior n). The "Extra Bits" column shows how much these dm contrinbute to
excess bits (usually 1/10^9). If the mantissa were shorter, or if n is
very large, these accumulations of increments would eventually cause
mantissa to overflow, which would then increase the SWI exponent by 1,
hence lenghtening the output by 1 bit.

Unlike QI, AC uses multiplicative binomial recurrences, described in
earlier post [P1], eq (1), (2). Even more unlike QI, which does this
only during the table construction (which is a universal table, the same
table for all source probabilities, hence it can be premade & loaded
from a file when the program starts), AC computes them from scratch on
every coding task and for every symbol along the way. The final
difference from QI, is that AC works top down, from the largest binomial
or addend (point B on Fig 1 in [T3], see also [T2] pp. 19), which in AC
is scaled to 1, and all other addends are smaller than this since
they're all divided by this largest binomial (which is at the endpoint
of the path, cf [T2] pp. 19-25 for details on this very close
correspondence between the two coders). Hence, the rounding down
arithmetic of AC works by fixing the starting binomial to 1 (or an
addend generally) and reducing the size of the lower order ones,
resulting at the end in smaller Pc in (1), thus longer output L. With QI
which computes from lower to higher binomials, the higher binomials are
incremented when rounding (since SWI rounding in (21) is up). So, while
QI's rounding up errors accumulate toward the larger addends making them
larger, the AC's rounding down errors accumulate towarad the smaller
addends making them smaller, which it eventually pays for via (1) in 1-2
excess bits for ACX or 2-3 excess bits for ACF (note that for AVF, the
excess can go beyond 2-3 if the input gets too long for the frequency
counters).

As to your IVF intervals fitting perfectly together, that is irrelevant
for the gaps in the ICW intervals (which are the intervals defined by
the codeword C, the quantity that is transmitted and to which (4), the
gaps, the unused codes & the redundancy apply). IVFs are your internal
variables and you're managing them as you wish (e.g. what to do with
their fit when you need to drop their digits in finite precision). If
you want to fit the IVFs together, all tight and snuggly, thats' fine
and you are free to do so, but that won't make the gaps in the CWI go
away or average L become shorter. For example, you could reduce your AC
precision to 4 bits and your avg. excess bits, the unused code space and
the CWI interval gaps will baloon (see table 3.4 p. 52 in [41a] for
effects of the 4 bit AC precision), yet all your internal variables IVFs
are still happily covering the entire interval [0,1), as snuggly as
ever, with no gaps in sight.

You're welcome, of course, to write out all the codes AC will produce on
that example (under conditions described i.e. no special ad hoc code
tweaks for the 3 short strings; or just run some AC, in decrementing
mode as you did in your description) and show how the CWI intervals for
them fit gaplessly or equivalently, show how these actual AC codes use
entire output code space without "coding gaps" (and, obviously, without
contradicting your earlier statement on "coding gaps" due to 2-3 bit
termination overhead).

> I hope to have cleared up your misgivings about Arith
> Coding with this.

Well, I hope yours will clear up, too :)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

24. T.M. Cover, J.A. Thomas
"Elements of Information Theory" Wiley 1991

P1. Earlier post on AC multiplicative recurrences:

http://groups.google.com/group/comp.compression/msg/71847a32daa9a571

David A. Scott

unread,

Jan 8, 2006, 4:55:55 PM1/8/06

to

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:yIGdndvhfe_GGVze...@rcn.net:

First some minor problems with you whole line of thought.
We can easily code this so most of your anwser has nothing
to do with reality. Second just what you want coded is unclear
I have tried to ask for enough details so others can look at
real test results instead of the long rants. Third are you
going to write code that would work on real files for this
example or is that to hard even though you designed this
yourself to make AC look bad you your stuff good?

> > There is a coding gap caused by having to terminate
> > the encoder, but that is a fixed cost of two or three
> > bits, not a loss at each step as you have claimed.
>
> The AC doesn't normally build the complete, ready to go output code on
> each step. Part of the output code is implicit in the coder's state
> variables and the implementation conventions. If at any point the
> coder is told to wrap it up and provide output code, the excess of 1-2
> bits on that total output code will be in that output in the case of
> an infinite precision AC coder (ACX). This excess will be 2-3 bits in
> the case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3
> excess bits on the total output are part of the AC output throughout
> -- they are only represented differently at different stages (implicit
> in the algorithm conventions and the internal state variables until
> the last step, at which point the output code creation is finalized
> and they become explicitly part of that output). Hence, at best your
> argument above is vacuous (arguing about ambiguities in verbal
> conventions), and at worst it is incorrect. I will grant you the
> benefit of the best case.
>
> Well, on the positive side, at least you recognize that these excess
> bits result in a "coding gap", which they do, of course.
>

Actually thats not entirely true. As one does a full bijective
arithemtic file encoder for this you write out only what is done when
compressing you realize for each or the 3 possible output you need
roughly 1.58 bits So naturally the coder is carrying the excess around
till the input terminates. On termination some times you round up
sometimes you round down. The interesting thing is this.
Take the output file from a long set of inputs but change the last
byte of the compressed file. In fact change it so that you have 256
files each with a different last value. In turn each of these files
will decompress back to a valid input that if recompressed goes back
to same unique compressed file. THER ARE NO GAPS. Can you do this with
your method. I don't think so.

REST OF USELESS RANT CUT

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 8, 2006, 4:56:53 PM1/8/06

to

--- Errata:

The citation:

> (cf. Cover & Thomas IT textbook [24]). The
> common convention for this mapping is as follows
> ([24] eq. (24)

should have in the last line:

([24] eq. 5.11 p. 84

Willem

unread,

Jan 8, 2006, 6:18:30 PM1/8/06

to

nightlight wrote:
) First, few minor problems with your answer:
)
) > The starting range is [0..256) ...
) > Therefore, the range is subdivided as follows:
) > [0..170) for a '0', and [170..255) for a '1'. ...
)
) You start with full "range" [0..256), but then you switch
) the endpoints from there on to [*..255). You also meld together
) two mutually exclusive notations, the Pascal range A..B
) with the semi-open interval math notation [A,B) for a "range".

Semantics, and a typo. I trust it was obvious what I meant.

) > There is a coding gap caused by having to terminate
) > the encoder, but that is a fixed cost of two or three
) > bits, not a loss at each step as you have claimed.
)
) The AC doesn't normally build the complete, ready to go output code on
) each step. Part of the output code is implicit in the coder's state
) variables and the implementation conventions. If at any point the coder
) is told to wrap it up and provide output code, the excess of 1-2 bits on
) that total output code will be in that output in the case of an infinite
) precision AC coder (ACX). This excess will be 2-3 bits in the case of a
) finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on
) the total output are part of the AC output throughout

This does not follow. As I see it, there are no excess bits in an Arith
Coder, until the coder is told to wrap it up. Can you give some clear
arguments why you claim that the bits added by termination are present
before termination ?

To any sane person, the phrase 'a loss at each step' implies that this
loss will grow as the number of steps increases.

) Well, on the positive side, at least you recognize that these excess
) bits result in a "coding gap", which they do, of course.

A coding gap of a whopping two bits. If your claim is that AC is worse
than QI because of those two bits, I laugh in your face.

I have snipped your whole discussion on those two or three bits because
they are irrelevant, and your claim that a cost that does not grow as
the number of steps increases is nevertheless incurred at each step is
ridiculous and arbitrary.

) Now to the fundamental problem of your answer -- the complete and
) hopeless conceptual meld of the three distinct concepts of "interval"
) relevant in arithmetic coding. Hence, before we can head toward any
) approximation of a coherent discussion, we need a finer resolution vew
) of these three kinds of "intervals":

I have snipped your entire discussion below, because in the end it still
boils down to the fixed cost of two or three bits of excess.

) You're welcome, of course, to write out all the codes AC will produce on
) that example (under conditions described i.e. no special ad hoc code
) tweaks for the 3 short strings; or just run some AC, in decrementing
) mode as you did in your description) and show how the CWI intervals for
) them fit gaplessly or equivalently, show how these actual AC codes use
) entire output code space without "coding gaps" (and, obviously, without
) contradicting your earlier statement on "coding gaps" due to 2-3 bit
) termination overhead).

As evident from your conclusion. As I said before, I will now laugh
in your face. You have written page upon page of explanation, all of
which in the end revolved around a fixed coding excess of a few bits.

nightlight

unread,

Jan 9, 2006, 12:56:27 AM1/9/06

to

> This analysis is no different from any other analysis
> in that you have to make lots of assumptions. This
> means that if you use such an analysis to make
> real-world predictions, then that depends
> on how well your assumptions match the real world.

I see now where you're misreading my redundancy

d(g) < log(e)/2^(g-1) .... (1)

Now to your main questions, starting with the last one,

which is the most specific:

> assuming I have a stream of symbols, where at each
> position in the stream, the probability distribution
> of the symbols is different, then how does QI coder
> adapt itself to all those different distributions ?

This is the scenario of AC modeling engine feeding QI,
which was sketched in note N7 on p. 9, [T3]. Two ways
are described there:

a) --- QI coding AC style

QI can code the AC style by performing "Lattice

The summary of this is that if you want the AC

modeling plus AC coding style, you get the AC
speed and the AC accuracy. The AC scheme, with its
'single next symbol probability' bottleneck interface
between the modeler & the coder (where the modeler
micro-manages the coder, symbol by symbol, and where
the whole coder+ modeler processing and interaction
is traversed from top to bottom on every symbol)
is simply intrinsically a poor division of labor
to allow for any high performance coding.

It is analogous to organizing car manufacturing,

b) --- QI Modeling AC style

QI breaks the probabilities into classes, so that

each class includes an interval of probabilities of
size 1/sqrt(n), where n is the size of data to be
encoded. Since coder doesn't assume any more the Morse
telegraph kind of "online", it doesn't assume n is 1
but some much larger number. The modeler is still left
to work under the old "online" spell and imagine that
it has to convert, by hook or crook, all it knows
or that it could know about the input sequence into
the probabilities for the next single symbol p(c).

Consider now a binary sequence of n symbols, for which
the modeler produces, symbol by symbol probabilities
p of bit=1, with p in some interval D=[a,b), of size
d=b-a. We divide D into s=sqrt(n) equal subintervals
of lengths d/s. Each input symbol is assigned to
one of s enumerative classes (thus enumerated by a
separate index) based on the subinterval in which
the modeler's p at that point falls in. Hence, we're
using quantized probabilities to classify the symbols
as "equiprobable". The excess output E in bits per
symbol due to this 'p quantization' is about
(cf. [41c], p. 8):

E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2)

> how would you use a QI coder with an adaptive model ?

QI order-0 adaptive modeler identifies contiguous

quasi-stationary sections of the input sequence and
uses them as enumerative classes. There are many ways
to do such segmentation and even more ways to encode
it, along with the corresponding section counts, into
the output. Some of these methods, especially the
encoding aspect, were developed already for conventional
EC, such as described in [11]-[15], [23]. I have also
developed several for QI (some of which were touched
upon in [T2], where the general QI/EC modeling pattern
is presented, pp. 26-35).

Due to a lack practical EC coder and the exaggerated
dominance of the AC modeling paradigm (hypertrophied
to the point of pathology by the absence of practical
competition), this entire field of EC modeling is
highly under-explored. With the precision & performance
problems finally solved by QI, an algorithmic gold mine
has opened, where just about anything you do, and there
is more to do than an eye can see, is a new algorithm,
maybe a great new discovery to be taught to kids ever after.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

nightlight

unread,

Jan 8, 2006, 4:29:19 PM1/8/06

to

First, few minor problems with your answer:

> The starting range is [0..256) ...

> Therefore, the range is subdivided as follows:

> [0..170) for a '0', and [170..255) for a '1'. ...

You start with full "range" [0..256), but then you switch

the endpoints from there on to [*..255). You also meld together

two mutually exclusive notations, the Pascal range A..B

with the semi-open interval math notation [A,B) for a "range".

The Pascal "range" A..B includes into the interval both points
A and B, while the semi-open interval [A,B) doesn't include B.
That notational meld of two mutually exclusive prescriptions
was probably what led to your endpoint mixup 256 vs. 255. The fact that
255 is divisible by 3, while 256 is not gave it an extra wishful nudge
to go with the 255. I will switch to a single coherent notation, [A,B),
hence use e.g. [*,256).

> There is a coding gap caused by having to terminate

> the encoder, but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.

The AC doesn't normally build the complete, ready to go output code on

each step. Part of the output code is implicit in the coder's state

variables and the implementation conventions. If at any point the coder

is told to wrap it up and provide output code, the excess of 1-2 bits

on that total output code will be in that output in the case of an
infinite precision AC coder (ACX). This excess will be 2-3 bits in the
case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3
excess bits on the total output are part of the AC output throughout --

they are only represented differently at different stages (implicit in
the algorithm conventions and the internal state variables until the
last step, at which point the output code creation is finalized and
they become explicitly part of that output). Hence, at best your
argument above is vacuous (arguing about ambiguities in verbal
conventions), and at worst it is incorrect. I will grant you the
benefit of the best case.

Well, on the positive side, at least you recognize that these excess

bits result in a "coding gap", which they do, of course.

There is an additional and unrelated misconception revealed in the last
part of that sentence:

> ... but that is a fixed cost of two or three

> bits, not a loss at each step as you have claimed.

An infinite number of additive contributions can easily produce a sum

with fixed upper bound, or within some fixed range, such as 1-2 excess
for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the
partial sums after 2 terms are between 1.33 and 1.5. Hence, that the
cost is "fixed" (or rather the cost interval is fixed) gives you no
information on how many steps contributed to the cost.

> Furthermore, this termination cost can be avoided as
> well, with careful handling.

Not in the general AC coding setup. In a special cases, e.g. when the

number of symbols is prescribed and known upfront to encoder and
decoder, you can avoid, at least part of it. In a general AC setup,
without any such "side information" provided, the 1-2 bit excess for
ACX (or 2-3 for ACF) is a part of the standard ACX code length (needed
to select unambiguously the final codeword interval):

L = ceiling(log(1/Pc)) + 1 .... (1)

whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability'
(the probabilities you compute with in AC) of the complete message. For
a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where
p=AC's probability of 1's, k=count of 1's, n=count of all symbols.
Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman
or exact EC code, where the +1 is absent and where the upper bound is
1, resulting solely from rounding up of the log(1/Pc) to the next whole
bit).

Now to the fundamental problem of your answer -- the complete and

hopeless conceptual meld of the three distinct concepts of "interval"

relevant in arithmetic coding. Hence, before we can head toward any

approximation of a coherent discussion, we need a finer resolution vew

of these three kinds of "intervals":

1. IVF ==> the coder's internal variables in finite precision

specifying finite precision intervals (these are your [0,85),
[85,170),... etc. variables).

2. IVX ==> the coder's internal variable specifying an infinite
precision interval (this type is implicit in your description, when you
are referring to your intervals as approximate). While one can't store
infinite precision binary fraction in an internal variable, one can
easily store an equivalent rational fractions with unlimited precision
integers for numerators and denominators (e.g. as done in [41a] pp.
13-14, 46).

3. CWI ==> Codeword interval - this is the interval defined by the AC
codeword (the explicit codeword ready for output or the implicit
codeword which could produced at each step). The AC codeword, implicit
or explicit, is a binary string C of finite length L (for ACX L is
given via (1)). The usual AC convention is to interpret C as
representing fractional binary digits of some rational number Z,
defined as: Z = 0.C. For example, if the AC codeword is C=1010 then
L=4, Z=0.1010 = 1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of
codewords C to intervals CWI is used more generally than just for AC

analysis e.g. in the Kraft inequality proofs (cf. Cover & Thomas IT

textbook [24]). The common convention for this mapping is as follows

G=1/2^L .... (3)

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three

> bits, ...

Clearly, you are realizing above that adding 2 or 3 bits as a

termination cost to the codewords, does create gaps somehow and
somewhere. But since that didn't seem to fit into the your
conceptual meld in which all three interval types are just the
"interval" and where you can make your internal variables interval IVF
fit together as tightly as you wish, you couldn't quite see how could
the "interval^3" have this gap. Now, with the three higher res interval
concepts in front of you, you can look at (3) and see that adding 2 or
3 to L will make the corresponding CWI interval length G smaller by 4
to 8 times, hence a pretty big gap will be opened in the CWI coverage
(4) of the interval [0,1) (since 75-87 percent of that CWI will turn
into the unused code space, the interval gap).

But seeing in it the low res back then, you concluded:

>... but that is a fixed cost of two or three

> bits, not a loss at each step as you have claimed.

Now look at (1) which shows how the L is computed -- the smaller the

You're welcome, of course, to write out all the codes AC will produce
on that example (under conditions described i.e. no special ad hoc code

tweaks for the 3 short strings; or just run some AC, in decrementing

mode as you did in your description) and show how the CWI intervals for

them fit gaplessly or equivalently, show how these actual AC codes use

entire output code space without "coding gaps" (and, obviously, without

contradicting your earlier statement on "coding gaps" due to 2-3 bit

termination overhead).

> I hope to have cleared up your misgivings about Arith
> Coding with this.

Well, I hope yours will clear up, too :)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and

Matt Mahoney

unread,

Jan 8, 2006, 12:57:59 PM1/8/06

to

David A. Scott wrote:
> "Matt Mahoney" <matma...@yahoo.com> wrote in
> news:1136605351.6...@g47g2000cwa.googlegroups.com:
>

> In your model you use 9 bits for every 8 bits of data where starting
> bit is a ZERO for each byte and then for EOF you allow the starting
> bit to be ONE and stop the compressor. This does not allow all the
> code space to be used.
> As a result of this modeling every compressed file with FPAQ0 has
> the first bit of the fist byte set on output so technically the first
> bit out is a total waste of space.

Actually all the code space is used because the first bit is 0 for an
empty file.

Of course it could be improved by making it bijective instead of using
2*log2(n) bits to encode the length in bytes.

-- Matt Mahoney

nightlight

unread,

Jan 9, 2006, 8:38:45 AM1/9/06

to

> You have written page upon page of explanation,
> all of which in the end revolved around a fixed
> coding excess of a few bits.

The topic being discussed in this sub-thread was about difference in
the quantization errors and the resulting gaps. For ACX (infinite
precision AC), this excess is 1-2 bits on total. For ACF (finite
precision AC) the error is 2-3 bits, provided AC is within its initial
bounds for counts (usually 2^24-2^30 or similar). Once it gets beyond,
unlike ACX, the error does grow beyond 2-3 bits, which becomes
dependent on this case is handled.

In any case, this is only one component of the ACF excess, which was
the particular subtopic we were discussing. You can make ACF perform at
about that level of excess by using decrementing AC (see [34]). In that
role, though, while ACF will code nearly as optimally as QI (within
these few bits, which grow very slowly with N), all it achieves is to
become a slightly less accurate and much slower exact work-alike of QI.

The regular adaptive or static AC's that one normally finds in
practical implementations, there will be an additional redundancy
relative to QI which, in case of order-0 stationary sources can be
derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
leading order 1/2 log(2P npq) for binary (for general alphabet of size
A, there are A-1 such terms which are summed, resulting in approx 1/2
changing to (A-1)/2). As shown in [T2], this error is due to the
approximate enumeration of AC, where the Stirling approximation and the
subsequent dropping of the sqrt() & other factors (each being <1),
causes AC addends to increase relative to QI, leading to an excess of
O(log(n)) bits on the total output.

This O(log(n)) order redundancy was the dominant effect shown in the
test table (p.10 in [T3]) for all but the last row (which illustrates
different effect). Depending of course on what the total output size is
(which depends on input size and density of 1's), this may amount to a
tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
table.

Some superficial, low res, short fuse mind will stop right here,
concluding 'few percent' at best, it seems. What's the big deal. I can
do with my new xyz coder for images 35% better than jpeg,... or some
such.

To realize what's the big deal, you need to follow the 'white rabbit' a
bit further.

Since the enumeration excess of O(log(n)), or even the 2-3 bit
quantization excess, can be as large or even larger than the output
size (e.g. in low entropy limit or for short enough inputs of any
entropy per symbol), this can be 50% or 100% or _any_ ratio you want
provided you work with exact fractional bit sizes. You can encode to
such fractional sizes if you have several items of any kind (which may
belong to entirely different sources, different parts of the program or
different programs) to encode and all have some fractional bit sizes --
the mixed radix codes allow combining of these fractions so that the
only one whole bit rounding error is paid for the total. Hence, the
potential compression ratio in favor of QI is in principle unlimited.

The two kinds of "small" differences in redundancy are indeed small for
the usual things one codes with AC (or Huffman). The reason for that is
not because the differences are universally small, but because one
doesn't code with AC anything where it would be very poor. So, the
apparent "smallness" is a mere reflection of the narrowing of the
domain surveyed to what AC or Huffman algorithm code reasonably well.

But, there is a whole realm of unexplored algorithms, which would
require coding of many separate items, which in each item amount to a
tiny bit fraction, and which are ignored because AC won't do well here,
due to those "tiny" overheads O(log(n)) of fixed 2-3 bits per coding
task. Consider, for example BW-Transform output column R (Fig 5, p. 34,
[T2]). If you look at long enough contexts, the column R will fragment
into many small pieces. To account for longer contexts, one would want
to select for coding as fine fragmentation of R as practical. The
adaptive AC was found to perform very poorly here, even worse than
various ad hoc schemes that are in use now. { The row "Vary" in table
on p.10 in [T3] illustrates this effect, where on a 4K input, adaptive
order-0 AC outputs more than twice as much as the descriptive order-0
QI.} The general coding for long contexts is not the only area with
lots of highly fragmented small inputs. Differential frame updates for
video & audio codecs generate similar data as well.

In addition to domain of highly fragmented data, there is an area where
the ultra-low entropy limit effects become significant. These are the
database or data-warehouse bitmap indexes and keyword incidence bitmaps
for search engines. The big search engines will have such bitmaps of 8+
billion bits (or whatever the latest Google vs Yahoo vs MSN figures are
these days). The compression methods in these fields are either ad hoc
methods or runlengths, both of which break down on highly clustered and
very low average entropy data.

In relation to search engine & database domains (and few others),
another unique aspect of QI becomes highly relevant:

a) the compressed output size is available without decompressing the
data -- for example you know exactly the size of binomial coefficient
C(n,k) from count of 1's.

b) For fixed input entropy rate QI codes at _precisely_ the fixed
number of bits (which is also within the log(e)/2^(g-1) bits from the
entropy, at precision g and model used) e.g. all n! permutations of n
items will always be encoded into the _precisely_ the same size (you
can test that with QI.exe or see it in the source).

The Huffman and AC don't have these properties. Huffman will have
fairly large variations (easily as much as 10-15%) in the output size,
AC less than Huffman, but still of the order of O(log(n)) bits even at
fixed input entropy rate. Hence, if one wishes to store the compressed
items into a designated space, known in advance, with AC and Huffman,
one would have to reserve space for the worst case of variation,
increasing their redundancy.

Similarly, if one is coding complex data structures, with lots of
compressed pieces that need later to be traversed quickly and accessed
randomly, it is essential that the sizes of compressed pieces can be
known exactly, without expanding the data, which QI property (a)
assures. With Huffman & AC, you either need to pad the space to cover
the worst case compressed size or include the size as a separate item.
Either way wastes additional space.

Although the above types of setups occur in many fields, the complex
data structure aspects are especially relevant when coding and
traversing large networks or graphs, the most popular example of these
being the web links. But the network modeling paradigm has been gaining
in recent years in many other fields, such as computational biology,
where ever more complex webs of bio-chemical reactions are being
constructed on top of the growing DNA maps & genetic databases. Another
field with mathematically similar problems are in the ad hoc routing
for wireless networks.

Finally, there is a domain of constrained coding (used for the low
level recording media codes and channel coding), where the exact EC,
however cumbersome and inefficient in the conventional form, is still
sufficiently advantageous over the Huffman & AC that they have been
using it here since 1950s (see Immink's work). The QI improves speed
and memory requirements vs exact EC by a huge factor O(n), so these
areas can benefit directly and right away.

In conclusion, the availability of an extremly fast and a highly
accurate (optimal for any given aritmetic precsion) coding algorithm to
perform such coding tasks, opens the entire realm of unexplored
compression algorithms. The compression gains potentially available to
such algorithms are not limited at all to the few percent one can see
on the subset of coding tasks where Huffman or AC code well (which to
those over-conditioned to seeing only that domain, appears to be all
there is to compress).

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

34. J.G. Cleary, I.H. Witten "A Comparison of Enumerative and Adaptive

Willem

unread,

Jan 9, 2006, 8:57:04 AM1/9/06

to

nightlight wrote:
) In any case, this is only one component of the ACF excess, which was
) the particular subtopic we were discussing. You can make ACF perform at
) about that level of excess by using decrementing AC (see [34]). In that
) role, though, while ACF will code nearly as optimally as QI (within
) these few bits, which grow very slowly with N), all it achieves is to
) become a slightly less accurate and much slower exact work-alike of QI.
)
) The regular adaptive or static AC's that one normally finds in
) practical implementations, there will be an additional redundancy
) relative to QI which, in case of order-0 stationary sources can be
) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
) leading order 1/2 log(2P npq) for binary (for general alphabet of size
) A, there are A-1 such terms which are summed, resulting in approx 1/2
) changing to (A-1)/2). As shown in [T2], this error is due to the
) approximate enumeration of AC, where the Stirling approximation and the
) subsequent dropping of the sqrt() & other factors (each being <1),
) causes AC addends to increase relative to QI, leading to an excess of
) O(log(n)) bits on the total output.

This redundancy is offset by the need to transmit extra bits for the
modeling information, which is something you have neatly excluded in
your calculations.

) This O(log(n)) order redundancy was the dominant effect shown in the
) test table (p.10 in [T3]) for all but the last row (which illustrates
) different effect). Depending of course on what the total output size is
) (which depends on input size and density of 1's), this may amount to a
) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
) table.

I assume your tests only counted the actual bits output by the encoder,
and not the bits needed for transmitting the model ? That makes it
an unfair comparison.

David A. Scott

unread,

Jan 9, 2006, 9:42:10 AM1/9/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136681911.7...@g47g2000cwa.googlegroups.com:

> David A. Scott wrote:
>> "Matt Mahoney" <matma...@yahoo.com> wrote in
>> news:1136605351.6...@g47g2000cwa.googlegroups.com:
>>
>> In your model you use 9 bits for every 8 bits of data where starting
>> bit is a ZERO for each byte and then for EOF you allow the starting
>> bit to be ONE and stop the compressor. This does not allow all the
>> code space to be used.
>> As a result of this modeling every compressed file with FPAQ0 has
>> the first bit of the fist byte set on output so technically the first
>> bit out is a total waste of space.
>
> Actually all the code space is used because the first bit is 0 for an
> empty file.
>

You could leave the enpty file empty.

And yes one could think of the code space there as being used when
one likes at compression. I guess I was also looking at decompression
of a long file and wondering what happens to the rest of the file
to be decompressed if what is returned is marked as an EOF while the
uncompressed file has much wasted file space since following bytes
will not be looked at.

> Of course it could be improved by making it bijective instead of using
> 2*log2(n) bits to encode the length in bytes.
>

You don't have to go to that extreme and make it hard for most to
follow. You could use just log2(n) bits to encode the length. It still
would not be bijective but it would not complicate the code that much
more and a rough file integrity check could be done during
decompression without any additional output to the compressed file.

Matt if I can get nightlight to commit to coding his example of
the 3 symols types. I would like to play again with fpaq0. To see
how much better it can be made with as little change as possible.
I like your style but I don't think I will go to the wall and make
it bijective. But the nine times for each eight can be changed to
eight for eight with a ninth only needed for the last byte.

But it looks like nightlight is all talk and will not even
attempt to code his simple example. Maybe he has looked at the
example he himself made up and realizes that he can't beat a
real aritmetic coder that can be written to do his own example
where QI was suppost to shine and arithmetic fail.

> -- Matt Mahoney
>

Again you code really it neat for the other methods. Where this
nodel code is different is that in your other coding you have the
length field in front of file so you never add this extra cost
through out the file. I think you can add the cost at the back end
and save a little extra space with very little change in source
code.

nightlight

unread,

Jan 9, 2006, 9:53:18 AM1/9/06

to

> This redundancy is offset by the need to transmit extra bits for
> the modeling information, which is something you have neatly
> excluded in your calculations.

Not quite. The Stirling approx. & dropping of the sqrt(n) factor is
purely an arithmetic error (which increasies the AC addends) that you
can avoid in AC only by coding in decrementing mode, in which case both
coders have exactly the same parameter encoding. As explained few
messages ago, in some special cases (such as stationary Bernoulli
source) you can avoid the leading order of sqrt() error via KT
estimator (that is not available generally and even when available
includes tradeoffs as explained earlier in this thread). A simple intro
into the 2nd order AC redunancy is in the ref. [40].

> I assume your tests only counted the actual bits output by the encoder,
> and not the bits needed for transmitting the model ? That makes
> it an unfair comparison.

Of course, not. Each coder had to produce entirely self-contained
output, with data sizes and all counts packaged in. AC needed only to
encode its input size (which Moffat98 does using AC itself, as an outer
loop, called on every 8 bits of payload), while QI had several more
items. I already explained that in [T3] p. 9, other posts & the source
code package. I coded the counts using a small Huffman code table
(prebuilt for the binomial distribution, just few KB size for n=1024; I
also have hypergeometric distrib. tables, but these were not used for
tests in [T3]), using total k sent separately (in log n bits since n is
known at that point; the n itself was coded using two part
self-delimiting codes, Huffman code for prefix specifyingt the length
of rest & binary for the rest). The total k was used to calculate the
running block to block average (which selects the Hufman subtable for
that average) and also to extracting the exact last block count as the
leftover k at that point. I also coded the leading 16 bits of mantissa
for each output block using the mixed radix codes (these can also use
tapered huffman codes if one doesn't care about 2-3 percent loss on
these 16 bits or about the artificial output size variability for fixed
entropy inputs, see item (a) & (b) in previous post).

The source is out there along with tips in the readme.txt on how to
compare, compression effectivness & speed vs AC or other coders. This
is not some secret compression algorithm with magic powers and which no
one is allowed access to. The math alone should be enough for those
familiar enough with the relevant topics, to know how the compression
will come out, even without any source or tests. You're welcome to
experiment with source and report any abberations.

David A. Scott

unread,

Jan 9, 2006, 9:58:29 AM1/9/06

to

Willem <wil...@stack.nl> wrote in news:slrnds4qtg...@toad.stack.nl:

> nightlight wrote:
> ) In any case, this is only one component of the ACF excess, which was
> ) the particular subtopic we were discussing. You can make ACF perform at
> ) about that level of excess by using decrementing AC (see [34]). In that
> ) role, though, while ACF will code nearly as optimally as QI (within
> ) these few bits, which grow very slowly with N), all it achieves is to
> ) become a slightly less accurate and much slower exact work-alike of QI.
> )
> ) The regular adaptive or static AC's that one normally finds in
> ) practical implementations, there will be an additional redundancy
> ) relative to QI which, in case of order-0 stationary sources can be
> ) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
> ) leading order 1/2 log(2P npq) for binary (for general alphabet of size
> ) A, there are A-1 such terms which are summed, resulting in approx 1/2
> ) changing to (A-1)/2). As shown in [T2], this error is due to the
> ) approximate enumeration of AC, where the Stirling approximation and the
> ) subsequent dropping of the sqrt() & other factors (each being <1),
> ) causes AC addends to increase relative to QI, leading to an excess of
> ) O(log(n)) bits on the total output.
>
> This redundancy is offset by the need to transmit extra bits for the
> modeling information, which is something you have neatly excluded in
> your calculations.
>

Would you really expect him to actaully do the example he proposed.
This guy seems all talk and no action even on the very example he proposed
to code.

> ) This O(log(n)) order redundancy was the dominant effect shown in the
> ) test table (p.10 in [T3]) for all but the last row (which illustrates
> ) different effect). Depending of course on what the total output size is
> ) (which depends on input size and density of 1's), this may amount to a
> ) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
> ) table.
>
> I assume your tests only counted the actual bits output by the encoder,
> and not the bits needed for transmitting the model ? That makes it
> an unfair comparison.
>

Of course he tried to make an unfair comparison. He wants his method to
look best. However I am sure I can change my code to do the example he
proposed in a bijective way, So go ahead press him to write code that any
one can check with files of there own. He will not do it since I don't
think he has the ability to follow through even on his own example. He can
quote text that he seems to not understand when it come to arithmetic but
can he write real code for even the simple example he proposed. Don't
hold you breath I think we have see his kind here every year. So lets
cut the chase and see if he will actually do anything.

NIGHTLIGHT where is your code. Lets nail the example down to files.
Where one had to compress and then has to decompress back, Since its
not a real test unless one can go both ways and users of the group can
actually test it.

WE ARE WAITING!!!!

>
> SaSW, Willem

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

David A. Scott

unread,

Jan 9, 2006, 10:02:23 AM1/9/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136818397.9...@z14g2000cwz.googlegroups.com:

>
> Of course, not. Each coder had to produce entirely self-contained
> output, with data sizes and all counts packaged in. AC needed only to
> encode its input size (which Moffat98 does using AC itself, as an outer
> loop, called on every 8 bits of payload), while QI had several more
> items. I already explained that in [T3] p. 9, other posts & the source
> code package. I coded the counts using a small Huffman code table
> (prebuilt for the binomial distribution, just few KB size for n=1024; I
> also have hypergeometric distrib. tables, but these were not used for
> tests in [T3]), using total k sent separately (in log n bits since n is
> known at that point; the n itself was coded using two part
> self-delimiting codes, Huffman code for prefix specifyingt the length
> of rest & binary for the rest). The total k was used to calculate the
> running block to block average (which selects the Hufman subtable for
> that average) and also to extracting the exact last block count as the
> leftover k at that point. I also coded the leading 16 bits of mantissa
> for each output block using the mixed radix codes (these can also use
> tapered huffman codes if one doesn't care about 2-3 percent loss on
> these 16 bits or about the artificial output size variability for fixed
> entropy inputs, see item (a) & (b) in previous post).
>
>

I guess this is your way of saying you can't even code your own
example. What are you afraid of. I think you fear losing your own
contest??

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 9, 2006, 10:35:29 AM1/9/06

to

> Discarding part of the range is one way to deal with finite
> precision, for example the carryless rangecoder in ppmd.
> However the various coders in paq6, paq7, fpaq0, etc. do
> not have any gaps in the code space. These are carryless
> binary arithmetic coders with 32 bits precision and 12
> bit representation of probabilities.

I responded to a similar objection from Willem in another post [M1] in
more detail, so I will just supplement the differential aspect here,
referring to the formulas & ref's there.

The AC carry problem which is due to the usual FIFO coding mode can be
avoided by either using LIFO mode (as Rissanen's AC-76) or buffering
the whole data and propagating the carry. But the carry redundancy or
any explicit gaps in the IVF (see [M1]) intervals are a only part of
the quantization errors.

The unavoidable fundamental quantization bit excess comes from the
reduction in size of the Pc (see eq. (1) in [M1]), which is the coding
probability of the complete message. This reduction in size of Pc as
gets computed along the string (e.g. by mutiplying with the single
symbol probabilities along the way) is instrinsic to the truncating
arithmetic of AC -- there is no way around it since one can't just
round the product down along one branch and round it up along the
alternative branch, due to the Kraft inequality constraint in the
current step and all the steps upstream (the parts already encoded).

This exact same effect, which is a purely arithmetic rounding
phenomenon, is much more obvious with QI, where the Kraft inequality
constraint is given in eq. (20) ([T3] p. 8). The basic QI quantization
recurrence is given in eq. (21). Any time the rounding up occurs in
(21), the next higher order addends increase slightly (the upper bound
on that excess is log(e)/2^(g-1) bits per symbol). The AC addends are
exactly the same addends, except that they are rescaled for each coding
task spearately, so that the largest addend is always exactly 1
(interpreted in AC as total probability=1, cf. [T2] pp. 19-20, Figs. 3
& 4). All other AC adends are simply the QI addends divided by this
largest addend (which is selected individually for each input).
Similarly, the AC equivalent of eq (20), the Kraft inequality
constraint, is divided by the largest addend, so it appears as a
constraint on the probabilities, but it is still the same constraint.
Hence, the exactly same relation between the higher and lower order
addends is maintained by AC and QI, where in both cases the
quantization enlarges the ratios between the higher order and lower
order addends. The difference is that QI quantization, which is bottom
up, is optimal for any given precision, while AC's, which is top down,
is not (this is the same difference as between the bottom-up Huffman
codes, which are optimal, and top-down Shannon-Fano codes, which are
not).

As explained in another message I posted today [M2], this particular
difference (which is unavoidable for the AC) is small for the tasks for
which AC & Huffman are typically used for. As explained in [M2] in more
detail, that is almost a tautological observation, since AC & Huffman
are used only where they do a reasonably good job. But there are coding
domains (along with the corresonding largely unexplored higher level
algorithms) beyond this one, in which the compression ratios could be
arbitrarily large, solely based on these "small" differences (the
quantization & enumeration errors, the latter are of O(log(n)) are
avoidable by AC, see [M2]).

> Being restricted to an order 0 model seems like a
> severe disadvantage. How would you transform a
> context mixing model to order 0?

This was explained in detail in another message, [M3]. The appearance
of this limitation (which many people have expressed), is due to
viewing it through the AC modeling paradigm (ACMP), especially through
the so-called "online" constraint (which doesn't exist in real life,
since Morse telegraph were the main communication technology), where
one has to process entire sequence symbol by symbol and output with
only log(n) latency. Also, the modeling is a much larger field than
calculating probabilities of the next single symbol (note, though, that
method (b) in [M3] can handle arbitrary AC models, while still
retaining the full QI speed advantage).

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

M1. Post on quantization errors in AC & QI:

http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8

M2. Post on the apparent "smallness" of compression differences:

http://groups.google.com/group/comp.compression/msg/6ebbc078012c215c

M3. Post on QI modeling:

http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

Thomas Richter

unread,

Jan 9, 2006, 12:27:27 PM1/9/06

to

Hi,

> The AC carry problem which is due to the usual FIFO coding mode can be
> avoided by either using LIFO mode (as Rissanen's AC-76) or buffering
> the whole data and propagating the carry. But the carry redundancy or
> any explicit gaps in the IVF (see [M1]) intervals are a only part of
> the quantization errors.

Ehem, small correction: You don't have to buffer the whole data, see
for example Nelson's or Moffat's implementation. All you need to do is
to count how many times you have "forgotten" to carry over, and
resolve this as soon as the carry is resolved by propagating it
thru all the "counted" (but never buffered) data.

Other methods use "bitstuffing", i.e. insert a redundant bit to
allow the carry to propagade in, but this is completely off-topic
here since it pessimises the code, and we're currently in "bit juggling"
mode here. (-;

> This was explained in detail in another message, [M3]. The appearance
> of this limitation (which many people have expressed), is due to
> viewing it through the AC modeling paradigm (ACMP), especially through
> the so-called "online" constraint (which doesn't exist in real life,
> since Morse telegraph were the main communication technology), where
> one has to process entire sequence symbol by symbol and output with
> only log(n) latency.

I afraid this is pretty much an existing problem. This "online" constraint
is often a limitation imposed by limited memory of telecommunications
hardware, and in this context called the "latency" of the coding model.
Consider for example an audio compression over an IP network, using
a hardware based coder. A long memory in the device would mean that
the receiver has to wait long (in principle infinitely long)
before it might be able to play the sound. Unacceptable for real-life
applications, and this is why folks play the "bitstuffing" game I mentioned
above, even though it reduces the coder performance. Thus, there *are*
definitely areas where one wants to have an online-algorithm. However,
I would not consider this problem on-topic here right now.

I don't think that this latency question is right now
relevant for your work; getting past AC is interesting enough by itself,
let it be on-line or off-line.

So long,
Thomas

nightlight

unread,

Jan 9, 2006, 1:10:43 PM1/9/06

to

> Consider for example an audio compression over an IP network, using
> a hardware based coder. A long memory in the device would mean that
> the receiver has to wait long (in principle infinitely long)
> before it might be able to play the sound. Unacceptable for real-life
> applications, and this is why folks play the "bitstuffing" game I mentioned
> above, even though it reduces the coder performance.

I did few of these, all mods from existent audio codecs, for our
current communication product ( http://www.hotComm.com ). Even here,
though, take the most basic low quality voice with 8000 samples/sec,
and take low precision samples of just 1 byte per sample (this is not
what we normally use). The segments we typically get are 20ms of audio
data. (Note that one could easily go to 60-80 ms, blocks without human
listener noticing any difference.) With 20ms, 1 byte/sample, 8
samples/ms you get a block of data to encode that is 1280 bits long.
That is three orders of magnitude larger than the "online" constraint
in the AC modeling/coding paradigm lingo. Any higher quality, and you
are several times larger. Another real time app which doesn't tolerate
latency would be video codec, and even the differential frame data is
almost two orders of magnitude larger than audio. Although the entropy
coder won't get the raw samples but some outputs from the transforms &
filtering, the above was reduced in all parameters to bare minimum, so
in more realistic case that one does get at least as much data even for
the entropy coder.

Even assuming you can have few hundreds symbols at a time vs just a
single symbol (or just few, for AC latency) adds a great deal of
flexibility and opens space for new algorithms, for the modeling engine
and the coder, as BWT illustrates (or the so-called "offline"
dictionary methods) for the modeler or QI for the coder.

> Ehem, small correction: You don't have to buffer the whole data, see
> for example Nelson's or Moffat's implementation. All you need to do is
> to count how many times you have "forgotten" to carry over, and
> resolve this as soon as the carry is resolved by propagating it
> thru all the "counted" (but never buffered) data.

I think that depends on the constraints the task. The AC is adding to a
common sum the numbers which decrease in size. Hence, there is no way,
even in principle to send data out from the higher digits if they are
large enough for carry to propagate, and let decoder decode it
incorrectly (and possibly take actions based on), then issue a carry
signal to undo that decode. That is what I meant above -- encoder
either has to keep all the data that can propagate carry in the worst
case or use blocking mechanisms mechanisms (which add redundancy) or
use LIFO mode (where everything is held until the whole buffer is
encoded).

Note also that one pays the cost of FIFO coding not just in cases when
one has to propagate carry, or in small extra redundnancy, but the
coder is burdened with checking & branching for such cases inside its
coding loop, so there is a coding speed penalty of the AC "online"
constraint, with no practical benefit at all.

nightlight

unread,

Jan 9, 2006, 4:37:26 PM1/9/06

to

>> the probability distribution of the symbols is different,
>> then how does QI coder adapt itself to all those different distributions ?
>

> I dont know the answer to this one.

There is a more detailed answer to this question in an another post
[M1], see section (b), which is QI style coding using AC modeling
engine, which retains the QI speed advantage while coding at the
compression efficiency of the AC. That section fleshes out in more
detail a shorter comment about this method from the note N7 on p. 9 in
[T3]. Much more detailed description of EC/QI modeling is given in
[T2], pp. 26-35.

> As for nightlights's comments on QI's speed, I am afraid
> that as the modelling scheme for QI is different from
> modelling scheme for ArithCoding, we will need to compare
> speed of "QI+its modelling code" with "AC+its modelling code".
> Where both models should be of same order, or chosen to give
> same compression ratio. (My doubt here is "what if QI just
> *shifts* computation burden to modelling code instead
> of reducing it".)

Well, you already have results for QI+Model and AC+Model for the
simplest case of modeling, order-0 adaptive AC vs order-0 "adaptive" QI
(where QI simply counts the bits and selects 0 or 1 based addends to
use, as shown in the source). The AC uses its probability adaptation
algorithm, which is quite a bit of more work (since it counts bits as
well). You can also look at [M1], method (b) and see that any model
made for AC, which feeds probabilities to the coder can be used with QI
and coded with the full coding speed advantage. In that case the
modeling engine is left as is, hence there is _no cost shifting_.

The QI simply codes faster because it has a much better division of
labor within the coder itself than AC (see [T2] pp. 19-21). You can see
that by comparing the AC coding loop with QI coding loop. For example,
the computation of all enumerative addends is done separately, by
specialized code which does only that and saves the results in a
universal table (e.g. quantized binomials for Bernoulli source,
factorials for permutations, powers & products for mixed radix codes).

The tables are universal in the sense that they encapsulate the generic
combinatorial properties of symbol sequences, hence no matter what
source probabilities are, a single table is needed. The AC algorithm,
due to an intricate entanglement of source probabilities with its
coding arithmetic, cannot have such universal tables (see [T2] pp.
19-21) -- it would need a separate table for each source probability.
Instead AC's coding loop computes essentially the same addends
(rescaled so that max addend is always 1, see [T2] p. 19) using
multiplicative recurrences for the addends. In the binary order-0 case,
these are simply the binomial coefficients C(n,k) recurrences:

C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ...... (1)

when symbol 1 is encoded and:

C(n,k) = C(n+1,k) * (n-k)/(n+1) ...... (2)

when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which
is a ratio of the remaining counts of ones, (k+1), and the total
remaining symbols (n+1), is interpreted within AC as probability
of ones and the factor q=(n-k)/(n+1) in (2) as probability of
zeros at that same place. The AC also uses a common scaling factor,
where all C(n,k) in (1) & (2) are divided with C(N,K), where N=total
bits and K=total 1's. This division with C(N,K) is implicit in AC, i.e.
AC starts coding with, what in QI space is the path endpoint (N,K), and
sets its addend value to 1 (interpreted in AC as the starting
probability of an empty message), then using (1) & (2) it computes next
lower coefficients C(N-1,K-1) if 1 is encountered 1st or C(N-1,K) if 0
is encountered 1st, thus it doesn't need to know explicitly the
absolute value of C(N,K). Whenever it encounters 1 (less frequent
symbol by convention), it adds the current addend (the interval
belonging to 0) to the index (which the same as QI does with its
integer addends). The main difference is that AC needs to do
multiplications in (1) or (2) and do it on every symbol. QI already has
all the addends in its table and it doesn't have to do anything for the
most frequent symbol.

As you can see (or also check in the source), at the coder level
itself, the QI has the labor divided much cleaner -- all calculations
of the universal properties of the symbol sequences are outside of its
coding loop. They're done once for all. The only thing done in the QI
coding loop is specific to the given sequence - the particular
placement of 1's for that sequence. The AC carries all the
calculations, for the universal and for the particular sequence
properties in its innermost coding loop, since the two are irreversibly
entangled in the AC scheme.

The same distortions in the division of labor propagate outwardly as
you wrap more layers around the coder, forcing the modeling engine to
deform its division of labor around the inefficient logistic inside
(see more discussion on this point in [T2] pp. 30-31).

The QI modeling is much simpler. The coder does not force the modeler
to convert all it knows or what it can find out about the sequence into
the probabilities of the next symbol, as AC does with its modeling
engine. Since QI doesn't need probabilities to perform coding (but, as
shown in [M1], part (b), it can use them just fine if that is what
modeler outputs), the QI modeling engine is less constrained, more free
to chose from a much larger space of possible algorithms. In essence,
what QI's modeler has to do for QI is to tell it which messages are
equiprobable with which other messages (so they can be enumerated in
the same enumerative class). QI coder doesn't care what the actual
probabilities of the messages are, but only whether P(M1)==P(M2), which
is a much weaker load on the modeler than asking it what is the value
P(M1) and what is the value P(M2), as AC does.

Another essential difference between the two modeling schemes is that
QI's modeler is trying to describe the sequence, while the AC's modeler
is trying to predict the next symbol in the sequence (i.e. calculate
all the possible odds for the next symbol). Paraphrasing Yogi Berra, it
is much easier to predict the past than to predict the future. Hence,
the QI's modeler does have a much easier job at the fundamental level.

The general QI/EC modeling pattern is described in [T2] p. 26, with
details fleshed out on pp. 27-35.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

QI source & tech. reports:
http://www.1stworks.com/ref/qi.htm

M1. Post on QI modeling:

http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
arXiv cs.IT/0511057, 10p, Nov 2005, 1stWorks TR05-1115
http://arxiv.org/abs/cs.IT/0511057

T2. R.V. Tomic "Quantized indexing: Background information"
1stWorks TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

Matt Mahoney

unread,

Jan 9, 2006, 8:17:52 PM1/9/06

to

David A. Scott wrote:
> Matt if I can get nightlight to commit to coding his example of
> the 3 symols types. I would like to play again with fpaq0. To see
> how much better it can be made with as little change as possible.
> I like your style but I don't think I will go to the wall and make
> it bijective. But the nine times for each eight can be changed to
> eight for eight with a ninth only needed for the last byte.

There is some room for improvement. I tried compressing 10,000,000
bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988
bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a
difference of 201.7 bytes.

-- Matt Mahoney

Sportman

unread,

Jan 9, 2006, 9:46:34 PM1/9/06

to

Matt Mahoney wrote:
> There is some room for improvement. I tried compressing 10,000,000
> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988
> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.

Funny test I repeated it quickly with 10 compressors already located at
my PC:

Input (random A,B,C):
0.97 MB (1,024,000 bytes)

Output (compressed):
198 KB (203,401 bytes) PAQ 7
199 KB (204,731 bytes) WinRK beta 3.0 build 2
200 KB (205,119 bytes) WinUHA 2.0
202 KB (207,127 bytes) SBC 0.970
202 KB (207,355 bytes) Slim 0.021
206 KB (211,485 bytes) WinRAR 3.51
206 KB (211,632 bytes) Stuffit 9.0.0.21
216 KB (222,042 bytes) 7-ZIP 4.32
229 KB (234,886 bytes) WinZip 9.0
231 KB (237,390 bytes) BZIP2 1.0.2

David A. Scott

unread,

Jan 9, 2006, 9:46:42 PM1/9/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136855872....@g44g2000cwa.googlegroups.com:

That interesting but I suspect even though arb255.exe would compress
to a much smaller amount. It would not hit the limit yet since raw arb255
is for all possible 256 symbols instead of 3. Yet I suspect it will be
much closer to the limit. Do you have a place to get some of the zipped
test files. So I can test arb255.exe I think you would be surprised at the
difference.

Your code does not use the full count you have a limit at 65534 so if
the data is mixed well you start to appraoch what I am assuming is the
numbers your got. However in your code if there is roughly equal number of
both A B and C your code will compress smaller than what your calculating
as the theoritical if all the A's followed by all the B's followed by all
the C's. My code since its a more true arithmetic does not have these
reset points. So you should get roughly within 2 bytes the same length no
matter how the bytes occur. My code does sort of what QI promises as far
as compression amount but with real files.

Both your code and mine are basically using the same tree. Yours is laid
out at 512 cells mine is laid out as 255 put you are really only using half
the cells since fist bit a constant except at the EOF.

David A. Scott

unread,

Jan 9, 2006, 9:55:41 PM1/9/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136861194.8...@f14g2000cwb.googlegroups.com:

Could you test with arb255.exe
http://bijective.dogma.net/arb255.zip

Or do you have a test file. Again arb255 is not tuned for
just 3 characters but would work well if the A B C are
truely in a random order.

Sportman

unread,

Jan 9, 2006, 11:43:39 PM1/9/06

to

David A. Scott wrote:
> Could you test with arb255.exe
> http://bijective.dogma.net/arb255.zip

Do you have a readme file with command line instructions?

> Or do you have a test file. Again arb255 is not tuned for
> just 3 characters but would work well if the A B C are
> truely in a random order.

The test file is send by email did you received it?

I tested the same 10 compressors with a more real life file to compare
the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes
to compress the test file at a Pentium M single core.

Input (DB dump structure with mix between HTML tags, text and data):
32.4 MB (34,017,118 bytes)

Output (compressed):
2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2
2.56 MB (2,686,923 bytes) PAQ 7
2.81 MB (2,948,566 bytes) Slim 0.021
3.22 MB (3,379,942 bytes) WinUHA 2.0
3.52 MB (3,699,846 bytes) SBC 0.970
3.54 MB (3,723,094 bytes) 7-ZIP 4.32
3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21
3.72 MB (3,910,233 bytes) WinRAR 3.51
4.03 MB (4,231,646 bytes) BZIP2 1.0.2
4.84 MB (5,082,559 bytes) WinZip 9.0

David A. Scott

unread,

Jan 9, 2006, 11:53:20 PM1/9/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136855872....@g44g2000cwa.googlegroups.com:

Matt AEB255 got 1,981,227 both for a random file of 10,000,000
and a file where first 3,333,334 symbols of A followed by 3,333,333 of
B followed by 3,333,333 of C. Which is only 24 bytes different than
what you call optimal. But that optimal value you quoted was assuming
the coder knows that only the 3 symbols used are A B C and that each
occur equally likely. Which is not what our models assume so the
corrent value is larger. See the work of Paul Howard and Jeffery
Scott Vitter.

Since we are using general arithmetic compressors not operating with fixed
probabilites there is a cost associated with in the data. With a slight
change to either FPAQ0 or ARB255 you will see the excess drop a lot.
So lets see if NIGHTLIGHT ever gets a working code for what was really
the contest he wanted. The value you quote is what a modifed arb should
get or Nightlight code where the code knows that its only 3 sybmbols each
occurring at equal rate but don't hold you breath.

David A. Scott

unread,

Jan 10, 2006, 12:06:49 AM1/10/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136868219....@z14g2000cwz.googlegroups.com:

> David A. Scott wrote:
>> Could you test with arb255.exe
>> http://bijective.dogma.net/arb255.zip
>
> Do you have a readme file with command line instructions?
>

I thought there was one in it but to compress
arb255.exe file.in file.out

to decompress
unarb255.exe file.in file.out

Note this is very slow code its writes stuff to the screen.
Its made to do pure bijective arithmetic file compression.

>> Or do you have a test file. Again arb255 is not tuned for
>> just 3 characters but would work well if the A B C are
>> truely in a random order.
>
> The test file is send by email did you received it?

Yes I got it but I don't have code on this machinge to
uncompress rar files. Yes I know its common could you send
a zip file?

>
> I tested the same 10 compressors with a more real life file to compare
> the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes
> to compress the test file at a Pentium M single core.
>
> Input (DB dump structure with mix between HTML tags, text and data):
> 32.4 MB (34,017,118 bytes)
>
> Output (compressed):
> 2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2
> 2.56 MB (2,686,923 bytes) PAQ 7
> 2.81 MB (2,948,566 bytes) Slim 0.021
> 3.22 MB (3,379,942 bytes) WinUHA 2.0
> 3.52 MB (3,699,846 bytes) SBC 0.970
> 3.54 MB (3,723,094 bytes) 7-ZIP 4.32
> 3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21
> 3.72 MB (3,910,233 bytes) WinRAR 3.51
> 4.03 MB (4,231,646 bytes) BZIP2 1.0.2
> 4.84 MB (5,082,559 bytes) WinZip 9.0
>
>

David A. Scott

unread,

Jan 10, 2006, 12:10:19 AM1/10/06

to

"David A. Scott" <daVvid_...@email.com> wrote in
news:Xns9746DEE5BBC64H1...@213.155.197.138:

> "Matt Mahoney" <matma...@yahoo.com> wrote in
> news:1136855872....@g44g2000cwa.googlegroups.com:
>
>> David A. Scott wrote:
>>> Matt if I can get nightlight to commit to coding his example of
>>> the 3 symols types. I would like to play again with fpaq0. To see
>>> how much better it can be made with as little change as possible.
>>> I like your style but I don't think I will go to the wall and make
>>> it bijective. But the nine times for each eight can be changed to
>>> eight for eight with a ninth only needed for the last byte.
>>
>> There is some room for improvement. I tried compressing 10,000,000
>> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988
>> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference
>> of 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a
>> difference of 201.7 bytes.
>>
>> -- Matt Mahoney
>>
>>
>
> Matt AEB255 got 1,981,227 both for a random file of 10,000,000

I meant ARB255 the other was a typo

> and a file where first 3,333,334 symbols of A followed by 3,333,333 of
> B followed by 3,333,333 of C. Which is only 24 bytes different than
> what you call optimal. But that optimal value you quoted was assuming
> the coder knows that only the 3 symbols used are A B C and that each
> occur equally likely. Which is not what our models assume so the
> corrent value is larger. See the work of Paul Howard and Jeffery
> Scott Vitter.
>
>
> Since we are using general arithmetic compressors not operating with
> fixed probabilites there is a cost associated with in the data. With a
> slight change to either FPAQ0 or ARB255 you will see the excess drop a
> lot. So lets see if NIGHTLIGHT ever gets a working code for what was
> really the contest he wanted. The value you quote is what a modifed
> arb should get or Nightlight code where the code knows that its only 3
> sybmbols each occurring at equal rate but don't hold you breath.
>
>
>
> David A. Scott

David A. Scott

Sportman

unread,

Jan 10, 2006, 12:16:56 AM1/10/06

to

David A. Scott wrote:

> Yes I got it but I don't have code on this machinge to
> uncompress rar files. Yes I know its common could you send
> a zip file?

Done

David A. Scott

unread,

Jan 10, 2006, 12:23:10 AM1/10/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136870215.9...@o13g2000cwo.googlegroups.com:

Your file 1,024,000 bytes
compressed to 202,894 bytes

Sportman

unread,

Jan 10, 2006, 12:27:42 AM1/10/06

to

David A. Scott wrote:
> > Could you test with arb255.exe
> > http://bijective.dogma.net/arb255.zip
> >
> > Do you have a readme file with command line instructions?
> >
>
> I thought there was one in it but to compress
> arb255.exe file.in file.out
>
> to decompress
> unarb255.exe file.in file.out

Thanks this helped:

Result test file 1:
198 KB (202,894 bytes)

Result test file 2:
22.4 MB (23,500,615 bytes) Did I something wrong?

David A. Scott

unread,

Jan 10, 2006, 12:34:57 AM1/10/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136870862.0...@g44g2000cwa.googlegroups.com:

maybe don't try unix style with the > or < you
have to type in the whole line.
Here is what I do. to compres a file A.txt
arb255 a.txt a.255
unarb255 a.256 a.25
fc /b a.txt a.a25

the a,255 is compressed of a.txt
a.25 is uncompressed and fc checks
to see if they are the same.

Since it bijective if you run unarb255 it will
uncompress to another file that is usually longer
but when you compress it comes back. The code
it completely bijectibe but don't till nightlight
not sure he could take it from a simple arithmetic
coder.

Sportman

unread,

Jan 10, 2006, 12:54:30 AM1/10/06

to

David A. Scott wrote:
> maybe don't try unix style with the > or < you
> have to type in the whole line.
> Here is what I do. to compres a file A.txt
> arb255 a.txt a.255
> unarb255 a.256 a.25
> fc /b a.txt a.a25
>
> the a,255 is compressed of a.txt
> a.25 is uncompressed and fc checks
> to see if they are the same.
>
> Since it bijective if you run unarb255 it will
> uncompress to another file that is usually longer
> but when you compress it comes back. The code
> it completely bijectibe but don't till nightlight
> not sure he could take it from a simple arithmetic
> coder.

I did the test again with the same result, it looks like arb255 has
difficulties to compress that second real life test file with a high
compression ration as the other compressors I tested, unarb255 and file
compare results where ok.

Jasen Betts

unread,

Jan 10, 2006, 12:16:36 AM1/10/06

to

In comp.compression, you wrote:
>> You have written page upon page of explanation,
>> all of which in the end revolved around a fixed
>> coding excess of a few bits.
>
> The topic being discussed in this sub-thread was about difference in
> the quantization errors and the resulting gaps. For ACX (infinite
> precision AC), this excess is 1-2 bits on total. For ACF (finite
> precision AC) the error is 2-3 bits, provided AC is within its initial
> bounds for counts (usually 2^24-2^30 or similar). Once it gets beyond,
> unlike ACX, the error does grow beyond 2-3 bits, which becomes
> dependent on this case is handled.

What is a wasted bit?

Why are there wasted bits?

AFAICT all the bits seem to be serving some purpose.

> In any case, this is only one component of the ACF excess, which was
> the particular subtopic we were discussing. You can make ACF perform at
> about that level of excess by using decrementing AC (see [34]). In that
> role, though, while ACF will code nearly as optimally as QI (within
> these few bits, which grow very slowly with N), all it achieves is to
> become a slightly less accurate and much slower exact work-alike of QI.

how can it be an exact work-alike if it prodices different output?

Bye.
Jasen

Willem

unread,

Jan 10, 2006, 3:23:35 AM1/10/06

to

nightlight wrote:
) The regular adaptive or static AC's that one normally finds in
) practical implementations, there will be an additional redundancy
) relative to QI which, in case of order-0 stationary sources can be
) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
) leading order 1/2 log(2P npq) for binary (for general alphabet of size
) A, there are A-1 such terms which are summed, resulting in approx 1/2
) changing to (A-1)/2). As shown in [T2], this error is due to the
) approximate enumeration of AC, where the Stirling approximation and the
) subsequent dropping of the sqrt() & other factors (each being <1),
) causes AC addends to increase relative to QI, leading to an excess of
) O(log(n)) bits on the total output.

By the way, this excess you are talking about is *not* present as coding
gaps. It is present as an imperfect mapping to the output domain,
resulting in some files compressing larger, but others compressing
*smaller* than they should, according to their probability distribution.

) a) the compressed output size is available without decompressing the
) data -- for example you know exactly the size of binomial coefficient
) C(n,k) from count of 1's.
)
) b) For fixed input entropy rate QI codes at _precisely_ the fixed
) number of bits (which is also within the log(e)/2^(g-1) bits from the
) entropy, at precision g and model used) e.g. all n! permutations of n
) items will always be encoded into the _precisely_ the same size (you
) can test that with QI.exe or see it in the source).

If the data fits these restrictions, then that is additional modeling
information that should be present in the model, and if so, AC will
take advantage of it as well as QI.

) In conclusion, the availability of an extremly fast and a highly
) accurate (optimal for any given aritmetic precsion) coding algorithm to
) perform such coding tasks, opens the entire realm of unexplored
) compression algorithms. The compression gains potentially available to
) such algorithms are not limited at all to the few percent one can see
) on the subset of coding tasks where Huffman or AC code well (which to
) those over-conditioned to seeing only that domain, appears to be all
) there is to compress).

I wasn't debating that. I was debating your absolute 'always better'
claim, and your 'holy grail' attitude.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

Thomas Richter

unread,

Jan 10, 2006, 4:13:05 AM1/10/06

to

Hi,

> I did few of these, all mods from existent audio codecs, for our
> current communication product ( http://www.hotComm.com ). Even here,
> though, take the most basic low quality voice with 8000 samples/sec,
> and take low precision samples of just 1 byte per sample (this is not
> what we normally use). The segments we typically get are 20ms of audio
> data. (Note that one could easily go to 60-80 ms, blocks without human
> listener noticing any difference.) With 20ms, 1 byte/sample, 8
> samples/ms you get a block of data to encode that is 1280 bits long.
> That is three orders of magnitude larger than the "online" constraint
> in the AC modeling/coding paradigm lingo. Any higher quality, and you
> are several times larger. Another real time app which doesn't tolerate
> latency would be video codec, and even the differential frame data is
> almost two orders of magnitude larger than audio. Although the entropy
> coder won't get the raw samples but some outputs from the transforms &
> filtering, the above was reduced in all parameters to bare minimum, so
> in more realistic case that one does get at least as much data even for
> the entropy coder.

> Even assuming you can have few hundreds symbols at a time vs just a
> single symbol (or just few, for AC latency) adds a great deal of
> flexibility and opens space for new algorithms, for the modeling engine
> and the coder, as BWT illustrates (or the so-called "offline"
> dictionary methods) for the modeler or QI for the coder.

A war-time story from my side: JPEG-1 has an arithmetic coder option,
the QM-coder, which does have a latency problem because the coder can,
in principle, delay the output arbitrarely long, depending on the
carry-over resolution. There was a long discussion over this issue in
the group because this actually disallows efficient hardware
designs. Thus, it *is* a problem, at least for some people.

> > Ehem, small correction: You don't have to buffer the whole data, see
> > for example Nelson's or Moffat's implementation. All you need to do is
> > to count how many times you have "forgotten" to carry over, and
> > resolve this as soon as the carry is resolved by propagating it
> > thru all the "counted" (but never buffered) data.

> I think that depends on the constraints the task. The AC is adding to a
> common sum the numbers which decrease in size. Hence, there is no way,
> even in principle to send data out from the higher digits if they are
> large enough for carry to propagate, and let decoder decode it
> incorrectly (and possibly take actions based on), then issue a carry
> signal to undo that decode. That is what I meant above -- encoder
> either has to keep all the data that can propagate carry in the worst
> case or use blocking mechanisms mechanisms (which add redundancy) or
> use LIFO mode (where everything is held until the whole buffer is
> encoded).

But that's not what you've written. Clearly, you cannot send the data
until the carry propagation has been resolved, (unless you sacrifize
some coder efficiency, that is) but that's not "keeping" the data, as
I would call it. It is represented in the encoder, sure, but not "as
is", but as a "carry-over" count. Thus, one doesn't require an
arbitary long buffer for that. Just an arbitrary large counter. (-;

> Note also that one pays the cost of FIFO coding not just in cases when
> one has to propagate carry, or in small extra redundnancy, but the
> coder is burdened with checking & branching for such cases inside its
> coding loop, so there is a coding speed penalty of the AC "online"
> constraint, with no practical benefit at all.

I don't understand what you mean by "no practical benefit". AC online
coders without the carry-over problem (i.e. bitstuffing) are *very*
practical, and they are all realistically used in real applications,
i.e. JBIG, JPEG2000, etc... and that, *because* the hardware folks
cannot get away with arbitrary long delays. The intermediate
communication protocols don't allow it.

However, note that this drifts the communication away from the initial
discussion, thus we stop here.

So long,
Thomas

David A. Scott

unread,

Jan 10, 2006, 7:05:03 AM1/10/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136872470....@g44g2000cwa.googlegroups.com:

If you could zip and send the test file. I would like to check it.
I have test hundreds of files when I wrote the code. However none was
over 2 megs so its possible there is a problem with large files. Its
also possible you stumbled onto another error. I am not perfect but this
is code that I would like to see correct.

nightlight

unread,

Jan 10, 2006, 7:31:29 AM1/10/06

to

> As I see it, there are no excess bits in an Arith
> Coder, until the coder is told to wrap it up. Can
> you give some clear arguments why you claim that
> the bits added by termination are present before
> termination ?

I thought the explanation was pretty clear. Well, perhaps a numeric
example showing the accumulation of errors may help.

I will use below results from my previous message [M1]. The ACX
(infinite precision AC) redundancy formula (from [M1]) is:

L = ceiling(log(1/Pc)) + 1 .... (1)

which gives the length L of the ACX output in terms of Pc, which is the
"coding probability" for the entire message. AC computes Pc by
multiplying coding probabilities of all symbols encountered along the
way. With ACX, the products are exact, while with ACF (finite precision
AC), the products are approximate which will add 2-3 bits to L in (1)
(before ACF counts overflow i.e. generally, ACF adds excess bits at the
approximate rate of log(e)/2^(g-2) bits per symbol). Note also that Pc
is always a product of probabilities generated by the AC model along
the way. For order-0 stationary models the factors used are constants,
but in a general model the factors used may vary from step to step.

Let's now take a finite precision AC and watch its computation of Pc
used in (1). For simplicity we'll calculate in decimal notation (we'll
still use base 2 log in (1) to count bits). We'll use static binary
source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
precision to 1 decimal digit. Let's look at what happens to Pc when
coding string 00100:

0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000
2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090
3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020
4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000
5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009

The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's
approximate Pc is 0.04 which is smaller than Pcx. The output sizes
(rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6
bits. The gaps created in the Pc (or in the "range" in ACF's
terminology) are shown next to each truncation.

Note, for example in step 2, the product 0.49 was truncated to 0.4 and
0.09 gap was created in Pc, which via (1), but without rounding to the
next whole bit since we're not outputing yet anything, results in the
extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to
the ACX's L at that point.

Important element to observe in step 2 is that the gap of 0.09 in the
Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step
2 did not get reassigned to symbol '1', since if '1' were the second
symbol, its product would have been truncated as well, as shown below
in alternate step 2a:

2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0 G=0.01

Hence, either symbol in step 2 would have created a gap in the Pc
(range), i.e. the part of the Pc coverage of the full interval [0,1)
for all messages is simply wasted. For example, all 2 symbol messages
are 00, 01, 10, 11. Their Pc values, as computed by ACF, add up to: 0.4
+ 0.2 + 0.2 + 0.9e-1 = 0.89 leaving the total gap of 0.11 in the
coverage of the interval [0,1) (or shortfall from 1 in the Kraft
inequality (4) in [M1]). The exact Pcx for these 4 messages always add
up to exactly 1 and thus have no this gap. Note that ACX does introduce
a gap in [0,1) when truncating the final Pcx to a finite fraction in
order to obtain a finite length codeword, which results in the baseline
1-2 bit redundancy it creates (see CWI discussion in the earlier post
[M1]). But the extra gaps & the corresponding extra redundancy added by
ACF, accumulate from symbol by symbol truncation errors, step by step,
as shown above.

If you have time and want to play with this more, you should be able to
replicate these same conclusions from this little '1 decimal digit ACF'
in an actual ACF by making it output Pc for all input messages of
certain length (note that the usual ACF's "range" is an integer, a
mantissa of Pc, and ACF doesn't' keep track of its exponent explicitly,
so you will need to keep track of that yourself to display Pc's as
decimal numbers; the QI's SW formalism is a much cleaner and more tidy
way to do these kinds of computations; the ACF in SW formulation flows
much more coherently than the usual handwaving at its little pictures).

(Note: before you lose the context of the post again and bolt off onto
that tangent 'who cares, it's so tiny', as you did in the previous
post, recall what this was replying to: you were insisting to teach me
on the AC gaps and asserting that there are none.)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

M1. Post on quantization errors in AC & QI:

http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

Thomas Richter

unread,

Jan 10, 2006, 8:25:21 AM1/10/06

to

Hi Again,

> Let's now take a finite precision AC and watch its computation of Pc
> used in (1). For simplicity we'll calculate in decimal notation (we'll
> still use base 2 log in (1) to count bits). We'll use static binary
> source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
> precision to 1 decimal digit. Let's look at what happens to Pc when
> coding string 00100:

> 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
> 1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000
> 2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090
> 3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020
> 4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000
> 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009

> The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's
> approximate Pc is 0.04 which is smaller than Pcx. The output sizes
> (rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6
> bits. The gaps created in the Pc (or in the "range" in ACF's
> terminology) are shown next to each truncation.

> Note, for example in step 2, the product 0.49 was truncated to 0.4 and
> 0.09 gap was created in Pc, which via (1), but without rounding to the
> next whole bit since we're not outputing yet anything, results in the
> extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to
> the ACX's L at that point.

And that's where the computation starts getting incorrect. You do
not

It might
be that Pc gets rounded to 0.4, but that also means that the interval for
the other symbol gets larger, thus there is no gap. This means, then,
that the probabilies that are imposed by the AC coder no longer fit
to the source (true), but it does not mean that there are gaps. The 0.090
here is the derivation of the "idealized" probability and a quantized
one, Qc, that differs from Pc due to the finite precision.

Note that this is different from ELS, where we do have gaps, indeed.

> Important element to observe in step 2 is that the gap of 0.09 in the
> Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step
> 2 did not get reassigned to symbol '1', since if '1' were the second
> symbol, its product would have been truncated as well, as shown below
> in alternate step 2a:

> 2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0 G=0.01

No, Pc = 0.3, G = -0.090, resulting in a total "gap" of zero. The
confusion arises because you use only the code-word where AC in
fact uses an interval I to represent the values.

0. Initial I = [0,1) = [0,0.7) U [0.7,1)
1. In C=0 => [0,0.7). At this time, the interval cannot yet be rescaled since
the first digit (7) is not yet fixed,
2. In C=0 => [0,0.4). No digit written either.

Instead, with 2a:

2a. In C=1 => [0.7,0.4). No digit written either, Pc = 0.3, Gap = -0.090

Rescaling doesn't happen in your example, because the end-points of
the interval are not yet "fixed enough" to write out data.

> If you have time and want to play with this more, you should be able to

> replicate these same conclusions from this little '1 decimal digit ACF'.

Yes, please play with the above a bit more. (-;

So long,
Thomas

Willem

unread,

Jan 10, 2006, 8:32:01 AM1/10/06

to

nightlight wrote:
)> As I see it, there are no excess bits in an Arith
)> Coder, until the coder is told to wrap it up. Can
)> you give some clear arguments why you claim that
)> the bits added by termination are present before
)> termination ?
)
) I thought the explanation was pretty clear. Well, perhaps a numeric
) example showing the accumulation of errors may help.

Read my sentence again. 'the bits added at termination'.
I was talking about the fixed cost.

) Let's now take a finite precision AC and watch its computation of Pc
) used in (1). For simplicity we'll calculate in decimal notation (we'll
) still use base 2 log in (1) to count bits). We'll use static binary
) source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
) precision to 1 decimal digit. Let's look at what happens to Pc when
) coding string 00100:
)
) 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
) 1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000
) 2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090
) 3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020
) 4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000
) 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009

Why are you rounding down all the time ?

In AC, some of the symbols probabilities are rounded up,
in such a way that the total always adds up to the total range.
In other words, the 'gap' for one of the symbols gets added to
the code space for the other symbol.

) (Note: before you lose the context of the post again and bolt off onto
) that tangent 'who cares, it's so tiny', as you did in the previous
) post, recall what this was replying to: you were insisting to teach me
) on the AC gaps and asserting that there are none.)

You were asserting that there are gaps _at each step_, which is false.

It then looked as if you were suddenly talking about only the gaps
that occur at termination, which I discarded as irrelevant, but
which you jumped on with glee to point out I was wrong.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

nightlight

unread,

Jan 10, 2006, 8:51:49 AM1/10/06

to

The QI.exe file which you may already have (from the source; current
source version is 1.03) has a command line option to test it on that
same input (which is a high entropy limit for multi-alphabet coding,
and which in I call radix codes):

QI cr3 n1000000 i100

which tells it to code inputs in radix 3 (this can be any 32 bit value
above 2), to use input of 1 million symbols (there is a constant
MAXRDIG in Intro.h which limits the input size to max 2^20 or 1M
digits, you can change that to allow larger sizes e.g. to 16 MEG) and
to run the test 100 times on 100 random inputs (i100 for 100
iterations). The size it produces is 1584962.50... bits, which compared
to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the
total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits).

To compare that with AC output size, one option is to make AC work in
static mode without adapting to probabilities and make it not count the
transmission of frequency table or number of symbols n (which is the
same condition that the QI.exe figure applies to).

Alternatively, you can add to QI's output the size to transmit N, A and
the frequency table. QI.exe has a command QI cl<int> which computes
self-delimiting size for <int>, or just "QI cl" to list a table for
common values. There you get for N=10^6 its self-delimiting length
L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with
QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3,
which is log(C(1000002,2))=38.863 bits, which totals (each rounded
separately, which they don't need to) 28+3+39=70 bits to be added to
QI's output to match the adaptive AC's coding conditions. Since the
QI's output was essentially the entropy, the QI's total is 70 at most
whole bits above the "entropy" (note the "entropy" N*log(3) didn't
include N; also in high entropy limit QI doesn't need to transmit freq.
table, but one would need to modify AC to work in high entropy limit,
so I added table to QI, which distorts a bit comparison to entropy H).

Now, running the Moffat98 coder (in 8 bit max symbol coding mode &
frugal bits enabled), it outputs: 1588435.52 bits (this is avg. over
100 iterations), which is 3473 bits above the entropy, or 3403 bits
above the comparable QI output size. (Note that Mofat98 coder has
generally a slight bias to code worst for the max entropy inputs, but
it gains in return on very low entropy inputs.)

To compare speeds properly, one would need to modify QI's radix coder
to use 8 bit alphabet size limit, instead of 32 bit limit, otherwise QI
pays cost for accessing 4 times larger memory (and few other related
costs in the code, such as 32 bit multiplies or padding 4 times larger
output buffers, etc). Without adjusting the max alphabet size, QI (on a
2G P4 laptop) codes at 22 ns/sym while Moffat98 at 96 ns/sym which is a
ratio 4.36. That is a smaller ratio vs Mofat98 than the for the binary
coder high entropy limit vs Moffat98, which is about 6. I think that
when both coders are normalized to the same max alphabet (thus to use
the same buffers they for input & width of multiplies), it would
probably come out the same high entropy ratio of 6 as in the binary
case.

nightlight

unread,

Jan 10, 2006, 9:11:03 AM1/10/06

to

> Why are you rounding down all the time ?

> In AC, some of the symbols probabilities are rounded up,
> in such a way that the total always adds up to the total range.

Those are not probabilities being rounded down but the update to the
range size (which is the mantissa of the Pc, the coding probability of
the entire message up to that point). Check any AC source and watch div
in the coder to calc new range size (which discards the reminder), or
take a look in (41a) p. 48, formulas for WNC coder (see the 1st line
with asterisk in pseudo code). That is always rounded down.

> It then looked as if you were suddenly talking about only the gaps
> that occur at termination, which I discarded as irrelevant, but
> which you jumped on with glee to point out I was wrong.

The +1 in eq. (1) is the ACX (exact AC) truncation error cost for the
last interval, when it does need to produce finite fraction. That last
interval does have an interval gap as well, that is what those CWI
intervals were about. The exess above that +1 bit is the accumulated
error due to rounding along the way. The ACF error produces total of
2-3 bit on top of +1. You can measure it accumulate if you force
output at any point and subtract the last +1 (which need to go, by
definition on the last CWI interval specification). Anything above +1
in any intermediate stage is the accumulated excess. Note that total
ACF output has a latency of approx. g bits, so until you make it clear
up its internal state (and also adjust for +1 since that one is by
definition on the last interval only) you can't see what the excess is.

nightlight

unread,

Jan 10, 2006, 9:52:32 AM1/10/06

to

> look in (41a) p. 48, formulas for WNC coder (see the
> 1st line with asterisk in pseudo code).

That should say:

> 3rd line with asterisk..

Willem

unread,

Jan 10, 2006, 9:59:23 AM1/10/06

to

nightlight wrote:
)> Why are you rounding down all the time ?
)
)> In AC, some of the symbols probabilities are rounded up,
)> in such a way that the total always adds up to the total range.
)
) Those are not probabilities being rounded down but the update to the
) range size (which is the mantissa of the Pc, the coding probability of
) the entire message up to that point). Check any AC source and watch div
) in the coder to calc new range size (which discards the reminder), or
) take a look in (41a) p. 48, formulas for WNC coder (see the 1st line
) with asterisk in pseudo code). That is always rounded down.

If the AC source you checked rounds down to calc the new range size,
then you have a very poorly written AC source. No wonder you are
misguided about gaps.

Matt Mahoney

unread,

Jan 10, 2006, 10:14:31 AM1/10/06

to

I posted fpaq1.cpp to
http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
It is an improved order 0 arithmetic coder using 64 bit arithmetic. On
a 10MB file which repeats "ABC" it is 25 bytes over the theoretical
limit, and I believe most of this is due to approximations made by the
model early in compression.

-- Matt Mahoney

nightlight

unread,

Jan 10, 2006, 10:15:28 AM1/10/06

to

> And that's where the computation starts getting incorrect. You donot
> It might be that Pc gets rounded to 0.4, but that also means that the
> interval for the other symbol gets larger, thus there is no gap.

The Pc calculation, which is the total message "coding probability"
(meaning it uses AC's model probabilities for its computation as a
product for all p's of the symbols encountered along the way), rounds
down on every symbol. Check any AC source code and look the integer
divison when they update the total range size. The formulas in [41a]
p. 48, show how this is done in the WNC87 coder (look at the 3rd line
with the asterisk in the pseudo code). The same goes for Moffat98,
except here they divide first, then multiply. In all cases, though, the
result is rounded down (the integer "div" discards the reminder). There
is no check for symbol value and a branch for one or another symbol
value. It is unconditional loss of reminder -- the reminder gets
discarded on every symbol, every time the new range is calculated.
These are pure gaps in the coding interval (or shortfalls from 1 in the
Kraft inequality). You can easily see them if you make AC output
codewords for all possible inputs M of given length and add up Pc(M),
which are the total message probabilities, for all possible inputs M of
given length. The sum will come out smaller than 1, just as the numeric
examples show.

What you Willem are talking about is the division of the range among
different symbols, in which case your compensation does apply. But the
total interval has shrunk during the total interval update (when the
integer div rounds down on every symbol).

Note also that index computed by ACF is not rounded itself. This works
exactly the same way as the QI's arithmetic, where eq. (21) computes
path counts for given point using rounding up arithmetic, while the
index formulas (22)-(23) use exact arithmetic, with no rounding. With
QI these two calculations are separate, the rounding stuff is done only
the for the tables to obtain quantized binomials.

Thomas Richter

unread,

Jan 10, 2006, 11:14:08 AM1/10/06

to

Jo.

> The Pc calculation, which is the total message "coding probability"

Yes.

> (meaning it uses AC's model probabilities for its computation as a
> product for all p's of the symbols encountered along the way), rounds
> down on every symbol.

No. Definitely. Not.

> Check any AC source code and look the integer divison when they update
> the total range size.

You typically do not update the total range size, but rather the
coding interval low and high, and you have to do this consistently.
If you always round here to the same direction, some intervals will get
larger, and some smaller. I placed an example on top which is pretty
much realistic except that I round to powers of two instead to powers
of ten in real applications. Otherwise, that's the code, sorry.

> The formulas in [41a]
> p. 48, show how this is done in the WNC87 coder (look at the 3rd line
> with the asterisk in the pseudo code). The same goes for Moffat98,

No. You are confused. If you always round down *interval boundaries*,
you do not round down *interval sizes* because the topmost interval
then gets larger. Pick for example Nelson's coder from the net (just
google for Nelson, Arithmetic Coding), then see yourself.

> What you Willem are talking about is the division of the range among
> different symbols, in which case your compensation does apply. But the
> total interval has shrunk during the total interval update (when the
> integer div rounds down on every symbol).

No, see the example, compute yourself. The *top* interval gets larger
than it should be if you round down its lower boundary. It is really
that simple. AC coding does not have coding gaps. ELS does. AC coding
has a quantization of probabilities, though.

So long,
Thomas

nightlight

unread,

Jan 10, 2006, 11:17:12 AM1/10/06

to

>> in the coder to calc new range size (which discards

>> the reminder), or take a look in (41a) p. 48, formulas
>> for WNC coder (see the 3rd line with asterisk in pseudo

>> code). That is always rounded down.
>
> If the AC source you checked rounds down to calc the
> new range size, then you have a very poorly written
> AC source. No wonder you are misguided about gaps.

I gave you above the places to check, which are not some "poorly
written AC source" but the well known reference implementations of the
AC coders. So, look Moffat98 or WNC87 source or the reference [41a]
above which shows it quite clearly how the Pc is updated.

You are welcome to show a coder which doesn't round down the size of
the updated total range on every symbol. (It still has to be able to
decode, though.)

--- References:

Sportman

unread,

Jan 10, 2006, 11:20:37 AM1/10/06

to

David A. Scott wrote:
> If you could zip and send the test file. I would like to check it.
> I have test hundreds of files when I wrote the code. However none was
> over 2 megs so its possible there is a problem with large files. Its
> also possible you stumbled onto another error. I am not perfect but this
> is code that I would like to see correct.

The creator of that real life test file was so friendly to make a
download link http://www.majestic12.co.uk/files/other/input2.zip. Alex
created a World Wide Web distributed search engine
http://www.majestic12.co.uk and to save bandwidth and storage space he
can use good compression with a reasonable compression/decompression
speed.

David A. Scott

unread,

Jan 10, 2006, 11:53:33 AM1/10/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136906071.4...@g49g2000cwa.googlegroups.com:

I find this very interesting. The fact your coding 10,000,000 zeros
should at least add roughly 5.8 bytes to my anwser. Instead they
differ by 1 byte since I got 24 bytes 1,981,227 for what your claiming
optimal at 1,981,203

Just complied and ran your code a 10,000,000 file that was
ABCABC...ABCA note one exta A

My code compress to 1,981,227 from what you should was opitmal
of 1,981,203

You code compressed to 1,981,232 which is only 5 bytes longer
mine. So both I hope are doing what they should. I don't see
how you say this new one is 25 bytes over isn't it 29 bytes over?

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

David A. Scott

unread,

Jan 10, 2006, 12:01:37 PM1/10/06

to

"Sportman" <spor...@gmail.com> wrote in
news:1136910037....@o13g2000cwo.googlegroups.com:

I downloaded the file. ITs not a random file of THREE characters only
arb255 will not do good on this file.

nightlight

unread,

Jan 10, 2006, 12:20:59 PM1/10/06

to

>> The formulas in [41a]
>> p. 48, show how this is done in the WNC87 coder
>> (look at the 3rd line with the asterisk in the
>> pseudo code). The same goes for Moffat98,
>
> No. You are confused. If you always round down *interval
> boundaries*, you do not round down *interval sizes*
> because the topmost interval then gets larger.

Sorry, I gave you the wrong page & the wrong line number (that was for
similar code in the decoder).

Take a look in (41a) p. 47, the formula in the 1st line with astersk.
which shows the interval size update. Here he updates (in simplified
notation) the integer quantity P(n-1), which is his integer range for
n-1 symbols, using (in abbreviated notation, see the paper):

P(n) = floor[p * P(n-1)] .... (1)

where p is the coding probability of the n-th symbol xn, which is the
symbol now being encoded (for p he uses the conditional probability
that the new symbol xn, just found, occurs after the given n-1 previous
symbols). There is no other statement which assigns a value (updates)
quantity P(n).

Note that at the top he initializes value P(0) to 2^F (where F is the
mantissa precision in bits), i.e. the quantity P is the mantissa of the
actual Pc, meaning the initial width is 1.00.... The fraction dropped
in (1) is gone from P(n) which is to be used on next step. There are no
other updates of P(n). It simply lost the fraction at its edges
irreversibly, no condition, no compensation for it anywhere else.

The compensation occurs in the next line, where he calculates the new
lower boundary point of the interval, Q(n) as (again streamlined, see
paper):

Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2)

where now p(x<xn) is conditional cumulative probability for symbols
ahead of xn. For example (this is binary coder) if xn=0, then (2)
doesn't add anything since no other symbol x is smaller than xn. If
xn=1 then (2) adds to the lower boundary Q(n-1), the product:
p(0)*P(n-1), where p(0) is probability of 0.

The same conclusion follows from the basic accounting of the precisions
and buffers used by AC: there are two real numbers < 1.0 in the coding
loop whose precision grows indefinitely for the unlimited precsion AC:

a) the Pc, which is the product of all probabilities of the symbols
encounterd in the input, and

b) the cummulative probability Qc (which is the exact full precision
rational number < 1.0; note that the Q(n) in (2) is just an integer
representing the trailing bits of Qc as seen from the current AC
window).

The Qc is the content of the AC's encoded output. The precision of that
number does grow indefinitely, since it is the output itself. The other
number, also large with unlimited precsion AC, the message probability
Pc does not grow indefinitely. Its precision is reduced in each step in
(1). Its fraction beyond the F bits precision is discarded
unconditionally and irreversibly after each new symbol -- in (1) we
multiply previous Pc with the probability of the new symbol, and
truncate the result to F significant bits. If you are saying that at
the end the AC has somehow also computed the product Pc in the full
precision, then where is it stored? That number is _independent_ of the
value Qc, hence you would need a second number of indefinite length,
besides Qc, which would be some kind of second work buffer of the size
of output. There is no such buffer in AC.

nightlight

unread,

Jan 10, 2006, 12:28:33 PM1/10/06

to

I gave you the incorrect page & line number to look at (that was for a
similar code in the decoder, which still does the same, but it may
confuse the discussion).

The proper page for the encoder is 47 in [41a], the 1st line with
asterisk. See the details on that code in the post below to Thomas
Richter (you're both wrong the same way):

http://groups.google.com/group/comp.compression/msg/39c25c38b882532e

David A. Scott

unread,

Jan 10, 2006, 12:47:39 PM1/10/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136914113....@f14g2000cwb.googlegroups.com:

>
> I gave you the incorrect page & line number to look at (that was for a
> similar code in the decoder, which still does the same, but it may
> confuse the discussion).
>
> The proper page for the encoder is 47 in [41a], the 1st line with
> asterisk. See the details on that code in the post below to Thomas
> Richter (you're both wrong the same way):
>
>

Actually you have yet to give a correct page and line number to
show what you claim. All you do is farther show people how will read
you are yet how little you seem to comprehend. Why is that?
You have proposed a test for a arithmetic coder to put it in a bad
light yet. Even this simple test you can't seem to do with a coder of
your own design. Why is that?

The fact that there are simple arithmetic coders that contain no gaps
at the end of file or through the compression show that you know nothing
about arithmetic compression. Quoting or misquoting a paper has little
to do with reality when working code exists that shows your wrong. What
is it about simple real world working code that you can't comprehend.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **

Disclaimer:I am in no way responsible for any of the statements

nightlight

unread,

Jan 10, 2006, 1:19:50 PM1/10/06

to

> Even this simple test you can't seem to do with a coder of
> your own design. Why is that?

The QI source code (which was public for a while now) and the compiled
executable QI.exe which comes with it, compress the 1,000,000 symbol
input, alphabet size A=3, to smaller size than any AC you can ever have
(even in principle). I already posted on that few messages earlier. You
can run the program with the command line given there and see for
yourself (and then look in the source and see how it did it). If you
want a 10,000,000 symbol input, you need to change the upper limit in
the header to 10,000,000, as described in that post (the default limit
for max number of symbols in the source was arbitrarily set to 1M
symbols). If you want it to read/write to a file, you can play with
that, too.

That's why the code was released in the first place -- so that anyone
who is interested in specific questions that would require some work on
my part which isn't already on my list of things to do, can do it
himself and find out the answer. There is also the 1stWorks Corp.
contact email on the QI web page where you can send any other requests
for consideration.

--------
Earlier post on QI.exe coding performance on 1,000,000 symbol (A=3)
input:
http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

David A. Scott

unread,

Jan 10, 2006, 2:31:52 PM1/10/06

to

"nightlight" <night...@omegapoint.com> wrote in
news:1136917190....@g47g2000cwa.googlegroups.com:

>
> The QI source code (which was public for a while now) and the compiled
> executable QI.exe which comes with it, compress the 1,000,000 symbol
> input, alphabet size A=3, to smaller size than any AC you can ever have
> (even in principle). I already posted on that few messages earlier. You
> can run the program with the command line given there and see for
> yourself (and then look in the source and see how it did it). If you
> want a 10,000,000 symbol input, you need to change the upper limit in
> the header to 10,000,000, as described in that post (the default limit
> for max number of symbols in the source was arbitrarily set to 1M
> symbols). If you want it to read/write to a file, you can play with
> that, too.
>
>

Again you show your incredable lack of intelligence. I write
bijective arithemtic coders. Even if your QI was any good and
the more you give useless rants it seems likely its not very good
since you don't have a basic understanding of arithmetic compression.

This has been pointed out to you several times. Let me try to
explain it to you one more time. BIJECTIVE ARITHEMTIC FILE CODERS
EXIST. Therefore the set of compressed files is nothing but a reordering
of every possible input file. There are no gaps period.
Again since you seem not to comprehend the obvious. THERE ARE NO GAPS.

If the problem is such you make the BIJECTIVE ARITHMETIC CODER
work only on files of type X in this case to be in X the file
has to have only the letters A B C. Let the output set be any
possible file call the set Y then if x is any element of X
and y is any element of Y and the coding is bijective.

That is
compress( uncompress ( y) ) = y
and
uncompress ( compress (x) ) = x

You have an optimal compressor by defination. If the above holds for
every possible x and y. Can you do this in a QI file compression
the more you stall the less likely you can. The point is even if you
could write an optimal compress for this class of files at the best
its a slightly different reordering and thats only if you could write
such a compressor. But in terms of compression it will never always
beat the equivalent bijective file compressor. Its not rocket
science I am sure if you have normal people working for you even
they could tell you where you have gone wrong. Unless you would fire
them for daring to question your great knowledge.

Again this is not a real world test and you know it. But just
for laughs how small does it compress a 1,000,000 symbol input
where A = 3? Is this a zero order static compression or not? You do
know the difference don't you? And if you are only calculating the
size of the index its not a fair comparasion. Since you yourself state
you needed two other numbers to go along with it. ONE a count field for the
number of inputs. TWO the number of inputs where a one is used. So
thats 3 fields. Do you skip this important info in your compression
comparasion with the highly modifed arithmetic coder you chose to use.

Look you modified a general arithmetic coder to color you comparisons.
You stated a test for a real airhtmetic file coder. You have been wrong
on what arithmetic compression does so are you know saying you don't
have the time to do the test you yourself proposed. Do you really think
most people here belive anything your saying whem you will not do a real
test?

You must really think people are stupid if you cliam you have test code
yet it somehow seems incapable of using real world files. How would you
feel if we wrote the equivalent and changed you code to fit our needs
and then the people bad mouth your code. I can tell you now you would not
like it. I don't like it when you bad mouth arithmetic and then refuse
to do a real file test. What are you hiding? Are you admiting in your
own why it does not compress very good? I would mod you code to work with
real files but then when it sucks you will scream as we are screaming that
it was not done correctly.

Do you comprehend anything being said here or is your mind so closed
that you will not take the time to back you own claims.

You can be glad I am not reviewing your stuff since the way you present
it without a real test I would reject it. You might have something
you might not but its clear you have not really tested it in a real
world way.

Matt Mahoney

unread,

Jan 10, 2006, 2:36:32 PM1/10/06

to

Oops, you're right, it's 29 bytes over. I also get 1981232 bytes.

For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over.

fpaq1 compresses files of all zero bytes as follows:
0 -> 1
1 -> 2
10 -> 5
100 -> 9
1000 -> 13
10^4 -> 17
10^5 -> 21
10^6 -> 25
10^7 -> 29
10^8 -> 34

Here is a hex dump of the 34 byte compressed file. I'm not sure where
the small inefficiencly is.

FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
FF FF FF FF FF FF FF FF FF FF FF FF FF FE 09 DE
BD F7

fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes.
fpaq0: 10^6 -> 17
fpaq1: 10^6 -> 25

but better on all 1 bits (10^6 FF bytes):
fpaq0: 10^6 -> 446
fpaq1: 10^6 -> 25

Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit
precision coder from paqar/paq6fb (carry counter and 1 bit at a time
I/O). It also improves on fpaq0 using only 32 bit arithmetic.

-- Matt Mahoney

niels.f...@seies.de

unread,

Jan 10, 2006, 2:41:19 PM1/10/06

to

Hy nightlight;

> ...

>
> P(n) = floor[p * P(n-1)] .... (1)
>

> ...

>
> Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2)

Mei is this a nitpicking discussion. I think after this post
I got your idea, so I did this picture with a (im)possible
development of the probabilities.

I think you mean that the limited precision arithmetic coder
does not code exactly on the probability of the given source-
symbol.
Also you say that because there is no backfeed-correction of
the modified probability, in result this quantization noise
adds up.

I agree with that, in the picture I hope to got it right. The
blue one is the backfeed-quantizer, the green the unlimited
precision-quantizer, the red the AC and the yellow the QI.
I didn't read the paper of you so I don't know if yours is
not maybe the blue one.

I agree with that (completly) only in the case of static
modeling! In the case of adaptive modeling it is in fact
not true that the exact probability is the best predictor.
The difference between adaptive and static modeling is that
static nows it all and adaptive modeling is guessing. The
fixed precision quantized _guess_ in the adaptive case may
be(come) _better_ than the infinite precision _guess_.

So there are sources for an adaptive modeled AC where rounding
down produces _smaller_ output. For example when you always
underestimate the wrongly predicted MPS.

Ciao
Niels

P.S.: the pic is in http://www.paradice-insight.us/pics/AC.png

David A. Scott

unread,

Jan 10, 2006, 3:18:01 PM1/10/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136921792.7...@g43g2000cwa.googlegroups.com:

First of all this may be totally wrong but its my gut feeling.

IN you coder X1 and X2 have to have the same bit patterns before
you dump it out. Look at these two set of data using 32 bit
registers.

X1 0x7FFFFFFF X2 0x80000000 difference 1. and nothing written out

while

X1 0x12345678 X2 0x12345679 difference 1 and 3 bytes written out.

I am sure that in practice its not usually that bad and for the
case where you did the three symbols nothing like above popped up.

But when you try all zeros or ones it may pop up and then all the
sudden your actaully using a lot less than 32 bits for the state
registers.

David A. Scott

unread,

Jan 10, 2006, 4:23:01 PM1/10/06

to

"Matt Mahoney" <matma...@yahoo.com> wrote in

news:1136921792.7...@g43g2000cwa.googlegroups.com:

Here is the result of arb255 first on a file of
all zeros then on a file of all 0xFF each of
which is 1,000,000 bytes long

0000 E0 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5 *.,0..R....Ag.L..*
0010 EC 4A E7 25 C2 D8 60 . . . . . . . . . *.J.%..`*
number of bytes is 23
0000 1F 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5 *.,0..R....Ag.L..*
0010 EC 4A E7 25 C2 C7 A0 . . . . . . . . . *.J.%...*
number of bytes is 23

Note only the first byte and last few are different.
The lack of a string of ones and zeros on output is how
I do the I/O there is a mapping with a hope of making it
more stable so you will not see the string of FFF or 000
for long repeats. Also as an unexpected side affect it
would be better for a last pass compression that is getting
ready for an encryption pass.

In this case only your leading 0 for the 9 bits for 8 caused
the expansion since in the case the freeend was so large it
as if we both wrote out your last 1 for the count. At least in
the large register cases.

Note even if alternate I/O used the code most peple either
break intervals so a "one" is high or low.
Or they break the interval so most probalby is either high
or low.

For some reason I choose to do it totally different again so
that it would compress better and as a side effect be better
for the file if a last pass in compression before the
encryption pass.

I feel strongly 23 is most likely optimal and that you should
with in a byte or so get the same length for fpaq0 if its all zeros
or all ones. IN your case its like it used to low a probability
when doing the all ones and used to high when doing all zeros.
It most have something the fact you don't carry and that for this
case 32 bits without carry not enough.

nightlight

unread,

Jan 11, 2006, 3:12:00 AM1/11/06

to

>-- Willem wrote:
>
> Well, yeah. It is the 'always compresses better' that
> he keeps harping on about that just simply isn't true.
> And in his discussions, he keeps on demonstrating
> this misunderstanding of his.
> http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50

There is no misunderstanding here. There are few frivolous ways in
which "always compresses better" cannot obviously be true and which are
not worth cluttering a discussion with by guarding against when talking
with presumably informed participants. Hence, a statement "always
compresses better" made in this newsgroup, where one can assume
informed readers, should naturally be understood to include implicitly
allowances such as "excluding these frivolus cases (a),(b),(c)... to
which the statement does not apply".

It appears that, having no argument against the substance of the claim,
you have fallen to clinging onto some of these frivolus ways of
circumventing "always". Let me review few of these.

Say, we have agreed to a data set of 256 input samples that we wish to
use to test the QI claims against. You can always "prove" the QI claims
false using any the following "compression techniques" or
"enhancements" of AC:

a) You can write a QI work-alike (or if need be, just a clone, since
the QI source is publicly available), which will create exactly the
same output size as QI, at least on the given test. Hence it trivially
cannot be true that QI "always compresses better" than any other
entropy coder.

b) You may insist that the sample inputs must be fed to the coders as
"files", so that the OS will keep track of their lengths. Then you can
"enhance" any existent AC (or generally any other compressor which
assumes that its output must be a self-terminating code) by providing
the AC decoder with the compressed length you obtained from the OS,
which allows you to save approximately log(H) bits on the AC decode
termination cost. If per chance the competitor doesn't follow the suit
and "enhance" similarly his coder, or if he is entirely unaware that
you are using this age old kind of "compressor helper", you've just
pocketed a log(H) bits edge.

c) You can compile the test set into your coder and decoder executables
and then "compress" any of the 256 test inputs by simply transmitting
to the decoder the 8 bit number telling it which of the samples to
retrieve from the built in list.

d) You can refine the technique (c), so that it is not as blatantly
obvious and which doesn't take as much room. Instead of including all
of the 256 test files into your coder, you just select one and make an
"enhancement" on top of regular AC, so that for that selected input the
"enhancement" outputs a single bit 0, which decoder uses to retrieve
the built in pre-selected sample, and for the remaining 255 samples, it
outputs 1 followed by the original output of the AC. You can further
refine this, so it is harder to detect, by not storing the entire input
sample, but just some section of it, and also by not going all the way
to a single bit, but maybe few or some such. The AC with this
"enhancement" will come out worse off by 1 bit on average, but at least
it will "prove" the QI claim wrong. What's 1 little bit, compared to
the great "victory".

e) But even the "technology" (d), however subtle with its refinements
compared to (c), still leaves some vulnerabilities. It still kind of
looks like cheating, and one can get caught since there as an 'intenet'
hiding in there. Plus it breaks down if the data set changes and you
don't get chance to recompile for the new set. The breaktrough of the
present method is to do a randomized version of (d), where you
"enhance" the AC by letting a random number generator help select which
of the input message patterns, which may be just small bit-string
sections bound to occur by chance in any nontrivial test set, will get
shorter codewords and by how much. As in (d), at some cost to the
average performance, this kind of "compression technology" can "prove"
the QI claim wrong, although it may take much longer time than (d) to
find an actual example "disproving" the claim. But, in return for the
longer wait (possibly astronomically long) compared to (d), this
"compression technology" doesn't require any test set to be given
upfront and it is "good" not just against QI but against any other
genuine advance in the future, since it always can "beat them" at least
on some inputs. And you virtually cannot get caught, since the
deterministic intent of (d) has become a random fluctuation beyond
human will or control.

>From your latest line of arguments on 'random quantization' it would
seem you have taken your final fallback position -- the "compression
technology" of class (e), which is the hardened version of technique
(d), which in turn was a subtler variant of method (c). The random
generator is simply the set of discarded fractional parts of the AC
'range', which in turn can always be used to select a small, random
fluctuation in the codeword lengths (at some cost to the average output
size), hence implement the "compression technology" (e). Well, you're
welcome to continue clinging onto that.

Willem

unread,

Jan 11, 2006, 3:31:09 AM1/11/06

to

nightlight wrote:
)>-- Willem wrote:
)>
)> Well, yeah. It is the 'always compresses better' that
)> he keeps harping on about that just simply isn't true.
)> And in his discussions, he keeps on demonstrating
)> this misunderstanding of his.
)> http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50
)
) There is no misunderstanding here. There are few frivolous ways in
) which "always compresses better" cannot obviously be true and which are
) not worth cluttering a discussion with by guarding against when talking
) with presumably informed participants. Hence, a statement "always
) compresses better" made in this newsgroup, where one can assume
) informed readers, should naturally be understood to include implicitly
) allowances such as "excluding these frivolus cases (a),(b),(c)... to
) which the statement does not apply".

The statement 'always compresses better' is simply false as such, and
should be replaced by 'compresses better on average'. I have made clear
from the start that this was my argument. Your statement was and is based
on a misunderstanding of how AC works. It's as simple as that.

If you had read this newsgroup for longer instead of immediately starting
to post, as is described in the netiquette guidelines, you would have
realised that 'no compressor always outperforms another' is one of the
basics here, and as this is what your statement blatantly goes against,
this caused everyone to fall over you.

The wise move would be to learn from this, instead of clinging to your
claim and going up against everyone.

SaSW, Willem
--

Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be

nightlight

unread,

Jan 11, 2006, 4:13:26 AM1/11/06

to

> The statement 'always compresses better' is simply false as such,
> and should be replaced by 'compresses better on average'.

The statement "better on average" does not capture the essential
relation between the two coders. They are almost twins regarding the
coding efficiency and the method of enumerating messages, at least
when everything else is set the same (such as coding in decerementing
AC mode), except that QI quantization is optimal, while AC
quantization is sub-optimal. Hence, aside for the few frivolous
loopholes, the factual implication is that QI will always compress
better than AC simply because its addends, which are the same between
the two except for scaling factor, expand slower. The AC's truncated
infinite fractions (along with the top-down enforcement of the Kraft
inequality constraints), which don't exist in the integer EC/QI
formulation { whether you leave them as code space gaps arising from
the excess contraction of Pc or shift the burden randomly to the cost
of skewed coding probabilites, using the discarded fractions as the
random number generator to fluctuate probabilities, losing on average
but getting around "always"} make the relation in compression
efficiencies entirely one sided (not to mention speed, where the main
practical difference is). When AC is set to code optimally on average,
QI will always produce smaller output. That's why "always" [except the
trivial] is a proper characterization. That you can trade AC
optimality to get around "always" is a frivolous observation as
illustrated in the listing of few such "techniques".

Thomas Richter

unread,

Jan 11, 2006, 4:16:09 AM1/11/06

to

Hi Again,

> > No. You are confused. If you always round down *interval
> > boundaries*, you do not round down *interval sizes*
> > because the topmost interval then gets larger.

> Sorry, I gave you the wrong page & the wrong line number (that was for
> similar code in the decoder).

> Take a look in (41a) p. 47, the formula in the 1st line with astersk.

Look, I prefer to look at working code, and I know what it does.

> Note that at the top he initializes value P(0) to 2^F (where F is the
> mantissa precision in bits), i.e. the quantity P is the mantissa of the
> actual Pc, meaning the initial width is 1.00.... The fraction dropped
> in (1) is gone from P(n) which is to be used on next step. There are no
> other updates of P(n). It simply lost the fraction at its edges
> irreversibly, no condition, no compensation for it anywhere else.

You are confused. The value P(n) is *nowhere* stored in a sane AC encoder,
it is implicit by size of the interval used to encode the symbol n.

> The compensation occurs in the next line, where he calculates the new
> lower boundary point of the interval, Q(n) as (again streamlined, see
> paper):

> Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2)

> where now p(x<xn) is conditional cumulative probability for symbols
> ahead of xn. For example (this is binary coder) if xn=0, then (2)
> doesn't add anything since no other symbol x is smaller than xn. If
> xn=1 then (2) adds to the lower boundary Q(n-1), the product:
> p(0)*P(n-1), where p(0) is probability of 0.

> The same conclusion follows from the basic accounting of the precisions
> and buffers used by AC: there are two real numbers < 1.0 in the coding
> loop whose precision grows indefinitely for the unlimited precsion AC:

> a) the Pc, which is the product of all probabilities of the symbols
> encounterd in the input,

No. Pc is the result of applying some kind of modelling of the input.
This *need not* to relate to the relative frequencies of the symbols
found in an actual sequence. A static model would have Pc *fixed* once
and for all.

> b) the cummulative probability Qc (which is the exact full precision
> rational number < 1.0; note that the Q(n) in (2) is just an integer
> representing the trailing bits of Qc as seen from the current AC
> window).

What is typically kept as a model is Qc, and not Pc. If you cannot
keep Qc to full precision, you clearly loose coding efficiency because
the "model" implied by the Qc does no longer fit to the model you
intended (i.e. the coder no longer encodes the proposed model optimally,
but rather another model). But nevertheless, there are no gaps. If
you follow the formula (2) closely, you'd see that for the "topmost" symbol
the update rule says that the upper boundary of the coding interval
stays constant, whereas the lower boundary is updated and "rounded down",
making the interval larger than it should, and thus making Pc larger
than it should. This means a discrepancy between model and implementation,
but no gaps.

> The Qc is the content of the AC's encoded output. The precision of that
> number does grow indefinitely, since it is the output itself. The other
> number, also large with unlimited precsion AC, the message probability
> Pc does not grow indefinitely.

Neither Qc nor Pc have infinite precision in a realistic implementation.

> Its precision is reduced in each step in
> (1). Its fraction beyond the F bits precision is discarded
> unconditionally and irreversibly after each new symbol -- in (1) we
> multiply previous Pc with the probability of the new symbol,

No, *that* is a model update. You *can* do that, but there's no need
to drive the model like this.

> and
> truncate the result to F significant bits. If you are saying that at
> the end the AC has somehow also computed the product Pc in the full
> precision, then where is it stored?

I never stated that Pc is kept in infinite precision. I stated that
there are no gaps. In fact, Pc is *nowhere* stored. Instead, high and
low interval counts are stored.

> That number is _independent_ of the
> value Qc, hence you would need a second number of indefinite length,
> besides Qc, which would be some kind of second work buffer of the size
> of output. There is no such buffer in AC.

So then, this doesn't prove that there are gaps. It only proves that
AC cannot implement all possible models. That is true in first place
since the Qcs are quantized (and thus the Pcs) by the precision
limitation.

So long,
Thomas

Thomas Richter

unread,

Jan 11, 2006, 4:26:34 AM1/11/06

to

Hi again,

> I gave you above the places to check, which are not some "poorly
> written AC source" but the well known reference implementations of the
> AC coders. So, look Moffat98 or WNC87 source or the reference [41a]
> above which shows it quite clearly how the Pc is updated.

Aparently, you don't read the sources correctly.

> You are welcome to show a coder which doesn't round down the size of
> the updated total range on every symbol. (It still has to be able to
> decode, though.)

Oh my, that should be your job. Ok, so here we go. The following is
code from a real, existing arithmetic coder students of mine wrote.
It works, is decodable and has no gaps, and it "doesn't round Pc down".

I added comments:

void ArithCod::Encode(UWORD low_count,
UWORD high_count,
UWORD total)
//
// This encodes a symbol i where high_count is the (scaled) probability
// of finding a symbol with index smaller or equal than the current one,
// and low_count is the (scaled) probability of finding a symbol whose index
// is exactly smaller than the current. total is the scaling factor.
// Specifically, low_count/total = Q[n-1], high_count/total = Q[n] in
// your notation. low_count for the lowest symbol is therefore zero,
// high_count for the topmost (last) symbol in the alphabet equals total.
// m_Low and m_High are the (scaled) borders of the coding interval.
{
// compute scaling factor
ULONG step = (m_High-m_Low+1)/total;

// scale upper and lower interval borders
m_High = m_Low + step*high_count - 1;
m_Low = m_Low + step*low_count;

// This is the update step. Now what happens for the first symbol
// of the alphabet: low remains constant, high is scaled and
// due to the finite precision of "step" rounded down.
// For the last symbol, step * high_count = m_High - m_Low + 1
// by a simple identity, thus m_High stays constant and m_Low
// is rounded down. -> The implied probability grows.
// For all other symbols between, m_Low of the symbol n scales
// as m_High of the symbol n-1. (Compute!)
// Thus, no coding gaps and the claim that Pc is always rounded
// down is refuted.

//
// ensure that m_High and m_Low are not in the same half
// nb: here we generate the output bits!
while ((m_High & m_Half) == (m_Low & m_Half)) {
m_Stream.Put(m_High & m_Half); // argument casted to bool
while (m_UnderflowBits > 0) {
// output saved underflow bits
m_UnderflowBits--;
m_Stream.Put(~m_High & m_Half);
}

// before scaling | after scaling | output bit
// ===============+===============+============
// m_Low: 00xy... | 0xy... | 0
// m_High: 00ab... | 0ab..1 |
// or
// m_Low: 01xy... | 0xy... | 1
// m_High: 01ab... | 0ab..1 |

// m_Half is the representation of 0.5 in the precision
// of the coder, namely 0x80000000 in the current implementation
m_Low &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!)
m_Low <<= 1;

m_High &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!)
m_High <<= 1;
m_High |= 1;

// Here low and high are updated and scaled.
}

// prevent underflow if m_Low and m_High are near to m_Half
// This is the resolution of the carry-over problem.
while ((m_Low & m_Quarter) && !(m_High & m_Quarter)) {
m_UnderflowBits++;

// before scaling | after scaling
// ===============+==============
// m_Low: 001xy... | 00xy...
// m_High: 010ab... | 01ab..1

m_Low &= ~m_Quarter;
m_Low <<= 1;

m_High &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!)
m_High <<= 1;
m_High |= 1|m_Half;
}
}

So, that's it. Now, where is the "always rounds down" part you claim
to have, and where are the coding gaps?

So long,
Thomas