Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

QI and MQ Coder: First real-life experiences

17 views

Skip to first unread message

Thomas Richter

unread,

Jan 28, 2006, 5:30:20 PM1/28/06

Hi group,

concerning the ongoing discussion of the QI coder vs. the arithmetic
coding, I played now a bit with a real life example, namely audio
compression.

The test application is a mildly drilled-up QI coder source from 1st
Works, as found on their web-page, ported for Unix (thanks Jasen Betts!)
and equipped with a very simplistic audio codec front-end, consisting of
a linear predictor and a bitplane context modelling. QI was used in the
largest possible block size (16K bits), MQ coding with its default
context update tables. The code for QI/MQ is found below for anyone who
feels like reproducing my results; *NOTE THAT I CANNOT GRANT YOU THE
RIGHTS TO USE THESE CODES FOR ANYTHING BUT DEMONSTRATION/RESEARCH*.

Both codes were compiled with gcc-3.4.4, -O2 optimization for QI,
default options for MQ (nothing but default optimization with -O). Codes
read from stdin, create data on stdout. I didn't try to speedup the IO
by blocking or other nice tricks on either code. Test data was the first
movement of Brahms 1st Symphony, conducted by Leonard Bernstein, Wiener
Phillharmoniker, Deutsche Gramophon recording. It is an approximately
15MB large audio track, encoded as a 16bpp wav file. That was the
topmost on the stack of CDs here, no special reason to pick it. The
system is a 1.4Ghz Athlon XP machine running Debian 3.1 "Sarge" with
512MB main memory. Thus, coding results are possibly not representative,
I just had to make a test.

Some results:

1) It is pretty hard to get the data into the right shape for QI. This
also implies that it is nearly impossible - or at least, for me, right
now, pretty hard - to get it into a really decent/usable context
modelling. Thus, something very simple had to be set up. Creating a
simple context model for MQ was a "piece of cake" on the other hand. See
also below.

2) Running time:

thor@skywise:~/src/qic> time QI cc </tmp/track1.wav >/tmp/track1.qi

real 4m12.484s
user 4m3.640s
sys 0m4.340s

thor@skywise:~/src/audiocompress> time arthdeco_wide </tmp/track1.wav
>/tmp/track1.qm

real 2m25.064s
user 2m4.880s
sys 0m1.860s

Thus, MQ shows a clear advantage here (about a factor of two). The
reason is possibly to a major degree due to the difficulty to get the
data into the right shape for QI, thus a good deal of time is spend in
just juggling bits. This is overall running time.

3) Memory footprint:

The MQ coder requires almost no memory except its tables, the code
doesn't allocate memory at all. QI eats with 16K blocks about half of my
system's main memory (something along 300MB estimated).

4) Coding efficiency:

-rw-r--r-- 1 thor thor 148708208 2006-01-28 21:35 track1.mq
-rw-r--r-- 1 thor thor 159764576 2006-01-28 22:49 track1.qi

Topmost is the MQ coder output, bottommost the QI coder output. MQ shows
a clear advantage here, though the comparison is not quite fair. I added
a simpler, slightly more sophisticated context model to the MQ coder. I
haven't been able to add a similar context model to the QI coder. Thus,
I removed the more advanced context model from the MQ coder and played
the game again, the result is as such for the MQ coder:

rw-r--r-- 1 thor thor 154397815 2006-01-28 23:11 track1.qm

This above is the MQ coder with the *identical* context model to the QI,
thus even here it encodes better. Wierd?

Node that all these tests are pretty much preliminary, thus anyone is
invited to try his/hers luck to improve the code and play with it.

5) Efficiency analysis: I think the current optimality analysis might
have missed a term: The QI coder requires the side-information of how
many LPS-bits have been encoded, and whether the LPS is 1 or 0. In the
above coding analysis, this is makes up about 200K, thus even removing
this side-information (thus making the stream non-decodable) does not
yet make QI better than MQ. Node also that I had byte-padding at
block-boundaries. This is included in the 200K estimated: I wrote an
additional 2 bytes per block, and came out with 200K more.

6) If anyone feels like reproducing, please add the following code to
QI, and drill up the main program apropriately to use 16K block sizes,
and bypass the random-bit-test here:

/* snip */

/*
** Push the indicated bit buffer to stdout.
** Arguments are the bit buffer, the total number of bits
** in this buffer and the number of one-bits, as required for
** QI. This is not yet fully optimal as the encoder gets
** flushed to byte-boundaries for simplicity reasons.
*/
void push_buffer(unsigned char *bits,int totalbits,int onebits)
{
int exchange = 0;
int code = 0;
unsigned char outbuffer[2048+4];
/*
** Encode the number of bits. Two bytes suffer. Also encode
** whether we have the exchange case with more one-bits than
** zero bits
*/
if (onebits > (totalbits >> 1)) {
int j;
exchange = 1; /* conditional exchange */
for(j = 0;j < ((totalbits + 7) >> 3);j++) {
bits[j] ^= 0xff;
}
onebits = totalbits - onebits;
}
if (totalbits < (2048 << 3)) {
putchar((totalbits >> 8) | (exchange << 7));
putchar(totalbits & 0xff);
putchar((exchange << 7) | (onebits >> 8));
putchar(onebits & 0xff);
} else {
/* full buffer case. Avoid the overhead */
putchar((exchange << 7) | (1<<6) | (onebits >> 8));
putchar(onebits & 0xff);
}
memset(outbuffer,0,sizeof(outbuffer));
/* Get the number of output bits.
** Need not to be encoded, the decoder can reconstruct
** this from the number of one-bits. This is the number
** of bytes to write. Could be made better by not enforcing
** byte-padding here.
** Unfortunately, QI doesn't seem to be able to handle an
** all-zero buffer. Since this is encoded in the above with
** the number of one-bits, bypass in that case.
*/
code = (bt.len[onebits] + 7) >> 3;
/*
** Run now the encoder
*/
if (onebits) {
qiEnc(bits,0,onebits,outbuffer,0);
fwrite(outbuffer,1,code,stdout);
}
}

/*
** Simple and stupid encoding for audio compression
*/
void compress(void)
{
unsigned char bitbuffer[16][2048];
unsigned int bitpos[16];
unsigned int bitcnt[16];
unsigned int last = 0,lastlast = 0;
int pred,j;

memset(bitpos,0,sizeof(bitpos));
memset(bitcnt,0,sizeof(bitcnt));

do {
int c1,c2;
// Try to fill the input buffer by continuously reading from the
input stream.
c1 = getchar();
c2 = getchar();
if (c2 >= 0) {
unsigned int c = ((c1 << 8) | c2);
unsigned int code;
/*
** Make unsigned, level shift
*/
c = (c + (1<<15)) & 0xffff;
/*
** Simple linear prediction,
** Form code, grey - coding for
** bitplane decorrelation.
*/
pred = (last << 1) - lastlast;
code = (c - pred) & 0xffff;
code ^= (code >> 1);
/*
** Sort into the contexts.
*/
for(j = 0;j < 16;j++) {
int pos = bitpos[j];
if (code & (1 << j)) {
/* The bit is one */
bitbuffer[j][pos >> 3] |= 1UL << (pos & 0x07);
bitcnt[j]++;
} else {
bitbuffer[j][pos >> 3] &= ~(1UL << (pos & 07));
}
bitpos[j]++;
//
// Check whether the bit position is out of range. If
// so, empty the buffer by writing it out via QI.
if (bitpos[j] >= (2048<<3)) {
push_buffer(bitbuffer[j],bitpos[j],bitcnt[j]);
bitcnt[j] = 0;
bitpos[j] = 0;
}
}
/*
** Update the predictor.
*/
lastlast = last;
last = c;
} else break;
} while(1);
/*
** flush the remaining parts of the buffer
*/
for(j=0;j < 16;j++) {
push_buffer(bitbuffer[j],bitpos[j],bitcnt[j]);
bitcnt[j] = 0;
}
}

/* snip */

Thus, if anyone finds a bug in the above that might have had negative
impact on the coding effiency, please let me know. Also, if anyone feels
like integrating this better into the program, let me know. I rather
hacked it in.

Finally, here's the similar MQ coder code. Note that the MQ coder is
covered by patents (IBM and Mitsubishi and maybe others), thus you
cannot use it "as is" in products, and I do not give any guarantee
concerning the usability, reliablity and similar of the code. Please use
this only for research purposes. The context modelling is commented out
here, and the MQ coder is *not* implemented optimally (one can do better
than this!)

/* snip */

/*
** Simple predictive arithmetic audio coding, (c) 2004 Thomas Richter,
** THOR Software.
** $Id: arthdeco_wide.c,v 1.5 2004/10/10 20:06:33 thor Exp $
**
*/

#include <stdio.h>
#include <string.h>

/*
** context of the arithmetic coder: index into the probability table,
** most probable symbol
*/
struct MQContext {
unsigned char index;
unsigned char mp;
};

/*
** Arithmetic coder structure.
** The MQ coder is patented software, IBM/Mitsubishi.
*/
struct MQCoder {
unsigned long a; /* the inverval */
unsigned long c; /* computation register */
int ct; /* bit counter */
int flag;
unsigned char b; /* b output register */
};

/*
** MQ coder lookup tables
*/
const unsigned short Qe_Value[] = {
0x5601,0x3401,0x1801,0x0ac1,0x0521,0x0221,0x5601,0x5401,0x4801,0x3801,
0x3001,0x2401,0x1c01,0x1601,0x5601,0x54ff,0x5401,0x527d,0x5101,0x4c5f,
0x4801,0x3f80,0x3801,0x35f7,0x3401,0x31f6,0x3001,0x2801,0x2401,0x2201,
0x1c01,0x1801,0x1601,0x1401,0x1201,0x1101,0x0ac1,0x09c1,0x08a1,0x0521,
0x0441,0x02a1,0x0221,0x0141,0x0111,0x0085,0x0049,0x0025,0x0015,0x0009,
0x0005,0x0001,0x5601
};

/*
** MSB/LSB switch flags
*/
const unsigned char Qe_Switch[] = {
1,0,0,0,0,0,1,0,0,0,
0,0,0,0,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,
0,0,0
};

/*
** Next state on MPS coding
*/
const unsigned char Qe_NMPS[] = {
1,2,3,4,5,44,7,8,9,10,
11,12,13,35,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,
31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,45,47,48,49,50,
51,51,52
};

/*
** Next state on LPS coding
*/
const unsigned char Qe_NLPS[] = {
1,6,9,12,35,39,6,14,14,14,
20,22,25,27,14,14,14,15,16,17,
18,19,20,21,22,23,24,25,26,27,
28,29,30,31,32,33,34,35,36,37,
38,39,40,41,42,43,44,45,46,47,
48,49,52
};

static unsigned int getwchar(void)
{
int c1,c2;

c1 = getchar();
c2 = getchar();

/* This is big-endian */

return ((c1 << 8) | c2);
}

static void putwchar(unsigned int c)
{
putchar(c >> 8);
putchar(c & 0xff);
}

/*
** Open the MQ coder for writing
*/
static void OpenForWrite(struct MQCoder *self)
{
self->a = 0x8000;
self->c = 0;
self->ct = 12;
self->b = 0;
self->flag = 0;
}

static void OpenForRead(struct MQCoder *self)
{
unsigned char t;

self->b = getchar();
self->c = self->b << 16;
t = getchar();
self->ct = 8;
if (self->b == 0xff) {
if (t < 0x90) {
self->c += t << 8;
self->ct--;
}
}
self->c += t << 8;
self->b = t;
self->c <<= 7;
self->ct -= 7;
self->a = 0x8000;
}

static int GetBit(struct MQCoder *self,struct MQContext *ctx)
{
unsigned long q = Qe_Value[ctx->index];
unsigned char t;
int d;

self->a -= q;
if ((self->c >> 16) >= q) {
/* MPS case */
self->c -= q << 16;
if (self->a & 0x8000) {
/* short MPS case, return immediately. */
return ctx->mp;
}
/* MPS exchange here */
d = (self->a < q); /* d == 1 on LPS */
} else {
/* LPS exchange here */
d = (self->a >= q); /* d == 1 on LPS */
self->a = q;
}
if (d) {
/* LPS decoding. Check for MPS/LPS exchange, adjust index */
d ^= ctx->mp;
if (Qe_Switch[ctx->index])
ctx->mp = d;
ctx->index = Qe_NLPS[ctx->index];
} else {
/* MPS decoding */
d = ctx->mp;
ctx->index = Qe_NMPS[ctx->index];
}
/* Now renormalize */
do {
if (self->ct == 0) {
t = getchar();
self->ct = 8;
if (self->b == 0xff) {
if (t < 0x90) {
self->c += t << 8;
self->ct--;
}
}
self->c += t << 8;
self->b = t;
}
self->a <<= 1;
self->c <<= 1;
self->ct--;
} while((self->a & 0x8000) == 0);

return d;
}

/*
** Put out a bit thru the MQ coder.
*/
static void PutBit(struct MQCoder *self,struct MQContext *ctx,char bit)
{
unsigned long q = Qe_Value[ctx->index];

self->a -= q;
/*
** Check for MPS/LPS coding
*/
if (bit == ctx->mp) {
/*
** MPS coding
*/
if (self->a & 0x8000) {
/* Short MPS case, no renormalization */
self->c += q;
return;
} else {
/* context change */
if (self->a < q) {
/* MPS/LPS exchange case */
self->a = q;
} else {
self->c += q;
}
ctx->index = Qe_NMPS[ctx->index];
}
} else {
/* LPS coding here */
if (self->a < q) {
self->c += q;
} else {
self->a = q;
}
ctx->mp ^= Qe_Switch[ctx->index];
ctx->index = Qe_NLPS[ctx->index];
}
/*
** Renormalize now.
*/
do {
self->a <<= 1;
self->c <<= 1;
if (--self->ct == 0) {
if (self->b < 0xff) {
if (self->c & 0x8000000) {
/* Overflow into the b register, remove
** carry bit and go on */
self->b++;
self->c &= 0x7ffffff;
}
}
if (self->b == 0xff) {
/* We either have an 0xff here, or generated one due to carry.
** in either case, must have buffered something or the overflow
** could not have happened.
*/
putchar(0xff);
self->b = self->c >> 20;
self->c &= 0xfffff;
self->ct = 7;
} else {
if (self->flag)
putchar(self->b);
self->b = self->c >> 19;
self->c &= 0x7ffff;
self->ct = 8;
}
self->flag = 1;
}
} while((self->a & 0x8000) == 0);
}

/*
** Flush out the remaining bits
*/
static void Flush(struct MQCoder *self)
{
int k;
/*
** Number of bits to push out is then in k.
*/
self->c <<= self->ct;
for(k = 12 - self->ct;k > 0;k -= self->ct,self->c <<= self->ct) {
if (self->b < 0xff) {
if (self->c & 0x8000000) {
self->b++;
self->c &= 0x7ffffff;
}
}
if (self->b == 0xff) {
putchar(0xff);
self->b = self->c >> 20;
self->c &= 0xfffff;
self->ct = 7;
} else {
if (self->flag)
putchar(self->b);
self->b = self->c >> 19;
self->c &= 0x7ffff;
self->ct = 8;
}
self->flag = 1;
}
if (self->b < 0xff) {
if (self->c & 0x8000000) {
self->b++;
}
}
if (self->b != 0xff && self->flag)
putchar(self->b);
}

int compress(void)
{
struct MQContext ctxt[4096*16];
struct MQContext *cx;
struct MQCoder coder;
unsigned int c,code,i;
unsigned int last = 0,lastlast = 0;
int pred;
unsigned int lb = 0;
unsigned int ci;

/* reset all contexts */
memset(&ctxt,0,sizeof(ctxt));
OpenForWrite(&coder);

do {
/* Get input data, level shift to make unsigned
** to avoid overflows which would destroy the
** statistics
*/
c = (getwchar() + (1<<15)) & 0xffff;
if (feof(stdin))
break;
/*
** Form the context. This is here simply
** the last code and thus depends on the
** last three encoded words.
*/
ci = lb >> 4;
/*
** Make a prediction. This is just a linear
** predictor, i.e. a linear slope would be
** predicted to zero.
*/
pred = (last << 1) - lastlast;
/*
** Get the prediction error, possibly including
** wrap-arounds.
*/
code = (c - pred) & 0xffff;
/*
** Build up the context. This context is formed
** by the bit position (16 contexts) and the
** selected context above.
*/
cx = ctxt /*+ (ci << 4)*/;
/*
** keep the last code for the context
*/
lb = code;
/*
** bitplane decorrelation by grey-coding
*/
code ^= (code >> 1);
/*
** Now encode all bitplanes
*/
for(i = 0;i < 16;i++) {
PutBit(&coder,cx,(code & (1 << i))?1:0);
cx++;
}
/*
** keep history for prediction
*/
lastlast = last;
last = c;
} while(1);

Flush(&coder);

return 0;
}

/* snip */

Greetings,
Thomas

David A. Scott

unread,

Jan 28, 2006, 6:14:59 PM1/28/06

Thomas Richter <th...@math.tu-berlin.de> wrote in
news:drgraj$s7p$1...@mamenchi.zrz.TU-Berlin.DE:

>
> Thus, MQ shows a clear advantage here (about a factor of two). The
> reason is possibly to a major degree due to the difficulty to get the
> data into the right shape for QI, thus a good deal of time is spend in
> just juggling bits. This is overall running time.
>
> 3) Memory footprint:
>
> The MQ coder requires almost no memory except its tables, the code
> doesn't allocate memory at all. QI eats with 16K blocks about half of my
> system's main memory (something along 300MB estimated).
>
> 4) Coding efficiency:
>
> -rw-r--r-- 1 thor thor 148708208 2006-01-28 21:35 track1.mq
> -rw-r--r-- 1 thor thor 159764576 2006-01-28 22:49 track1.qi
>
> Topmost is the MQ coder output, bottommost the QI coder output. MQ shows
> a clear advantage here, though the comparison is not quite fair. I added
> a simpler, slightly more sophisticated context model to the MQ coder. I
> haven't been able to add a similar context model to the QI coder. Thus,
> I removed the more advanced context model from the MQ coder and played
> the game again, the result is as such for the MQ coder:
>
> rw-r--r-- 1 thor thor 154397815 2006-01-28 23:11 track1.qm
>
> This above is the MQ coder with the *identical* context model to the QI,
> thus even here it encodes better. Wierd?
>

No this is not wierd. Thank about it when nightlight claimed how bad
arithmetic compressors are yet he failed to understand just what a simple
static 0-order arithmetic file compress does. When my simple example
beats what can be done for the simple case of 3 symbols it became more
clear to me that he doesn't really understand even simple arithmetic
compression.

Also note he was able to modify Moffat to meet his needs but has yet
to replace even a simple arithmetic coder. So what on earth makes you
think this is wierd?

SEE
http://bijective.dogma.net/nitelite.zip

Just wait I expect him to write at least 100 lines why what you did was
impossible and that you did not give it a fair test. So lets wait for
his verbal jujutsu attack. It will be fun to watch.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

nightlight

unread,

Jan 28, 2006, 10:47:42 PM1/28/06

> The test application is a mildly drilled-up QI coder
> source from 1st Works, as found on their web-page,
> ported for Unix (thanks Jasen Betts!)

Glad too hear it can run now on your system. Thanks Jasen,
from this side, too.

Looking at the test code and the description of the test setup,
though, was like watching Lockheed SR-71 Blackbird (a Mach 3+
aircraft), transported via time machine into ancient Greece to
be tested against their chariots, and then watching them harness
four horses to pull and try dragging the poor Blackbird through
some narrow road weaving through a dense forrest. Hardly a test
of SR-71 performance.

{That image, which kept popping into my head as I read through,
was not meant to criticize your test code on its own merit,
since that is how one would test one AC against another AC, one
chariot against another chariot. And I do appreciate very much
that you took time and effort to give it this intial try. In a
year or two, at most five, you may be teaching QI to students in
the information theory or coding classes, as the latest marvel
and the most advanced among the entropy coding algorithms.}

Developing a type of code, to work properly along the lines of
context splitting of method (b) in [M1]:

M1. AC Context coding with QI, method (b):
http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

M2. Bit fractions packaging and mixed radix codes
http://groups.google.com/group/comp.compression/msg/1e015f38d228969e

and then to properly encode the higher model information itself
(note the contexts are not mutually independent for a Markov
source), then to package all such information, including the
exact bit fractions (which QI always produces, [M2]) between the
components, would require a great deal of work. The compression
gains, even if one were coding the AC's contexts optimally with
QI, would still be no better than what some mixed average of the
order-0 inputs shows [T3] vs Moffat98, up to 6%, and a little
bit more for MQ.

To get a more realistic idea how is your MQ doing against QI,
when both are functioning in their native modes, you could plug
into your test program the Moffat98 code:

http://www.cs.mu.oz.au/~alistair/arith_coder/

and see how it does against MQ, speed and compression for a
basic spectrum of order 0 inputs (sizes and densities). That
would give us all a better idea how QI would do when adapted
properly for that application.

Or do a pure order-0 tests, using QI's bit counting and flagging
1's or 0's type coding, as shown in Test.c, e.g. in function
qbc_iter(), where it codes one block as if it were in a
multiblock loop (that was mostly cloned from a multiblock loop).
The AC style bit by bit fiddling doesn't use the QI's skipping
of the MPS at all. Also the bit flipping should be done on 32
bits at a time and while encoding that group of 32 bits (e.g.
as shown in function qiEnc0() in EncDec.c, as called from
qbc_iter() in Test.c).

Or measure nanosec/sym on your MQ coder and compare to similar
test QI.exe on your system shows for the same order-0 input.

If you have an MQ coder C source as distributed by original
authors (which includes binary & at least high entropy limit
multi-alphabet coding, so both could be compared) and provide
link for download, I could set up a fair test for the two (I
didn't plan it, but now that you've gotten this far, I should
probably include an MQ or similar quasi-AC coder test in the
QIC, provided its source can be distributed as a separate
lib/dll linkable to QIC.exe and without any interaction between
the licenses). To test speeds, I would change MQ to code memory
to memory. I would then move all memory allocs outside of the
timed sections (so it works with preallocated buffers in the
timed sections). Then I would feed both coders pre-built large
arrays of order-0 stationary data (which would be randmized for
each iteration). For QI, when competing against MQ, which is
less accurate than Moffat98, I would use a very simple method
(a) (see below) for block boundary coding. For QI all other data
is reduced to this form, hence the result on the arbitrary order
inputs would be some composition of the elmental inputs results,
both for speed and size. This way one would be testing the raw
coding speeds and precision of the coders proper, with all the
noise from the factors outside of the intrinsic difference in
the coding algorithms removed (otherwise the programming task is
much more complex to adjust all other external factors to
equivalent due to fundamental differences in the approach to
coding & modeling).

Regarding the cache, 16K blocks will use 256MB table, which is
well outside of the cache limits on anything even when
everything else is done right. But, due to the serial and up &
down nature of your block to block processing (which are just
2KB per block), with file i/o, modeler and every other compnent
getting in, all cached entries from one block were lost.
Normally, in multi-block coding, with say table limit NT=1K bits
(which is just 128 bytes blocks) one would have many such blocks
belonging to the same enumerative class (fixed 'p' & 'q'). That
way, going from block to block traverses the same swath of the
1MB table of width sqrt(NT)=32 bits, which is about NT^(3/2)
bytes of the table or 32KB, well within cache and one time slice
for the task to remain in cache. E.g. a bit takes about 0.2-20ns
to process (on a 4 years old 2Ghz machine I am typing this on),
which is 1-20 microsec per 1K block (time per bit drops with
size, leveling after 60K to 1MB inputs depending on data
entropy), say 10 microseconds per block on avg, which is 1000
blocks, or 1Mbit, in 10 milliseconds of coding, which is well
below time slice (e.g. 20ms). After the first 2-3 dozen blocks,
the coding largely runs out of the cache.

To get a proper speed & compression test, one would hence use 1K
blocks table limit (which works fine with cache on most
machines), then read in as much data as possible e.g. 4+ MB, the
more the better, then run the modeler on the whole data
available and obtain all of the split contexts as a set of C
separate order-0 segments to encode (BWT can do that in quite a
generic way, even for unknown source). Each segment c=1..C is
bit array of Nc bits (Nc could be several megabits or as large
as practical for given amount of usable RAM and vary between
segments).

On each of these C segments, say a segment of size N bits, one
would run QI, with table set to NT bit blocks. Hence one would
get NB blocks with NB=N/NT for this segment.

Each block j would produce soume count of 1's Kj (which could be
> or < than NT/2, that serves as a flag for 1/0 type of coding),
plus index Ij. All NB blocks would have total count of K ones.

The counts Kj can be coded either as flat 10 bits (in tapered
Huffman code since they would 0..1024, hence 2 entries would
have 11 bits, usually the central ones), or if the source
is approximatelly stationary Bernoulli, via fixed Huffman
codes, from a table precomputed once (and loaded from disk
at progam load time) for binomial distribution.

The indices Ij have length Lj determined in whole bits by
the quantized binomial coefficient C(NB,Kj), as done by
the macro SWLENC(SWI_arg) in Qi.h. The Lj-g least
significant bits of each index are a tail packaged in whole
namber of bits, while the top g bits (g=32 in QIC), call
them Dj, would be coded using the mantissa Mj of C(NB,Kj)
in one of the two ways:

a) In tapered (flat) Huffman code, where numbers Dj
0...Mj-1 are coded in c or c+1 bits, where
c=hiBit(Mj)

b) Mixed radix code, with radices Mj, where digits
are Dj. QIC has this kind of mixed radix coder which
can handle any array {Mj,j=1..NB} 32 bit radices
(see mradix_iter() in Test.c).

Method (a) is faster (this encoding being done once every 1024
bits, the speed difference between (a) & (b) is usually
negligible), although it will lose on average up to 0.086 bits
per digit Dj. The method (b) will lose less than 2.5e-10 bits
per digit Dj, hence it will remain under 1 bit for N up to
2^(10+32) bits.

So, for each "large" order-0 segment, you will get the encoded
array CK of all Kj's, plus the NB of (Lj-g) bit tails of indices
Ij (this are blocks of whole bits), plus the mixed radix number
R (or Rc for a segment with index c), which packs top g bits for
all the Ij's in this segment.

The same is done for all C order-0 segments. The g top bits bits
of the mixed radix numbers Rc for c=1..C, are then themselves
packed into the common mixed radix number S, thus no bit
fractions are lost even if you have a very large number of
smaller order-0 segments.

Regarding counts Kj or their blocks KC(c) for segment c, their
statistics would be maintained from one set of C cyles to the
next. Normally for given segment S(c), one would get binomial
distribnution of the total K(c) per whole segment. Further, for
a Markov process of C contexts, there is normally a dependence
of one context to the next, which would need to be accounted for
in a manner similar to the QI multi- alphabet coder (where these
C contexts correspond to the A-1 bit plane fragments of the
multi-alphabet coder, cf. [T1], pp. 31-38). In particular, as
you go down the bit planes they become increasingly fragmented,
and one should not meld distinct fragments from the same bit
plane into a single coding block. With QI this kind of precise
bit plane fragmenting can be coded with no bit fraction losses
(beyond the usual 2*log(e)/2^g bits per symbol due to
quantization of binomials or radix powers)

Setting up the coding (and decoding) to AC modeler for larger
inputs, as sketched above, is quite a big job, probably weeks of
focused work, even when you know exactly what is needed for each
piece and have lot more of QI support code. The presently
released kit [QIC] lacks all the multi-block and general multi-alphabet
support (which the company had decided not to release yet; it
would also take a lot of my time to repackage, if not rewrite,
the vast quantities of research code into the more generic
elemental functions, a la those in EncDec.c and Radix.c). The
[QIC] does have quite usable binary block, sparse block, radix
and permutation codes. Except for radix & permutation codes,
which are quite prectical as implemented in [QIC], the binary &
sparse coder are only the building blocks for multi-alphabet and
multi- block coders.

As suggested earlier, the [QIC] kit is a reasearch kit, and it
was definitely not meant to be 'plug & go' into an AC modeler of
any order. At the level of functionality released at present,
the only fair test with QIC 'out of a box' is on the order-0
stationary data (but not hooked to AC order-0 adaptive modeler
or some such, or fed bit by bit, or with AC style division
of labor etc) within the max block size set at init time.
Hopefully, depending on my time available to port that stuff,
the QIC source release will include in the near future some AC
with at least the multi-block binary benchmarks, with the simple
descriptive order-0 modeler and QI's division of labor.

------------------------------------------------------------
I have one minor question on the size figures:

>... 15MB large audio track, encoded as a 16bpp wav file

but then compressed file sizes (in bytes I assume) are
about 10 times larger.

David A. Scott

unread,

Jan 29, 2006, 12:33:17 AM1/29/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138506462.7...@g49g2000cwa.googlegroups.com:

>> The test application is a mildly drilled-up QI coder
>> source from 1st Works, as found on their web-page,
>> ported for Unix (thanks Jasen Betts!)
>
> Glad too hear it can run now on your system. Thanks Jasen,
> from this side, too.
>
> Looking at the test code and the description of the test setup,
> though, was like watching Lockheed SR-71 Blackbird (a Mach 3+
> aircraft), transported via time machine into ancient Greece to
> be tested against their chariots, and then watching them harness
> four horses to pull and try dragging the poor Blackbird through
> some narrow road weaving through a dense forrest. Hardly a test
> of SR-71 performance.
>

The SR-71 is old leaky workhorse. But then I am not suprised
you might think its a modern aircraft. I do have to admit its very
pretty. I use to work at edwards it was a fantastic old plane even
its touch but there are only a few left NASA has them they have
long left the air force. But I am not surprised you would think
otherwise. It was developed in the early 60's.

Lets see another test done I guess QI is not what you claim
and what do you do but post a 233 lines of what? You claimed it
was hot but hot at what. Is there code except the code you modifed
of Moffats to attempt to make it look good. It looks bad compared
to my old static zero order arithmetic compressor and I guess it
looks bad compared to MQ. Funny you want him to test Moffats code
why should he.

Which brings me back to a very basic question which it seems
you have yet to anwser in a correct fashion. Just what do you think
a static 0-order coder should do. It's obvious to me and I am sure
others that you have yet to understand the basic concepts. Maybe
someday some one can find a use for it some where. But it looks like
its not going to be a drop in replacement for a good arithmetic
entropy coder assuming Thomas has his act together.

> {That image, which kept popping into my head as I read through,
> was not meant to criticize your test code on its own merit,
> since that is how one would test one AC against another AC, one
> chariot against another chariot. And I do appreciate very much
> that you took time and effort to give it this intial try. In a
> year or two, at most five, you may be teaching QI to students in
> the information theory or coding classes, as the latest marvel
> and the most advanced among the entropy coding algorithms.}
>

Funny you kept saying QI was better than arithmetic no matter
how one did and that its easy to "drop in". You have yet to show
that. But I do agree it could be used as an example that one has
to be very careful when reading what others write. The good thing
is in compression no matter how one gloats about how good something
is one can write a real test with real code that uses files of ones
choice. So yes I could see where one could use this in a class room.

nightlight

unread,

Jan 29, 2006, 1:05:35 AM1/29/06

I did compile QM from your post (after renaming get/putwchar which
conflicted with MSVC 6 own version & fixing few mismatched size
warnings). To test something, I would need a basic QM coder, with
either static or adaptive coding & decoding mode (so I can verify that
I didn't break something when changing it to memory to memory coding,
and also to test decode times), and with no audio modeling and i/o
hardwired in. Do you have such QM source?

Looking at the QI adaptation code, I wonder how long it would run if
you don't call QI encode at all, but just your model adaptation code,
the bit stuff, and then instead of calling qiEnc just copy data as is.
That would give some measure of overhead of the rest. With, say using
1K blocks, one may gat 10ns/bit for qiEnc with all the block bit
counting & qiEnc set-up time. Hence one can do about 100Mbits per
second for encoder. I am still unclear about your actual input sizes
(15 MB or 15 Mega samples? yet the output is ~150 MB??). In any case,
we're looking at maybe 2-5 seconds, or something similar, of QI coding
time, on the top of everything else, which seems to be couple order of
magnitude larger (4+ minutes). Hence it looks the timing part is simply
a reflection of bitwise modeler adaptation from the original linear
predictor. QI wouldn't code this as bitwise but as 16-bit symbols,
where the linear prediction could work nicely (the time for
multi-alphabet of size scales as log(A) from the binary coder with the
same number of symbols, since the LPS & MPS distinction and advantages
carry over through all A-1 internal nodes of binary decomposition, cf.
[T1]).

> Unfortunately, QI doesn't seem to be able to handle an

> all-zero buffer.

The uniform buffer generates no index, only count of 1's is saved
(which is 0 or BlockSize). Hence qiEnc/qiDec functions should never be
called with such counts.The QI.exe doesn't allow (at the higher level
of checking command line parameters) setting count of 1's to 0% or
100%. The regular QI multi-block coder in such cases only stores the
count of 1's (which may be smaller or greater than BlockSize/2) and
moves on to the next block.

> 5) Efficiency analysis: I think the current optimality analysis might
> have missed a term: The QI coder requires the side-information of how
> many LPS-bits have been encoded, and whether the LPS is 1 or 0.

Of course, it wasn't missed. Consider block size B=1024 and count of
1's K=0..B and single block coding. If count is assumed uniformly
distributed (a composite source), one would code K in 10 bits (with 2
values of K using 11 bits if using tapered Huffman, or as log(1025)
within 2*log(e)/2^g bits if coded with QI radix codes, with A=1025).
Since C(B,K) is smaller than absolute block entropy H in bits (H(p) for
the actual p, which was unknown to coder coding composite source) by
about 1/2 log(B) bits. With the log(B+1) from K, that ends up coding
the block at approx. H + 1/2log(B), which is the optimum (the
theoretical lower bound) for composite source.

If you have a source with known p (which doesn't need to be
transmitted, which is what H(p) bound assumes to be the case), the
plain Huffman will encode K in 1/2 log(B) bits on avg, with error
below 0.086 bits. That, added to log(C(B,K)) then yields exactly the H
(within O(1) for entire B bits). For multiple blocks, the stats from
previous blocks give an estimate so that later blocks get coded at H
with O(log(N)) bits excess, which you can verify after playing a bit
with Stirling expansion of binomials (or see Krichevsky, Trofimov paper
[33] for the very detailed story). The exact EC, coding in multi-block
mode and packing all bit fractions and counts optimally (using exact
mixed radix codes for block boundaries and multi-alphabet codes for
binomial distributions of counts), codes at the theoretical lower
bound, exactly as if it were coding in the single large block mode with
a single total count of 1's, for the standard source classes
(quasi-stationary Markov of finite order and composite sources; see
also [T2] p. 27 on how that output size invariance comes out in EC for
general enumerative classes; using "entropy pump" described there you
can code to the lower bound even with block size of 2 bits, although
that would take log(N) iterations, while still remaining within O(N) in
computations complexity for total work). QI at fixed precision g codes
only 2*log(e)/2^g bits/symbol above the exact EC, which for any input
size within machine memory will produce less than 1 bit in total excess
(for the entire input). QI.exe with "ct" command shows the quantized
binomial table stats, so you can see various redundancies of interest
compared to exact EC (which is computed in QI.exe using 53 bit double
and FP log(x!) functions).

David A. Scott

unread,

Jan 29, 2006, 1:12:24 AM1/29/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138506462.7...@g49g2000cwa.googlegroups.com:

....

>
> As suggested earlier, the [QIC] kit is a reasearch kit, and it
> was definitely not meant to be 'plug & go' into an AC modeler of
> any order. At the level of functionality released at present,
> the only fair test with QIC 'out of a box' is on the order-0
> stationary data (but not hooked to AC order-0 adaptive modeler
> or some such, or fed bit by bit, or with AC style division
> of labor etc) within the max block size set at init time.
> Hopefully, depending on my time available to port that stuff,
> the QIC source release will include in the near future some AC
> with at least the multi-block binary benchmarks, with the simple
> descriptive order-0 modeler and QI's division of labor.

...

I missed this last time I read it. You write so much and say
so little I wonder what it means. But at least you can write.

So we go full circle again "the only fair test" is on oder-0
stationary data.

It would be nice if you attempted to explain that since
clearly for the 3 symbol example I did beat your numbers for QI
so I guess QI can't even win in the only fair test you currently
think is possible. Like I said in other threads. Even if you
add the 3 bytes for the buffer since files confuse you it still
compresses smaller than what you claim is possible for QI could
you attempt to give an honest explanation why this is so. Or
are you going to post hundreds of more lines trying to obscure
the obvious.

nightlight

unread,

Jan 29, 2006, 1:51:47 AM1/29/06

> The SR-71 is old leaky workhorse. But then I am not suprised
> you might think its a modern aircraft.

I don't think its speed has been beaten yet. As to NASA, it's only a
shadow today from the Wernher von Braun's NASA. When you just think
what kind of "computers" they used to get to the Mars and put humans on
the Moon and get them back. Or the Pioneers...

> It looks bad compared to my old static zero order arithmetic compressor ?

You haven't given as yet your size figures for unbiased random sample
on A=3 N=10^6, (good enough that ZIP ought to come much worse on it).
Give it 10000 random inputs, filled with using rand()%3, and leave
out encoding of N (or add the same figure for all coders and to
N*log(A)) and see if you can get to within 2e-7 bits from the N*log(A)
on the total size (that's how much room QI leaves between itself and
the N*log(A)). You can set your AC to 128 bit, it won't do any good.
Even the unlimited precision AC can't get to N*log(A) to within 1-2
bits, which for N=10^6 is too far from QI's distance. That's all basic
math (number of sequences of that size), you can as well try to make
2+2 come out 3.

> But it looks like its not going to be a drop in replacement for a good
> arithmetic entropy coder assuming Thomas has his act together.

And I am not even working on 'drop in' replacement for adaptive AC
modeler (unless a client pays for the work, in which case we'll do it,
of course, as they request). Keep in mind that the QI coding time, when
set-up with proper block size such as 1K or even 1/2K on lower end
machines (instead of using 256MB tables), runs those kinds of sizes in
2-5 seconds, while the times he measured were over 4 minutes. It
probably would barely show (for the 1K or 1/2K block coder), if at all,
if he were to take the QI coder out of the loop altogether and just
call the plain 'store_bits()', with no encoding at all.

> Funny you kept saying QI was better than arithmetic no matter
> how one did and that its easy to "drop in".

You can't 'drop in' Huffman into the AC either. Nor AC into the QI's
parametrization and modeling. All would take some work, to avoid big
penalties in size and/or speed. With QI source released to public, the
binary coders as provided will do only order-0 stationary inputs and
only for 1 block. That's enough to measure compression ratios and speed
in nano-seconds/symbol for the binary inputs. But not for real life
applications or any kind of 'drop in' for AC. The mixed radix &
permutation coders, on the other hand, are quite usable as is, even for
real life applications, as 'drop in' or anything else. No AC can code
close to those, especially if you go to A>3, say some 32 bit value
(have you tried the test I posted on that case yet?).

> one can write a real test with real code that uses files

You haven't explained as yet how does use of files change compression
ratios shown with buffers you can fill with any data you wish
(including data from files, if that is what you want to put in)? How
does it change compression ratios? (As for speed and file i/o, that
merely adds much bigger noise than the signal itself.)

nightlight

unread,

Jan 29, 2006, 2:00:55 AM1/29/06

Typo:

> and see if you can get to within 2e-7 bits from the N*log(A)
> on the total size

The figure "2e-7 bits" above, should be 1.62e-4 bits on the total size
of 10^6 symbols i.e. per symbol excess from log(A) was 1.62e-10
bits/symbol.

David A. Scott

unread,

Jan 29, 2006, 2:35:44 AM1/29/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138517507.3...@g47g2000cwa.googlegroups.com:

>
>> It looks bad compared to my old static zero order arithmetic
>> compressor ?
>
> You haven't given as yet your size figures for unbiased random sample
> on A=3 N=10^6, (good enough that ZIP ought to come much worse on it).
> Give it 10000 random inputs,

Actaully the compressed size stays the same. What is it that you
don't seem to understand. It does better than what you claimed yours
could do.

It seems you still lack the ability to realize what my code does.
It compresses all files of A=3 N=10^6 to the same length. I picked the
extreme cases to include in the zip. Yes zip compresses them much smaller
since zip is not using the fact of each of the 3 symbols being
equally likely. I find it hard to belive you can't grasp the simple
nature of the problem. I did not include random files since random
is poorly defined and one tends to pick special cases. But the simple
fact is for the length in question it will compress all to the same
size. Any one including you can pick the random files you want.
That's why people with nothing to hide tend to use files.

I fear you still don't understand basic static 0-order compression
why is that?

http://bijective.dogma.net/nitelite.zip

What I did to compare to you QI in a buffer was to place
a length field of 3 bytes in front of file that way its self
terminating and can be comparted directly to yours since it seems
you can't write files. Even though the 3 bytes not optimal by any
sense it was still shorter than what you claimed was the fixed
length you get.

I suspect when you realize your is longer even on the buffer
or memory array or what ever you wish to call it. You will suddenly
either claim you made a slight miscalulation or that you will claim
I don't have an understanding of the problem. I wish others would
check thes claims out.

I like the idea of your 3 symbols each being random for tests
since first of all for a given length all outputs roughly the same
and second becuase like you mentioned it's the only current fair way
to compare QI. But yet I suspect this only current fair way would take
an extreme amount of time to actually get QI to work on a file.

Thomas Richter

unread,

Jan 29, 2006, 1:21:29 PM1/29/06

nightlight wrote:

> Looking at the test code and the description of the test setup,
> though, was like watching Lockheed SR-71 Blackbird (a Mach 3+
> aircraft), transported via time machine into ancient Greece to
> be tested against their chariots, and then watching them harness
> four horses to pull and try dragging the poor Blackbird through
> some narrow road weaving through a dense forrest. Hardly a test
> of SR-71 performance.

This is a very simple test case, and I'm totally aware of that. However,
the general approach used here is typical for my applications for
entropy coder back-ends: Decorrelation, context modelling and entropy
coding. Typically, the decorrelation stage is powerful enough to remove
most correlation so a simple back-end is enough.

> Developing a type of code, to work properly along the lines of
> context splitting of method (b) in [M1]:

Why, instead of posting a reference, don't you just demonstrate that by
providing code? (-: You're invited to drill up my code and post it, or
use it within the legal limitations (note the MQ coder patent).

> To get a more realistic idea how is your MQ doing against QI,
> when both are functioning in their native modes, you could plug
> into your test program the Moffat98 code

I've a Moffat98 handy, though thanks for the offer. It should perform a
little bit better concerning the entropy coding (MQ has definitely
entropy loss), but possibly a bit worse when adapting to the statistics.
The MQ coder tables are setup for typical data in this area. Given that,
it is not exactly a zero-order coder anymore since it has an adaption
strategy.

> Or do a pure order-0 tests, using QI's bit counting and flagging
> 1's or 0's type coding, as shown in Test.c, e.g. in function
> qbc_iter(), where it codes one block as if it were in a
> multiblock loop (that was mostly cloned from a multiblock loop).

The point of the test was to test it on *realisitic* data. That means
that other features of a coder play a more important role, namely how
well it is able to adapt to statistics, i.e. how to deal with
non-stationary sources, and how easy it is to have it context-sensitive.

> Or measure nanosec/sym on your MQ coder and compare to similar
> test QI.exe on your system shows for the same order-0 input.

Again, I couldn't care less, sorry. (-; *If* I want to measure speed, I
need the speed of the full compression engine chain, and that implies
the overhead required to get the context modelling right, and to get the
data into the right shape. After all, that's what a customer would
finally see.

> If you have an MQ coder C source as distributed by original
> authors (which includes binary & at least high entropy limit
> multi-alphabet coding, so both could be compared) and provide
> link for download, I could set up a fair test for the two

You have one. It was attached to my post, the second source. There is no
"original" MQ coder source because it is published as a flow-chart; you
have one implementation here. It is not optimal (i.e. one can do
faster), but it serves the purpose. If you want, please use the source,
but note the patent situation.

> (I didn't plan it, but now that you've gotten this far, I should
> probably include an MQ or similar quasi-AC coder test in the
> QIC, provided its source can be distributed as a separate
> lib/dll linkable to QIC.exe and without any interaction between
> the licenses). To test speeds, I would change MQ to code memory
> to memory.

Please don't run it on random data. As stated, it is important to have
the coder running on *realistic* data with a realisistic data modelling
front-end. This simplistic audio codec is the easiest one can come up in
a couple of hours and it shows how the game is typically played. It is
definitely not the best audio coder scheme one could come up - obvious.

> I would then move all memory allocs outside of the
> timed sections (so it works with preallocated buffers in the
> timed sections).

Irrelevant. This is a constant term, and it took around 20 seconds for
the test to setup the tables. Don't invest time, I've enough *long*
audio samples where this doesn't play a role.

> Then I would feed both coders pre-built large
> arrays of order-0 stationary data

I *heavily* disagree. I need to see a real-life test.

>(which would be randmized for
> each iteration). For QI, when competing against MQ, which is
> less accurate than Moffat98, I would use a very simple method
> (a) (see below) for block boundary coding. For QI all other data
> is reduced to this form, hence the result on the arbitrary order
> inputs would be some composition of the elmental inputs results,
> both for speed and size. This way one would be testing the raw
> coding speeds and precision of the coders proper, with all the
> noise from the factors outside of the intrinsic difference in
> the coding algorithms removed

There are none. The front-end is the same for both codecs here.

> Regarding the cache, 16K blocks will use 256MB table, which is
> well outside of the cache limits on anything even when
> everything else is done right. But, due to the serial and up &
> down nature of your block to block processing (which are just
> 2KB per block), with file i/o, modeler and every other compnent
> getting in, all cached entries from one block were lost.

Ok, please adjust the block size so it works optimal. Please *do not*
change the overall architecture of the test - it is pretty much
realistic, and quite similar to what would happen in a real code: Yes,
if you *need* large tables, and you *need* to do file-IO, the data
*would* run out of cache. There's no way around that in a realistic setup.

> To get a proper speed & compression test, one would hence use 1K
> blocks table limit (which works fine with cache on most
> machines), then read in as much data as possible e.g. 4+ MB, the
> more the better,

This is unrealistic. You don't have this as an option in a full coder
engine. The entropy coder is often only a minor part in a large codec,
and you cannot just arrange data like you want.

> As suggested earlier, the [QIC] kit is a reasearch kit, and it
> was definitely not meant to be 'plug & go' into an AC modeler of
> any order. At the level of functionality released at present,
> the only fair test with QIC 'out of a box' is on the order-0
> stationary data (but not hooked to AC order-0 adaptive modeler
> or some such, or fed bit by bit, or with AC style division
> of labor etc) within the max block size set at init time.

Not quite; the test was partially done to get a feeling on how to
integrate QI into such more "sophisticated" (not that this example is
extremly sophisticated, but it's a start of how things look like in
practical applications) setups. That is, it would be interesting to see
how to do this more efficently with QI; you know, it is *also* important
to know how easy or hard it is to fit an entropy coder to the desired
target application.

> Hopefully, depending on my time available to port that stuff,
> the QIC source release will include in the near future some AC
> with at least the multi-block binary benchmarks, with the simple
> descriptive order-0 modeler and QI's division of labor.
>
> ------------------------------------------------------------
> I have one minor question on the size figures:
>
>
>>... 15MB large audio track, encoded as a 16bpp wav file
>
>
> but then compressed file sizes (in bytes I assume) are
> about 10 times larger.
>
>
>>4) Coding efficiency:
>>
>>-rw-r--r-- 1 thor thor 148708208 2006-01-28 21:35 track1.mq
>>-rw-r--r-- 1 thor thor 159764576 2006-01-28 22:49 track1.qi

Sorry, it was late... The original is *190MB* large, not 15MB. (Would
definitely too short for Brahm's symphony anyhow...)

So long,
Thomas

Thomas Richter

unread,

Jan 29, 2006, 1:35:32 PM1/29/06

nightlight wrote:
> I did compile QM from your post (after renaming get/putwchar which
> conflicted with MSVC 6 own version & fixing few mismatched size
> warnings). To test something, I would need a basic QM coder, with
> either static or adaptive coding & decoding mode (so I can verify that
> I didn't break something when changing it to memory to memory coding,
> and also to test decode times), and with no audio modeling and i/o
> hardwired in. Do you have such QM source?

The MQ coder part consist of OpenForRead/OpenForWrite(), GetBit() and
PutBit(). To make it non-adaptive, you need to go into the Qe_Value
table, and into the state-update tables Qe_NMPS and Qe_NLPS. The former
contains the estimated probability of the next LPS scaled apropriately:
Divide them by (4/3) * 0x8000 to get the probability. Qe_NMPS/Qe_NLPS
define the path of the coder within its state, namely they contain the
next state after having coded a MPS or LPS, respectively. Qe_Switch
contains a one whenever encoding an LPS interchanges MPS and LPS. Thus,
use only one state, setup NMPS and LMPS to point to this state.

However, this test wouldn't really show what I would be interested in. I
would rather care a lot more on how to setup an adaptive QI.

> Looking at the QI adaptation code, I wonder how long it would run if
> you don't call QI encode at all, but just your model adaptation code,
> the bit stuff, and then instead of calling qiEnc just copy data as is.
> That would give some measure of overhead of the rest.

I haven't really measured this yet, sorry. I would believe that the
overhead of organizing the data contributes to about one half of the
overall computation time, ignoring the contribution of IO which should
dominate all others. However, this is just fair because this is exactly
what I would need in a realistic setup. IOW, if you find a method how to
feed QI such that it uses the same or a similar bitplane modelling as
MQ, it would be more interested.

> With, say using
> 1K blocks, one may gat 10ns/bit for qiEnc with all the block bit
> counting & qiEnc set-up time. Hence one can do about 100Mbits per
> second for encoder. I am still unclear about your actual input sizes
> (15 MB or 15 Mega samples? yet the output is ~150 MB??).

190MB, sorry.

> QI wouldn't code this as bitwise but as 16-bit symbols,
> where the linear prediction could work nicely (the time for
> multi-alphabet of size scales as log(A) from the binary coder with the
> same number of symbols, since the LPS & MPS distinction and advantages
> carry over through all A-1 internal nodes of binary decomposition, cf.
> [T1]).

Well, that wouldn't quite fit to the MQ coder then anymore, but fine
with me. I'm also willing to plug in a Moffat back-end and avoid the
bitplane modelling at all.

>>Unfortunately, QI doesn't seem to be able to handle an
>>all-zero buffer.
>
>
> The uniform buffer generates no index, only count of 1's is saved
> (which is 0 or BlockSize). Hence qiEnc/qiDec functions should never be
> called with such counts.

Yes, I figured this. It is avoided in the given code - it doesn't cause
trouble because the encoder receives the same information over the
side-channel. Similar, whether you transmit the full count of ones or a
separate flag and the number of LPS doesn't matter. It's equivalent
information.

So long,
Thomas

nightlight

unread,

Jan 29, 2006, 6:14:39 PM1/29/06

> However, the general approach used here is typical
> for my applications for entropy coder back-ends:
> Decorrelation, context modelling and entropy coding.

Having worked with audio and video codecs, as well as on
numerous other compression and coding projects, I realize
that is how it is done. And I do appreciate great deal
that you took the trouble to adapt the [QIC] code for the
audio data tests (I'll probably end up playing with it
too, now that you have opened up the topic).

While the coder is nominally a small part of the whole task,
the larger pattern itself has grown around the AC way of
coding (its probabilistic parametrization and interface,
predictive modeling constraint, its one dimensional, one
symbol at a time per full coder+modeler cycle...), wrapping
and shaping its every new layer closely to the one below,
all faithfully propagating the genetic imprint of the
little seed pattern at their core.

Maybe I didn't communicate the central point well enough
in the summary ([M1] or [N7] p.9 [T3]), but QI is an
entirely different seed pattern, almost an antipode
of the AC pattern on every key element, [M1]. { The
two coders are, of course, not the real 'master genes'
of their respective schemes. The original AC idea was
suggested first by Shannon [2] and enumerative
coding, hence its offspring QI, by Kolmogorov [31].
Each algorithm carries an implicit blueprint for
the larger pattern of one of the two kinds of
information theory frameworks, which it was meant
to unfold and in which it can be at its best.}

The present release of the QI source kit [QIC] is just
the core seed pattern which has plenty of growth and
unfolding ahead before it bears all the 'marvelous
fruits' painted in [M1], on the application levels.
The functions included in [QIC] are merely the initial
form of the core building blocks which were meant
to fit and work well within a descriptive coding and
modeling pattern. Thus trying to fit them into an
existent AC way of coding+modeling would be quite a
challenge, having to go against the grain on both
ends. And they're certainly not aiming to be a 'drop
in' replacement for AC within its conventional setups.

But they are sufficient to demonstrate the core QI
strengths (A1)-(A4), in the low noise, 'clean room'
environment, where these unique purely coding aspects
can be measured and analyzed with high precision,
in isolation from any extraneous factors, however
important and large part of the application these
other components may be. Once the core strengths
are explored, quantified and charted, one gets the
basic assurance and a much clearer idea what are
the outer bounds, the potentials in different fields,
and what needs to be built and how, on top and around
to realize these potentials in specialized applications.

While much of the present coding and modeling is
done within the AC's predictive and probabilistic
pattern, there are only the isolated elements of
the QI's modeling superstucture above, such as BW
transform, along with the vastly rich substratum
below, consisting of the highest quality mathematical
results and algorithms of enumerative combinatorics,
extremal and general finite set theory and other
areas of finite/discrete math almost entirely
untapped in this context. Hence, there will be
many pretty blossoms from this seed, before the
season for fruit arrives.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

M1. -- Summary of QI algorithmic advances (A1)-(A4)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

2. C. E. Shannon "A mathematical theory of communication"
Bell Syst. Tech. J. 27, 79-423, 1948
http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

31. A. N. Kolmogorov "Three approaches to the quantitative
definition of information" (in Russian)
Problems of Inform.Transmission, 1(1), 1-7, 1965
http://www.1stworks.com/ref/kolmogorov65.pdf

QIC. -- QI source code, tech. reports [T1]-[T3], references
http://www.1stworks.com/ref/qi.htm

nightlight

unread,

Jan 29, 2006, 6:54:34 PM1/29/06

>> The uniform buffer generates no index, only count of 1's is
>> saved (which is 0 or BlockSize). Hence qiEnc/qiDec functions
>> should never be called with such counts.
>
> Yes, I figured this. It is avoided in the given code - it
> doesn't cause trouble because the encoder receives the same
> information over the side-channel. Similar, whether you
> transmit the full count of ones or a separate flag and
> the number of LPS doesn't matter. It's equivalent information.

It is equivalent only on the first block coded. It becomes
increasingly less equivalent for the subsequent blocks, where
the previous blocks serve as a 'sample' for the later blocks.
Hence, the set of all 0/1 bit flatly coded flags may end up
using hundreds times more space than the information they
actually transmit to the decoder. There are many details of that
sort in EC/QI type descriptivce coding where the problems are
not apparent when viewed from the AC perspective (e.g. where
this flat 0/1 bit, equivalent at the logical level to the count
of ones relation to N/2, would get stuffed into the AC, amid all
of the other data bits, to be squashed to its more proper size).

Tight packaging of many distinct pieces of the description at
different levels, down to a 1/2^g bit fraction, is an essential
element of the descriptive modeling scheme. This kind of 1/2^g
level bit-fraction packaging is vital for AC as well, as the
Rissanen's 1st AC paper [5], on generalized Kraft inequality
emphasized. But with AC it is done all at a single flat level,
the one symbol codeword. Anything else that needs this kind of
bit fraction packaging in AC is first converted to a symbol and
then funneled down to meld in the middle the regular symbol
stream. This gives rise to the strange problems and their ad hoc
solutions, all peculiar to the AC way of coding, such as zero
frequency problem and escape probabilities estimations.

With QI/EC bit fraction packaging is done at the symbol level,
too (except with the four times smaller quantization bit leak)
and then it is propagated to the higher levels explictly. This
is the result of QI's much cleaner division of labor, where each
component does its complete specialized job entirely separately
and on the entire input, from start to end, when it tidies it
all up. Only then the next component takes over and does all of
its work for all of the data and then tidies its output up.

Jasen Betts

unread,

Jan 29, 2006, 11:44:52 PM1/29/06

On 2006-01-29, nightlight <night...@omegapoint.com> wrote:

>> It looks bad compared to my old static zero order arithmetic compressor ?
>
> You haven't given as yet your size figures for unbiased random sample
> on A=3 N=10^6, (good enough that ZIP ought to come much worse on it).
> Give it 10000 random inputs, filled with using rand()%3, and leave
> out encoding of N (or add the same figure for all coders and to
> N*log(A))

DYM N*log2(A) ?

>> But it looks like its not going to be a drop in replacement for a good
>> arithmetic entropy coder assuming Thomas has his act together.
>
> And I am not even working on 'drop in' replacement for adaptive AC
> modeler (unless a client pays for the work, in which case we'll do it,
> of course, as they request). Keep in mind that the QI coding time, when
> set-up with proper block size such as 1K or even 1/2K on lower end
> machines (instead of using 256MB tables), runs those kinds of sizes in
> 2-5 seconds, while the times he measured were over 4 minutes.

he wasn't testing speed. speed is often not a big issue.

> You haven't explained as yet how does use of files change compression
> ratios

It doesn't it just makes them more easily measurable.

Bye.
Jasen

nightlight

unread,

Jan 30, 2006, 6:03:52 AM1/30/06

> > out encoding of N (or add the same figure for all coders and to
>> N*log(A))
>
> DYM N*log2(A) ?

Yep. Virtually all logs in compression are log2(). The number N*log(A)
is the log() of the number A^N, which is the number of distinct strings
of N symbols in alphabet of size A, which was our test set (call it
Set(A,N)) of 'messages'. Hence A*log(N) is the lower bound on length
(in bits) on any unique index for the elements of Set(A,N).

Unlimited precision enumerative coding (EC) achieves, of course, this
bound, which is simply the length in bits of a binary form for N digit
numbers in radix A (since EC or QI do this type of encoding as a
conversion from radix A to binary). The QI, limited to g bits precision
(the mantissa size for SW integers, in QIC g=32) produces output of
length which is on average (over Set(A,N)) longer by a D(g,A,N) bits
from N*log(A), where D(g,A,N) < N*2*log(e)/2^g bits. An arithmetic
coder (AC) limited to the same precision g will produce excess of
Da(g,A,N), which is on average 2*A times larger than QI's quantization
excess D(g,A,N) due to its less accurate quantization of codewords (the
enumerative addends).

On the Set(A,N), for A=3 and N=10^6, which is the most favorable A>2
value for AC, the QI's output size is only 1.6e-4 bits above the lower
bound N*log(A). As shown on p. 8 in [T3], this is, purely
mathematically, the closest that any coder using codewords (addends
with QI and AC) limited to length g=32 bits can get, there was no
results I was waiting for to hear (one might as well wait to see
whether someone can "beat" 2+2 and claim their "adder" can get 3 for
it). Since anyone can run the test with QI.exe provided, as described
in the posts:

M1. --- QI test results for coding on Set(A,N) for A=3 N=10^6
http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

M2. --- Explanation for the bit fractions shown by QI
http://groups.google.com/group/comp.compression/msg/1e015f38d228969e

M3. Answers to further questions on test reported in [M3]
http://groups.google.com/group/comp.compression/msg/508e96ebcb4577f1

to verify the claim and check how it is computed and displayed in the
source code (or read [T1] on QI mixed radix coding), the matter is
closed at a rational level (the alleged "controversy" is being stirred
at the other level).

> he wasn't testing speed. speed is often not a big issue.

Tested or not, the execution times were reported and commented on in
their own little section.

>> You haven't explained as yet how does use of files change compression
>> ratios
>
> It doesn't it just makes them more easily measurable.

The point above was a response to the claims that compression ratios
obtained on memory buffers tests are not valid (how?) and that only
file tests will provide the "real" compression ratios (why?) of the
two entropy coding algorithms for which the _source is publicly
available_. The underlying theme of all that seemingly pointless song
and dance, the persistent assertions (without being able to explain
why) that only files somehow give the "real" compression ratios while
the buffers give the "unreal" ratios, is described in the earlier post:

http://groups.google.com/group/comp.compression/msg/8d62e372056d9d53

as method (b) (of the five frivolous ways to "win" a compression
contest against another coder).

David A. Scott

unread,

Jan 30, 2006, 9:04:11 AM1/30/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138619032....@g49g2000cwa.googlegroups.com:

>
> The point above was a response to the claims that compression ratios
> obtained on memory buffers tests are not valid (how?) and that only
> file tests will provide the "real" compression ratios (why?) of the
> two entropy coding algorithms for which the _source is publicly
> available_. The underlying theme of all that seemingly pointless song
> and dance, the persistent assertions (without being able to explain
> why) that only files somehow give the "real" compression ratios while
> the buffers give the "unreal" ratios, is described in the earlier post:
>
>

I think your are the one doing the song and dance. You have mentioned
how only the fair way to compare with QI is to do a static zero order
test. You mentioned the 3 symbols of equal likelyhood. I write such code
for such a test. Even when the data is on a buffer it beats what you
claim QI will do. The fact you keep denying the obvious makes it more
clear you really don't want to face reality. QI is not as good as you
claim. Real tests show its weaknesses. An honest person would want to
understand way. A con man would want to just sweep the results under
the table. Are you interested in learning nightlight or do you think
you know everything and just want to push QI on people without actually
taking the time to understand basic compression?

I showed you clearly even in a nonoptimal way how to but the
compressed file in the memory buffer such that even with the nonoptimal
way of expressing length would be part of the data this was still shorter
than what you claimed was the fixed length your compressor would compress
to assuming you ever get it working to buffers. Why can't you face facts?

I can explain why the arithmetic is better but it obvious you are
not interested in why. You only want to claim QI is better which it
obviously even to the casual observer is not. Quoting thousands of
lines of peoples papers which it seems you don't fully understand
is not going to change the fact that QI doesn't seem to compress
smaller than a simple arithmetic compress for long files even in the
one case you claim would be a fair test. Why is that?

Randy McNabb

unread,

Jan 30, 2006, 11:34:24 AM1/30/06

"nightlight" <night...@omegapoint.com> wrote in message
news:1138619032....@g49g2000cwa.googlegroups.com...

> > > out encoding of N (or add the same figure for all coders and to
> >> N*log(A))
> >
> > DYM N*log2(A) ?
>
> Yep. Virtually all logs in compression are log2(). The number N*log(A)
> is the log() of the number A^N, which is the number of distinct strings
> of N symbols in alphabet of size A, which was our test set (call it
> Set(A,N)) of 'messages'. Hence A*log(N) is the lower bound on length
> (in bits) on any unique index for the elements of Set(A,N).
>
> Unlimited precision enumerative coding (EC) achieves, of course, this
> bound, which is simply the length in bits of a binary form for N digit
> numbers in radix A (since EC or QI do this type of encoding as a
> conversion from radix A to binary). The QI, limited to g bits precision
> (the mantissa size for SW integers, in QIC g=32) produces output of
> length which is on average (over Set(A,N)) longer by a D(g,A,N) bits
> from N*log(A), where D(g,A,N) < N*2*log(e)/2^g bits. An arithmetic
> coder (AC) limited to the same precision g will produce excess of
> Da(g,A,N), which is on average 2*A times larger than QI's quantization
> excess D(g,A,N) due to its less accurate quantization of codewords (the
> enumerative addends).

Your SW integers are no more than a floating point number amounting to 64
bit math.

Message has been deleted

nightlight

unread,

Jan 30, 2006, 12:29:35 PM1/30/06

TYPO correction:

> QI invokes rounding operator only when computing the
> quantized binomial table (eq. (21) in [T3]), but not for the
> index computation (eq. (21)).

The last "eq. (21)" above should say "eq. (22)".

Thomas Richter

unread,

Jan 30, 2006, 1:08:56 PM1/30/06

Hi,

> Maybe I didn't communicate the central point well enough
> in the summary ([M1] or [N7] p.9 [T3]), but QI is an
> entirely different seed pattern, almost an antipode
> of the AC pattern on every key element, [M1].

Partially. QI is currently "nothing but" a smart way how to encode a
permutation of k LPS within n symbols. The question now is, how can
this be turned into a good model for something. While this is
particulary easy for AC and its coding methology, it seems
particularly hard for QI right now. For example, to make this a good
model, one could choose block sizes dynamically, but then would need
to find a smart way how to find block starts and ends that make sense
taking the side-channel (the number k) into account thus to allow
an overall coding gain.

The question is now how to find good models of audio or visual
data that match this QI coding paradigm. For AC, the model is
quite clear: Markov chains are mathematically well understood,
and there is a toolchain of mathematics how to deal with them.

Or to put my request in different terms:

Why is the QI coding paradigm a natural one for the mentioned
data sources?

It is not so that I don't want to understand that your beast has a
different approach. I just wanted - in my provocative nature - an
information how to make use of this approach for realistic data
sets, and the best way how to get it is to state a real contest.

> While much of the present coding and modeling is
> done within the AC's predictive and probabilistic
> pattern, there are only the isolated elements of
> the QI's modeling superstucture above, such as BW
> transform, along with the vastly rich substratum
> below, consisting of the highest quality mathematical
> results and algorithms of enumerative combinatorics,
> extremal and general finite set theory and other
> areas of finite/discrete math almost entirely
> untapped in this context.

BW is a pretty nice algorithm, but it is also pretty useless
for my field. (-; It doesn't fit to the data sources I care
about, and is a bad model for them. It might, or might not be
that this is about the same situation for QI. But then again,
this would contradicte your "is always better than AC" statement,
so what is it? (-:

So long,
Thomas

Message has been deleted

nightlight

unread,

Jan 30, 2006, 6:21:51 PM1/30/06

[reposted with better formatted line breaks]

> typedef union _swi { // SW Integer
> struct {
> dword m; // mantissa
> int32 e; // exponent
> };
> qword q; // alias for 64-bit compares
> } SWI;

> That is a specialized floating number. What information
> is used to align the mantissa?

Regarding the 64 bit "precision", there are two "precision"
conventions or concepts, (a) and (b) described below, that are
being mixed up in these questions.

a) If you look at SWI as a floating point number, than the SWI
structure does use 64 bits, which would be equivalent to a 64
bit FP number (with some non-standard partition of bits between
mantissa and exponent). So for that concept of "precision", call
it "precision-fp", the 32-bit SWI "precision-fp" is the same as
for 64 bit FP numbers.

b) Coding "precision", "precision-c" -- This is convention that
is used to describe AC (or Huffman code) precision, where the
"precision" means the width of "addend" (which in QI is called
SW mantissa and in AC the "range"). With a g-bit Huffman coding,
these addends are just the Huffman codes with max length of g
bits per code, and these codes are added to the output code on
exact bit boundaries i.e. the Huffman addends don't overlap with
the current sum or the encoded ouput (hence this "addition" in
Huffman coding is equivalent to concatenation of addends to the
current encoded output). With QI and AC, each working in 32 bit
"precision-c", which code symbols with bit fraction accuracy,
the addends are 32 bit values and these addends do overlap with
the current sum (the cumulative probability in AC or enumerative
index with QI).

The convention I follow when saying that QI uses "precision" g=
32 bits is (b), since we are comparing QI to AC with the same
"precision-c" (of its mantissa, the "range").

> That is a specialized floating number.

Structurally, yes. The FP and SW numeric types are not specified
by their structure alone, but also by the set of operators and
how these operators work. The operators work differently on SW
operands than on FP operands. That should be also obvious if you
check [T3] p. 7, the definition (i) which defines how the
operators work.

If you wished to perform that kind of operation on existent FP
numbers (hence reusing the structural similarity), you would
need to implement it on your own, since the FP operations would
not work the way (i) specifies. The principal difference is that
SW arithmetic operators are decoupled from rounding operator,
while with FP numbers the rounding is an integral part of the
arithmetic operators.

Note that the [QIC] source kit includes conversion functions
between SW and FP numbers in Swi.c, functions: fp2swi(),
swi2fp() as well as for logs of fp number to swi of the fp
number itself: lfp2swi() and lfp2swx(). All these are used in
QIC only for various stats, where logs of exact binomials &
factorials are computed via FP's 53 bit mantissa, in order to
measure quantization effects on the QI's quantized binomials,
powers, factorials, etc.

> What information is used to align the mantissa?

The SW exponent, of course, as shown e.g. in EncDec.c, function
qiEnc(). Note that AC, due to the historical accidents of its
evolution, never developed similar clean formalism as SW for
that type of numbers and their operations, even though it does
arithmetically the same thing as QI with SW integers (AC's are
g-bit SW fractions).

The AC's mantissa is the "range" and its exponent is -(# of
"range" scaling steps), i.e. AC exponent is a negative number
(scaling step is a way of looking at SW fraction multiplication
with 2, and forgetting of exponent). The kludgy way of handling
the arithmetic of, what amounts to simple SW multiplications of
32 bit mantissas for SW fractions (which are similar to FP
multiplications of 32 bit mantissas, except that the full 64 bit
product of two mantissas is not truncated to 32 bits, as FP
"multiply" operator would do), results in the AC's arithmetic
leaving the randomly fluctuating number of significant bits in
its mantissa ("range") and thus to further precision loss.

The other precision loss (also absent with QI) is the loss of
"range" precision after multiplication of the "range" with p=
Probability(x) where x is the current symbol, which for
unlimited precision AC would gain the number of significant bits
from p, but which are not kept in a finite precision value for
"range" of a regular 32 bit AC.

Hence, even reformulating the AC arithmetic via SW fractions,
with explictly maintained exponent, which would help keep its
mantissa precision more stable (using up its max value of g bits
throughout), would not solve its entire excess redundancy vs
QI's integer based arithmetic and index (with AC the "index" is
its "cummulative probability", hence a number smaller than 1.0).
But it would certainly simplify and streamline its rounding,
zooming and scaling logistic (which amounts to a "poor man's SW
arithmetic") making it much more transparent and mathematically
more sound.

nightlight

unread,

Jan 30, 2006, 6:27:39 PM1/30/06

[reposted with properly formatted line breaks]

>> Your SW integers are no more than a floating point
>> number amounting to 64 bit math.

The QI mantissa in the source code has 32 bits not 64 bits. I
don't know where did you get the 64 bits of precision from? All
the multiplies are 32x32 bit, of 2 mantissas (e.g. see Radix.c,
function new_radix(), where it calls function swuMul() in Swi.c,
which does this multiplication to obtain quantized powers of the
radix. Similarly, when adding binomials into the output index,
the addend is a 32 bit mantissa of the table binomial obtained
via qbc() function. See e.g. in EncDec.c, function qiEnc() where
the 32 bit mantissa is aligned and added into the index (which
may or may not cross the 32 bit boundary).

The difference between SWI and FP is that SWI has arithmetic
operations formally decoupled from rounding. The rounding in SWI
is a separate operator, unrelated to arithmetic operators, and
which the program invokes as any other arithmetic operator, as
the algorithm specifies. Hence if X and Y are two SWI integers,
X+Y will generally not be an SWI integer but an integer of a
larger precision. With FP variables X and Y, the sum X+Y is
another FP number, i.e. the FP rounding is _entangled_ with the
FP arithmetic operations +,-,*,/ . Hence with FP, if X and Y
exponents are different, the sum X+Y will generally lose some
bits (the lowest bits of the smaller of the two numbers) while
with SWI numbers X and Y, the X+Y does not lose any (which is
how QI functions when computing index (see eq. (22),(23) p. 8
in [T3]). QI invokes rounding operator only when computing the

David A. Scott

unread,

Jan 30, 2006, 9:47:21 PM1/30/06

Thomas Richter <th...@mersenne.math.TU-Berlin.DE> wrote in
news:drlkno$ohd$1...@mamenchi.zrz.TU-Berlin.DE:

>
> Partially. QI is currently "nothing but" a smart way how to encode a
> permutation of k LPS within n symbols. The question now is, how can
> this be turned into a good model for something. While this is
> particulary easy for AC and its coding methology, it seems
> particularly hard for QI right now.

Yes I agree its nothing but another way to encode a permutation
index. Something that is actaully very easy to do with a proper
arithmetic coder. I notice it looks slick for small strings as
in the example in his paper. However there is still a problem
when one has a large index and one is using it for compression.
You still would have to express that in bits. He keeps stressing
about how it gives the exact answer to some infinite decimal fraction.
Of course this is nonsense. If QI or BS or whatever you want to call it
calculates an index with say 23.72314... bits when you actaully code
the data for compression you would write 24 bits sometimes and
23 bits other times. He really doesn't seem to grasp this simple fact.
The nice thing about using arithmetic codeing to get the index is
that it would automatically give you 23 bits sometimes and 24 bits
other times. No need to know what the fraction is. And once you have
the index at that point if its useful for compression then it could
have been done with arithmetic in the first place.

He is trying hard to find a way to use the index in entropy
compression where arithmetic is king. What he misses is getting
the index itself is no problem one could just use an arithmetic
coder to do that in the first place.

So in the few cases if there is a value for using this index.
The real question one might ask is this. Does his QI work any
better for large strings at getting the index then common ways
that use arithmetic means to get the index. If the anwser is no
then the method is of little use. But even if the anwser is
yes it is a better way to get the index even for large files
you still have the problem of finding a use for it in compression,

nightlight

unread,

Jan 31, 2006, 4:46:36 AM1/31/06

>-- Thomas Richter:

> Partially. QI is currently "nothing but" a
> smart way how to encode a permutation of
> k LPS within n symbols.

Even the released [QIC] source includes additional types of
coders, such as fixed radix, mixed radix and permutation coders
which are entirely different from the binary coder, in types of
addends and in kinds of sets being enumerated. Even in the
single block mode, due to the linearity of their table sizes
which are of size N SWI entries (instead N^/4 for binomial
tables), these additional coders are quite practical and useful
as implemented in [QIC] (you can also change "#define MAXRDIG
1*MEG" in Intro.h, which sets the upper limit on N for these
codes, to a much larger values if needed). They are used by all
QI variants for all bit fraction packaging needs within other
coders. Note that these coders, in addition to being faster than
AC, are also much tighter, especially for very large alphabets,
than anything you can do with AC, since their quantization
excess for N symbols grows as 2*N*log(e)/2^g for _any_ alphabet
size A, while the AC's excess for N symbols in alphabet A grows
as 4*N*A*log(e)/2^g. Further, they produce for a given N,
precisely fixed output size, down to the exact bit fraction, and
this size is available from the quantized powers tables, without
having to decode the index (both these qualities are absent in
AC of Huffman's variant of such codes). These properties are
especially helpful for hierarchical forms of component
packaging.

If you skim over some enumerative combinatorics textbook, such
as Ruskey's [38], you will find great variety of different
un/ranking algorithms for any conceivable combinatorial objects,
all of which can be converted to QI form, by simply taking the
recurrences for their enumerative addends, the path counts in
lattice formulation, and enclosing them into SWI rounding
operator (this procedure is shown in general form in eq. (21)
as QI version of non-quantized eq (5)). Immink's work [26],
contains additional variants which illustrate complex
constraints on the lattice paths.

I have experimented with some of these, and have found tree
coding (see [38], Ruskey thesis & textbook pp. 73-78), quite
useful in encoding complex fragments layouts, especially
those with strong clustering properties. This method is
based on the well known (see misc. Kraft inequality proofs)
1-1 mapping between the sub-intervals of an interval
[0,2^n) and the n leaves of a compact binary tree.
Similarly, for coding fragments in 2D, one would code
quadtrees (these and higher dimensions, can be done
by reusing quantized tables for binary tree addends,
which are quantized ballot numbers, with Catalan numbers
on the diagonal). And, as you probably know, trees,
and Catalan numbers especially, have vast numbers of
other enumerative interpretations.

In short, the possibility " 'nothing but' a smart way how to
encode a permutation of k LPS within n symbols" that you
noticed, is more like "nothing but" the tip of the tip of the
iceberg. If there is one problem at all with the QI and its
form of descriptive modeling, it would be the shear multitude
of different ways to approach and tackle any given problem.
I think this "problem", which is of 'too much of a good thing'
kind, will resolve itself as simple, routine ways emerge for
various common types of problems, so that the people who do
not care to make a career out of enumerative combinatorics
and finite set theory can apply it mechanically. Of course,
there are no 'routine ways' for the first ones through.
Luckily, they're are the type who just enjoy these kinds of
little explored problems where combinatorics and programming
overlap and for them the absence of 'routine ways' is not a
flaw but an attraction.

> For example, to make this a good model, one could choose
> block sizes dynamically, but then would need to find a

> smart way how to find block starts and ends ...

That's exactly what one needs for the quasi-stationary order-0
sequences, the proper contiguous segments. One very simple and
quick approach mentioned earlier is to split the block in
half if there is a "gain", down to some minimum block size. To
compute the "gain", consider a block of length B which uses for
K, the counts of ones, flat LK=log(B+1) bit codes (in
fractional bits if B+1 is not power of 2). { The flat length
codes are useful for very dynamical densities of ones. For
more stationary sequences one can use Huffman codes for binomial
distribution.} In any case, the LK is the length of code for K.
The index for B is in the interval [0,C(B,K)), where C(B,K) is
the quantized binomial. Hence the index length (in fractional
bits) is LB=log(C(B,K)). The total one piece length is then
L=LB+LK (in fractional bits, as always).

One similarly computes the lengths of the two halves L1=LB1+LK1
and L2=LB2+LK2, when coded separately, and the cost S of
indicating the split. In the binary tree form of coding
segmentation, the split S costs only 2 bits (but it requires
2n-1 bits for tree n leaves). A simpler method is to allocate a
split code among the Huffman codes for K, placed usually right
after the sqrt(B) values of K codes, which then yields
aproximately S= 1/2*log(B) bits or log(B) for flat K codes. The
decision to split is then made if L1+L2+S < L.

One weakness of the top-down splitting is that it may get stuck
in a local minima and stop splitting in two, even though a
subsequent split, in 3 or 4 pieces would produce gain. A bottom
up block merge algorithm is immune to this problem. It starts
with some minumum block sizes (e.g. 32 or 64 bits) and
merges the adjacent blocks while L < L1+L2+S holds.

In addition to tree representation (which is useful for
clustered layouts) and Esc codes (useful for very rare splits),
one can use a bitmap (coded via binary coder) to represent start
of each block with bit=1 in the array of 0's (each representing
minimum block size units, which may be as small as 32 bits, a
convenient minimum for speed and simplicity). This is the most
flexible method since one can represent easily arbitrary
segmentation. To determine segments in this scheme, one can
processes the whole array of N bits and slide, step by step
around N, a window of N/2 bits (aligned to block granularity)
and select window position in which the difference between the
K inside and K outside window is maximum. One would then recurse
down into each half, until all segments are within the quantized
addend table size limits NT. Each of these splits on a segment
of size C granularity units requires log(C) bits to encode. Once
the table limits NT are satisfied, the further recursive splits
via sliding window are decided on via the "gain" calculations.

Note that with BWT based segmentation, the contexts which BWT
produces, automatically specify possible boundaries for splits
(or merges in the bottom up approach). But BWT isn't as useful
for order-0 (or any low order) quasi-stationary sequences.

There are many other ways, of course, and exploring them is
quite fun. I will include few segmentation functions when the
multi-block coders get ported into the [QIC] kit (in the near
future).

Note also that these kinds of segmentation criteria and
decisions are implicit in the various adaptations schemes used for
AC. That has been worked out for different kinds of sources over
decades and applying them is usually a routine mechanical work,
so one doesn't notice them any more. Hence, QI doesn't introduce
here some new need but only a finer grained parametrization for
such criteria. If one wants to model dynamical sequences, those
criteria and decisions have to be devised for different kinds of
sources and put into the modeler in some form. There is no way
around it in AC or QI.

But, the existent AC adaptation schemes in various domains can
help short-circuit the whole invention and development process
for QI's descriptive modeler. Since 'adaptation' and
'prediction' are generally a round about, proxied way of
describing the sequence/source dynamics, their translation to
the descriptive language should generally increase the model
accuracy and simultaneously represent a simplification, a
more direct and sharper way of saying the same things. Of
course, anyone doing it needs to understand the particular
source dynamics on its own and the motivation for the related
AC adaptation procedures.

> The question is now how to find good models of audio or
> visual data that match this QI coding paradigm.

At present I am only working on the QI application for
video codec, and only for the differential frame update
data (not the full image) plus the related encoding of
the complex regions.

As to your audio data, the mechanical bit by bit conversion
of AC model is problematic, both for speed and for missed
opportunities offered by more accurate, low noise coder.

For example, the bit planes would be coded much better if one
were coding separate fragments in each bit plane, conditioned on
the bit values above (in the higher level bit planes), which
QI's accuracy allows (provided you don't round up any bit sizes
to whole bits, let alone to whole bytes). If the actual alphabet
contains A distinct symbols, the top plane will have 1 fragment,
the next plane 2 interleaved fragments (determined by the bits 0
or 1 in the plane above)... with grand total of A-1 interleaved
fragments over the log(A) bit planes (see [T1] pp. 32-38). Note
that the fragment sizes need not be transmitted separately since
they can be computed from the symbol frequency table and the
symbol codes. Additionally, the symbol frequency table stats
would be carried from one data batch to the next, which would
cut the table sizes in half.

With your audio coding you do have temporal variation of
the symbol statistics, in addition to the skewed distribution
overall. The data is essentially the second differences
of the original 16 bit audio sample values. The second
differences should be most often close (+ or -) to 0. For
bit plane coding of second differences, one would use
either centered representation instead of signed one,
or a sign bit and the absolute value. Due to non-stationarity,
the key problem is how to get a good segmentation. Once
that is solved (the solution may already exist for some
other purpose, e.g. within voice recognition algorithms),
the encoding of segments could be done via bitmap or
tree method (if the clustering is significant). I would
guess the segmentation could be a fairly complex task to
discover from scratch (recalling some audio filtering
algorithms I played with few years ago).

> For AC, the model is quite clear: Markov chains are
> mathematically well understood, and there is a
> toolchain of mathematics how to deal with them.
>
> Or to put my request in different terms:
>
> Why is the QI coding paradigm a natural one for
> the mentioned data sources?

It is AC that is more natural for those models since they are
_idealized_ models, using infinite sequence limits as
parameters, the same parameters that AC uses in its coding.
QI/EC modeling, which uses finite sequences enumerative
parameters, can at best approximate "infinite sequence limits".

Hence QI modeling is more natural for sequences that one
actually codes, the finite sequences, but not for the idealized
infinite sequences. The latter are in the AC's native domain.
Each coder sees the other domain in lower resolution than the
coder native to that domain does.

While there have been various EC models for Markov chains (from
Cover [1] & Davisson [28], through Tjalkens thesis [36] and
Oktem [35]), they were too cumbersome for higher orders, or too
inaccurate for finite sequences (e.g. Davvisson's "histogram"
method [28]). For practical sequences either BWT (for sequences
with little a priori information) or the context splitting
method sketched earlier as 'method (b)' for using AC modeler,
with proper encoding of context dependencies (similar to the
QI's multi-alphabet decomposition in [T1]), and of course, tight
packaging throughout, should work better (after some practical
tuning).

-- References ( http://www.1stworks.com/ref/RefLib.htm )

QIC. QI C Source code.
http://www.1stworks.com/ref/qi.htm

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
arXiv cs.IT/0511057, 10p, Nov 2005
http://arxiv.org/abs/cs.IT/0511057

38. F. Ruskey "Combinatorial Generation" (textbook)
CSC-425/520 Univ. of Victoria CA,2003
http://www.1stworks.com/ref/RuskeyCombGen.pdf

Ruskey's publications:
http://www.cs.uvic.ca/~ruskey/Publications/

Ruskey's thesis excerpt on lattice enumeration for trees:
http://www.cs.uvic.ca/~ruskey/Publications/Thesis/Thesis.html

26. K.A.S. Immink "A Practical Method for Approaching the
Channel Capacity of Constrained Channels"
IEEE Trans. Inform. Theory IT-43 (5), 1389-1399, 1997
http://www.exp-math.uni-essen.de/~immink/pdf/ultra.pdf

David A. Scott

unread,

Jan 31, 2006, 7:57:35 AM1/31/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138700796....@g14g2000cwa.googlegroups.com:

>>-- Thomas Richter:
>> Partially. QI is currently "nothing but" a
>> smart way how to encode a permutation of
>> k LPS within n symbols.
>
> Even the released [QIC] source includes additional types of
> coders, such as fixed radix, mixed radix and permutation coders
> which are entirely different from the binary coder,

Buy yet in the two cases its been tested it appears to come
up short. You can quote hundreds of lines of text but Richter
is correct its so far "nothing but" what he stated. Sure you
have secret code that does all kinds of things sure you do.

....

> codes, to a much larger values if needed). They are used by all
> QI variants for all bit fraction packaging needs within other
> coders. Note that these coders, in addition to being faster than
> AC, are also much tighter, especially for very large alphabets,
> than anything you can do with AC,

...

Yeah right so far real world test done by Richter show its not
only slower it doesn't compress as well. My test which is not meant
for speed shows that at your A=3 equiprobabil model it falls short
of an arithmetic. What is this bit frcation packagaing you keep
talking about. Why can't you discuss honest simple real world results.
Is it you don't care about reality that you have to keep writting
hundreds of lines of text as if that means anything.

I think the only thing you may have learned so far is that you
are wrong in your A=3 case so now your claiming its going to be better
for very large alphabets. Why should anyone belive you especially
if you can't write code where the common user can use files to test it.

Do you still think this forum is only for your PR and that everybody
should bow dowm and kiss your feet without even testing it on real data?

Bottom line what happened to the so called only fair way you mentioned
to compare QI to arithmetic. Don't you want to be fair? Or do you think
most will tire of this thread like the last one which is nothing more than
self promotiom of your method with out actually testing it or anwsering
any questions people bring up.

Thomas Richter

unread,

Jan 31, 2006, 8:03:40 AM1/31/06

Hi,

> > Partially. QI is currently "nothing but" a
> > smart way how to encode a permutation of
> > k LPS within n symbols.

> Even the released [QIC] source includes additional types of
> coders, such as fixed radix, mixed radix and permutation coders
> which are entirely different from the binary coder, in types of
> addends and in kinds of sets being enumerated.

/* snip */

Look, I don't want to dimish your work, and that's why this
"nothing but" is in quotes. However, that's the QI coder model,
and the question remains why this is a good model. Or for what it is
a good model.

For example, it is a good model for the situation where you draw white
and black balls from an urn, with a total number n = blocksize balls
inside and k white balls. Question is now: Why is this a good model
for something in coding, i.e. realistic audio or video data. It only
gets a zero-order Markov source for block size to infinity, which is
something the codec is not able to.

/* snipped */

You still make nice words, but there's still no hint why this is
a good thing to do.

> > For example, to make this a good model, one could choose
> > block sizes dynamically, but then would need to find a
> > smart way how to find block starts and ends ...

> That's exactly what one needs for the quasi-stationary order-0
> sequences, the proper contiguous segments.

/* snip */

But then again, would it be faster than MQ with that complicated
algorithm? Or, would it be better? There is no realistic test yet,
and thus I would prefer to delay discussions of speed until one
can really see a *realistic* implementation. Which is exactly why
I wouldn't care for speed right now. What you did is to test it in a
specific situation that is optimized for its coding paradigm, but
not in a situation where it can be realistically applied. Thus, that
is why I'm careful.

> If one wants to model dynamical sequences, those
> criteria and decisions have to be devised for different kinds of
> sources and put into the modeler in some form. There is no way
> around it in AC or QI.

Yes, except that it is extremly easy to get a successful scheme
for AC.

> > The question is now how to find good models of audio or
> > visual data that match this QI coding paradigm.

> At present I am only working on the QI application for
> video codec, and only for the differential frame update
> data (not the full image) plus the related encoding of
> the complex regions.

> As to your audio data, the mechanical bit by bit conversion
> of AC model is problematic, both for speed and for missed
> opportunities offered by more accurate, low noise coder.

> For example, the bit planes would be coded much better if one
> were coding separate fragments in each bit plane, conditioned on
> the bit values above (in the higher level bit planes), which
> QI's accuracy allows (provided you don't round up any bit sizes
> to whole bits, let alone to whole bytes).

Aparantly, you haven't yet read thru the code completely since this
is exactly what happens for the MQ *and* the QI implementation. Grey
coding is exactly that: It "predicts" the value of the current bitplane
with that of the bitplane above. The coding step to do that looks
extremly innocent, but it is quite powerful for slowly varying
signals.

> With your audio coding you do have temporal variation of
> the symbol statistics, in addition to the skewed distribution
> overall.

Yes, sure.

> The data is essentially the second differences
> of the original 16 bit audio sample values.

Yes, sure.

> The second
> differences should be most often close (+ or -) to 0. For
> bit plane coding of second differences, one would use
> either centered representation instead of signed one,
> or a sign bit and the absolute value.

Yes, sure. That's why the grey-coding step is done. The topmost
bit is the sign bit, the rest is magnitude. (Think about how
Grey-coding is applied.)

> Due to non-stationarity,
> the key problem is how to get a good segmentation.

Yes, sure. For MQ, this is not a problem since it is an adaptive
AC coding algorithm: It "finds" the statistics by slowly adapting
into it, and by using a "fast attack" entry cycle to it is able to
find the initial distribution quickly.

> Once
> that is solved (the solution may already exist for some
> other purpose, e.g. within voice recognition algorithms),
> the encoding of segments could be done via bitmap or
> tree method (if the clustering is significant). I would
> guess the segmentation could be a fairly complex task to
> discover from scratch (recalling some audio filtering
> algorithms I played with few years ago).

Yes, sure. That's why adaptive AC coding schemes are so successful,
they "segment" implicitly by switching the prediction from the
relative symbol frequency. That's extremly easy to do within AC,
and extremly hard to do for QI.

> > Why is the QI coding paradigm a natural one for
> > the mentioned data sources?

> It is AC that is more natural for those models since they are
> _idealized_ models, using infinite sequence limits as
> parameters, the same parameters that AC uses in its coding.

Yes, sure.

> QI/EC modeling, which uses finite sequences enumerative
> parameters, can at best approximate "infinite sequence limits".

Yes, sure. But then, to restate my question differently: Why are
finite sequence enumerations good models for audio data prediction
residuals? I can find a couple of reasons why Markov models might
be a nice thing to do.

> Hence QI modeling is more natural for sequences that one
> actually codes, the finite sequences, but not for the idealized
> infinite sequences.

Well, my audio sequences are potentially infinite, and at least
a lot larger than your block size, so they are "infinite enough".

> The latter are in the AC's native domain.
> Each coder sees the other domain in lower resolution than the
> coder native to that domain does.

Yes, sure. But then again, "why is the QI coder domain natural
to these audio residuals."

> While there have been various EC models for Markov chains (from
> Cover [1] & Davisson [28], through Tjalkens thesis [36] and
> Oktem [35]), they were too cumbersome for higher orders, or too
> inaccurate for finite sequences (e.g. Davvisson's "histogram"
> method [28]). For practical sequences either BWT (for sequences
> with little a priori information) or the context splitting
> method sketched earlier as 'method (b)' for using AC modeler,
> with proper encoding of context dependencies (similar to the
> QI's multi-alphabet decomposition in [T1]), and of course, tight
> packaging throughout, should work better (after some practical
> tuning).

That's then a different coding domain. Higher order *can* be done
within AC (by using more contexts, quite easily though), just built
the context from the previous symbol and you have a first order Markov
encoder (kinna like what the "context enhanced audio codec" does, it
is a first-order Markov model here). When encoding textual data (as
this posting), we are in a different domain - whether Markov is good
here is a question I cannot answer.

So long,
Thomas

nightlight

unread,

Jan 31, 2006, 4:56:02 PM1/31/06

> Look, I don't want to diminish your work, and that's why this

> "nothing but" is in quotes.

I am only concerned about misunderstanding. It tells me that,
however unlikely:), I might not be explaining it well (that is
indeed a complaint from almost everyone except for the few who
have spent lots of time and effort on the very same circle of
the EC problems, the coding or modeling aspects). In a way it
is a good sign, usually indicating some distance from the
ordinary. (Of course, there are two directions away from
the middle.)

> However, that's the QI coder model

ECMP: The QI "model" (modeling pattern, [T2] p.27) is to
partition (which is some form of decomposition) of the
input sequence into elements of some "enumerative classes"
(sets with ranking functions), followed by ranking of
the elements within their classes and encoding of the
class identifiers (class tags). The latter portion of
the output, can invoke [ECMP] with 'class tags' as
a new and separate input (iterative scheme called
"entropy pump" in [T2]).

> and the question remains why this is a good model.

Assuming the correct "this", the [ECMP], Davisson has answered
[31] this question over three decades ago: It is because
enumeration is "minimax universal". More specifically, consider
a set of all (decodable:) codes C(n,z) (where z labels different
code sets, or coders which compute them) for sequences limited
to n symbols, and where these n-sequences are emitted by some
parametrized class of sources S(n,t) (t=souce parameters). Now
you obtain the average redundancy for each coder 'z' for some
parameters t, the value R(n,z,t) in bits per symbol, then find
the source parameters value tz for which the average redundancy
R(n,z,tz)=[def]=R(n,z) is the largest for that coder (the "max"
part of minimax) i.e. R(n,z) is the worst average redundancy for
a given coder z over all source parameters t. Now you pick the
coder zm with the lowest R(n,z) among all possible coders z,
R(n,zm) =< R(n,z) for all z (the "min" part of minimax). If
R(n,zm) then goes to 0 when n->infinity, then this coder zm (or,
equivalently, its codes) is called minimax universal. And this
zm, the "most desirable" among them all, is the exact enumerative
coder, for variety of source types, including the arbitrary
finite order Markov sources ([28],[33]).

One can thus say that EC is the most resilient against
uncertainties among the coders (the unlimited precision AC, the
ACX, is merely an approximation of EC, hence EC _always_, for
any t, codes tighter than ACX, not just for the single tz value
that Davisson's minimax criterium requires). The QI, being the
_closest finite precision_ coder to the EC, is thus the most
resilient among the finite precision coders. And similarly to
their infinite precision counterparts, regarding the finite
precision AC, QI _always_, for any t, codes tighter than AC i.e.
it has lower redundancy R(n,z,t) for any t & n. In addition to
inheriting the EC vs ACX relation, due to the QI's optimal
quantization (with 4 or more times smaller 'quantization leak'
than AC), QI is doubly 'always tighter' than AC, at any given
arithmetic precision.

That's all without even getting to the QI's advantages in
speed and power consumption (A1), in output stability and
availability (A4), in richer and finer grained parametrization
of finite sequences (A2) (cf. [M1] for A1-A4).

At a more fundamental level, the reason behind the greater
'resilience' (the mimimax universality) of enumerative approach,
for the "real life" finite sequences, is the natural tendency
toward the economy in number of choices (standardization,
uniformization) that characterizes any "natural system" seeking
its optimum (homeostasis, omega point). Namely, the "natural
systems" (from genetic networks, cells, to human immune systems
and brains, to social networks, whole societies and ecosystems,
languages, formalisms and other networks in abstract realms)
are intelligent networks or 'complex systems' [SF], which model
their environments. These models include a model of "self",
which the network runs forward in a what-if game, like a chess
player looking at the moves ahead, enumerating the branching
tree of variants, pruning some and evaluating others, and
selecting the one which yields the optimum "punishments/rewards".
Reducing its cost for these computations, reducing the number
of possible variants that need to be examined, hence the time
and resources needed to compute the steps of its model, is a
very powerful omnipresent 'reward', hence all such systems
seek states with fewer variants to examine, which leads to
standardization and uniformization, and on individual "cell"
level to specializations and preference for routine, rhythms,
periodicity... etc.

The enumerative approach to modeling (ECMP) seeks to
identify the boundaries of these reduced choice spaces,
within which the number of choices is much smaller than
all that is allowed by the blind chance at that level.

The core premise of ECMP is that such boundaries and
the reduced choice sets (enumerative classes) enclosed
inside do exist. And the above is the fundamental reason
for that to be so. That is why the QI way of modeling is
a fundamentally sound method for modeling "real life"
sequences, regardless of particular technical definitions
of redundancy, resilence to "surprises"... etc.

>> For example, the bit planes would be coded much
>> better if one were coding separate fragments in
>> each bit plane, conditioned on the bit values above

>> (in the higher level bit planes),...
>
> Apparently, you haven't yet read thru the code completely

> since this is exactly what happens for the MQ *and* the
> QI implementation. Grey coding is exactly that: It
> "predicts" the value of the current bitplane with that
> of the bitplane above.

You didn't exactly hide it in the code, which says so
plainly:

** bitplane decorrelation by grey-coding
*/
code ^= (code >> 1);
/*

The problem is that Gray code is only a heuristic which may or
may not work, even as just an approximate decorrelator. And
it certainly is not the exact decorrelator as the A-1 node full
tree decomposition to bit planes and their precise plane
fragments (cf [T1]) that I was suggesting. For AC, and
especially for MQ, the coder noise and excess, along with the
coarse grained parametrization, for the small outputs makes the
separate plane fragment coding out of reach. Hence, the whole
bitplane is melded into one coding input, with a little bit of
Gray code on top, as a 'lucky charm'.

Just consider what the "melding" of the two plane fragments
(these are normally interleaved) F0=000...0 and F1=111..1 of n
bits each does to the output size: the 2*log(n) bit output blows
up into a 2*n bit output (which is the phenomenon illustrated in
the row "Vary" in [T4] and p.10 [T3]). These kinds of problems
with "melding" of exact bit plane fragments into a single bit
plane are even more pronounced in the image coding.

As pointed out in (A2), the high precision, low noise coder
allows one to code optimally these kinds of small (which are
also often interleaved) bit plane fragments, provided one does
no rounding up of the encoded fragments to the next whole bit,
or god forbid, to the next whole byte. These AC rounding up
habits are mostly automated unconscious actions since the AC
index normalization to 1.0 obliterates the index 'bit fraction',
hence the concept of 'index bit fraction' is completely foreign
to most AC programmers (unless they read Rissanen's early AC
papers [5],[27], where the codeword bit fractions and codeword
overlaps have a key part in the story on generalized Kraft
inequality).

The neat part with the enumerative coding of bit plane fragments
is that the original symbol counts give the exact bit fragments,
their uncompressed interleaved layouts, their 'counts of ones'
and their precise compressed sizes (so one can e.g. code them in
one pass into precomputed tightly packed back-to-back slots
ready to ship).

>> Due to non-stationarity, the key problem is how to get
>> a good segmentation.
>
> Yes, sure. For MQ, this is not a problem since it is
> an adaptive AC coding algorithm: It "finds" the
> statistics by slowly adapting into it, and by using
> a "fast attack" entry cycle to it is able to find the
> initial distribution quickly.

There is no basis to claim that "adaptive" way of determining
the segment boundaries is the "best" or even a "good" way to do
it, in any sense, let alone in every sense. It is just one way,
a round about and imprecise, done in half hearted nudges (it's
also done bit by bit in the middle of the coding loop, hence
dragging the speed down), to state a very simple fact such as:
this is where segment S0 ends and where segment S1 starts, which
is how descriptive modeler would say it, once it determined the
segment boundaries in a straightforward manner (e.g. via the
methods suggested earlier).

Finding it and transmitting it 'adaptively' is "simple" because
it is so coarse and leaky and, even more importantly, because
someone had already thought it up decades ago and it was
implemented in the coders, taught to students and it comes ready
to use, so one doesn't need to invent it. In every other
respect, it is inferior to directly measuring the property of
interest in a way suited to the property itself, and then
plainly reporting the precise results in terms closest to the
quantity measured.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

28. L.D. Davisson "Universal noiseless coding"
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973
http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf

33. R. Krichevsky, V. Trofimov
"The performance of universal encoding"
IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981

http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf

12. J.C. Lawrence "A new universal coding scheme for the
binary memoryless source"
IEEE Trans. Inform. Theory IT-23 (4), 466-472, 1977
http://citeseer.ist.psu.edu/lawrence77new.html

T1. R.V. Tomic "Fast, optimal entropy coder"
1stWorks TR04-0815, 52p, Aug 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T2. R.V. Tomic "Quantized indexing: Background information"
1stWorks TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
arXiv cs.IT/0511057, 10p, Nov 2005
http://arxiv.org/abs/cs.IT/0511057

T4. One page summary of [T3]
http://www.1stworks.com/ref/TR/DCC2006.pdf

M1. -- Summary of QI algorithmic advances (A1)-(A4)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

SF. Santa Fe Institute (of "Complexity science")
http://www.santafe.edu/

Thomas Richter

unread,

Feb 1, 2006, 10:24:34 AM2/1/06

Hi,

> > Look, I don't want to diminish your work, and that's why this
> > "nothing but" is in quotes.

> I am only concerned about misunderstanding. It tells me that,
> however unlikely:), I might not be explaining it well (that is
> indeed a complaint from almost everyone except for the few who
> have spent lots of time and effort on the very same circle of
> the EC problems, the coding or modeling aspects). In a way it
> is a good sign, usually indicating some distance from the
> ordinary. (Of course, there are two directions away from
> the middle.)

I know I'm slow. Don't worry. Up to now, all information comes thru
to the inner parts of my brain. It just requires a bit to diffuse
there.

> > However, that's the QI coder model

> ECMP: The QI "model" (modeling pattern, [T2] p.27) is to
> partition (which is some form of decomposition) of the
> input sequence into elements of some "enumerative classes"
> (sets with ranking functions), followed by ranking of
> the elements within their classes and encoding of the
> class identifiers (class tags). The latter portion of
> the output, can invoke [ECMP] with 'class tags' as
> a new and separate input (iterative scheme called
> "entropy pump" in [T2]).

Yes, understood.

> > and the question remains why this is a good model.

> Assuming the correct "this", the [ECMP], Davisson has answered
> [31] this question over three decades ago: It is because
> enumeration is "minimax universal". More specifically, consider
> a set of all (decodable:) codes C(n,z) (where z labels different
> code sets, or coders which compute them) for sequences limited
> to n symbols, and where these n-sequences are emitted by some
> parametrized class of sources S(n,t) (t=souce parameters). Now
> you obtain the average redundancy for each coder 'z' for some
> parameters t, the value R(n,z,t) in bits per symbol, then find
> the source parameters value tz for which the average redundancy
> R(n,z,tz)=[def]=R(n,z) is the largest for that coder (the "max"
> part of minimax) i.e. R(n,z) is the worst average redundancy for
> a given coder z over all source parameters t. Now you pick the
> coder zm with the lowest R(n,z) among all possible coders z,
> R(n,zm) =< R(n,z) for all z (the "min" part of minimax). If
> R(n,zm) then goes to 0 when n->infinity, then this coder zm (or,
> equivalently, its codes) is called minimax universal. And this
> zm, the "most desirable" among them all, is the exact enumerative
> coder, for variety of source types, including the arbitrary
> finite order Markov sources ([28],[33]).

There is one thing in this argumentation chain I don't understand:
The minimax optimizer (the enumerative coder) talks about encoding
messages of size n, and as such, is able to encode messages of size
n. In which sense is it optimal for a *random source* where the
characteristics of the latter is only found for n->infinity? Do
you mean "the expectation value of the redundancy" then?

What are the "source parameter" here for the Markov setting? The
conditioned probabilities of the Markov chain p(x=X_n|X_n-1,...X_n-k)?
If so, then enumerative coders will provide optimal redundancy for the
"worst possible" Markov source (likely: No correlation?) If so, then
what is the power of the theorem, because it doesn't tell me that
enumerative coding is best for "interesting" sources.

> That's all without even getting to the QI's advantages in
> speed and power consumption (A1), in output stability and
> availability (A4), in richer and finer grained parametrization
> of finite sequences (A2) (cf. [M1] for A1-A4).

Again, don't argue with speed, because you're arguably wrong here, and
I think the problem has been demonstrated: It requires, too, time to
setup the enumerative classes for the QI, and to make use of the QI
enumerative coding for a compression gain, you *do* need to do
that. Thus, if there is a speed benefit, it is "eaten up" by the
modelling cost. So please, I think that a) it is at this time completely
unnecessary to argue about speed, because QI isn't yet evolved enough
to make this an interesting argument, and b) because in realistic
scenarious, it is *not* faster, as demonstrated, because more brain
power has to go to other directions.

/* snip */

> The enumerative approach to modeling (ECMP) seeks to
> identify the boundaries of these reduced choice spaces,
> within which the number of choices is much smaller than
> all that is allowed by the blind chance at that level.

It remains to be answered why then this is a good paradigm
for encoding of realistic sources. I have seen your Markov
chain argument, though, but I do not quite understand in which
sense does hold. And, *if* one accepts that non-stationary
random sources are a model for realistic sequences in audio or
video coding, then, what carries over?

> The core premise of ECMP is that such boundaries and
> the reduced choice sets (enumerative classes) enclosed
> inside do exist. And the above is the fundamental reason
> for that to be so. That is why the QI way of modeling is
> a fundamentally sound method for modeling "real life"
> sequences, regardless of particular technical definitions
> of redundancy, resilence to "surprises"... etc.

Is it? I do not feel convinced; for example, if I have an audio
signal, it is likely much more convincing to think of parts of the
signal generated by white noise. And even *if* I want to think in
enumerative classes, one major problem remains, namely: To identify
these classes. (The major QI problem, in fact.) It is not effective
(as demonstrated) that classical modelling techniques (context
modelling, prediction,...) are useful here. You proposed a couple of
examples how to find such classes (e.g. split the sequence by gradient
methods,...) but all of them are either algorithmically hard, or only
approximations (heuristics). Then, the question is again: In which sense
is such a combined method optimal? I might find optimal, but NP-complete
versions (thus, not really usable in practical applications), or
polynomial algorithms that then do not work much better than AC.

> > Apparently, you haven't yet read thru the code completely
> > since this is exactly what happens for the MQ *and* the
> > QI implementation. Grey coding is exactly that: It
> > "predicts" the value of the current bitplane with that
> > of the bitplane above.

> You didn't exactly hide it in the code, which says so
> plainly:

> ** bitplane decorrelation by grey-coding
> */
> code ^= (code >> 1);
> /*

Not "hidden", not fully commented out, though. (-: Primitive, sure.
But it is in so far effective as for "continuous" behaivour of
the input, only one bit changes for an increment of +/-1 of
the source.

> The problem is that Gray code is only a heuristic which may or
> may not work, even as just an approximate decorrelator.

Definitely. The code is not claiming to be optimal as it stands.
It is claiming to be simple enough.

> And it certainly is not the exact decorrelator as the A-1 node full
> tree decomposition to bit planes and their precise plane
> fragments (cf [T1]) that I was suggesting.

Ok, so the question would rather be: Given that this is implemented,
I still do not see why QI should perform better. If I *remove* the
bitplane dependency in the context modelling with MQ, I still get
better results. /-:

> For AC, and
> especially for MQ, the coder noise and excess, along with the
> coarse grained parametrization, for the small outputs makes the
> separate plane fragment coding out of reach. Hence, the whole
> bitplane is melded into one coding input, with a little bit of
> Gray code on top, as a 'lucky charm'.

Pretty much "ad hoc". Sure.

> Just consider what the "melding" of the two plane fragments
> (these are normally interleaved) F0=000...0 and F1=111..1 of n
> bits each does to the output size: the 2*log(n) bit output blows
> up into a 2*n bit output (which is the phenomenon illustrated in
> the row "Vary" in [T4] and p.10 [T3]). These kinds of problems
> with "melding" of exact bit plane fragments into a single bit
> plane are even more pronounced in the image coding.

I don't understand. What is "n"?

> >> Due to non-stationarity, the key problem is how to get
> >> a good segmentation.
> >
> > Yes, sure. For MQ, this is not a problem since it is
> > an adaptive AC coding algorithm: It "finds" the
> > statistics by slowly adapting into it, and by using
> > a "fast attack" entry cycle to it is able to find the
> > initial distribution quickly.

> There is no basis to claim that "adaptive" way of determining
> the segment boundaries is the "best" or even a "good" way to do
> it, in any sense, let alone in every sense. It is just one way,
> a round about and imprecise, done in half hearted nudges (it's
> also done bit by bit in the middle of the coding loop, hence
> dragging the speed down), to state a very simple fact such as:
> this is where segment S0 ends and where segment S1 starts, which
> is how descriptive modeler would say it, once it determined the
> segment boundaries in a straightforward manner (e.g. via the
> methods suggested earlier).

No, it is not precise, of course. It is not provably optimal in
any sense (except for some artificial sexamples maybe), but
it is nevertheless a very practical thing to do. Getting *exact*
segment boundaries and finding the optimum there is a hard
problem. Thus, to argue here about coding speed that is dragged
by the context adaption goes completely in the wrong direction: Sure
it drags a bit, but not even closely as much as identifying the
segments would. So we're down at the "useful but not provable" and
"provably optimum" discussion, and *at that level*, please do not
argue about speed; it's the wrong game for that.

> Finding it and transmitting it 'adaptively' is "simple" because
> it is so coarse and leaky and, even more importantly, because
> someone had already thought it up decades ago and it was
> implemented in the coders, taught to students and it comes ready
> to use, so one doesn't need to invent it. In every other
> respect, it is inferior to directly measuring the property of
> interest in a way suited to the property itself, and then
> plainly reporting the precise results in terms closest to the
> quantity measured.

But *obtaining* the precise results is, then again, a hard thing to
do. It sounds so simple if you say you just "plainly" need get the
results. (Aka Monty Python's instructions how to play a flute: Blow
in at one end, move fingers over the holes: Correct, but useless.)

Thus, I afraid, if you want to convince anyone here that you can play
a flute - err, compress messages - you definitely would need to
demonstrate with non-artificial, real-life sequences because *much more*
problems come to play here.

So long,
Thomas

nightlight

unread,

Feb 2, 2006, 3:32:37 AM2/2/06

> The minimax optimizer (the enumerative coder) talks
> about encoding messages of size n, and as such, is
> able to encode messages of size n. In which sense
> is it optimal for a *random source* where the
> characteristics of the latter is only found
> for n->infinity?

The largest source class for which Davisson [28] shows
EC to be minimax universal is "composite source", which
is a source that "randomly" selects probability p (unknown
to coders) and remains at that p for n->inf. All values
of p are considered equally likely. Hence his result is
relatively narrow. The KT81 paper [33] covers more source
classes (and coding methods), including stationary finite
order of Markov sources for which Davisson [28] shows
only weaker types of minimax universality via the
"histogram method" of EC coding. { That method is a
variant of an earlier (1966) Fitingoff algorithm and
its improvements by Babkin (cf. [37] p. 51-52). }

> MODELING SPEED:... problem has been demonstrated:

> It requires, too, time to setup the enumerative classes
> for the QI, and to make use of the QI enumerative coding
> for a compression gain, you *do* need to do that. Thus,
> if there is a speed benefit, it is "eaten up" by the
> modelling cost.

Well, what was demonstrated is that the partition into
enumerative classes for various types of sources, especially
non-stationary ones, is an open research problem. There is
no a priori reason that a segmentation at least as good as
AC's (implicit in its adaptation algorithm) for the type of
audio data you tested, should be any slower than the AC
modeler.

In general, though, all one could say in advance is that QI
modeling should be faster due to QI's cleaner division of
labor between coder & modeler. Namely, the QI modeler is
_not constrained_ to produce its output for the coder symbol
by symbol, as it is with AC modeler. Since the less
constraint implies more freedom for the modeling algorithm,
on that basis alone the QI modeler should do the same job
faster if one takes advantage of that freedom (and the
resulting economies of scale). Consider for example the BWT
method of separation of contexts of unlimited order vs
unlimited order PPM. BWT will do it much faster since it is
not constrained to produce probability distribution for all
possible symbols on each step or to produce any probability
at all (or to deal with zero frequency or esc probabilities
problems either), But it will produce segmentation, exactly
of the kind on which QI can work directly (after it decides,
via the segment merge or split iteration, which of the
segment boundaries provided by BWT to keep and which to
remove; that is similar to non-stationary source
segmentation discussed, except that here QI doesn't need
to compute available boundaries).

But the EC/QI native segmentation algorithm, in the style of
BWT, for that kind of source still needs to be invented. It
may be something similar to the segmentation variants I
experimented with, as sketched in the earlier response. Or
it could be something much better (in output quality &
speed), provided one takes advantage of the nature of audio
data i.e. that you have a "small" number of harmonic
oscillator sources, which are activated quasi-peridically
(or in a poissonian manner), by being "slowly" phased in and
out, superposing their outputs with some low amplitude
gaussian noise.

> I have seen your Markov chain argument, though, but I do
> not quite understand in which sense does hold.

The form described earlier was only for stationary & "known"
Markov source (i.e. no sampling is needed to determine order
or to establish stationarity; it may but need not know what
the stationary probabilities are). For that type of source
it should produce the optimum output, since that is same
as the multi-alphabet case, except that instead of code
tree for A symbols it has a transition graph with M=A^m
nodes (for order m source, alphabet size A), each node
with A "in" and A "out" branches. Each node is outputing
to its probability class index. Multiple nodes may output
to the same index if probabilities are close enough (e.g.
within 1/sqrt(n) distance for node input of n symbols). If
the probabilities are not known (still assumed to be
stationary), but sampled instead, the number of
probability classes may start with a single class and
increase as the probabilities for different states
become statistically distinguishable.

In practical implementation one could use fixed depth
variant of BWT which should work much faster than processing
contexts symbol by symbol, outputing to different indices
and jumping on each symbol to the next state. One would only
make sure that nodes in the same probability classes are
assigned BWT alphabet values next to each other. The encoder
can establish this in the 1st pass, when counting context
occurrences, then sort the empirical probabilities, then
decide how many classes it will use and which are the
contexts in each class, then assign BWT alphabet so that
contexts in the same class get consecutive BWT symbol values.

Note also that as far as index computations are concerned,
each set of nodes in the same probability class, is a
single stationary source. Hence, all the regular QI
coding economies apply, no coding operations on MPS and
simpler coding operations on LPS.

> And, *if* one accepts that non-stationary random sources
> are a model for realistic sequences in audio or video
> coding, then, what carries over?

The non-stationary Markov source problem is similar to
a set of non-stationary order-0 sources. The non-stationarity
will normally double the size of model information (class
tags, i.e. the counts), as it does with order-0 sources
(e.g. where counts of 1's get coded into flat log(n+1) bits).

> And even *if* I want to think in enumerative classes, one
> major problem remains, namely: To identify these classes.
> (The major QI problem, in fact.) It is not effective
> (as demonstrated) that classical modelling techniques
> (context modelling, prediction,...) are useful here.

The identification of classes for non-stationary source
is the segmentation problem already discussed. There
is nothing magical (or optimal in any sense) about the
adaptive way to identify such segments. The adaptive
method in effect draws some curve with small absolute
value of the slope as a boundary. That is a very coarse
kind of boundary (as test array "Vary" illustrates).
A descriptive modeler can obtain such quality boundary
as cheaply as adaptive, probably cheaper. But it also
has other options available (as sketched before) to
get a much more faithful boundary.

> but all of them are either algorithmically hard,
> or only approximations (heuristics).

The boundaries implicit in the adaptive algorithm for non-
stationary sources are much more approximate than even the
simplest quick n' dirty split methods sketched earlier (as
illustrated in the row "Vary", [T4]). The only way for
adaptive algorithm to avoid such drastic failures is to at
least partially switch to descriptive mode (via some ad hoc
Esc probabilities). If one had to pick just one advantage of
descriptive over adaptive modeling, it would be precisely
for non-stationary sources. Regardless of various technical
definitions of minimax universality or any other kind of
resilience, the basic reason for such advantage is that the
effects of "surprise" (the segment boundaries) are sharply
delimited to the model information which is generally of
O(log(N)) size instead of affecting the main O(N) output.
With iterative modeling method ("entropy pump" in [T2] p.
27), the model information itself is modeled if it is
large enough, further limiting the effects of the
"surprise" to log(log(N)) ... etc.

> Ok, so the question would rather be: Given that this
> is implemented, I still do not see why QI should
> perform better. If I *remove* the bitplane dependency
> in the context modelling with MQ, I still get better
> results. /-:

To see what you would get with proper coding of bit
plane fragments, you simply need to compute the exact
multi-alphabet entropy on the second differences
of the raw audio samples (which is equivalent to
your linear interpolation). Non-stationarity
complicates computation of such entropy. One
would need at least some crude form of segmenting
instead of using fixed blocks, e.g. just using
some sliding average (of say, 32 symbols, or an
average with exponential decay, for example:
Avg[i+1]=x[i]+Avg[i]/4) and a few thresholds (e.g.
16 to match the 16 bit planes) to set the boundary
line whenever threshold is crossed. Note also that
this segmenting would work on full (16+2) bit
alphabet values (which are 2nd difference
values, hence they may need 2 more bits if the
raw samples span the full 16 bits range), not
on bits or bit planes. The entropies for 16 classes
of fragments would be computed separately. One
would also add the costs for 16 frequency tables
and for the segment boundaries.

>> Just consider what the "melding" of the two
>> plane fragments (these are normally interleaved)
>> F0=000...0 and F1=111..1 of n bits each does to
>> the output size: the 2*log(n) bit output blows

>> up into a 2*n bit output...

>
> I don't understand. What is "n"?

That is just a 'local variable', a symbol for that
paragraph, denoting some length of each fragment,
set to same value n for F0 & F1 to get a simple
expression on what the mixing entropy would look like.

>> to use, so one doesn't need to invent it. In every
>> other respect, it is inferior to directly measuring
>> the property of interest in a way suited to the
>> property itself, and then plainly reporting the
>> precise results in terms closest to the quantity
>> measured.
>
> But *obtaining* the precise results is, then again,
> a hard thing to do. It sounds so simple if you
> say you just "plainly" need get the results.

Obtaining a provably optimal segmentation may be hard. But
to beat the adaptive segmentation, even the crudest direct
methods (e.g. even just an option of deciding on 0-3 splits
of a default block size of say 1K; in real application all
such calculations can work from a small table of precomputed
entropies for the 4 block sizes and all valid counts for
each) will already work better than adaptive method.

You can compute the effect of that simple segmentation
without dealing with coders, by first computing sum of
L=log(C(n,k)) for the blocks selected (there is a function
logbc() in Swi.c). A rough decision to split a block in half
could use "if (L > L1+L2+log(n)) => split". During that
pass, which is on the whole bitplane data, you also collect
counts of values k (which are the counts of 1's in each
block), separately for the 4 block sizes allowed, and
similarly counts of block size indicators which have values
0-3. These last 5 frequency tables will give you 5
entropies whose sum is the cost of the model info, which
gets added to the binomial entropies computed earlier.
Hence after 1 pass over the main data, plus one pass over
the 5 tables you can obtain the effectiveness of that
kind of very simple direct segmentation for your data.

> you definitely would need to demonstrate with
> non-artificial, real-life sequences because
> *much more* problems come to play here.

I agree that this is important, especially for non-
stationary sources. But from several different angles,
be it from artificial samples and various entropy
measurements on artificial and some real data (from
video frame differencing), or minimax universality
results or general heuristic considerations and
estimates, all indicators are that the 'descriptive'
modeling (as described in [T3]), in combination with
a high precision coder such as QI, will work much
better (something like row "Vary") on the "real"
(less predictable) data than AC + adaptive methods.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

28. L.D. Davisson "Universal noiseless coding"
IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973
http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf

33. R. Krichevsky, V. Trofimov
"The performance of universal encoding"
IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981

http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf

37. V.N. Potapov "Theory of Information"
(in Russian), Novosibirsk Univ. 1999
http://www.1stworks.com/ref/posob.pdf

nightlight

unread,

Feb 2, 2006, 3:49:15 AM2/2/06

-- Typo correction:

> (of say, 32 symbols, or an average with exponential decay,
> for example: Avg[i+1]=x[i]+Avg[i]/4)

The last expression should say:

Avg=x[i]+Avg*3/4

or more generally (for quickly computable exponential decay averages):

Avg=x[i]+Avg*(2^s-1)/2^s

for s=1,2,... (the longer s the longer the memory of Avg). For
stationary aray x[i]=x, the Avg stabilizes as Avg -> x*2^s.

Nicolas Le Gland

unread,

Feb 2, 2006, 6:08:16 AM2/2/06

So far, nightlight has posted 93 messages for a total amount of 624 495
bytes (headers included). KZip was able to compress it down to a 302 948
bytes archive, WinRAR achieved 136 062 bytes, whereas 7-Zip using PPMd went
down to 114 571. From the 696 320 bytes TAR archive, arb255.exe was able to
code all this prattling in 448 924 bytes. Impressive.

What I don't understand is why does he spends so many time quoting thousands
of lines of text and why I can't download a simple "nightlight approved"
test executable I could run on my files, like "qienc.exe news.tar
news.tar.qi" and "qidec.exe news.tar.qi news.tar". I couldn't care less for
compression speed, all I would have liked to was being able to feel it's
"always better than arithmetic" compression.

Nicolas Le Gland
- Not being a native speaker, but a natural born punk, all mistakes and
typos were intended to serve general amusing.

nightlight

unread,

Feb 2, 2006, 11:43:32 AM2/2/06

> So far, nightlight has posted 93 messages for a total amount
> of 624 495 bytes (headers included). KZip was able to compress
> it down to a 302 948 bytes archive, WinRAR achieved 136 062 bytes,

> whereas 7-Zip using PPMd went...

Our wise elders would say:
http://www.google.com/search?hl=en&q=%22margaritas+ante+porcos%22&btnG=Google+Search

> why I can't download a simple "nightlight approved"
> test executable I could run on my files, like "qienc.exe news.tar
> news.tar.qi" and "qidec.exe news.tar.qi news.tar".
> I couldn't care less for compression speed, all I
> would have liked to was being able to feel it's
> "always better than arithmetic" compression.

What the [QIC] kit is meant to show is described here:
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

To whom and why:
http://groups.google.com/group/comp.compression/msg/6d8dcafd8b947ea1

> all I would have liked to was being able to feel it's
> "always better than arithmetic" compression.

For that, you don't even need the source or executable test. Just a bit
of math to work through [T3], especially p. 8, for QI and Volf's or
Tjalken's thesis for in depth AC formulation, plus Ryabko for finite
precision AC redundancy. In [T2] pp. 19-25, you can see that QI and AC
are calculating exactly the same codewords (addends), except that QI
does most of the work outside of the coding loop, where it can afford
the time to construct the shortest codewords that exist (and that are
still decodable, eq. (20) [T3], [T4]) at given finite precision.

36. T.J. Tjalkens "Efficient and fast data compression codes for
discrete sources with memory" Ph.D. thesis, Eindhoven University of
Technology, Sep 1987
http://alexandria.tue.nl/extra3/proefschrift/PRF5B/8709277.pdf

41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory
and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec
2002
http://alexandria.tue.nl/extra2/200213835.pdf

41b. B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive
Arithmetic Coding"
Proc. 7th IMA Intern. Conf. on Cryptography and Coding, 1999

T3. R. V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"

arXiv cs.IT/0511057, 10p, Nov 2005
http://arxiv.org/abs/cs.IT/0511057

T4. One page summary of [T3]
http://www.1stworks.com/ref/TR/DCC2006.pdf

T2. R. V. Tomic "Quantized indexing: Background information"

Sachin Garg

unread,

Feb 2, 2006, 12:46:23 PM2/2/06

nightlight wrote:
<ramaining post snipped>

> In practical implementation one could use fixed depth
> variant of BWT which should work much faster than processing
> contexts symbol by symbol, outputing to different indices
> and jumping on each symbol to the next state. One would only
> make sure that nodes in the same probability classes are
> assigned BWT alphabet values next to each other. The encoder
> can establish this in the 1st pass, when counting context
> occurrences, then sort the empirical probabilities, then
> decide how many classes it will use and which are the
> contexts in each class, then assign BWT alphabet so that
> contexts in the same class get consecutive BWT symbol values.

I only knew of one BWT, which is used for data compression. What do you
mean by "fixed depth variant"?

btw, once you mentioned QI being a natural backend for BWT. If that is
correct, then BWT+QI can be very useful for pratical purposes and also
prove practical usefulness of QI. (I presume by natural you meant
better and faster compression)

Why not just implement that... is there any part of it which you havent
figured out yet?

Sachin Garg [India]
http://www.sachingarg.com

nightlight

unread,

Feb 2, 2006, 1:35:12 PM2/2/06

> I only knew of one BWT, which is used for data compression.
> What do you mean by "fixed depth variant"?

That is Michael Schindler's BWT variant, where only the first k columns
are sorted (where k is much smaller than the array size n). See the
refs below:

http://www.compressconsult.com/st/
http://www.google.com/search?hl=en&q=schindler+bwt&btnG=Google+Search

> Why not just implement that... is there any part of it which
> you havent figured out yet?

If I had only QI to work on over the last year and half, that would
have been done. But with all the other demands on my time, other than
some experiments to measure potential gains, the real work on it will
have to wait, unless someone else takes interest in that in the
meantime. That's why the source & tech reports were released and ideas
shared here and elsewhere -- there are so many interesting unexplored
paths opened up by the arrival of high precision, low noise coding
algorithm and the associated 'descriptive' modeling pattern, that no
single or few persons, even working on it full time, could even begin
to cover.

David A. Scott

unread,

Feb 2, 2006, 10:10:58 PM2/2/06

"nightlight" <night...@omegapoint.com> wrote in
news:1138898612.6...@o13g2000cwo.googlegroups.com:

>> all I would have liked to was being able to feel it's
>> "always better than arithmetic" compression.
>
> For that, you don't even need the source or executable test. Just a
> bit of math to work through [T3], especially p. 8, for QI and Volf's
> or Tjalken's thesis for in depth AC formulation, plus Ryabko for
> finite precision AC redundancy.

You know maybe if it wasn't so easy to write an arithmetic compressor
that actaully beats what nightlight claimed was the most fair test.
One might think there is some gems in his long posts. But from the fact
that it fails to do as claimed its no wonder nightligt is trying to
steer you away from scource code or executables where one can run ones
own tests on files and see the facts themselves. Nightlight does not
really want you to think for yourself. He is more than willing to
do that for you.

So the "always better than arithmetic" compression is like
a lot of his logic it's just plain wrong. He has quoted so many texts
over and over written before his QI I am surprised he hasn't quote
the Holy Books to support his postions I am waiting to see which
Holy Book he first uses to support his claims anyone have a guess?

nightlight

unread,

Feb 3, 2006, 7:09:29 AM2/3/06

> You know maybe if it wasn't so easy to write an arithmetic
> compressor that actaully beats what nightlight claimed was
> the most fair test.

I may have missed that post that reported some AC (on
that test A=3, N=10^6) get better figure than the QI.exe
from the QI source kit. Can you tell me which AC did
that and how many bits did it produce on average for
this set?

To remind you, QI figure on the set of all strings of N=10^6
symbols in alphabet size A=3, is 1.6163e-4 bits above
the lower bound N*log(3) bits, for _every_ string in the set.
The QI.exe source which produces this result is publicly
available at:

http://www.1stworks.com/ref/qi.htm

and the QI.exe command line which runs the test on 1000
random samples from the set of all strings A=3, N=1e6 is:

QI.exe i1000 n1e6 cr3

Since any result with average on this set below N*log(3)
bits would not be decodable, the margin for any coder to
beat QI is that little gap of 1.614e-4 bits between the QI
result and the N*log(3) lower bound.

Note that this lower bound N*log(3) does not include the
cost of coding N and A values i.e. it is a lower bound for
all coders which are given these parameters (which define
the set of all possible messages in the test) as the input
on encode & decode (along with the raw or compressed
data). To normalize the test for different coding conditions
e.g. where coders don't receive N and A as input parameters,
one would need to add the same cost of N and A to the
coders' output sizes (if it doesn't include it) _and_ to the
lower bound N*log(3) (since it does not include it).

Any AC (or any other coder) with claimed smaller output
needs to show a better average on at least 1000 randomly
selected inputs (with random seed for the sample generator
allowed to vary from test to test) from that set and come with
source that backs up the claim. I would surely be curious
to see an AC, or any other coder which can do it.

Note though that even the infinite precision AC will produce
an average which is between 1 and 2 bits above N*log(3),
hence for N=10^6, the exact AC can't get closer than QI
(it would take a string more than 6 thousand times longer
than N=10^6 symbols before the QI redundancy will reach
1 bit above the N*log(3) limit, at which point the infinite
precision AC will a theoretical chance, albeit astronimically
small, to break even with QI).

The exact enumerative coder would produce a smaller
output than QI, but it would need to do O(N) multiplications
with arithmetic precision of over 1.5 million bits (QI codes it
using O(N) multiplications in 32 bit precision, which is of
course what the key advance was).

As to the test QI vs MQ reported in this thread, however valid
the idea of trying to test on real world problem, in its present
realization it is equivalent of having a race where each racer is
sent off from their home in the morning to go run a mile somewhere,
and the referees with stop watches drive off in the evening, after
their work, to visit each racer's home and check whether the racer
is back yet so they can press the stop buttons. In other words, the
noise in the test was far greater than the signal being measured
(which is the coder differences in speed and coding efficiency).

Message has been deleted

Nicolas Le Gland

unread,

Feb 3, 2006, 8:24:10 AM2/3/06

> To remind you, QI figure on the set of all strings of N=10^6
> symbols in alphabet size A=3, is 1.6163e-4 bits above
> the lower bound N*log(3) bits

To remind you, arb255 on the set of all your posts of N=696320 bytes in TAR
size is 448924 bytes, but I can't find a way to test QI on that specific
input and decode it back.

> In other words, the noise in the test was far greater than the signal
> being measured (which is the coder differences in speed and
> coding efficiency).

In other words, the noise of quoted text in your posts is far greater than
the signal of working compressor and decompressor we could try (which is the
end of any fun made of you and recognition for your work).

nightlight

unread,

Feb 3, 2006, 8:38:38 AM2/3/06

Non sequitur.

David A. Scott

unread,

Feb 3, 2006, 9:10:11 AM2/3/06

nightlight <nightlight...@and-this.omegapoint.com> wrote in
news:OJidnQMA864y1X7e...@rcn.net:

> > You know maybe if it wasn't so easy to write an arithmetic
> > compressor that actaully beats what nightlight claimed was
> > the most fair test.
>
> I may have missed that post that reported some AC (on
> that test A=3, N=10^6) get better figure than the QI.exe
> from the QI source kit. Can you tell me which AC did
> that and how many bits did it produce on average for
> this set?

I really don't think you missed it. I supplied code
that will compress any file of the stated three symbols
bijectively without gaps to another file. I realize that
QI is so inflexable that you could not port your QI to actually
work with test files that anyone could use. After all it would
take you a very very long time for such a task. I am quite sure
you have looked at my code. You have alluded to the challange
or test in some of your voluminous posts. So you know it exists.

You give a figure for what you cliam it would take to exactly
code this for the the case of N=10^6 Its quite possible you write
so much you forget this exact number. You may have brushed it
aside since the whole concept of gapless arithmetic compression
appears to be something you don't see as possible. Even though
I have working code. But I admit I am not god nor do I have the
same faith in code you have. Murphy has bit me in the ass before.
I do feel confident that you have people pouring over my code
maybe you found a file that take one byte more its sure is nice
to have code that works with files so anyone can test it. Have
your people found such a file if so could I or others here see it?

But I know you seem able to compress data to buffers of some
fixed length. So I added the three bytes needed to do a poor
nonoptimal conversion to the length field so one would see how
much space the data could take in a buffer.

One possible partly relavent post of your is
http://groups.google.com/group/comp.compression/browse_frm/thread/7053c23c0
d01c81c/1e015f38d228969e?lnk=st&q=1584993&rnum=1&hl=en#1e015f38d228969e

QUOTE ON
Hence the total decodable output for N=10^6 symbols, A=3 is:

DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes

That size is fixed, the same for all inputs.
QUOTE OFF

My post is
http://groups.google.com/group/comp.compression/browse_frm/thread/7053c23c0
d01c81c/299934202277c192?q=198%2C124&rnum=1#299934202277c192

QUOTE ON
http://bijective.dogma.net/nitelite.zip

is way off base and doesn't seem to compress to
1584992 bits or 198124 bytes. That is if I put it
to a buffer and included in a poor way the length
field. Actully the compressed file length was
198121 bytes.
QUOTE OFF

That was actually my last post on that long thread. I give up
on trying to get you to actually think in that thread. You only
seemed interesting in going off on tangents Maybe now you can more
open. It was truely depressing to see you arguing about the fact that
the test cases I choose which where designed to stress the coder and to
ensure that it was actaully compressing each sumbol to log 3 bits.
Which is what it was all about. You seemed to mock it for not compressing
as small as the zip file that was supplied. I think many people may have
been laughing at you. Since the code actually compresses all such files
the same lenght. Something zip could not do. In fact since by the rules
you laid down for this case all such files should compress to roughly the
same size. All random ones would. Why you made such an error is really
hard to understand maybe your under to much stress and should take it
easy for awhile.

> Any AC (or any other coder) with claimed smaller output
> needs to show a better average on at least 1000 randomly
> selected inputs (with random seed for the sample generator
> allowed to vary from test to test) from that set and come with
> source that backs up the claim. I would surely be curious
> to see an AC, or any other coder which can do it.

You have seen such a coder. It also worrys me that from this post
it seems appearant you haven't had to do much real testing. I have it
was my job. Frequently we got code from contractors that went on
and on how much random testing they did. And they went on an on
about how good it was. Well I was damn good at testing code I
could usually write a test quickly that would show just how bad the
contractor code was. It seems you fall for the same trap as those
with little experience. You need to test extremes. Try to take
a deep breath and relax for a second. First of all random is very hard.
But you pick one sequence from your so called random output. That's a
unique file since each symbol independent of each other and each
is equally likely is as likely as a file of the same length where
each symbol is the same. If you can't see that for this case
I can see why you have trouble testing code.

>
> Note though that even the infinite precision AC will produce
> an average which is between 1 and 2 bits above N*log(3),
> hence for N=10^6, the exact AC can't get closer than QI
> (it would take a string more than 6 thousand times longer
> than N=10^6 symbols before the QI redundancy will reach
> 1 bit above the N*log(3) limit, at which point the infinite
> precision AC will a theoretical chance, albeit astronimically
> small, to break even with QI).

Please give it a rest. For a file 6000 times longer than
N = 10^6 my arithmetic would hardly burp. It would just take
6000 times as long. But could QI even start to attack such
a case. How much memory would it take just to store the lattice
point values. Do you actually try to think these thing out.

>
> The exact enumerative coder would produce a smaller
> output than QI, but it would need to do O(N) multiplications
> with arithmetic precision of over 1.5 million bits (QI codes it
> using O(N) multiplications in 32 bit precision, which is of
> course what the key advance was).

Haven't your ever coded an arithemetic to get the index number.
You can easily do it with very low pecision airthmetic coders.
Getting the index in a useful form is much easier than what you
suggest. Since most times the index is not a power of 2 you write
out either an n or n+1 bit string. It would actually represent a
complete huffman tree. Many programers have mostly likely created
such code when ever a simple index of a combination is needed.
QI seems able to get this same index at least maybe for small
files faster than arithmetic but I don't see it getting a usable
index that is the n or n+1 bit string in a large set. And this
fact alone that arithmetic can get a useful index makes me really
wonder why you seem so taken with the index. Why would anyone use
QI to get the index when a simple arithmetic can do the same and
get it directly in a usable form. Take your own example where
you get the index of an 8 bit string. you get an index that is
a number bewteen 1 and 56 so what? Say I get 19 you can call it
exact but you still have to write the bits out arithmetic does this
without all the fuss.

nightlight

unread,

Feb 3, 2006, 1:23:06 PM2/3/06

(decodable under the test conditions that the calling parameters
of decode() do not include N, A)

>> DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes
>

> is way off base and doesn't seem to compress to

> 1584992 bits or 198124 bytes. is way off base and

> doesn't seem to compress to 1584992 bits

> ,,, So I added the three bytes needed to do a poor

> nonoptimal conversion to the length field so one would
> see how much space the data could take in a buffer.

1. Those figures had different costs of N and A added. The 1st one had
30 bits added to index size, the 2nd one 24 bits. For the tenth time,
the decodable QI output size for the test in which decoder receives the
N and A values is 1.6163e-4 bits above the N*log(3). That's the gap you
need to fit in (bear in mind that not even the infinite precision AC
can do it for N=10^6).

2. More importantly the 2nd figure doesn't refer to the random sample
of inputs but to your five cherry picked files. Even zip squashes those
five files to over 200 times smaller size than entropy coder, which it
should not be able to do on any unbiased sample from our test set. As
you ought to know, any coder can be "enhanced" to compress your 5 files
in 3 bits each and still code the rest with only a 3 bit excess from
the "non-enhanced" version.

When you have a code that will compress any randomly selected ubiased
sample of 1000 or so inputs from the test set A=3 N=10^6, that comes on
average over such sample to within less than, say 20 bits, of N*log(3)
or QI (the two differ only in 1.6e-4 bits) you can report it and I will
take look at the code.

Also of interest would be figures for the min and max size in bits on
the sample, to give idea of variation and how much it would cost to be
able to traverse compressed components without decompressing or how
much padding one would need to reserve to fit in a fixed size record.
With QI the size on each input from the set is exactly the same, down
to the exact bit fraction (given via log(Q[N]), where Q[N] is the
mantissa of the last quantized power, 3^N).

> Please give it a rest. For a file 6000 times longer than
> N = 10^6 my arithmetic would hardly burp. It would just take
> 6000 times as long. But could QI even start to attack such
> a case. How much memory would it take just to store the
> lattice point values.

QI doesn't need to do it in one block, since the block boundary costs
it only log(e)/*2^31 bits per boundary (coded using the same mixed
radix codes from Radix.c as the original test, just with a different
radix which is the mantissas of the quantized power 3^B for the block
size B). With a slightly different outer loop than the one shown in the
QIC test function for fixed radix codes, you can code the N=10^6 test
with a 1000 entry table (each entry is an SW integer i.e. it uses 32
bit mantissa and 32 bit exponent), and the resulting 10^3 block
boundaries will add only about 6.7e-7 bits on the total output, which
is well below the 1.6e-4 bit redundancy (from the quantization of
alphabet size powers). As explained in the earlier post to Thomas
Richter on asymptotic optimality, using 2 or more blocks of size B
simply multiplies the redundancy of an equivalent single block code by
a factor (1+1/B).

I will include this multi-block variant of radix codes in the next
source release.

> Getting the index in a useful form is much easier than what you
> suggest. Since most times the index is not a power of 2 you write
> out either an n or n+1 bit string. It would actually represent a
> complete huffman tree.

Of course, Huffman can do it. Its average size for the test will be
1,666,667 bits which is 81704 bits more than QI (about 5.2% more than
QI).

Further, the Huffman output size will fluctuate between 10^6 and 2*10^6
bits. If you need to store it in a fixed record you need to reserve
2*10^6 bits, which is the same as uncompressed size. Similarly if you
need to package it with other compressed data and need to traverse or
skip over the compressed components without expansion, you need to
store compressed length for particular instance of the input. The AC
will have both of these problems as Huffman, albeit not as severe. But
getting the mathematically guaranteed maximum size AC will produce is
not simple.

With QI the output length is not just smaller than either, but it is
exactly the same for _every_ input from the "N=10^6 A=3" set, which
makes it easy to skip over or traverse compressed components or store
in the fixed size records without waste.

Note also that for the alphabet size much larger than 3, the AC will do
increasingly worse since its quantization redundancy grows
proportionately to A*N while QI's redundancy grows proportionately to N
only. With a 32-bit alphabet a 32-bit AC will do about as poorly as if
it never bothered to compress. I gave you the example earlier (which
runs with QI.exe already out there):

http://groups.google.com/group/comp.compression/msg/66aee67eefbd6753

---- Excerpt

Take some alphabet size A such that 2^31 < A < 2^32, for our N=
10^6 and run your AC on it. For example, I just ran QI.exe using
the command line (you are welcome to verify this):

QI cr3133061822 n1e6 i10

which is a test for A=3,133,061,822, N=10^6 and i=10 iterations
with randomly generated inputs (the iter doesn't matter for QI
since, as pointed out in (A4), QI produces always exactly the
same size, including the max index value which in this case will
always have the top 32 bits smaller than 0x8865BAF8) and the QI
output size for the index is:

Q = 31544926 . 09167... bits

which is about 2.422e-4 bits above the lower bound N*log(A) for
this index. That amounts to QI excess of 2.422e-10 bits per
symbol. You can run your AC on this A & N and tell me what you
get. Measure the total size and the maximum variation in size ...
...
Recall also that just storing 32 bits/symbol for N=10^6 will use
455,073 bits _more_ than the QI's output 31544927 in whole bits.
------

Try that (if your AC can handle 32 bit alphabets).

> Take your own example where you get the index of an 8 bit string.
> you get an index that is a number bewteen 1 and 56 so what?
> Say I get 19 you can call it exact but you still have to write the
> bits out arithmetic does this without all the fuss.

If you have just one component in the whole package that you have to
ship each index, one by one, in the whole number of bits each, then you
can still cut down on the average size. For example, for the index
0..55 you don't need to spend 6 bits on average. Instead you can
package the leading bits of QI mantissa (32 bits) in tapered Huffman
code, which will here use 5 bit codes for the first 8 values and 6 bit
codes for the remaining 48 values, hence yielding an average of 5.857
bits per index. Such packaging will give an average which is always
less than 0.086 bits from the exact fractional size of the full index.

If you have more than one component to package (as is often the case
with the decriptive modeling), then all the bit fractions will package
tightly (at less than 2e-10 bits cost per boundary) and only the total
needs to be rounded to the next whole bit via tapered Huffman code of
the final mantissa (of the combined mixed radix number).

Thomas Richter

unread,

Feb 3, 2006, 3:53:43 PM2/3/06

nightlight wrote:
>>>The uniform buffer generates no index, only count of 1's is
>>>saved (which is 0 or BlockSize). Hence qiEnc/qiDec functions
>>>should never be called with such counts.
>>
>>Yes, I figured this. It is avoided in the given code - it
>>doesn't cause trouble because the encoder receives the same
>>information over the side-channel. Similar, whether you
>>transmit the full count of ones or a separate flag and
>>the number of LPS doesn't matter. It's equivalent information.
>
>
> It is equivalent only on the first block coded. It becomes
> increasingly less equivalent for the subsequent blocks, where
> the previous blocks serve as a 'sample' for the later blocks.

All true, but counts only in so far as it says that the side-information
is compressible. Sure it is, but even if I remove this information, I
would still be longer with QI than with MQ. Sure that is because MQ is
an adaptive coder, whereas the block length of QI is not adaptive.
However, making AC adaptive is considerably easier than making QI adaptive.

> Tight packaging of many distinct pieces of the description at
> different levels, down to a 1/2^g bit fraction, is an essential
> element of the descriptive modeling scheme.

In which relation is this to the above?

> This kind of 1/2^g
> level bit-fraction packaging is vital for AC as well, as the
> Rissanen's 1st AC paper [5], on generalized Kraft inequality
> emphasized. But with AC it is done all at a single flat level,
> the one symbol codeword. Anything else that needs this kind of
> bit fraction packaging in AC is first converted to a symbol and
> then funneled down to meld in the middle the regular symbol
> stream. This gives rise to the strange problems and their ad hoc
> solutions, all peculiar to the AC way of coding, such as zero
> frequency problem and escape probabilities estimations.
>
> With QI/EC bit fraction packaging is done at the symbol level,
> too (except with the four times smaller quantization bit leak)
> and then it is propagated to the higher levels explictly. This
> is the result of QI's much cleaner division of labor, where each
> component does its complete specialized job entirely separately
> and on the entire input, from start to end, when it tidies it
> all up. Only then the next component takes over and does all of
> its work for all of the data and then tidies its output up.

I fail to see the "cleaner" part. In order to make this working, you
need to have a message of symbols from one single alphabet of one
length that is to be given to the decoder separately. It then finds
a short representation of the symbol permutation at hand, though it
neglects that certain permutations might be more probable than others -
by just encoding a description of a permutation, this kind of
statistical information is lost. AC has a much cleaner modelling
by allowing the explicit entries of the transition matrix of a Markov
chain (of higher order) within its encoding, and allows to combine or
merge two or more alphabets in one stage. I believe that first looking
for separation points in a message to split it into permutations to
encode these is a much worse division of labor if you assume to encode
Markovian sources.

So long,
Thomas

nightlight

unread,

Feb 3, 2006, 6:03:51 PM2/3/06

>> It is equivalent only on the first block
>> coded. It becomes increasingly less
>> equivalent for the subsequent blocks,
>> where the previous blocks serve as a
>> 'sample' for the later blocks.
>
> All true, but counts only in so far as
> it says that the side-information
> is compressible.

Well, that was the point. The difference
is that with AC, almost everything is melded
into single one dimensional stream, one bit at
a time, thus there is very little "side
information" and any consideration of its
compressibility is often neglected.

With predictive scheme the output has a multicomponent
structure and a hierarchy. The advantages are
flexibility, thus resilience (due to precise
boundaries for the side-effects of a "surprise"
to the small parts of output) and a finer grained
modeling, as well as the easier access to the
individual components or to their properties (such
as compressed size) without having to decompress
entire data. With the single melded monolith it is
all a black box until you expand the whole thing.

> Sure it is, but even if I remove this
> information, I would still be longer
> with QI than with MQ.

With the 16K blocks used for QI on very non-stationary
data and then rounding bit fractions to the next
whole byte it's a miracle it didn't come out larger
than the uncompressed data.

> Sure that is because MQ is an adaptive coder,
> whereas the block length of QI is not adaptive.

QI block length is adaptive if you adapt it. You're
welcome to test your input and MQ against the simplest
imaginable such adaptation, with just starting with 1K
block as a default and then selecting via simple
comparison (via logbq()) whether to halve the next
block, from 0 (no halving) to max 3 times.

To get an idea what difference it would make you
can just measure the entropies of the pieces as
described in the previous post. You don't even need
to encode or write to a file, to evaluate this,
since you can call QI's logbq(n,k) function which
will give you the length of quantized binomial
(bit fractions returned should not be rounded,
since mixed radix codes can pack them to fractional
size). See the previous post on calculating the
cost of "side information" (which can be very
easily coded to such size).

Or just take a look at row "Vary" in [T3], which can
be any non-stationary data with drastic changes in
statistics, to see how much better even the simplest
block size variation does than AC style adaptation
on non-stationary data. You can also check KT81
paper on performance of variants with variable
and fixed blocks.

> However, making AC adaptive is considerably
> easier than making QI adaptive.

That's like back in old days, a USSR citizen
arriving to New York, a bit disoriented with all
the choices available, declaring that it is
so much easier to select what car to buy back
home (since there is just one kind, if he could
get any at all). The descriptive modeling is
similarly much richer and more expressive
language for models than the 'one shoe fits
all' one (or rather 1/2) dimensional adaptive
method.

As array "Vary" illustrates (and you can measure
similar effects on your data as suggested),
the greater expressive power and finer grained
modeling can return much more than the few
percent one can get on the adaptive AC friendly
data.

>> This is the result of QI's much cleaner division
>> of labor, where each component does its complete
>> specialized job entirely separately and on the
>> entire input, from start to end, when it tidies it
>> all up. Only then the next component takes over
>> and does all of its work for all of the data and
>> then tidies its output up.
>
> I fail to see the "cleaner" part. In order to make
> this working, you need to have a message of symbols
> from one single alphabet of one length that is to
> be given to the decoder separately. It then finds
> a short representation of the symbol permutation
> at hand, though it neglects that certain permutations
> might be more probable than others - by just encoding
> a description of a permutation, this kind of
> statistical information is lost.

That is the case only for simple contiguous segmenting
of the input sequence. The "messages" that modeler
passes to coder need not have such simple relation
to the input sequence segments (or even its alphabet),
as for example the BWT segments don't have and earlier
description of finite order Markov coding.

The objection above is of the same kind as if some
objected to Fourier transform coding, declaring that
it can't model the real life sound since that sound
is not a simple harmonic exp(ikx) wave. As with the
transform coding, the QI modeler+coder decompose
the input into a superposition of much simpler
combinatorial patterns (enumerative classes), where
the QI class tag serves as the Fourier transform
wave vector k and the QI index for that class tag
as the Fourier coefficient value for that k.

You can look both as a modeling via large virtual
dictionary (or codebook) with Fourier transform a
dictionary of wave patterns and with QI a dictionary
of combinatorial patterns of symbol sequences.

The basic premise of descriptive modeling is
that a decomposition exists for which the
number of relevant enumerative classes is
restricted enough (so that class tags don't
cost very much to encode), just as for Fourier
transform to work well for compression one
has to assume that there are few important
harmonics k.

Otherwise, with QI you are trying to compress
arbitrary random data or with Fourier transforms
some random noise -- neither case will result in
compression. Note that with probabilistic
modeling you are making the same kind of
assumption about the space of sequences,
e.g. when saying that 1 has probability p in
this context, except that the boundaries you
are drawing in the space of sequences are much
fuzzier, less precise (due to the coarse grained
parametrization taken from the infinite sequences
limits), than when saying 1 occurs exactly k
times in this enumerative class.

> AC has a much cleaner modelling by allowing the
> explicit entries of the transition matrix of
> a Markov chain (of higher order) within its
> encoding, and allows to combine or merge two or
> more alphabets in one stage.

The above uses "cleaner" to describe more homogeneous,
less differentiated operation. The "cleaner" _division
of labor_, which is what I was talking about, means
more specialized, more differentiated operation.
Your "cleaner" refers to the appearance of code
implementing a model, which will look "cleaner" since
all goes into one pot to be mangled the same uniform
way, whether suitable or not.

> I believe that first looking for separation
> points in a message to split it into permutations
> to encode these is a much worse division of labor
> if you assume to encode Markovian sources.

Well consider a streamlined division of labor in
a car factory, where each production unit does
its own specialized element for all cars, then
the other separate unit does a different element
also for all cars, etc. This is exactly how QI
modeler divides labor. Your division of labor
is to have one uniform unit doing all the work
on one car until it is complete, only then to
start on the next car. The design or planning
of the latter scheme may be simpler, but its
perofrmance will be much worse. The clean division
of labor means more separate specialized units
each working on one aspect for the entire input,
with interaction and dependencies between the
units minimized.

nightlight

unread,

Feb 3, 2006, 11:12:22 PM2/3/06

In case you do measure entropies for various kinds
of modeling, try with and without Gray code. In the
early days of QI, while experimenting with various encodings
of counts array, I have found that replacing plain counts
with their Gray codes does more harm than good.

On the other hand, another alphabet mapping, which
minimizes the total number of 1's in the symbols (i.e. the
symbols are sorted by frequency then mapped to an array
of symbol codes sorted by their counts of 1's) does improve
the speed (not the output size since all 1-1 symbol mappings,
including Gray codes, yield the same multi-alphabet entropy,
which can be obtained exactly when encoding the A-1
bit-plane fragments separately).

Randy McNabb

unread,

Feb 4, 2006, 1:41:00 AM2/4/06

"nightlight" <night...@omegapoint.com> wrote in message
news:1139026342.8...@f14g2000cwb.googlegroups.com...

I thought a modified move to front like this might do well after the BWT.
Would it be better to encode it this way on a binary basis or stick with a
multi-symbol alphabet?

Randy McNabb

nightlight

unread,

Feb 4, 2006, 2:52:46 AM2/4/06

>> symbols are sorted by frequency then mapped to an array
>> of symbol codes sorted by their counts of 1's) does improve
>> the speed (not the output size

> I thought a modified move to front like this might do well after the BWT.

This is only done for speed in multi-alphabet coding since it minimizes
the number of binomial table accesses (see [T1] 32-33).

For BWT, which is a key component for descriptive modeling
(see [T2], 27-35), QI uses explicit segmentation of the BW
transformed output column R (see fFig. 5 p. 34 in [T2]).

QI doesn't do any MTF since that obliterates context boundaries
which BW transform produces. Instead, after the transform, QI
starts with longest contexts, thus the smallest fragments in
the column R (see fig. 5), within some limit on context length set
as coding parameter. Then it merges adjacent fragments, say F1
and F2 into concatenated fragment F=F1+F2, provided the entropy
(which includes encoding of counts plus index length in fractional
bits) of F is smaller than the sum of entropies of F1 and F2, i.e. it
is a greedy bottom-up segment construction. For example if F1 and
F2 have "similar" densities of 1's then it is beneficial to merge them
(since only one count of 1's for F needs to be encoded). But if
F1 and F2 have "very different" densities of 1's, then it is not
beneficial (since the index length increases exponentially in
the counts of 1's). The precise meaning of "similar" and "very
different" densities is determined by the comparison of entropies
At present, though, this method and its variations are only
at a level of rough experiments and simulations, not as a usable
BWT modeler. But even at that level of exploration, all indicators
are very positive for BWT and for another dictionary based
modeler.

> Would it be better to encode it this way on a binary basis or
> stick with a multi-symbol alphabet?

Because of excessive table sizes for general multi-alphabet
enumeration, QI uses binary decomposition described in [T1],
pp. 30-38. That allows it to reuse binomial tables and it produces
exactly the same entropy as the original multi-alphabet sequence
had. The use of binary coding at each node of the alphabet
code tree (there are A-1 nodes) retains the usual QI speed
advantages, no coding on MPS and just a table add on LPS,
for all A-1 nodes. Note that this is not the same as bit-plane
coding that was used in the audio code at the top of this thread.
(The audio bit-plane coding is only a rough approximation of the
QI multi-alphabet scheme.)

Another advantage of this multi-alphabet scheme is that its
quantization redundancy grows as N*log(A)/2^g bits for N symbols in
alphabet of size A and QI arithmetic precision of g bits. In contrast,
the AC redundancy grows as 4*N*A/2^g, which becomes more significant as
N and A increase.

For high entropy multi-alphabet coding (such as mixed radix codes &
permutations coding) QI uses even more precise method for which
redundancy is independent of A, and it grows only as N/2^g. The
source code for this method is included in the [QIC] source kit. This
method also requires much smaller codeword tables (of N entries)
than the binary and general multi-alphabet coders (which needs
N^4/4 entries for max block size N; N is usually set to 1K for these
tables, which uses around 1MB of memory).

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1. R.V. Tomic "Fast, optimal entropy coder"

1stWorks TR04-0815, 52p, Aug 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T2. R.V. Tomic "Quantized indexing: Background information"

Jasen Betts

unread,

Feb 4, 2006, 2:00:58 AM2/4/06

On 2006-02-03, David A. Scott <daVvid_...@email.com> wrote:
> "nightlight" <night...@omegapoint.com> wrote in
> news:1138898612.6...@o13g2000cwo.googlegroups.com:
>
>
>>> all I would have liked to was being able to feel it's
>>> "always better than arithmetic" compression.
>>
>> For that, you don't even need the source or executable test. Just a
>> bit of math to work through [T3], especially p. 8, for QI and Volf's
>> or Tjalken's thesis for in depth AC formulation, plus Ryabko for
>> finite precision AC redundancy.
>
> You know maybe if it wasn't so easy to write an arithmetic compressor
> that actaully beats what nightlight claimed was the most fair test.
> One might think there is some gems in his long posts. But from the fact
> that it fails to do as claimed its no wonder nightligt is trying to
> steer you away from scource code or executables where one can run ones
> own tests on files and see the facts themselves. Nightlight does not
> really want you to think for yourself. He is more than willing to
> do that for you.
>
> So the "always better than arithmetic" compression is like
> a lot of his logic it's just plain wrong. He has quoted so many texts
> over and over written before his QI I am surprised he hasn't quote
> the Holy Books to support his postions I am waiting to see which
> Holy Book he first uses to support his claims anyone have a guess?
>

Nightlights coder claims to be able to compress any 998358 of these
trinary symbols into 197795 bytes (it claims 1582359.99 bits)

That's about as close to ideal as anyone's likely to get.

has anyone confirmed that it's capable of this?

Bye.
Jasen

davvid_...@email.com

unread,

Feb 4, 2006, 12:43:21 PM2/4/06

Jasen my newsreader died so going through google

How can anyone confirm anything nightlight says. The
code can't work with files and its hard to get a straight
anwser from him with out his quoting hundreds of lines
of text that usually has nothing to do with the question
asked.

I did a test for the 1,000,000 symbol case and based
on what little I could squeeze out of him his coder fails
to compress as well. The problem is the side infromation
he needs to use. He will wave his hands and who knows
what he will claim as the final anwser.

But I like the number you choose since
998358 * log3 = 1582359.99229497204300379190460185...
I have not tested my code it really was a fast mod to arb255
however I suspect it would compress some strings to
197795 and some to 197796 bytes. Since this is close
to straddling the byte boundary. If I was writting to a bit
string instead of bytes. I think I could get it to code to
1582360 bits. But please check the string of 5 case.

Notice 5 * log3 = 7.92481250360578090726869471973908
I would like to see since you seem to be using his test code.
what does he spit out for this case. The reason I ask is
this 3 + 9 + 27 + 81 + 243 = 363 this is sum of the number
of possible strings of 3 symbols types up to a length of 5.
Notice the 363 is bigger thatn 256 the number of values for a
sinlge byte. It would be sweet if his code put out the number
7.92 when a reality check shows it has to exceed 8 bits.
The simple facts are more than 8 bits needed for this case.
I hope his code is smart enough to get a more correct anwser
but I fear the worse since it seems simple facts often get in
his way so he glosses over them.

So please tell what it gets for the extremely simple 5 symbol
case.

nightlight

unread,

Feb 4, 2006, 12:54:30 PM2/4/06

> Nightlights coder claims to be able to
> compress any 998358 of these trinary
> symbols into 197795 bytes (it claims
> 1582359.99 bits) That's about as close
> to ideal as anyone's likely to get.

QI using precision greater than 32 bits would
get closer to N*log(3) than the version in the
source kit [QIC]. But for any given precision
of g bits, QI's index is the closest number to
the ideal value _that exists_ for the codewords
(addends) of max g bit precision, and which
still satisfy Kraft inequality (that is a
decodability condition, which is the eq. (20)
in [T3]).

Note that even unlimited precision AC, call it
ACX, cannot beat 32 bit QI on this index (or
similarly on any other alphabet A>3) since its
redundancy is 1 to 2 bits (avg 1.5; this is due
to termination cost of unlimited fraction for
cumulative probability fraction). Since QI's
redundancy (for g=32) on this (N,A) is 1.6e-4 bits
for entire input, the input size would need to be
about 6000 times longer before QI's quantization
redundancy reaches 1 bit, at which point ACX would
get a theoretical chance (still astronomically
small) to break even.

Note also that A=3 is the best case of A>2 for AC.
For alphabet size A>3, the QI's quantization
redundancy for these codes is still always below

RQ(g)=2*log(e)/2^g bits/symbol

(where g=precision, g=32 in [QIC] and log(e)=1.442695...).
In contrast, AC max redundancy is:

RA(g)=2*A*log(e)/2^g bits/symbol,

i.e. in addition for being 4 times larger at A=2, it
grows linearly with alphabet size (due to AC's
quantization error when its interval of max size 2^g
is being divided between the A symbols), while RQ stays
fixed with increase in alphabet size. In an earlier post
[M2] I suggested a test with larger A which anyone can
verify with released QI.exe, on which a g=32 bit AC would
work as bad as no compression at all, while QI still
stays as close to the ideal value as for A=3. Here is
the excerpt from [M2] on this test:

{ --- Excerpt from [M2] on large alphabet size A test ----

Take some alphabet size A such that 2^31 < A < 2^32, for our

N=10^6 and run your AC on it. For example, I just ran QI.exe

using the command line (you are welcome to verify this):

QI cr3133061822 n1e6 i10

which is a test for A=3,133,061,822 N=10^6 and i=10 iterations

with randomly generated inputs (the iter doesn't matter for QI
since, as pointed out in (A4), QI produces always exactly the
same size, including the max index value which in this case will
always have the top 32 bits smaller than 0x8865BAF8) and the QI
output size for the index is:

Q = 31,544,926 . 09167... bits

(with AC you get only whole bits since its exact upper bound on
the index is obliterated due to normalization of index to 1.0,
which in turn is done to comply with AC's coarse-grained
probabilistic parametrization for finite sequences, enumeration
them on the cumulative probability scale, as explained in (A2)).

Recall also that just storing 32 bits/symbol for N=10^6 will use

455073 bits _more_ than the QI's output 31544927 in whole bits.

--- End of excerpt from [M2] ----- }

The meaning of the fractional bit sizes and the operation of
the QI radix coder (which is simpler than general QI entropy
coder) was explained in an earlier post [M1], with several
Q&A in the follow up posts. More detailed explanation of these
types of codes (high entropy limit codes) along with a convenient
heuristic on how to derive extensions to other problems, is
given in the preprint [T1] pp. 46-51.

> has anyone confirmed that it's capable
> of this?

You can easily verify that in the source code you have. The
command line option used for A=3, N=998358 and i=10 iterations:

QI.exe cr3 i10 n998358

The "cr" command is done by function radix_iter() in Test.c
which calls new_radix() in Radix.c (to create quantized
powers of 3 in struct QRAD.pa[] array). The encoding of
any sequence in given A is done in the call to enc_radix()
in Radix.c called by radix_iter() after the input instance
is generated via gen_radix() call. The output size is determined
by the (quantized) value equal A^N, which here is sliding window
integer M=(w,s)=w*2^s which is stored in QRAD.pa[N]. This value
is in this case M = 0xFEAA3722 * 2^1582328. All index values I
(which are binary values of radix 3 number) computed by QI for
these inputs are limited as: 0 <= I < M. Hence the length in
bits of any index is below

log(M) = log(w)+s = log(0xFEAA3722) + 1582328 bits ... (1)

This is computed in radix_iter() via call to radix_szf() in
Radix.c which gives exact value of log(w)=31.99245634... bits
in (1), which together with s=1582328 bits in (1) gives output
size (QI entropy qe in radix_iter()) as:

qe=1582359.99245634... bits ... (2)

which is only 1.6137e-4 bits above the ideal value for this N,A
(result (2) and this difference 1.6137e-4 is shown in QI.exe
output line as E+1.6137e-4). To verify that this is the exact
size needed, you can overwrite the compressed value beyond
these bits right before the call to decoder dec_radix().
The compressed data is stored in the struct QRENC.out[] array
(which is in little endian form, i.e. the last bit of the
array is the most significant bit of the last element).

You can also note that decoder in dec_radix() uses this same
upper bound log(M) on its input size, as shown in Radix.c:

1. SWI pw,*pa; // SWI variables
......
2. pw=pa[n]; // get leading quantized power A^N
3. tcb=SWLENC(pw); // tcb=total bits in this encoded data
4. getswx(s,sb,&tcb,&dtx);// get top 64 bits from the encoded buf

The line #2 gets SWI M (the same one used in (1), you can
printf() it to verify), then line #3 calculates the max length
tcb in whole number of bits (which adds s + width(w) in macro
SWLENC). The line #4 extracts 64 leading bits from the index by
calling function getswx() in Swi.c, by giving it base pointer
s[], starting bit offset for index "sb", total bits in the index
"tcb" (which is the value (1) in whole number of bits) and the
resulting 64 bit SWI is stored into "dtx".

Hence to test that only log(M) bits of the output array are used
for decoding, you can alter any bits in dword array s[] after
"tcb" bits from line #3. Verifying that only the fractional part
from (1) and (2) are needed, simple padding wouldn't do. A simple
test for bit fraction would be to run a large number of iterations
(command line option "i<max_iter>" and each time check that the
top 32 of the index are always below the max value w=0xFEAA3722
from (1), which is a fixed upper bound for all instance of the
inputs. As long as the index is smaller than w, you can use
mixed radix codes to package it into log(w) bits (the same way
we package all ternary digits in log(3) bits). If you have
just a single item to output in whole number of bits (hence
you can't use mixed radix code to package bit fraction of w),
you can encode the top 32 bits in tapered Huffman code i.e.
you encode the top 32 bits of the index (I will denote this
int as "I32") in 31 or 32 bits, where n31 = 2^32-w =
2^32-0xFEAA3722 = 0x155C8DE = 22,399,1298 lowest values for
I32 are coded in 31 bits, and remaining values in 32 bits.
This will yield an average length of I32: 1582359.9947574..
bits per whole index, which is 0.002301... bits longer than
the optimum QI length (2) (reachable via the mixed radix
codes when you have more than one item to ship). Generally,
the average for the tapered Huffman encoding of I32 will
always be less than:

1+log(log(e))-log(e) = 0.08607... bits ... (3)

above the ideal value (that mixed radix codes can produce).
This is usually good enough for most practical purposes when
you have just few such items. But this approximation will
have excess of up to 1 bit on every 11.6 items coded this
way, which can add up if lots of such items are coded.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

QIC. QI C source code (+tech reports, news...)
http://www.1stworks.com/ref/qi.htm

T1. R.V. Tomic "Fast, optimal entropy coder"
1stWorks TR04-0815, 52p, Aug 2004
http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"

arXiv cs.IT/0511057, 10p, Nov 2005
http://arxiv.org/abs/cs.IT/0511057

T4. One page summary and guide to [T3]
http://www.1stworks.com/ref/TR/DCC2006.pdf

M1. --- QI Bit Fractions and Mixed Radix Codes
http://groups.google.com/group/comp.compression/msg/1e015f38d228969e

M2. --- Test for larger alphabet size
http://groups.google.com/group/comp.compression/msg/66aee67eefbd6753

M3. -- Summary of QI algorithmic advances (A1)-(A4)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

nightlight

unread,

Feb 4, 2006, 2:10:42 PM2/4/06

> I did a test for the 1,000,000 symbol case and based
> on what little I could squeeze out of him his coder fails
> to compress as well.

There is no mystery left there at all. Your hardwired _64_ bit
precision AC produces 198121 bytes which is 1584968 bits.
All of your bits are used (since altering any bit in the last byte
breaks the decode to correct input). Also, you're not passing value
N=10^6 (your decoder gets it from OS via file size) or A=3 (it is
hardwired in you encoder & decoder).

Under these _same conditions_ for the N and A availability to
decoder, the _32_ bit QI produces output of 1584962.50... bits
(the tapered Huffman code for the top 32 bits of index reaches
on average to within 0.0016890... bits from the exact fraction).
That is only 1.6e-4 bits above the ideal N*log(3) value.

Now, if you want some other coding conditions, e.g. to include
encoding of N and A into the coded output (as one might need
e.g. if the compressed data is a part of larger data, hence it has
no OS to keep track for it of the compressed file size to give to
decoder), then you need to add size of such code for N and A
to the ideal value N*log(3), to the QI's size and to any other
coder's size tested. That, of course still leaves QI at the same
distance 1.6e-4 bits from the ideal size i.e. it doesn't help you
or anyone else at all. That's the gap that a competing coder has
to enter to get closer to the ideal size (for any coding conditions,
provided they are set the same for all) than QI.

But as explained many times earlier, for any given max codeword
precision of g bits, and requiring that the codewords satisfy Kraft
inequality (decodability condition, cf. eq. (20) in [T3]), the QI's
index
is the closest number to the ideal value that satisfies these two
constraints. There is mathematically not a single other number in
between the ideal value and the QI index range (at any given given
coder precision g) just as there is not a single integer in between
3 and 4.

Hence, even rounding QI to the next whole bit, yields 5 bits
shorter output than your hardwired multipass coder. Note that your
coder is what Cleary84 (see [34] in QI RefLib) calls "Enumerative
Coder" (implemented as AC with "faithful probabilities" which your
coder uses for our test set), i.e. it is the most precise form of AC
coding, in addition to using 64 bit mantissa (while QI in [QIC] used
only g=32 bits), which for our "small" N=10^6 and A=3 is for all
practical purposes what the ininite precsion AC would get.

The QI output size is also mathematically guaranteed to always fit
the size shown (which is determined soleley from N and A via QI
powers table). QI will also code much faster than your 64 bit coder
(even when changed to memory to memory coding), or any other
64 or 32 bit AC (take a look at your multipass computation and
compare to QI's enc_radix() function in Radix.c) . As explained
in yesterday's post to you, QI also doesn't need to use table of N
entries for this type of coding e.g. a very simple change (a 2 level
hierarchy as described before) could use 2 tables each with
sqrt(N)=1000 entries (or any q tables each of of size q_th root (N),
for a q-level hierarchy) with only 0.1% increas of the total excess
of 1.6e-4 bits for the single table case shown in [QIC] source.

In short, smaller output and much faster codinng than AC, as
stated before.

Of course, this A=3 is the best case for AC. You can try
the test suggested earlier for a much larger A, as described
in the post above (to Jasen). Your quantization error will grow
proportionately to A, while QI's remains the same for any A
(the QIC source limits A to A< 2^32, although one could easily
extend it to 2^64 using the SWX type in Qi.h for quantized powers).

> I hope his code is smart enough to get a more correct
> anwser but I fear the worse since it seems simple facts
> often get in his way so he glosses over them.

I described how to verify the QI output size for this in the reply
to Jasen. You can check it out if you don't believe (change
any bits beyond the claimed size to verify the whole bit part,
or check the bit fraction as described in that post above).

> So please tell what it gets for the extremely simple 5
> symbol case.

You can test that with the released copy of QI.exe using
the command line:

QI.exe cr3 i10 n5

which is for g=32 precision the exact enumerative coding (the only
error shown for above command is 1.001e-12 bits which is within
the MSVC floating point error on double for their logs base 2). The
mantissa value for 3^5 is 0xF3=243 (while exponent is 0) i.e. it is
identical to the exact binary code for radix 3 numbers, i.e. 7.9248...
bits per index. The tapered Huffman code (which one would use
if a single item is being shipped as a whole number of bits) would
code 13 lowest values 0..12 of the index mantissa in 7 bits, and
230 values in 8 bits, which would yield an average of 7.94650... bits
per index. That is 0.02169... bits above the ideal size of 5*log(3)
which the QI mixed radix would achive (available when you ship
more than one fractional bit item of any kinds, such as N, A or
any other part of the total output for a more complex coding task).

David A. Scott

unread,

Feb 4, 2006, 2:47:51 PM2/4/06

"nightlight" <night...@omegapoint.com> wrote in
news:1139080242.2...@f14g2000cwb.googlegroups.com:

> > I did a test for the 1,000,000 symbol case and based
>> on what little I could squeeze out of him his coder fails
>> to compress as well.
>
> There is no mystery left there at all. Your hardwired _64_ bit
> precision AC produces

Actaully I told you before that I like 64 if you don't to bad.
You actaully get the same size for this case when you change the
mask to use only 32 bits. It seems like you never learn.

Wake up the test was for a file compressor. Something which
it appears is not possible with QI. Yes you can modify Moffat
to meet your needs but if you can't compress files with what you
claimed is the only fair way stop the hype either you can write
a file compressor using QI to do this or the task is to hard
or your to busy. Just ignore those laughing as you continue
to change your posts. The facts are by you own numbers you could
not compress as small. If now you suddenly find something different
why should anyone belive you?

The only mystery really left is why this drop-in replacement for
an entropy coder can't actually me made to work on files. In the
case you claimed was once the most fair test WHY IS THAT??

Do you yet admit that you are wrong about arithmetic compression.
It can compress without gaps. Which is quite different than what you
have been saying. Do you admit that or is the thought to hard to
grasp.

That fact a lone makes me doubt you other claims. You may think
we are like your managers who may belive what you say since its
obvious you cam write and managers like long papers so do shools.
But here peoplw are interested in real results which is I guess
something you never had to do for the people you work for.

So once again do you have time to actaully do this simple
test with files so people here can test the files of thier choice
or is the task to hard to time consuming or what. I think I could
have converted your code in half the time it took you to write your
massive posts. But if you can't get it to work as good as a
simple aritmetic compressor and not willing to face the chance of
being wrong. I guess you have no choice but to quote long texts
in hopes people will forget about its short comming in real tests
where people can use there own files.

And no I don't read all the way through your long posts they
seldom seem to lead to anything other than showing your cut and
paste skills from other texts one only has so much time.

Phil Carmody

unread,

Feb 4, 2006, 8:00:07 PM2/4/06

"nightlight" <night...@omegapoint.com> writes:
> There is no mystery left there at all. Your hardwired _64_ bit
> precision AC produces 198121 bytes which is 1584968 bits.

...

> Under these _same conditions_ for the N and A availability to
> decoder, the _32_ bit QI produces output of 1584962.50... bits

(1584968-1584962)/1584962 = 0.00000378557971736862

I hope those outside Finland can appreciate what we mean by the term
"comma fucker" now.

Phil
--
What is it: is man only a blunder of God, or God only a blunder of man?
-- Friedrich Nietzsche (1844-1900), The Twilight of the Gods

nightlight

unread,

Feb 4, 2006, 9:48:45 PM2/4/06

> (1584968-1584962)/1584962 = 0.00000378557971736862
>
> I hope those outside Finland can appreciate what we mean by the term
> "comma fucker" now.

Although I didn't hear the term before you mentioned it recently, the
phenomenon it reflects is much older. I would imagine, say, when the
first microscope was shown at some court in Netherlands, the similar
conventional wisdom of their era, 16th century, was expressed by the
court physicians and astrologers -- who needs to see all that tiny tiny
stuff, what could it possibly matter for. I would say that the
conventional wisdom expressed in your phrase is as shortsighted today
as it was throughout the ages and as it always will be.

As usually with the folk wisdom, for every proverb or a bit of wisdom,
there is another one suggesting exactly the opposite. The opposing bit
of folk wisdom to your c.f. bit, would be a much more ancient one:
Margaritas ante porcos.

These tiny differences on 1584962 bit output yield quite a bit more
noticable effects on inputs in the top rows of the compression ratios
table in [T4], where the output is small, be it because of the small
inputs or because of the low entropy. It is even more dramatically
noticable in the last row labeled "Vary", where the "unpredictable"
(for the order 0 coders being tested) changes in the statistics occur
throughout, the smallest ones on the 32 bit spans, then larger ones on
power of two array indices, and finally the largest one at the array
index 0. Each unpredictable and drastic change effectively resets the
"N" (or useful memory) of the AC "predictor" back to 0, or worse, while
it affects very little the QI's "descriptive" modeling (it affects with
QI the "side information" coding only which is of the log(N) size and
which the descriptive modeler keeps cleanly separated form the O(N)
bulk of the output), resulting in the actual outputs of AC being over
twice as large as those of QI.

Any input with similar drastic unpredictable changes in statistics
yields similar results (the array "Vary" was picked only because I
could describe it unambiguously in less than half a line of space I had
left at the end of the preprint). Several possibilities for exploring
these kinds of effects are sketched in an earlier post, section (A3):

http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

-- References:

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"

arXiv cs.IT/0511057, 10p, Nov 2005
http://arxiv.org/abs/cs.IT/0511057

T4. One page summary and guide to [T3]
http://www.1stworks.com/ref/TR/DCC2006.pdf

Jasen Betts

unread,

Feb 4, 2006, 8:04:00 PM2/4/06

yeah, I did up a spreadsheet and searched for a big number that came to just
under a whole number of bytes, and confirmed that his buffer-based test code
also claimed a result that was clearly within that nuumber of bits.

> I have not tested my code it really was a fast mod to arb255
> however I suspect it would compress some strings to
> 197795 and some to 197796 bytes. Since this is close
> to straddling the byte boundary. If I was writting to a bit
> string instead of bytes. I think I could get it to code to
> 1582360 bits. But please check the string of 5 case.
>
> Notice 5 * log3 = 7.92481250360578090726869471973908
> I would like to see since you seem to be using his test code.
> what does he spit out for this case. The reason I ask is
> this 3 + 9 + 27 + 81 + 243 = 363 this is sum of the number
> of possible strings of 3 symbols types up to a length of 5.

No, that's the number of strings of 5 symbols or fewer ( well, almost - you
missed the case of a zero-lenghth string), his code seems to be built
for the case where the length of the original is known beforehand rather
than being encoded in the data.

There are circumstances where this is the case, one that he mentioned was the
hash-bitmaps used to transmit queries in p2p networks, another could be video
compresion.

> Notice the 363 is bigger thatn 256 the number of values for a
> sinlge byte. It would be sweet if his code put out the number
> 7.92 when a reality check shows it has to exceed 8 bits.

No. the range 0..242 is sufficient to encode the content of strings of 5
symbols if length is not also being encoded.

Bye.
Jasen

davvid_...@email.com

unread,

Feb 5, 2006, 12:08:28 AM2/5/06

You are correct. I see why now his numbers are so small.
I forget he has to have all these exra fields around for the
side information. But it shows that all his numbers have to
be taken with a grain of salt. Which confirms my first gut
feelings that until he can make a version that uses files it
will not be clear what his code can actually do. Or even what
his numbers really mean. I still don't have much faith or trust
in what he says since he is so often wrong about basic
airthmetic compression facts. But its understandable he
wants QI to look good so maybe his mind automatically
distorts things in favor of QI far more than what is really
possible.

In fact the difference my be so small that compressing to
files may not show a difference in compressed length. He
could for test purposes code to an ASCII strings of ZEROES
and ONES. Which then would go to an ASCII file. This may
be easier for him to code since bytes seem to confuse him
and it my be easier to code than files. Of cource if we do the
same test I would have to change my code to match. But
even this simple change I fear is something he would not
do since he does not want the user to be able to test files
of ones choice. To bad it would be fun.

Back to the 5 symbol case. My file compressor to bytes
would use 1 byte for some strings and go to the second byte
for other 5 symbol strings. That is because the other values
of the first byte have the smaller strings. However if the model
was modifed so that a 5 symbol case was part of the model it
would code it in 1 byte. Which makes sense
Thanks For the INFO

nightlight

unread,

Feb 5, 2006, 12:51:51 AM2/5/06

> You are correct. I see why now his numbers are so small.
> I forget he has to have all these exra fields around for the
> side information.

I always compare side information according to what the other
coder does with it. The Moffat98 includes lengths of data in its
output, hence QI's figures include that cost as well when comparing
to Moffat98 AC, as my original post on N=10^6, A=3 did, where I
compared QI's output to their output. Since they also use adaptive
mode, which is not necessary and hurts their result here, I even
included that cost in the comparison (the 39 bits estimate for
frequency table). You can see all that in the original post which
reported those results:

http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

But if your coder doesn't include N and A into the compressed
output (but relies instead on OS to remember N via file size
and on hardwired context tables & Huffman coder to remember
alphabet size A), then I don't add those figures to the QI's output
size either. It is you who kept repeatedly quoting my figures
stated explicitly for different coding conditions than yours (where
N and A are not given to the decoder as parameters but must
be included into the compressed data) so you can claim
to do better.

And now, after blowing the smoke for several threads, conveniently
mixing up the coding results from different coding conventions
throughout (despite being explicitly reminded numerous times of
what each figure represents), you declare your sincere concern
that it was me who is using such infantile gimmicks. Whoah Nelly.

Randy McNabb

unread,

Feb 5, 2006, 1:35:32 AM2/5/06

"nightlight" <night...@omegapoint.com> wrote in message

news:1139118711....@g43g2000cwa.googlegroups.com...

> > You are correct. I see why now his numbers are so small.
> > I forget he has to have all these exra fields around for the
> > side information.
>
> I always compare side information according to what the other
> coder does with it. The Moffat98 includes lengths of data in its
> output, hence QI's figures include that cost as well when comparing
> to Moffat98 AC, as my original post on N=10^6, A=3 did, where I
> compared QI's output to their output. Since they also use adaptive
> mode, which is not necessary and hurts their result here, I even
> included that cost in the comparison (the 39 bits estimate for
> frequency table). You can see all that in the original post which
> reported those results:
>

I have been steping through the code to find out how it works. I have been
using the command line
qi cc n4096 i2
All the numbers I seem to be getting are pretty close to what they should
be (log2 of the probabilities) and very much in line with my AC coder. The
only place I see any differance is that the QI numbers are calculated from
the probabilities which is the size they should be able to be encoded to.
During the encoding process QI calculates the index for the encoding but
does not do anything with it. AC pays for the ending of the encoded but QI
is not ending. The index can only be encoded close to
log2(number_of_possibilities) on average. When the encodeing is through, be
it after one block or many the encoded sequence must stop. You either have
to write a bit or not.

Randy McNabb

unread,

Feb 5, 2006, 1:53:55 AM2/5/06

"Randy McNabb" <the_fr...@hotmail.com> wrote in message
news:11ub75e...@corp.supernews.com...

My apologies inline with my Ac coder in size alone.

davvid_...@email.com

unread,

Feb 5, 2006, 2:45:14 AM2/5/06

nightlight wrote:
> > You are correct. I see why now his numbers are so small.
> > I forget he has to have all these exra fields around for the
> > side information.
>
> I always compare side information according to what the other
> coder does with it. The Moffat98 includes lengths of data in its
> output, hence QI's figures include that cost as well when comparing
> to Moffat98 AC, as my original post on N=10^6, A=3 did, where I

Actually you don't do the comparsion in a fair why which is
way one should do the test with files. Something you can't
seem to do. My guess is you can't compress as small even
for this simple task.

> compared QI's output to their output. Since they also use adaptive
> mode, which is not necessary and hurts their result here, I even
> included that cost in the comparison (the 39 bits estimate for
> frequency table). You can see all that in the original post which
> reported those results:
>
> http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a
>
> But if your coder doesn't include N and A into the compressed
> output (but relies instead on OS to remember N via file size
> and on hardwired context tables & Huffman coder to remember
> alphabet size A), then I don't add those figures to the QI's output
> size either. It is you who kept repeatedly quoting my figures
> stated explicitly for different coding conditions than yours (where
> N and A are not given to the decoder as parameters but must
> be included into the compressed data) so you can claim
> to do better.
>

First of all I showed how a length field could be added
so it could be used in a buffer it seems you forgot this but
it was added in even in a nonoptimal way and it beat your
code. I think its best of any one cares to read the previous
posts. And I am sure you know this even though you claim
that N was not accounted for. We could play this game
forever. That why people in this field like code they can use
themselves with files of there choice. What is it about
allowing people to do real test with files that seems to
scare you. What are you really afraid of.? Do you think
more people will laugh at you if they see it fail or do you
only feel safe saying what it can do with out actually doing
it or allowing others the choice of using there own test files.

Yes I kept quoting your figures since mine where better
you didn't seem to understand why I compressed the pure
case of all A's longer its still not clear you have a concept
of real arithmetic compression. Now all the sudden you
come up with different numbers. Friend thats why people
do it with files so others can check. At this point do you
really think anyone cares what you currently claim. Can
you get it to work wifh files. I guess not. Will you code
it to files so others can fix it I guess not. WHY?

Is QI a drop in coder for arithmetic when used as an
entropy coder yes or no? Isn't QI more flexable
yes or no?

Either you can't do it or you have and now that you failed
after all this was what you claimed was a fair test. I guess
now that you can't do or wont do it. You have to come
up with some other reason. You don't have a concept
what the conditions are. Your mind will not currently
allow the results to show your wrong. Again that is why it
has to be to files so all can check. Is that so hard to
understand?

> And now, after blowing the smoke for several threads, conveniently
> mixing up the coding results from different coding conventions
> throughout (despite being explicitly reminded numerous times of
> what each figure represents), you declare your sincere concern
> that it was me who is using such infantile gimmicks. Whoah Nelly.

Whoah Nelly is right. Who is mixing coding results. Good question
how can it be resloved. You could post hundreds of line of text or
you could write a program that actaully allows people to test files
and see what happens. Let me guess you will whine and write theory
as to why you are correct but you will not write code. Please prove me
wrong with real code. That is if QI can actually do what you claim
it can do.

It seems to me you aren't really interested in comparing the
actual results of a compression. I suspect it would not be hard
to write a pure arithemtic that only calculates the index. Which
it seems is the only thing QI can do. I suspect its actually easier
to code the index using arithmetic in the first place. I suspect why
this is not done very often for compression is due to the extra side
information you have to carry with the index to do any real
compression.
Which may be the main reason its so hard to use the index for
a drop-in to replacement for what is best done by a pure arithmetic
coder
in the first place.

Maybe if you ever get working code for this one simple example
we can compare results in just calculating the index only. A proper
arithmetic should be able to calculate the index as a n and n+1
bit string. Of course I am sure you can quote all kinds of theory
to show this is not possible. So that even for this you would not
risk a real code test. Just quoting text and saying things are so is
more than enough for you. But here at this forum we get people
all the time quoting stuff with out real code.

David A. Scott

unread,

Feb 4, 2006, 11:37:49 PM2/4/06

"nightlight" <night...@omegapoint.com> wrote in
news:1139080242.2...@f14g2000cwb.googlegroups.com:

> > I did a test for the 1,000,000 symbol case and based
>> on what little I could squeeze out of him his coder fails
>> to compress as well.
>

My newsreader finally picked it up. So thought I would read his
post. Naturally its mostly hype.

> There is no mystery left there at all. Your hardwired _64_ bit
> precision AC produces 198121 bytes which is 1584968 bits.
> All of your bits are used (since altering any bit in the last byte
> breaks the decode to correct input). Also, you're not passing value
> N=10^6 (your decoder gets it from OS via file size) or A=3 (it is
> hardwired in you encoder & decoder).

Bull. I showed how the length could be added and still it was shorter
than what you claimed was your fixed length. The problem is your trying
to change the game after you failed. I don't blame people for laughing
that you can't get it to work with files. I guess its harder to make
up numbers when people get to use files. You lost your wrong live with it!!

At least your no longer throwing a tamper tantrum about the fact
that a file of all A's still compresses to the same. I wonder what your
code does since appearantly your stuff hand tuned to skip those cases.
Its no wonder you can't get it to work with files. Can't let the masses
actually test it can you?

>
> Under these _same conditions_ for the N and A availability to
> decoder, the _32_ bit QI produces output of 1584962.50... bits
> (the tapered Huffman code for the top 32 bits of index reaches
> on average to within 0.0016890... bits from the exact fraction).
> That is only 1.6e-4 bits above the ideal N*log(3) value.
>
> Now, if you want some other coding conditions, e.g. to include

> encoding of N and A into the coded output ...

Again I showed how that could be done in a buffer. But the test
was based on what you clearly thought your code would be the best at
Stop trying to change it. Can you write to a file or not so that
others can test it. Be a man (unless your a woman) and either
stop whinning or do it. Or admit you can't. Its really that simple.
No one here belives your numbers there has been to much water
under the bridge for that.

>
> But as explained many times earlier, for any given max codeword
> precision of g bits, and requiring that the codewords satisfy Kraft
> inequality (decodability condition, cf. eq. (20) in [T3]), the QI's

> index .....

Who cares what you EXPLAIN since you don't seem to have a basic
understanding of even simple arihtmetic compression who cares. At
least I see you gave up the garbage about airthmetic compression
not making full use of the output space since it clearly can. Which
is a lot different than your first posts. Maybe you are learning
something yet. There my still be hope.

> The QI output size is also mathematically guaranteed to always fit
> the size shown (which is determined soleley from N and A via QI
> powers table). QI will also code much faster than your 64 bit coder
> (even when changed to memory to memory coding), or any other

> 64 or 32 bit AC (take a look at your multipass ... BLAH BLAH

You wrong about no gaps in an arithmetic your wrong about many
things. Can you code this to files since it the only fair test.
And you cliam its a drop-in replacement for arithmetic. Well if
it is PROVE IT. Since your so way off base on what arithmetic does
it would be foolish to belive any other of your off tangent posts.

> In short, smaller output and much faster codinng than AC, as
> stated before.
>

Funny thats not what Thomas got. Your going to have change
your mantra unless you can someday actually get code that does
something.

> Of course, this A=3 is the best case for AC. You can try
> the test suggested earlier for a much larger A, as described

> in the post above (to Jasen). ....

Funny you like A=3 in much earlier posts I wonder why now its
not so hot. Yes I could change my code for higher values but since
you can't understand what happens at A=3 what would be the point.
You can't even get QI to work on this simple example using files
that anyone can test. Why is that?

>
>> So please tell what it gets for the extremely simple 5
>> symbol case.
>
> You can test that with the released copy of QI.exe using
> the command line:
>
> QI.exe cr3 i10 n5
>
> which is for g=32 precision the exact enumerative coding (the only
> error shown for above command is 1.001e-12 bits which is within
> the MSVC floating point error on double for their logs base 2). The
> mantissa value for 3^5 is 0xF3=243 (while exponent is 0) i.e. it is
> identical to the exact binary code for radix 3 numbers, i.e. 7.9248...

I just love this why am I not surprised. I missed this gem
in the first reading.

What more can I say you could not beat the simple case you
give before what your exact fixed length would be. You have yet
to code even this exteremly simple case to files. It should be
easy after all you rant over and over again how QI is always
better than the best arithmetic. But so far Thomas has showed
its much slower and compresses worse than arithmetic. I modifed
a toy arithmetic compresser to do the case you actually first
brought up. I did make it work to files since that is what
people need to have a true evaluation not your highly controlled
tuned so called test code. Can you code this example to files or
is QI such a poor infexable replacement for an entropy coder that
it would be beyound your capability. Look people here don't mind
mistakes if people are honest about it. I make mistakes I think
you do to. Admit it QI might be hot but you can't make it work
better with files in this case.

In fact if you code it and it works on files and is worse than
arithmetic which I assume you found out since you must have tried
it. After all if you insist you don't have the time people here
would laugh at you. Let the people here help. Show your test code
people will make it better. They might even make it match my code
if they only get the chance. You might even learn something if you
only try and let others help you. In fact many here including me
would be happy to see it beat my code. You do need help don't you?

nightlight

unread,

Feb 5, 2006, 4:29:32 AM2/5/06

> All the numbers I seem to be getting are pretty close to what they
> should be (log2 of the probabilities) and very much in line with
> my AC coder.

The functions qiEnc()/qiDec() are building blocks which work same way
regardless of the other conditions such as how the counts are being
transmitted. The conventions for counts, be it data size n or counts of
1's or probabilities, can be passed in variety of ways in different
setups which are unrelated to QI or AC.

The index produced by qiEnc() is a quantized binomial coefficient
C(n,k). Its length is always smaller than entropy H(p,n) =
n*(p*log(1/p)+q*log(1/q)) by about 1/2 log(2Pi n*pq) bits (see more
precise expressions in [T2] p. 22).

Note that entropy H(p,n) does not include transmission of the data size
n or of probability p, hence the only cost one needs to add to QI's
index log(C(n,k)) is the cost of coding k, which is the count of 1's.
Since that count is limited to range 0..n and n is assumed given, the
simple flat code for k will use log(n+1) bits, which leads to total QI
output size of approx. H(p,n) + 1/2 log(n) bits, which is the minimum
value one can get if decoder does not know probability p (hence it
codes k in a flat log(n+1) bits).

But if you take into account that H(p,n) assumes that decoder has value
of n _and_ p, then k can be coded in approx. L(k) = 1/2 log(n) bits by
using binomial distribution for the possible k values when probability
is p, which is P(k,n,p) = p^k q^(n-k) C(n,k)., hence L(k) is
log(1/P(k,n,p)) ~= 1/2 log(n). With that, the combined index plus
optimal code for k adds up to exactly the H(p,n) value (to within O(1)
terms). The test coder which was used for comparisons with AC in [T3]
was using precomputed table of Huffman codes for distribution P(k,n,p)
for counts k for all values of "average k"=n*p (these tables are only
few KB in size due to great deal of regularity).

To compare with AC which codes n (or an equivalent terminator) into its
compressed output (as is usually done), you can either remove that code
from AC or alternatively add code for n to the QI's output.

The latter needs to be done carefully since using the universal codes,
such as Elias or other self-delimiting codes, allows n to be arbitrary
(thus it costs more than n with some upper limit). The regular AC's
don't normally allow n to be more than 2^32 (sometimes even less). In
that case, the proper code to add to QI output size would be 5 + "<bits
in n> - leading 1" since the 5 bit prefix can be used to encode the
length of n in bits and then that many bits of n can follow (with
leading bit of n, which is assumed to be 1, omitted). In general, when
AC has some limit M on input size, one needs log(log(M)) bits for
prefix specifying the length of n, followed by <bits in n> without
leading 1.

Qiutl.c has function sdlen(double x) which returns the general
self-delimiting length of x (which is considered arbitrarily large). It
also has a function sdlenx(double x, double MaxValue), which returns
the self-delimiting length of x which is known to be smaller than
MaxValue and which uses the method described above.

Qi.exe also has a command line option "cl" which gives self-delimiting
lengths sdlen() of common values. If you need specific value x not
listed there you can use option "cl<x>" to see sdlen(x).

So, depending on how your AC codes input size n and how it limits its
max value, for comparison of sizes you can add proper value for cost of
n to the QI's output size (which includes index plus code for k).
Alternatively (which is more accurate) you can pass n to AC as a
parameter to encoder and decoder, and remove its terminator or n
encoding logic. In that case you don't add anything to QI's size (index
+ k_code).

Basically the difference between QI & AC is not in how they can encode
n (or k or p) but in intrinsic index (called cumulative probability in
AC) coding. The rest can always be coded the same way or at the same
cost, hence it doesn't add any info about the coder differences in
coding efficiency.

Further common difference is that AC works adaptively while QI works
'descriptively'. That will usually add to AC output about 1/2 log(n)
bits, although AC can avoid that extra cost of 'learning' when coding
in enumerative mode using "faithful probabilities" (see [34]). That
excess cost of 'learning' can be quite drastic if the sequence is
"unpredictable" (for the predictor used), as illustrated in the row
"Vary" in [T4].

For enumerative mode AC, and when coding k & n at the same cost, the
excess you will get for AC is only 2-4 bits if the data length is small
compared to 1/2^g (where g is AC or QI precision, usually 32 bits),
depending on AC implementation and input instance. ( For infinite
precision AC and QI/EC, this difference is always only 1-2 bits in
favor of QI/EC, depending on input instance.)

It should be noted, though, that AC coding in enumerative mode, which
is its most precise coding mode, is detached from its usual AC modeling
engine. And without its modeling engine (the existent AC modeling
engines for different data give it the transient practical advantage
over QI for the time being), it is simply a much slower and still a
accurate version of QI.

For inputs sizes which are not negligible relative to 1/2^g, the
quantization difference between QI and AC results in 4 times larger
excess for AC than for QI (in binary case), which grows linearly in N.
More precisely, the quantization excess for QI is

DQ(n,g,A) = 2*n*log(e)/2^g bits ... (1)

while for AC it is:

DA(n,g,A) = 4*A*n*log(e)/2^g bits ... (2)

which for binary alphabet A=2 and the same g & n yields for O(N)
redundancies ratio DA/DQ=4. But this ratio grows linearly with A, so
for large alphabets it can become significant even for moderate values
of N. You can check more details on this (with references) in an
earlier post:

M1. -- Summary of QI algorithmic advances (A1)-(A4)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

in the section (A3) and in the subsequent couple messages covering Q&A.

Note also that the test we were playing with here, N=10^6 and A=3 in
high entropy limit (all 3 symbols equiprobable) is the most favorable
test case for AC, since the high entropy limit makes the relative error
from O(1) terms the smallest and A=3 is the smallest alphabet for which
the high entropy limit data are compressible (since for A=2 with 1 and
0 equiprobable, the best output is the uncompressed n bits). The AC
coders tried here were coding in enumerative mode (using faithful
probabilities for high entropy limit) and were also using precision
g=64 bits, while QI was using g=32 bits. But since N=10^6 wasn't large
enough (for DQ to be significant) it didn't make measurable diference.
The only difference for N=10^6, A=3 is the theoretical limit for O(1)
terms, 2-4 bits for enumerative mode AC which uses "faithful
probabilities" (see [34]). That is, of course, very tiny compared to
the total output size.

As explained in (A3), the above observation is tautotlogical, since the
inputs & test conditions were selected so that the O(1) terms would be
negligible and where the differences in O(N) and O(log(N)) terms were
eliminated through small N (compared to 1/2^g), twice the arithmetic
precision g for AC and dropping the usual adaptive AC coding.

But there are other cases where QI will show much better figures than
AC (see (A3)), where these these 3 types of terms are significant
compared to the output. This is especially significant if input
sequence contains many small pieces with large combined piece entropy
and small individual piece entropy, in which case the small pieces
behave as separate small inputs for an adaptive AC (hence the AC terms
O(log(n)) and O(1) can become comparable to compressed size of each
piece). The array "Vary" in [T4] illustrates one such case, where the
adaptive AC output is more than twice the QI's output size.

nightlight

unread,

Feb 5, 2006, 5:23:03 AM2/5/06

> First of all I showed how a length field could be added
> so it could be used in a buffer it seems you forgot this but
> it was added in even in a nonoptimal way and it beat
> your code.

You were comparing by adding 24 bits to your output (for N only),
to QI value with added 30 bits (27 for N and 3 for A). If you add the
same cost to both, you get the same difference as without additions.
So, don't keep repeating the nonsense "it beat your code" unless
you can show figure and code which does it.

> You could post hundreds of line of text or you could write a
> program that actaully allows people to test files and see
> what happens.

Yeah, why just round sizes to whole bytes. Why not round all sizes
to disk cluster size, 4K, so that all coders are equal in those units,
and no coder gets its self-esteam hurt. If the theoretical difference
for given N=10^6 & A=3, in high entropy limit, with AC coding in
enumerative mode (i.e. using "faithful probabilities" see [34]),
is 2-4 bits (depending on data instance for AC), what is the
point in using output sizes rounded to next byte? To keep
all coders equal and happy?

Keep in mind that the case you chose for test is the best possible
and still compressible case for AC. The high entropy limit reduces
the relative error from O(1) term (which is 2-4 bits in optimally
configured AC) to negligible. The "faithful probability" mode
removes O(log(n)) adaptive costs terms. Finally, the N=10^6
which is small compared to 1/2^g (for arithmetic precision g)
and A=3 (the smallest alphabet for which equiprobable symbols
allow compression), make the AC's quantization cost
DA=4*A*log(e)/2^g bits/symbol as close to QI's quantization cost
DQ=2*log(e)/2^g bits/symbol as theretically possible. The double
precision g won't help you much, since N isn't big enough to show
(you would need much much larger N before your AC's 64 bit
precision would compensate for the AC's O(1) excess bits).

So after all the conceivable advantages to AC were granted (other
than allowing blatant forms of cheating, such as adding different
N and A costs), it still produces larger output by about 6 bits (with
a bit mode output, the 64 bit AC using "faithful probabilities" ought
to be able to get to QI+4 bits instead of QI+6). And depending on
N chosen, your AC output size will fluctuate from instance to
instance of input by at least 1 byte. (We don't need to mention
the speed again.)

nightlight

unread,

Feb 5, 2006, 5:52:18 AM2/5/06

> I modifed a toy arithmetic compresser to do the case you
> actually first brought up.

Can you provide a link to post where I "brought up" A=3 (which
wasn't a response to your "challenge" that you & Matt were
discussing)?

> I showed how the length could be added and still it
> was shorter than what you claimed was your fixed length.

You can't compare QI+30 bits (the +30 was cost of N & A
when both are assumed to be _entirely arbitrary_ and
unknown to decoder, hence universal code costs were
computed) vs AC+24 bits. Show how you get smaller or
even equal size, when you add to both the same cost
for N & A (since that can be coded the same way or at
the same cost with any coder). What is with you and this
repeated insistence on adding different costs for N & A?

David A. Scott

unread,

Feb 5, 2006, 9:22:08 AM2/5/06

"nightlight" <night...@omegapoint.com> wrote in
news:1139134983....@g43g2000cwa.googlegroups.com:

>
> You were comparing by adding 24 bits to your output (for N only),
> to QI value with added 30 bits (27 for N and 3 for A). If you add the
> same cost to both, you get the same difference as without additions.
> So, don't keep repeating the nonsense "it beat your code" unless
> you can show figure and code which does it.
>
>

For your info my N was smaller than your N which is one reason it
took less space. Yes we can play this game forever which just pisses
off the readers. In this case I could use a more optimal why of encoding
N and even carry info for the uneeded A and still get it in the 24 bits
since my N is smaller thus better than your N. in case this is over your
head your N was the number of input symbols. Mine is the number of output
bytes. Do I need to do the arithmetic here is the concept to hard to for
you to grasp?
Since most here are already bored the only real test is one with files.
I see the idea of allowing users to use there own test files is still
a concept you wish not to allow. Can you use it with files YES OR NO.
I think your stalling why is THIS? I could even do the unthinkable
to you and use arithmetic to calculate the index and thus dispense with
the need of QI in the first place. But I was trying to follow your ideas
and compare it with the so called compression you think QI could do better
than arithmetic. The more I look at it the worse the idea of even using
the index for simple compression where arithmetic already shines. But
prove me wrong try writting code that uses files. Unless you can't
because of the inflexability of using QI's index for plain simple entropy
coding. I see know you also think A=3 is optimal for AC which is strange
since you brought up the case of A=3 as bad for AC funny how things change.

Randy McNabb

unread,

Feb 5, 2006, 11:23:34 AM2/5/06

"nightlight" <night...@omegapoint.com> wrote in message

news:1139131772.4...@g14g2000cwa.googlegroups.com...

This is what I was testing also. Except the averages I am getting are a lot
closer to N*log2(A) within a fraction of a bit. I only write a single 1
bit to end the encoding because all you have to do is ensure the that
decoder input is within the proper range when you decode the last symbol.
When I run out of input for the decoder I feed it 0 bits. Sometimes I come
out a little below N*log2(N) sometimes above but average above most of the
time. I use equal probabilities for one test where the probability is 1/A.
The second I use individual symbol counts and N. A third time I encode all
sets as one using the symbol counts and N. I know these are not real world
tests because I assume to much about the model. I just wish to understand
where all the differances are you speek of. It seems that Qi would have the
same discrepencies between the model and the encoder as far as accurate
representation of the data.

Randy McNabb

nightlight

unread,

Feb 5, 2006, 2:19:04 PM2/5/06

> For your info my N was smaller than your N which is one
> reason it took less space. Yes we can play this game forever
> which just pisses off the readers. In this case I could use a
> more optimal why of encoding N and even carry info for the
> uneeded A and still get it in the 24 bits since my N is smaller
> thus better than your N. in case this is over your head your
> N was the number of input symbols. Mine is the number of
> output bytes.

Either coder can be changed to decode from input or output size
(provided the given information is at least as large as log(N) bits,
where N is number of symbols). With QI the relation between output
size given in log(3) bit units (or smaller) and N symbols is
deterministic (via quantized power table, where powers have log(3)
distance in length). With AC you get some variation in output size,
yours even generates + or - 1 byte fluctuations in size (after just
few samples for some N).

Rounding output size to bytes (or to disk sector size or any such) adds
needless noise to the comparisons and to the thread. It adds no useful
information and wastes everyones time on pointless arguments how to
normalize such figure to N*log(3) convention or to other coders which
are already normalized to N*log(3). You might as well insist that we
all round to programmers' age in seconds at the time the test was ran
and then start arguing about whether everyone is normalizing fairly and
declaring elaborate procedures on age verification and synchronization
to make it fair. If anyone needs to kick up that kind of dust and smoke
on a most trivial kind of comparison (# of bits produced by A vs # of
bits produced by B vs N*log(3))), we are dealing with some scam. The
same goes for those insisting that measurement units be coarser than
what the natural output units are for the coders (bits for AC or
Huffman, bit fractions for QI). No one wants to waste time on such BS.

Why don't you simply printf and report how many bits (whole or
fractions if you are doing average) your AC produces and don't add any
other sizes (that later need some elaborate song and dance to compare
to entropy N*log(3) or to other coders' sizes).

That way you can compare your output size with the exact N*log(3) value
directly, since N*log(3) is a figure which does _not_ include costs
for N or A. Or to QI.exe output which displays the same type count as
N*log(3) when comparing to it.

When decoding, pass to your decoder N and A as parameters and let it
decode. That output should be decodable and changing any bit beyond the
declared size in bits (e.g. within last byte) should leave output
decodable. If size is declared in fractional bits as some value X, then
the output index should always be strictly below 2^X (this can be
established via sampling or by inspecting decoder).

When all is said and done, the 64 bit AC coding to "faithful
probabilities" (i.e. in enumerative mode [34]) should theoretically
(with nonbiased quantization) be able to get to QI+(3 to 4) bits for
its average on this N,A input (which is the most favorable of all tests
for AC). The AC will have O(1) bits variation in output size, so you
need an average from a reasonably sized random sample of inputs (as
opposed to declaring the smallest value seen).

In short, taking the most favorable test for AC that exists, it will
still produce larger output and it will take much longer time to
compute it.

nightlight

unread,

Feb 5, 2006, 3:15:15 PM2/5/06

> This is what I was testing also.

I am not clear on what AC are you using, at what precision g, and what
figures (QI? AC?) are you referring to in various places in your post.
Can you give some small table of labeled results? If you can test speed
that would be useful to see as well (note that QI coders in Radix.c
code for any A<2^32 so it is an overkill and consumes extra time for
A=3 case; you may also test for some large A, such as A>2^31 to get a
fair speed test).

For AC, due to O(1) variability of its output size you need a good size
random sample to get an accurate bit fraction. A g-bit precsion AC,
coding to "faithful probabilities" (see [34]) known to coder will get
averages 2-4 bits above N*log(3) if N is negligible compared to 2^(g-3)
(note that N*log(3) figure does not include cost of N or A).

QI's output will be well within N*3e-10 bits from N*log(3) and it will
produce a perfectly stable output size across all input instances for
given N and A (and predictable from N & A, without having to decode
compressed data; AC will need to decode to find out what the exact
compressed size is).

Randy McNabb

unread,

Feb 5, 2006, 5:37:22 PM2/5/06

"nightlight" <night...@omegapoint.com> wrote in message

news:1139170515.4...@g47g2000cwa.googlegroups.com...

> > This is what I was testing also.
>
> I am not clear on what AC are you using, at what precision g, and what
> figures (QI? AC?) are you referring to in various places in your post.
> Can you give some small table of labeled results? If you can test speed
> that would be useful to see as well (note that QI coders in Radix.c
> code for any A<2^32 so it is an overkill and consumes extra time for
> A=3 case; you may also test for some large A, such as A>2^31 to get a
> fair speed test).
>
> For AC, due to O(1) variability of its output size you need a good size
> random sample to get an accurate bit fraction. A g-bit precsion AC,
> coding to "faithful probabilities" (see [34]) known to coder will get
> averages 2-4 bits above N*log(3) if N is negligible compared to 2^(g-3)
> (note that N*log(3) figure does not include cost of N or A).
>

I have been testing AC at 64 bits for encoded size as would be transmitted.
In other words a complete encoding to the bit.
I have run i ttwice today. It is a slow implementation so it takes a long
time. I am encoding eack set three different ways.
1. With equal probabilities for each symbol (1/A)(labeled Equal
Probabilities)
2. Using what I think you are calling "faithful probabilities"(symbol
count/N). I keep totals for all the sets then calculate the average using
(total encode lengths-entropy totals)/number of sets.(labeled Actual
Probabilities)
3. Using the "faithful probabilities" for each set but encoding as if were
one single message to check against the total probabilities without ending
the sequence in the middle. I take an average from this number the same as
2. This seems like it would be more in line with your numbers as there would
be no final encoding of your sets.(labeled Running Total)

Check my entropy calculations routine to make sure I am doing it right.it
calculates the sum of each symbol*-log2(symbolcount/N).

double GetEntropy(unsigned long *model, unsigned long count)
{
double total = 0;
for(unsigned long i = 0;i < count; i++)
{
if ( (model[i]-model[i+1]) > 0 )
{
total +=
log10(((double)model[i]-model[i+1])/model[0])*(model[i]-model[i+1]);

}
}
return -total/ log10(2);
}

Arithmetic coding check

A = 3
N = 1000000
Sets = 1000
Equal Probabilities
N*log(A) = 1584962.500721
Average coding length = 1584962.584000
Average compared to N*log(A) = 0.083279
Number of sets decompressing improperly = 0
Worst = 1584963 Best = 1584962

Actual Probabilities
Total of entropy = 1584961094.522972
Total of coding = 1584961144.000000
Average compared to actual entropy = 0.049477
Number of sets decompressing improperly = 0

Running Total of coding = 1584961094.000000
Average compared to actual entropy = -0.000523
Number of sets decompressing improperly = 0
Press any key to continue

Arithmetic coding check

A = 3
N = 1000000
Sets = 1000
Equal Probabilities
N*log(A) = 1584962.500721
Average coding length = 1584962.592000
Average compared to N*log(A) = 0.091279
Number of sets decompressing improperly = 0
Worst = 1584963 Best = 1584962

Actual Probabilities
Total of entropy = 1584961015.493255
Total of coding = 1584961070.000000
Average compared to actual entropy = 0.054507
Number of sets decompressing improperly = 0

Running Total of coding = 1584961016.000000
Average compared to actual entropy = 0.000507
Number of sets decompressing improperly = 0
Press any key to continue

Randy McNabb

nightlight

unread,

Feb 5, 2006, 6:44:38 PM2/5/06

Note that for this set:

{N,A}={ all sequences of N symbols in alpha size A},

the "faithful probabilities" are constants N/A. The decrementing
probabilities are "faithful" for a different set :

{N,A,C} = { all sequences of N symbols in alphabet size A
with given symbol counts array C[a], a=0..A-1}.

The {N,A} is a much larger set than {N,A,C}, hence the code length will
be larger on {N,A} by approximately (A-1)/2 * log(N), which for A=2
comes to 1/2 log(N) bits. If you include into size of {N,A,C} the cost
for sending frequency table C[a], coded optimally to given
probabilities p(a), then you will exactly the entropy for these
frequencies. But this set {N,A,C} wasn't the one we were testing here.
We were testing {N,A} and for that set the optimim (average) code size
(without cost of N and A) is N*log(A).

While your entropy for {N,A} seems fine, the average AC size seems
wrong -- the reported size is by a at least 1 bit shorter than even an
infinite precsion AC would produce, which would be 1-2 bits above
N*log(A) bits, in this case 1.5 bits for infinite precsion AC. So you
may be miscounting the AC output size with your bit stuffing and some
subtractions. The AC always produces output size in whole number of
bits. If you are stuffing a known terminator bits, you can't subtract
such size from AC output size. Now, you can send the input size N and
alphabet size A to decoder, but then you can't stuff known bits at the
end and then subtract them later.

With N & A passed outside of AC output, the AC output cannot carry
other bits at its tail serving as terminators i.e. if the compressed
size declared by AC is X bits (X is integer), you should be able to
change any bits after these X bits, before passing it to decoder (along
with N & A) and it should decode correctly. Hence, it seems you may be
doing something different here, with some tail stuffing and subsequent
size figure adjustemennts, which results in incorrect AC output size
being reported.

So try doing it without any bit stuffing on the encoder end and
adjustements to reported output size X, and terminate decoder after N
symbols have been decoded, where N is passed as parameter to decoder.
As a test of validity for output size X reported by encoder, put random
1-64 bits after the X bits of compressed data. Note also that decoder
doesn't know the size X (unless you add the cost of sending X to the
decoder), it knows only N. Hence decoder can't take any advantage of
knowledge of X in its decoding.

Randy McNabb

unread,

Feb 5, 2006, 10:11:51 PM2/5/06

"nightlight" <night...@omegapoint.com> wrote in message

news:1139183078.2...@g14g2000cwa.googlegroups.com...

The Encoder writes a single 1 bit to flush the encoder after the last
symbol. Here the remaining underflow bits are counted also. When writing
to file any underflow bits spanning a byte boundary can be discarded. When
decoding the decoder supplies 0 bits when it runs of data. Changing the
bits after the encoded bits may cause the data to be decompressed
incorrectly.

Randy McNabb

nightlight

unread,

Feb 5, 2006, 11:26:28 PM2/5/06

> Changing the bits after the encoded bits may cause the data
> to be decompressed incorrectly.

If your encoder declares X bits as the size of its output [X], then no
other storage should be taken outside of [X] that keeps track of
length of [X] to protect it for decoder.

For example if these X bits are stored inside a larger structure,
bit-packed as: [X],[Y] where [Y] is some other arbitray data of which
you have no control of (what it contains or when and how it changes),
you should be able to decode [X] no matter what the content of [Y] is.
If you need some special padding of some length DX to follow [X] (e.g.
to specify the final position of the AC interval within the window by
flushing the encoder state) then the length of such padding needs to be
counted in the output size X.

Otherwise someone else needs to remeber the declared length X of your
output [X] and recreate the required padding after the X bits, before
calling your decoder (or give your decoder the value X so decoder can
do it). This may amount to as much as DX=log(X) bits that needs to be
added to X (in conventional finite precision AC algorithms, such as
Moffat98 [4], DX is only 2-3 bits, see pp. 277-278).

With that adjustment, the AC output will be about 2-4 bits larger than
that of QI (since AC also needs to round output to the whole number of
bits which QI doesn't need to do, see [M1]), even when AC is coding
using "faithful probabilities" for the given source (which is its most
accurate mode; in that mode, though, AC has to relinquish the
conventional AC modeling scheme). If you use much larger N than 10^6
and/or much larger A than 3, then AC will have additonal excess (see
[M2], section (A3)).

-- References ( http://www.1stworks.com/ref/RefLib.htm )

4. A. Moffat, R. M. Neal, I.H. Witten "Arithmetic coding revisited"
ACM Trans. on Inf. Sys. Vol 16, No 3, 256-294, July 1998

M1. --- QI Bit Fractions and Mixed Radix Codes
http://groups.google.com/group/comp.compression/msg/1e015f38d228969e

M2. -- Summary of QI algorithmic advances (A1)-(A4)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

Randy McNabb

unread,

Feb 5, 2006, 11:48:50 PM2/5/06

"nightlight" <night...@omegapoint.com> wrote in message

news:1139199988....@f14g2000cwb.googlegroups.com...

I pretty much agree. It all depends on how where it is used. I am not
totally sure that even 2-4 bits is enough in all circumstances depending on
the model and the data size. I can imagine instances where it's not. I
have never really had call to check just how accurate the AC coder was but I
am glad to see it was still so close after a billion symbols(Encoding all
sets together). Now I just have to learn what you are trying to do. I
understand the basic idea. I just have to figure out the rest. I have
already learned a lot such as the lattice walks on Pascal's Triangle. When
studing the whole Zeosync deal, I wondered how you could possibly encode
such large enumerations. Now I know. Thanks for your time and the brain
food.

Randy McNabb

Nicolas Le Gland

unread,

Feb 6, 2006, 12:02:50 PM2/6/06

> In short, taking the most favorable test for AC that exists, it will
> still produce larger output and it will take much longer time to
> compute it.

Can you use it with files ? YES or NO ?

nightlight

unread,

Feb 6, 2006, 1:50:01 PM2/6/06

> Can you use it with files ? YES or NO ?

Sorry. That's much too hard for me. You need to look for
some file genius to do fopen() and all that kind of advanced,
mind-boggling work. I just do simple stuff.

0 new messages