Jarek Duda's Asymmetric binary system

Mark Nelson

unread,

Nov 23, 2007, 3:25:49 PM11/23/07

to

Hi all,

I got an email from Jarek Duda from Jagiellonian University in Poland,
pointing to a Wikipedia article (which he wrote) on what he calls the
"Asymmetric Binary System", http://en.wikipedia.org/wiki/Asymmetric_binary_system.
He refers to a paper where he explains it in some additional detail:
http://uk.arxiv.org/PS_cache/arxiv/pdf/0710/0710.3861v1.pdf

Basically this is just another entropy encoder, and I'm trying to see
if there is any justification to Jarek's claim that it has "all
advantages of arithmetic coding, but much simpler"

Jarek assures me that it produces optimal codes (I haven't tried to
verify that) and I can see that it might be simpler to understand than
arithmetic coding.

The problem I have with seeing it as an improved alternative to
arithmetic coding or other range coding algorithms is that it seems
that when using it as, say, the output of a PPM model, you'll be
performing more computation per code than you would be with an
arithmetic coder. The generation of a code is an iterative process, so
encoding a given symbol with a previously unseen probability appears
to me to take quite a few steps.

In the case where probabilities are fixed, it generates a nice set of
codes, but is there any reason to prefer this over, say, Huffman
coding?

|
| Mark Nelson - http://marknelson.us
|

Marco Al

unread,

Nov 23, 2007, 3:52:53 PM11/23/07

to

Mark Nelson wrote:

> In the case where probabilities are fixed, it generates a nice set of
> codes, but is there any reason to prefer this over, say, Huffman
> coding?

I skipped to the example, he choses encoding a string of bits with p(1)
= 0.5 as an example ... I stopped reading right there.

Marco

dud...@gmail.com

unread,

Nov 25, 2007, 11:28:45 AM11/25/07

to

It's from wikipedia for 'arithmetic coding'
"At least one significant compression software program, bzip2,
deliberately discontinued the use of arithmetic coding in favor of
Huffman coding due to the patent situation. Also, encoders and
decoders of the JPEG file format, which has options for both Huffman
encoding and arithmetic coding, typically only support the Huffman
encoding option, due to patent concerns; the result is that nearly all
JPEGs in use today use Huffman encoding"

While encoding using Huffman, for each binary choice (node of the
tree), there is assumed that both probabilities are equipropable
(p(1)=1/2), so the probability of leaves (symbols) are approximated
with powers of 2.
While encoding symbols of given probability distribution, using
encoder which is optimal for probability distribution q, we are
loosing \sum_i p_i log_2(p_i/q_i) bits per symbol (Kullback-Leibler
distance).

So to encode optimally (in -\sum p_i log_2 (p_i) bits\symbol) we have
to use exact encoder. One of such is arithmetic - but the the simple
idea behind it occurs very difficult to calculate in practice - it's
why there are plenty of patents on it.
My idea occurs much simpler to encode - it should be faster and
simplifies hardware compressors ... and there are no patents on it.
You use it exactly as You've used the arithmetic - for each step You
need to know the estimated probability - but this time encoding one
bit takes just a few cycles.

Jarek Duda

Thomas Richter

unread,

Nov 25, 2007, 12:48:15 PM11/25/07

to

dud...@gmail.com schrieb:

> So to encode optimally (in -\sum p_i log_2 (p_i) bits\symbol) we have
> to use exact encoder. One of such is arithmetic - but the the simple
> idea behind it occurs very difficult to calculate in practice - it's
> why there are plenty of patents on it.
> My idea occurs much simpler to encode - it should be faster and
> simplifies hardware compressors ... and there are no patents on it.
> You use it exactly as You've used the arithmetic - for each step You
> need to know the estimated probability - but this time encoding one
> bit takes just a few cycles.

Have you made speed comparisons with the MQ coder, which also only takes a few
cycles?

Have you checked the ELS-coder, another arithmetic coder version, which works
in the logarithmic domain and is also pretty fast?

So long,
Thomas

Mark Nelson

unread,

Nov 25, 2007, 2:00:17 PM11/25/07

to

On Nov 25, 10:28 am, duda...@gmail.com wrote:
> It's from wikipedia for 'arithmetic coding'
> "At least one significant compression software program, bzip2,
> deliberately discontinued the use of arithmetic coding in favor of
> Huffman coding due to the patent situation. Also, encoders and
> decoders of the JPEG file format, which has options for both Huffman
> encoding and arithmetic coding, typically only support the Huffman
> encoding option, due to patent concerns; the result is that nearly all
> JPEGs in use today use Huffman encoding"

It's true that there are many patents covering arithmetic coding, but
I wonder if this wikipedia article might be overstating the case a
bit.

We should probably get started on a timeline so we can keep an eye on
the key patents in the field. I suspect that many are reaching their
end of line.

Also, in the worst case, CACM 87 should be patent-free, correct? Not
particularly efficient, but still, it works.

> While encoding using Huffman, for each binary choice (node of the
> tree), there is assumed that both probabilities are equipropable
> (p(1)=1/2), so the probability of leaves (symbols) are approximated
> with powers of 2.

I'm not sure I follow this, probably language difficulties.

However, I THINK that with your encoding system, if I were to have a
two-symbol alphabet of 1 and 0, with associated probabilities of .99
and .01, both your system and Huffman would yield a non-optimal pair
of binary codes: 1 and 0.

To make this more efficient, we could distribute those values over a
larger input range, but wouldn't doing so yield roughly the same
improvement in both systems?

dud...@gmail.com

unread,

Nov 26, 2007, 3:01:18 AM11/26/07

to

About MQ-coder, ELS-coder - I couldn't find exact description, but
they looks very complicated and I think there are patents on them.
My encoding can be written to need practically ONE (stream decoding
from the paper needs sometimes more of them (max(-Log(q),-Log(1-q)),
but it can be done by a rough checking) multiplication/division per
bit (and a few small operations).

> However, I THINK that with your encoding system, if I were to have a
> two-symbol alphabet of 1 and 0, with associated probabilities of .99
> and .01, both your system and Huffman would yield a non-optimal pair
> of binary codes: 1 and 0.

Huffman would do it awfully - approximate it with 1/2.
I don't see a reason why my coding wouldn't do it exact?

Thomas Richter

unread,

Nov 26, 2007, 8:11:12 AM11/26/07

to

dud...@gmail.com schrieb:

> About MQ-coder, ELS-coder - I couldn't find exact description, but
> they looks very complicated and I think there are patents on them.

None of the two are complicated. They are both multiplication-free, and
very, very, very fast. The MQ coder has one addition and one comparison in
its common coding path. The MQ coder has patented parts in it (IBM, Mitsubishi),
but available free of charge for ISO purposes. The ELS coder is patented,
but available under GPL.

> My encoding can be written to need practically ONE (stream decoding
> from the paper needs sometimes more of them (max(-Log(q),-Log(1-q)),
> but it can be done by a rough checking) multiplication/division per
> bit (and a few small operations).

In that case, you're too slow already. Multiplications and divisions are the
speed-bumps of arithmetic coding. *Avoid* them.

So long,
Thomas

Jim Leonard

unread,

Nov 26, 2007, 12:18:06 PM11/26/07

to

On Nov 25, 11:48 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Have you made speed comparisons with the MQ coder, which also only takes a few
> cycles?

I keep wanting to find a description (and possibly simple source) for
the MQ coder for implementation on a low-resource platform, but every
few months when I start my research, I can't seem to find anything
that I can parse. I'm not an expert, still learning; is there a
simple and/or concise explanation of exactly how the MQ Coder works?
And why is it so fast, is it because it is a range coder? And is it
assymmetric or symmetric? (I'm more interested in decompression speed
than compression speed)

Thomas Richter

unread,

Nov 26, 2007, 12:44:57 PM11/26/07

to

Jim Leonard schrieb:

> On Nov 25, 11:48 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
>> Have you made speed comparisons with the MQ coder, which also only takes a few
>> cycles?
>
> I keep wanting to find a description (and possibly simple source) for
> the MQ coder for implementation on a low-resource platform, but every
> few months when I start my research, I can't seem to find anything
> that I can parse.

There are actually flow-charts available, in the JBIG and the JPEG2000 specs.
Please go to http://www.math.tu-berlin.de/~thor/imco/, look under "Papers".

> I'm not an expert, still learning; is there a
> simple and/or concise explanation of exactly how the MQ Coder works?

If you google for it, you find an IBM tech report as one of the first hits
that explains the Q-coder (the anchestor of the MQ coder) and check there.
Q, Q15, QM and MQ are more or less variations of the same idea.

Basically, if Q is the probability for the least probable symbol (LPS), then
the computations for the interval size A are simplified from

A - Q * A -> A - Q

and

Q * A -> Q

always assuming that A is close to one. Internally, A is kept scaled (by 4/3 IIRC). The
scaling is more or less an empirical fix: A is kept "large enough" by renormalization, and
it has been observed that in this procedure, A keeps approximately the size 3/4 on average.
By representing it scaled, the variable A (the scaled version) remains approximately one.

Due to this "approximation", the LPS interval can grow larger than the MPS interval,
in which case the two are interchanged (LPS,MPS interchange, IIRC that's the Mitsubishi
patent).

> And why is it so fast, is it because it is a range coder?

Basically because you do not need to multiply, and the case where you encode the most
probable symbol, only minimal activity is performed.

> And is it
> assymmetric or symmetric? (I'm more interested in decompression speed
> than compression speed)

The MQ coder is very symmetric.

So long,
Thomas

Marco Al

unread,

Nov 26, 2007, 1:54:58 PM11/26/07

to

Thomas Richter wrote:

> In that case, you're too slow already. Multiplications and divisions are the
> speed-bumps of arithmetic coding. *Avoid* them.

I don't think that's a realistic goal for adaptive M-ary coding.
Binarization isn't always desirable or even possible (I don't know of
any existing coder using binarization which can handle a generalized
Gaussian probability model for instance).

Marco

Phil Carmody

unread,

Nov 26, 2007, 6:59:09 PM11/26/07

to

Mark Nelson <snork...@gmail.com> writes:
> a Wikipedia article (which he wrote)

> a paper where he explains it in some additional detail:

*cough* no original research *cough*.

Can't follow simple rules and self promoting? Smells like
a quack to me without even downloading the paper.

Phil

--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration

Jim Leonard

unread,

Nov 26, 2007, 7:47:07 PM11/26/07

to

On Nov 26, 5:59 pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

> Mark Nelson <snorkel...@gmail.com> writes:
> > a Wikipedia article (which he wrote)
> > a paper where he explains it in some additional detail:
>
> *cough* no original research *cough*.

Quacks notwithstanding, and jumping completely OT, I think "no
original research" is one of the smaller forms of cancer infecting
wikipedia. If I spend hours writing a factual article, it is
rejected; if I copy the same article on a public web page, then write
a stub article that references the webpage, it is accepted? (This has
actually happened to me)

The largest form of wikipedia cancer is of course "not
notable" (notability). Can't be "the sum of all human
knowledge" (their charter, not mine) with entire articles stricken
from the record simply because a few people didn't think the articles
were interesting.

(steps off soapbox)

Matt Mahoney

unread,

Nov 26, 2007, 8:09:52 PM11/26/07

to

On Nov 23, 3:25 pm, Mark Nelson <snorkel...@gmail.com> wrote:
> Hi all,
>
> I got an email from Jarek Duda from Jagiellonian University in Poland,
> pointing to a Wikipedia article (which he wrote) on what he calls the

> "Asymmetric Binary System",http://en.wikipedia.org/wiki/Asymmetric_binary_system.

> He refers to a paper where he explains it in some additional detail:http://uk.arxiv.org/PS_cache/arxiv/pdf/0710/0710.3861v1.pdf
>
> Basically this is just another entropy encoder, and I'm trying to see
> if there is any justification to Jarek's claim that it has "all
> advantages of arithmetic coding, but much simpler"

I looked at it, and yes I do think it will code as efficiently as an
arithmetic coder. It can't code any better, because arithmetic coding
is already optimal.

I initially objected that the algorithm is O(n^2) because as x grows
larger, so does the complexity of the arithmetic. But this is not
true because x can be normalized by outputting the low bits to keep it
in a fixed range (e.g. under 2^32) with insignificant compression
loss.

I also objected that compression requires division, which is slower
than multiplication, but I believe this problem could be solved using
a lookup table of inverses. I don't know which will be faster until
it is actually implemented, but right now I don't see a clear
advantage to either approach.

I also need to comment on the speed of arithmetic coding. Modern
processors have fast multipliers. This makes approaches like the Q
coder, which sacrifices compression by severely rounding the
probability, obsolete. For high end compressors, the bottleneck is
modeling and memory speed, not coding. It is still possible to write
reasonably fast compressors, for example, sr2, which uses binary
arithmetic coding, compresses smaller and slightly faster than zip -9
or gzip -9, which use Huffman coding. The disadvantage is that
decompression is not any faster than compression.
http://cs.fit.edu/~mmahoney/compression/text.html

-- Matt Mahoney

dud...@gmail.com

unread,

Nov 27, 2007, 3:23:43 AM11/27/07

to

> > About MQ-coder, ELS-coder - I couldn't find exact description, but
> > they looks very complicated and I think there are patents on them.
>
> None of the two are complicated. They are both multiplication-free, and

> very, very, very fast...

I've looked on ELS-coder ... 'unsigned char x' ... it's approximating
the [0,1] segment by 256 points, so approximating the probability with
about 2^{-9}
The same can be get using my coding - by restricting to x\in
[2^8,2^9-1].
Now we can just put all coding OR decoding information into a table,
which takes a few hundreds of kB and can be found in milliseconds -
for each x, we can divide [0,1] segment of q, into a few hundreds of
subsegments, for which the coding/decoding information is the same -
we can for example in the first step check a few (N) oldest bits of q
(q\in[k/2^N,(k+1)/2^N) ) and it can be the restriction to one of
subsegments as above, or a few - than we can find the exact one in a
few BST-like choices.

So using my coding, we can freely balance between speed,precision and
memory needs (eg choosing N).
If we use a few MB - we can do it in just one table check (and get new
x\in [2^8,2^9-1] and a symbol or series of bits).
Another use I can think of, is that it looks impossible to decode it
not having exact parameters (probabilities, initial x...) - maybe it
can be used in cryptography.
And there are completely no patents on it...

Mark Nelson

unread,

Nov 27, 2007, 8:33:45 AM11/27/07

to

On Nov 26, 7:09 pm, Matt Mahoney <matmaho...@yahoo.com> wrote:
> I looked at it, and yes I do think it will code as efficiently as an
> arithmetic coder. It can't code any better, because arithmetic coding
> is already optimal.

Thanks Matt. I'll be looking into your suggestions.

It looks to me like the average integer division time on modern CPUs
is maybe 2-4X the number of cycles as the average integer multiply,
which seems to be getting pretty fast. Is that a good ballpark
estimate? (Back in the day calculating these kinds of numbers was
pretty easy, now it's extremely complicated.)

Matt Mahoney

unread,

Nov 27, 2007, 1:30:35 PM11/27/07

to

On a Pentium 4 division is about 4 times slower than multiplication.
Appendix C of http://www.cs.cornell.edu/courses/cs412/2001sp/resources/Pentium%20Processor%20Optimization.pdf
has timing information. Of course it depends on the processor.
Optimization takes a lot of experimenting.

Also you have to consider parallelism. In the PAQ arithmetic coder
below, the range is x1..x2 (32 bit unsigned) and the next bit
probability is p (12 bits). Finding the range split requires 2
multiplications to avoid 32-bit overflow (or else assembler or
nonportable code).

U32 xmid=x1 + (x2-x1>>12)*p + ((x2-x1&0xfff)*p>>12);

The two multiplications are independent so they can be overlapped.
The first one has a latency of 14-18 clocks, and the second has a
throughput of 5 clocks. Not all code can be overlapped this way.

-- Matt Mahoney

dud...@gmail.com

unread,

Nov 29, 2007, 7:05:02 AM11/29/07

to

There were concerns about encoding very small probabilities, like from
gaussian/Levy tails.
It's because both arithmetic coding and ABS approximate the wanted
probability with some absolute precision |p_wanted-p_approximated|<E,
so for probabilities near E, the coder doesn't work properly.
I suppose there are versions of arithmetic coding with higher
precisions, but the cost should be like in ABS (choosing R, so x
\in[2^R,2^{R+1}-1], E is about 2^{-R}), that we require multiplication/
division.
But I think it's so large problem in the times of many GIPS processors
(specially that compression can be parallelized, eg. by splinting the
file), GPU should compress online much faster than HDD transfer.

ABS just looks like a patent free completely fresh view on encoding
(and maybe cryptography), which can be worth considering.

Jarek

Matt Mahoney

unread,

Dec 1, 2007, 8:29:36 PM12/1/07

to

I have been looking more at ABS. Having only one state variable
instead of 2 (low and high range) means a lookup table can be used for
both coding and decoding to fairly high precision. I did some
experiments where I simulated compressing bits with a probability
distribution p selected uniformly from (0,1). First I quantized p to
n-bit number q in range 1 to 2^n - 1. I let the state variable x be n
bits also, in range 2^n to 2^(n+1) - 1. I verified this works as long
as x has at least as many bits as q. It could be more, but adding one
bit each to q and x always helps more than adding 2 bits to x.

To encode bit d (0 or 1, choosing 1 with probability p), I first write
out enough low bits of x so that when d is encoded, x is back in the
right range. I compared the result with an ideal coder, e.g. log2(1/
p) if d = 1 or log2(1/(1-p)) if d = 0. The penalty (actual/ideal - 1)
for 10^7 simulated bits is as follows:

n penalty
-- -------
1 .386
2 .106
3 .0318
4 .00968
5 .00283
6 .000808
7 .000217
8 .000067
9 .000019
10 .000006

which means e.g. using a 64K lookup table (n = 8) the compressed size
is only .000067 larger than an ideal arithmetic coder.

I tried other distributions. The penalty is larger for p near 0 or 1
(highly compressible data), but arithmetic coding has the same
problem. For PAQ I use 12 bit probabilities for this reason, which
gives a negligible penalty (in practice about .000001). This would
probably be too large a table unless q were quantized more coarsely in
the middle of the range. In that case, encoding bit d with
probability p = P(d = 1) would be:

p -> q (table lookup)
x', q, d -> k, x (table lookup)
write low k bits of x'

Decoding would be:

p -> q (table lookup)
x, q -> k, x', d (table lookup)
read k low bits into x'

But where do I go from here? Unlike arithmetic coding, the decoder
moves x through the sequence of states in the opposite order as the
encoder. This means that the bits of d are decompressed in the
reverse order that they were compressed. This rules out all the usual
predictive models like PPM, PAQ, LZMA, etc. because the model sees
different context during encoding and decoding.

-- Matt Mahoney

dud...@gmail.com

unread,

Dec 2, 2007, 2:23:14 AM12/2/07

to

About larger penalties near 0 or 1, I've written the last post.
About stream versions - example of correct one is in the paper:
http://uk.arxiv.org/PS_cache/arxiv/pdf/0710/0710.3861v1.pdf
You have to ensure decoding/encoding to be reverses of each other - if
one firstly encode/decode than put/get lowest bits, than the second
has to do it in the reverse order - get/put lowest bits such that
after decoding/encoding we will stay in the selected range.
To do it unambiguously, what we need to have the reverse, we should
ensure constant bit-length of x - select the range [2^R,2^{R+1}-1].

About the lookup table - it's not exactly that easy - it will be a few
times larger - we have to do it for rational q.
As I've written before, for each x (or (x',d)) we can split q into
subranges with the same behavior - decoding is ok, but there is a
problem with encoding: for q near 0 or 1: x' will go to infinity and
the width of such subranges will drop to zero - I propose to select
some range, outside of which we use just multiplication/division.
Having such split into subranges, we can decide in which q is in
logarithmic time using BST-type tree, but we can do the first step -
check a few largest N bits of q ( \floor(q shl N)) ) and jump straight
to some bottom node of this tree.

dud...@gmail.com

unread,

Dec 2, 2007, 9:40:30 AM12/2/07

to

Precisely...
x\to (x',d) iff (d=0 and xq\in[x',x'+q)) or (d=1 and xq\in
[x'+q,x'+1))
If we wants decompress lookup table to be constructed fester, one step
is:
- x\to (x'd)
- put youngest bits to x' to make it be in [2^R,2^{R+1}-1]
The split of [0,1] segment of possible q would look like:
Q_{x\to(x',0)}=[x'/x,x'/(x-1) )
Q_{x\to(x',1)}=[x'/(x-1),(x'+1)/x )
So the 'lookup tables' will be: for each x place all x'/x, x'/(x-1) on
the [0,1] segment - we have to be able to determine quickly where q
is. We can do it in eg in two stages: in the first use N oldest bits
of q, and than make a few binary choices.

The compression looks more difficult, because:
- x can go to infinity
- we don't know how many bits we should cut on the begining
But because of the second 'problem' we don't have the first one - we
assure x to be less than 2^{N+1}
So the 'lookup table': for each (x'(\in [2^R,2^{r+1}-1),d(\in {0,1}) )
remember ends of subsegments for all possible x we can get (there are
2^R of them) - we have to sort them somehow. For each subsegment we
can store bits which had to be cut.

To summarize - for given N, we have to remember 2^N*2^{N+1} rational
numbers (subsegment's ends), decode information and make some
searching structure above them.

Matt Mahoney

unread,

Dec 3, 2007, 3:06:13 PM12/3/07

to

I was able to implement coding and decoding using functions, which
could be written as tables if N is not too large. I tested it and it
works as long as XBITS > QBITS (ideally XBITS = QBITS + 1). From the
table in my previous post, n = QBITS = XBITS - 1.

In the code below, the state of the coder is x in the range
2^(XBITS-1) to 2^XBITS - 1, the bit to be coded is d, and the
probability p(d = 1) is q/(2^QBITS), where q is in the range 1 to
2^QBITS - 1. To encode:

// read d
int x1 = x;
int k = scale(x1, q, d);
// write k low bits of x, leaving x1
x = encode(x1, q, d);

To decode:

d = decode_d(x, q);
// write d
x = decode(x, q, d);
// read low bits into x until x >= 2^(XBITS-1)

Code follows:

const int QBITS = 5; // probability quantization in bits, 1..10
const int XBITS = 6; // state quantization, QBITS+1..11
const int Q=1<<QBITS;

// Encode bit d with probability p(d==1) == q >> QBITS to state x
// and return new x
int encode(int x, int q, int d) {
assert(d==0 || d==1);
assert(q>0 && q<Q);
if (d)
return (x<<QBITS)/q;
else
return ((x+1<<QBITS)-1)/(Q-q);
}

// Return decoded d = 0 or 1 given state x, probability q
int decode_d(int x, int q) {
assert(q>0 && q<Q);
int d=((x+1)*q-1>>QBITS)-(x*q-1>>QBITS);
assert(d==0 || d==1);
return d;
}

// Return x after decoding bit d with probability q
int decode(int x, int q, int d) {
assert(d==0 || d==1);
assert(q>0 && q<(1<<QBITS));
int xq=(x*q-1>>QBITS)+1;
assert(xq>=0 && xq<=x);
if (d)
return xq;
else
return x-xq;
}

// Scale x so it encodes to XBITS bits (2^(XBITS-1)..2^XBITS-1)
// Return number of bits discarded
int scale(int& x, int q, int d) {
int count=0, e=-1;
while (x>0 && (e=encode(x, q, d)) >= 1<<XBITS)
++count, x>>=1;
assert(e>=1<<XBITS-1 && e<1<<XBITS);
return count;
}

I could make this all more efficient by replacing the encoder and
decoder with tables, using these functions to fill them. That is not
the problem. The problem is that the bits must be decompressed in the
reverse order that they were compressed. But then you can't use any
kind of predictive model to compute q because the compressor and
decompressor see different contexts.

-- Matt Mahoney

dud...@gmail.com

unread,

Dec 3, 2007, 3:39:35 PM12/3/07

to

> The problem is that the bits must be decompressed in the
> reverse order that they were compressed. But then you can't use any
> kind of predictive model to compute q because the compressor and
> decompressor see different contexts.

It's not true - encoder in one step produce k bits, decoder should use
the same k bits in the same order in corresponding moment.
to use predictive methods:
- encode d
- update q using d
or
- decode d
- update q using d
And we are using the same q for encode/decode.

About the lookup tables - I've simplified the compression in my last
post - we have to make segments for every number of possible number of
cut bits - R times more ... I think it's better to do compression
without the lookout table.
Decompression is ok.

Jarek

dud...@gmail.com

unread,

Dec 3, 2007, 3:59:27 PM12/3/07

to

I'm sorry - You'are right - we are decompressing in the reverse
direction - the predictive methods won't work ... :(

dud...@gmail.com

unread,

Dec 3, 2007, 4:16:01 PM12/3/07

to

But if the compression don't have to be quick...
When we have whole series of d, we can make the prediction process
from beginning to the end and now compress in backward order, going
back the prediction process.
Now while decompression, we have the correct order and we can restore
the prediction process online.

Matt Mahoney

unread,

Dec 4, 2007, 5:49:09 PM12/4/07

to

Yes, you're right. During compression you save q in a list, then
encode reading the input bits and q in reverse order. The output bit
order also have to be reversed. Then decoding is straightforward.

-- Matt Mahoney

dud...@gmail.com

unread,

Dec 5, 2007, 1:01:38 AM12/5/07

to

Table of q would take a loooot of memory. Of course we could store a
copy every some large number of iterations (like square root of the
file length), and every time works with one copy.
But much better would be if we could reverse the prediction process -
I imagine prediction process as counting numbers of occurrences - a
step forward is adding, so the step back would be subtracting it.
Like always - it needs to be very careful with precision here.

Jarek

dud...@gmail.com

unread,

Dec 11, 2007, 12:33:33 AM12/11/07

to

Of course if the prediction process is more complicated, we can store
not whole q for each step, but only the informations required to make
steps back in the reverse process, to reverse the whole prediction
process.

I've just finished paper about another generalization of numeral
systems (higher dimensions), which can be used for example in image
compression:
http://groups.google.com/group/comp.compression/browse_thread/thread/f90482484435d3a9#19891a47e668bc28

Jarek

Matt Mahoney

unread,

Dec 15, 2007, 9:39:43 PM12/15/07

to

I implemented your asymmetric coder in an order 0 compressor, fpaqa,
released under GPL. It is designed as a drop in replacement for
arithmetic coders in other programs like PAQ, LPAQ, BBB, and SR2.

http://cs.fit.edu/~mmahoney/compression/
Download: http://cs.fit.edu/~mmahoney/compression/fpaqa.zip

My implementation uses lookup tables that fit in L2 cache, but it is
twice as slow as my arithmetic coder for some reason. Coding uses two
lookups, first to quantize a 12 bit prediction, q, down to R=7 bits,
then a second lookup to map q, the N=10 bit state x, and the bit d to
a new state and the number of bits of x to write. The decompressor
also uses 2 lookups, first to quantize q as before, then a second
lookup to map q and x to d and new x.

To solve the asymmetry problem, the compressor models the input in the
forward direction and saves q and d in a stack of size B=500K. When
the stack is full or at end of input the saved q and d are encoded to
a second stack of packed bits, and that stack is written to the output
file in reverse order along with the first stack size and final state
x. The decompressor does everything in the forward direction,
periodically reading x to reset the state. The compressor does all
the data reversal, so it is about 15% slower than the decompressor. B
is large enough that the extra headers are negligible (5 byte overhead
per block).

I chose N and R to make the tables fit in 512KB L2 cache for
compression or 256K for decompression. Making them larger would
improve compression about 0.03% but be twice as slow on my computer
(2.2 GHz Athlon-64 in 32 bit mode). The maximum size in this
implementation is N=12, R=7, but there is not much to be gained by
going any higher than this. B does not affect speed because the stack
memory is accessed linearly so cache is not an issue.

The program has a compiler switch -DARITH that lets you substitute the
original arithmetic coder that I use in all my other programs. The
interface is identical. In theory the asymmetric coder should be
faster, so I need to solve the performance problems before I put it in
my other programs.

-- Matt Mahoney

dud...@gmail.com

unread,

Dec 16, 2007, 4:49:36 AM12/16/07

to

It's great!
It should be even faster and less memory demanding when the predictor
will have 'step back' option.
Jarek

dud...@gmail.com

unread,

Dec 16, 2007, 5:05:11 AM12/16/07

to

I'm sorry - it's wrong way - Your solution is faster, because we
quantize q only once.

dud...@gmail.com

unread,

Dec 16, 2007, 5:24:23 PM12/16/07

to

What's the Wiki's answer...
http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Chiral_life_concept

dud...@gmail.com

unread,

Dec 16, 2007, 5:27:00 PM12/16/07

to

Sorry, wrong link:
http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Asymmetric_binary_system

dud...@gmail.com

unread,

Dec 16, 2007, 5:28:52 PM12/16/07

to

Matt Mahoney

unread,

Dec 16, 2007, 10:01:49 PM12/16/07

to

On Dec 16, 5:28 pm, duda...@gmail.com wrote:
> Sorry, wrong link:http://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Asymmetr...

Wikipedia has a policy of no original research. I added a section on
the fpaqa implementation, so maybe it qualifies now.

-- Matt Mahoney

dud...@gmail.com

unread,

Dec 18, 2007, 3:02:08 PM12/18/07

to

I've looked into the table on Matt's page ... zip decompress TEN times
faster than fastest arithmetic!

So I thought about decoding multiple digits in one step using my
method.
The first approach is to observe that while decoding, d = 0, iff
frac(xq)\in[0,1-q)
So if the next probability used is q', we can just check (using table)
in which of four ranges frac(xqq') is (of width (1-q)(1-q'),q(1-q'),(1-
q)q',qq') - we decode 2 digits(symbols) in one table check.
Analogically if the the fallowing 'q's are fixed, we can decode a few
digits in one step by one tableup.

The more comfortable approach is to generalize to remove 'binary':
let (q_i) - probability distribution of symbols(digits)
analogically like in the derivation from my paper, from elements
smaller than x:
x_0:=ceiling(xq_0) will be decoded as 0
x_1=ceiling(x(q_0+q_1))-x_0 - as 1
x_2=ceiling(x(q_0+q_1+q_2))-x_1-x_2 -as 2
...
x_k=x-x_0-x_1-...
analogically as in the paper, while going to x+1, only one of x_i,
say x_d will increase by 1
so d is our decoded symbol(digit), x_d is the next x.

These equations looks horrible, but if THE PROBABILITY DISTRIBUTION IS
FIXED - we can put them into large (much larger precision is required)
one dimensional table:
dec[x] will return 2-4 bytes in which are encoded (decode-logic and
with mask and shifts):
- d
- x_d
- maybe the number of bits that have to be get from the input.
So in one table check - we decode whole symbol(digit)!
The table for coding (separate for each d) can be found exactly as for
decoding.

Jarek Duda
ps. it's not the best time to put Asymmetric Numeral Systems into
Wikipedia... :)

Jarek Duda

unread,

Dec 18, 2007, 5:27:20 PM12/18/07

to

I'm sorry - the equations for x_i above aren't correct - sometimes x_i
decreases with x->x+1
The first idea to initialize tables for decoding/coding correctly:

Fix some constant: 'a' - the best would be a irrational number (like
\phi)
we are starting with x=2^R - set x_i for it using for example the
equations above

For x = 2^R to 2^{R+1}-1
find d that fract(a*x) \in [p_0+...+p_{d-1},p_0+...+p_d)
dec[x]=(x_d,d) (or cod[d][x_d]=x)
x_d++

We are using that fract(a*x) is distributed uniformally on [0,1).
But I'll think about something simpler.

Jarek Duda

unread,

Dec 18, 2007, 6:16:17 PM12/18/07

to

Observe that 'a' is ideal for crypto key

Matt Mahoney

unread,

Dec 18, 2007, 6:34:06 PM12/18/07

to

I implemented a second version of the asymmetric binary coder, fpaqb.
http://cs.fit.edu/~mmahoney/compression/#fpaq0

The new version does not use tables, except for one table of 64-bit
inverses to avoid a division during compression. Compression is a
little slower than fpaqa and decompression is a little faster, but
still slower than arithmetic coding. The advantage of not using
tables is that x can be represented to higher precision allowing
bytewise I/O (faster) and eliminating the 0.03% space penalty when
N=10. I used N=12 bits, i.e 12 bits precision for probabilities and x
in the range 2^12 to 2^20 - 1. The code is also simpler than fpaqa,
and it uses less memory.

In my earlier experiments, eliminating tables doubled compression time
(due to integer division) but decompression time was the same, even
though a multiplication was required. But I/O a byte at a time (low 8
bits of x) saves a lot of time, and was not possible using state
tables that fit in cache. To avoid division by q I make a table of 1/
q, but it requires 64 bit arithmetic, either double or unsigned long
long. I found 64 bit integers to be faster on my Athlon-64 in 32-bit
mode (WinXP).

As a minor optimization, block headers were reduced from 5 bytes to 3
except for the last one, which is 6 bytes.

-- Matt Mahoney

Jarek Duda

unread,

Dec 19, 2007, 12:01:52 AM12/19/07

to

It's not large price for the compression rate.

I though the use as a cryptosystem.
Besides 'a' in the crypto key can be encoded:
- small variations of the probability distribution (huge variety)
- initial 'x'.
'a' encodes unique sequence of digits - it's difficult to believe that
it can be attacked different than by brute force, and if that - we can
easily get decrypting times (without key) like for modern RSA codes.
And the speed is amazing: compression+(en)decryption is done in time
of Huffman coding.

Jarek Duda

unread,

Dec 19, 2007, 12:42:32 AM12/19/07

to

'small variations' means below precision.
The other parameter we can vary, is to use a few 'a's, eg
fract(ceiling(x/2)a_1+floor(x/2)a_2) instead of fract(ax)
This freedom of choice can be seen that there is enormous (double
exponential of R) number of possible initializations consistent with
required probability distribution - and the initialization require
large (for brute force) time.
We could use it without compression too - assume p(0)=1/2 and block a
few digits.

Jarek Duda

unread,

Dec 19, 2007, 9:39:54 AM12/19/07

to

I've created separate discussion about this use on a cryptography
forum:
http://forums.devshed.com/security-and-cryptography-17/cryptosystem-based-on-asymmetric-numeral-systems-497071.html

Jarek Duda

unread,

Dec 20, 2007, 2:19:56 PM12/20/07

to

The limitation that the probability distribution is fixed in
asymmetric numeral systems, can be weakened by using a few of them:
For example we can use for each symbol(or a few previous) separate
probability distribution of the following one - in this way we DOUBLE
(or more) USED CORRELATION LENGTH (by cost of memory).
So we have separate coding/decoding tables for each symbol - we can
initialize them using the key in many ways.
While encoding we have to use proper table, remembering that the
decoding will be in the reverse order.

So it can be as fast as Huffman, as precise as arithmetic coding and
as safe as...? :)

Matt Mahoney

unread,

Dec 20, 2007, 4:23:01 PM12/20/07

to

On Dec 18, 6:34 pm, Matt Mahoney <matmaho...@yahoo.com> wrote:
> I implemented a second version of the asymmetric binary coder, fpaqb.http://cs.fit.edu/~mmahoney/compression/#fpaq0

I updated fpaqb to a faster version. I made a minor optimization
(replace 2-D compression inverse table with 1-D) and found better g++
optimization settings. Ver. 2 is compatible with ver. 1. but about
10% faster. New benchmarks:
http://cs.fit.edu/~mmahoney/compression/text.html#6222

-- Matt Mahoney

Matt Mahoney

unread,

Dec 21, 2007, 10:27:52 AM12/21/07

to

I put the fpaqb coder into lpaq1 to demonstrate that an asymmetric
coder can be used as a general purpose replacement for an arithmetic
coder and to compare performance using exactly the same model.
Download: http://cs.fit.edu/~mmahoney/compression/lpaq1a.zip (GPL
source and Win32 .exe)
from http://cs.fit.edu/~mmahoney/compression/#lpaq

In my tests, lpaq1a is about 1% slower, compresses 0.02% larger, and
requires 1.25 MB extra memory during compression. The larger output
is due to block headers (3 bytes output per 64KB of input). This
could be improved by using larger blocks at the expense of more memory
(20x block size). Decompression requires no extra memory and is
slightly faster than compression, but still not as fast as arithmetic
coding.

-- Matt Mahoney

Jarek Duda

unread,

Dec 25, 2007, 4:01:48 AM12/25/07

to

fpaqa is slower than arithmetic coding, because for each digit, it
makes TWO TABLE CHECKS:
" r=qr[predictor.p()]; x=dec[r][x] "
Matt used 2 tables to make stretch (stretch(q) = ln(q/(1 - q) ) before
decreasing its precision - thanks of this we have better approximation
for extreme q (near 0 or 1), but worse for q near 1/2 - finally Matt
writes that it makes the compression rate about 0.02-0.1% better.
If we wont use it - just reduct the precision, we can do everything in
ONE TABLE CHECK, for example
r=dec[predictor shr K][x]
where K is the number of bits to cut (2-4 or even zero but it would
need a lot of memory).

Another slow down is the while loop for very small q. We could store
the number of bits to get, in some bits of dec/cod and decode/encode
in separate codes.
But the best solution would be to construct the predictor to work
around 1/2 - if q>=1/4 we need at most 2 bits to get/put (one if
q>=1/2).

For example if there is Huffman based compressor, instead of standard
encoding (approximate every binary choice with 1/2), we can do it
exactly (get better compression rates) using created Huffman tree -
just calculate the binary choice probabilities exactly (they are near
1/2, maybe larger than 1/4 (1/2?) ) and use ABS.

I don't believe it would be slower than arithmetic coding.
Jarek

Matt Mahoney

unread,

Dec 25, 2007, 6:18:08 PM12/25/07

to

On Dec 25, 4:01 am, Jarek Duda <du...@interia.pl> wrote:
> fpaqa is slower than arithmetic coding, because for each digit, it
> makes TWO TABLE CHECKS:
> " r=qr[predictor.p()]; x=dec[r][x] "
> Matt used 2 tables to make stretch (stretch(q) = ln(q/(1 - q) ) before
> decreasing its precision - thanks of this we have better approximation
> for extreme q (near 0 or 1), but worse for q near 1/2 - finally Matt
> writes that it makes the compression rate about 0.02-0.1% better.
> If we wont use it - just reduct the precision, we can do everything in
> ONE TABLE CHECK, for example
> r=dec[predictor shr K][x]
> where K is the number of bits to cut (2-4 or even zero but it would
> need a lot of memory).

You could do that. As I posted earlier, the size penalty for using n
bits for x and q is as follows for a uniform distribution over q in
1..2^n - 1 and x in 2^n..2^(n+1) - 1.

n penalty
-- -------
1 .386
2 .106
3 .0318
4 .00968
5 .00283
6 .000808
7 .000217
8 .000067
9 .000019
10 .000006

The one-table method needs 2n+1 inputs and n outputs for compression.
It needs 2n inputs and n+1 outputs for decompression. n should be
small enough so that the table fits in cache.

>
> Another slow down is the while loop for very small q. We could store
> the number of bits to get, in some bits of dec/cod and decode/encode
> in separate codes.

Also, I/O one byte at a time is faster than bitwise I/O, but it
requires an extra 8 bits of inputs to the tables. Perhaps it would be
faster to read/write 2 or 4 bits at a time. In most cases, there is
only one input/output per compressed bit, so saving a shift value in
the table doesn't add that much.

> But the best solution would be to construct the predictor to work
> around 1/2 - if q>=1/4 we need at most 2 bits to get/put (one if
> q>=1/2).
>
> For example if there is Huffman based compressor, instead of standard
> encoding (approximate every binary choice with 1/2), we can do it
> exactly (get better compression rates) using created Huffman tree -
> just calculate the binary choice probabilities exactly (they are near
> 1/2, maybe larger than 1/4 (1/2?) ) and use ABS.
>
> I don't believe it would be slower than arithmetic coding.
> Jarek

This might work for a fast compressor like SR2, where the 3 most
recently occurring bytes in the current context are coded using 1, 3,
or 3 bits, and the others coded as 10 bits. However there is no way
to avoid q near 0 or 1 in highly redundant code, whether you use
symbol ranking or Huffman.

Also, I have made some speed optimizations to the non-table version of
the ABS coder in fpaqb. The new version, fpaqc, is a few percent
faster. http://cs.fit.edu/~mmahoney/compression/#fpaq0

-- Matt Mahoney

Jarek Duda

unread,

Jan 14, 2008, 3:43:55 PM1/14/08

to

> > Another slow down is the while loop for very small q. We could store
> > the number of bits to get, in some bits of dec/cod and decode/encode
> > in separate codes.
>
> Also, I/O one byte at a time is faster than bitwise I/O, but it
> requires an extra 8 bits of inputs to the tables. Perhaps it would be
> faster to read/write 2 or 4 bits at a time. In most cases, there is
> only one input/output per compressed bit, so saving a shift value in
> the table doesn't add that much.

I agree that in this versions it's not important, but in
- hardware versions - it will simplify, make it faster and ensure
constant time for a bit (bit rate)
- asymmetric numeral version - this time we have always to transfer
many bits

> However there is no way
> to avoid q near 0 or 1 in highly redundant code, whether you use
> symbol ranking or Huffman.

We only have to avoid q near 0 - maybe it could be easier to get...

I thought about error correction, because one bit-flip makes that we
lose everything after it.
To do it, we have to add some redundancy - according to expected
density of errors.
For example we can set given bit in every byte to 0 in the data to
encode.
Now if while decoding we get there 1, we know that there was an error
in a bit in at most a few bytes before.
We should flip bit by bit and try to decode - if the density of errors
isn't too large, we should be able to localize and correct the error
with large probability.

Jarek Duda

unread,

Jan 22, 2008, 9:21:01 AM1/22/08

to

I've finally realized that the standard deviation of (x_i-x*p_i) can
be significant in presented before versions of initialization of ANS.
It's about sqrt(p_i/2^R)/2 - it's relatively small - in should make
the output larger about something like 0.1%, we usually can just
ignore it.
But if someone wants to use it for good rate compressor, there is
required more precise version of the initialization.

Here is general perfect initialization algorithm (like the formula for
ABS):
We need some priority queue (eg heap): insert,
getmin - retrieve pair with minimum first coordinate, which is
expected position of next occurrence of x with given d.

For i=1 to D insert (1/p_i,i);
x=0;x_1=0;...;x_D=0;
For x=0 to some_value
{(y,d)=getmin;
dec[x]=(d,x_d) //or cod[d][x_d]=x;
x_d++;
insert(y+1/p_d,d);
}

In practice, we can start from x=2^R, x_1=x*p_i ...

To make it able to encrypt we can for example using our generator
choose one of the smallest instead of just taking the smallest(many
options).
This time I'm completely sure that even using the simplest generator,
hacker cannot omit full initialization.

ps. Generally: if we would like to enforce the initialization
artificially longer, we can for example start the initialization from
a distant point.

Jarek Duda

unread,

Jan 23, 2008, 10:03:41 AM1/23/08

to

I want to give some intuition about ANS and why it's so safe.

From the beginning...
We can say that in a natural number x is stored about lg(x) bits.
You probably would like to add ceiling there, but when thinking about
x in higher numeral systems, we asymptotically loose the ceiling.

In symbol(d) which have probability p is stored -lg(p) bits of
information (eg in 0/1 with probability 0.5 is stored 1 bit).
So if we would like to place this information with information stored
in x, it would take about lg(x)-lg(p) bits of information, so
x should be transformed into about x/p.
See that we have it in usual binary system: to store b=0/1 bit with
information stored in x, we can make x->2x+b (it's about x/0.5) - it's
bijection - we can reverse this transform.

Now generally if x-> about x/p, so (x+1)-> about x/q + 1/q
We can do it by placing somehow uniformly with probability q places
which will correspond to this symbol and analogously with other
symbols - we get the tables from ANS initialization:
bijection (x_d,d)->x such that x_d is about x*p_i

In practice we cannot operate on infinitely large numbers, so after
growing above some limit, we forget for a while (move to output) about
some least important digits and work only on first R+1 most important
digit (it almost doesn't change lg(x)) .
So in x is always stored something between R and R+1 bits.
When encoding symbol with probability q, we would increase number of
bits in x by -lg(p), so we have to put into output floor(lg(x)-lg(p)-
R) bits before.

In version for cryptography we place digits uniformly, but using
random generator.
So the symbol(byte?) we get, and place of state we will go to are
fixed, but in fact in random way, depends on the whole context - state
- DEFINED BY WHOLE HISTORY.
The smallest change and we are in a completely different place and
behave completely different.

We have three types of random behavior here:
1) ASYMMETRY (the strongest): different symbols have different
probability - slightest change changes completely place of the next
state
2) UNIFORM COVER: -lg(p_i) is usually irrational, so even with uniform
probability we jump uniformly on the whole space of states - random
behaviors
3) DIFFUSION - in the simplest initialization x*p_i-x_i has standard
deviation sqrt(p_i*(x-2^R)) - it makes that anyway we somehow randomly
diffuse around some expected state.

Jarek Duda

unread,

Jan 24, 2008, 2:55:40 PM1/24/08

to

I have found speedup for bit transfer!
For example it should make fpaqc a few (or more) percent faster (and
more for compression).

To ensure everything is reversible, we used constant length of digits
(R+1 bits) buffer (state).
But nowhere is written that this digits have to be binary ... we can use
higher base - for example 4 (2 bits) or larger.
For base 4 (2 bits), we will use:
Range: [2^{R-1},2^{R+1}-1]
Decompressing: while(x<2^{R-1})x=4x+2bits from input
etc.
Thanks of it we have to make twice less transfers in cost of table
size(half more) and a bit of precision.
We don't feel these costs in precise version (fpaqc) - it will be
faster practically for free.
If the cost of multiplication doesn't grow, maybe we could increase it
even to 8bits ([2^R,2^{R+8}-1])- decreasing the time for bit transfer
practically to zero (x=256x+byte).

Jarek Duda

unread,

Oct 15, 2016, 4:47:49 AM10/15/16

to

The Wikipedia article is finally back after nearly a decade:
https://en.wikipedia.org/wiki/Asymmetric_Numeral_Systems

Great thanks to Mark Nelson for starting this thread 9 years ago!