Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
New combinatorial coder: tighter & faster than Arithmetic coding
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 125 - Collapse all  -  Translate all to Translated (View all originals)   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
nightlight  
View profile  
 More options Nov 20 2005, 9:34 am
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 06:34:06 -0800
Local: Sun, Nov 20 2005 9:34 am
Subject: New combinatorial coder: tighter & faster than Arithmetic coding
There is a new algorithm for entropy coding, constrained
coding and fast combinatorial enumeration described in the
arXiv preprint cs.IT/0511057 (also submitted to DCC 2006):

   http://arxiv.org/abs/cs.IT/0511057

It is coding based on enumerative combinatorics (walks
on Pascal triangle, counting of integer lattice paths).
The new algorithm significantly improves on the previously
known method "enumerative coding" (which is the regular
un/ranking procedure in combinatorics), increasing its speed
and reducing memory size by factor O(n), where n is number
of input symbols). It is the algorithm Rissanen was looking for
when he found a partial solution, arithmetic coding (in 1976).

The new algorithm, even the exploratory unoptimized
prototype runs much faster (see the table below, copied
from the of paper, with speedup factors over 200) and
always compresses better than the best of arithmetic
coders (such as Moffat et al 1998). Besides its usefulness
in entropy coding (for all types of file, image,...
compressors where Huffman, arithmetic or similar coding
is used in the final phase) and for encoding of complex objects
(e.g. permutations, trees, graphs, self-delimiting sequences,..
etc) the new algorithm extends significantly the domain of
enumerative combinatorics accessible to computer explorations.

Here is the abstract:
---------------------------------------------------------
Title:  Quantized Indexing: Beyond Arithmetic Coding
Author: Ratko V. Tomic

Quantized Indexing is a fast and space-efficient form of
enumerative (combinatorial) coding, the strongest among
asymptotically optimal universal entropy coding algorithms.
The present advance in enumerative coding is similar to
that made by arithmetic coding with respect to its unlimited
precision predecessor, Elias coding.

The arithmetic precision, execution time, table sizes and
coding delay are all reduced by a factor O(n) at a
redundancy below log(e)/2^(g-1) bits/symbol (for n input
symbols and g-bit QI precision). Due to its tighter
enumeration, QI output redundancy is below that of
arithmetic coding (which can be derived as a lower
accuracy approximation of QI). The relative compression
gain vanishes in large n and in high entropy limits and
increases for shorter outputs and for less predictable data.

QI is significantly faster than the fastest arithmetic
coders, from factor 6 in high entropy limit to over 100
in low entropy limit (`typically' 10-20 times faster).
These speedups are result of using only 3 adds, 1 shift
and 2 array lookups (all in 32 bit precision) per less
probable symbol and no coding operations for the most
probable symbol.

Further, the exact enumeration algorithm is sharpened
and its lattice walks formulation is generalized.
A new numeric type with a broader applicability,
sliding window integer, is introduced.
---------------------------------------------------------

Below are few test results of QI vs. AC (AC: Moffat et al. 1998,
ver 3 from 2000, on its `best' settings, RAM to RAM coding,
all buffers prealloced, no slow streams; only inner coding
loops timed via sub-microsecond timers) for input sizes N
(K=1024 bits) and for given number of 1's randomly distributed
(except for the input "Vary" which represents a  non-stationary
source). Both coders tested were binary order 0 coders (i.e.
they consider input data as a sequence of uncorrelated bits).
Each result is an average obtained from 500 random inputs.

Column Speed: coding times ratio TimeAC/TimeQI
  (e.g. QI is up to 247.5 times faster than AC, cf. top right)

Column N: output size percent using (A/Q-1)*100
  (e.g. AC output is up to 110% larger than QI's, cf. bottom left)

Input array Vary = array of int32 {...,-2,-1,0,+1,+2,...}.

---------------------------------------------------------------
 #1's   N:4K   Speed  N:8K  Speed   N:32K Speed  N:128K  Speed
---------------------------------------------------------------
   8    6.846  68.3   6.421 112.8   5.447 199.6   5.966  247.5x
  16    4.175  59.7   3.830  78.5   3.389 138.1   3.730  168.0
  32    2.297  49.7   2.090  58.9   2.096  95.9   2.220  117.2
N/64    1.370  40.3   0.606  41.0   0.186  41.7   0.073   42.5
N/32    0.897  30.8   0.343  33.7   0.123  34.2   0.049   34.5
N/16    0.505  21.8   0.197  25.3   0.084  24.6   0.040   24.8
N/8     0.359  14.4   0.155  16.7   0.069  16.8   0.045   16.8
N/4     0.288   9.2   0.138  10.8   0.083  10.6   0.068   10.5
N/2     0.509   6.6   0.445   6.6   0.367   6.4   0.332    6.4
Vary  110.899  21.9  96.736  19.6  71.308  16.5  52.580   14.1
---------------------------------------------------------------


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David A. Scott  
View profile  
 More options Nov 20 2005, 10:39 am
Newsgroups: comp.compression, sci.math
From: "David A. Scott" <daVvid_a_sc...@email.com>
Date: Sun, 20 Nov 2005 15:39:30 +0000 (UTC)
Local: Sun, Nov 20 2005 10:39 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
"nightlight" <nightli...@omegapoint.com> wrote in
news:1132497246.485901.250410@f14g2000cwb.googlegroups.com:

> The new algorithm, even the exploratory unoptimized
> prototype runs much faster (see the table below, copied
> from the of paper, with speedup factors over 200) and
> always compresses better than the best of arithmetic
> coders (such as Moffat et al 1998). Besides its usefulness
> in entropy coding (for all types of file, image,...
> compressors where Huffman, arithmetic or similar coding
> is used in the final phase) and for encoding of complex objects
> (e.g. permutations, trees, graphs, self-delimiting sequences,..
> etc) the new algorithm extends significantly the domain of
> enumerative combinatorics accessible to computer explorations.

 Sounds like hype. Especially the statement
"always compresses better than the best of arithmetic coders"
This is obviouly a false statement. It is impossible
for any compressor to always compress better than every other
compressor. First of all the best compressors would have to
be bijective and then the best they can do is change the mappings
from one set to another. Take any compressor that could make one
stream or file smaller. Then the identity compressor could beat it
on files that the previous compressor lengthened since its a loose
loose situation if you make something else smaller or better compressed
then it has to make something else longer.

 Second if you actaully took the time to test the online versions
of code that use Moffat style code you would soon relize it not the
best.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 11:48 am
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 08:48:15 -0800
Local: Sun, Nov 20 2005 11:48 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> Sounds like hype. Especially the statement "always compresses better
> than the best of arithmetic coders" This is obviouly a false statement.
> It is impossible for any compressor to always compress better than
> every other compressor.

The two are not arbitrary unrelated algorithms -- arithmetic coding
(AC) and QI are both approximations of the exact enumerative coding
(the exact enumeration of messages). The difference is that AC first
approximates the enumeration problem itself (via stirling approximation
and ignoring some factors, cf. the paper p.2), then it applies
additional precision reduction (truncation to r-bit arithmetic). The QI
doesn't do the 1st AC approximation at all, i.e. it works from exact
enumeration and it performs only the second part of the AC
approximation and even here it selects precision truncation which has 4
times smaller redundancy than that of AC's truncation. So, the AC is a
lower accuracy approximation of QI, hence the AC always produces larger
or at best equal output as QI.

What you're referring to is trivial observation that for any given
algorithm A, one can create another algorithm B which will compress a
given input X better than A. It is trivial since you can select a
message X for which A produces an output of more than one bit, then you
simply create an algorithm B which outputs 0 for message X and outputs
1 followed by <output of A> otherwise.

The relation between AC and QI is not of this kind, though.  The AC
message index is the same number that QI produces, but multiplied with
a number which is strictly greater than 1 (cf. footnote on p.2 on
double index majorization by AC). This may or may not  (depending on
how fractions round to next whole bit) result in a greater number of
whole bits output by AC, but mathematically it cannot ever result in a
smaller number of whole bits in the AC output.

When Rissanen was looking to solve the precision problem of the exact
enumeration (in 1976), he ran into problem which  seemed intractable
(due to a missing piece of formalism in the conventional EC, which the
paper above constructs as a natural setting for QI). Hence he
approximated the enumeration first (by majorizing the index), which
allowed him to construct a decodable index -- this was the arithmetic
coding. Rissanen (who is the original creator of arithmetic coding and
of MDL modeling principle) recognized instantly the solution in this
paper, his old nemesis conquered at last, commenting: "I understand
what you have done. Very clever!"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 12:07 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 09:07:49 -0800
Local: Sun, Nov 20 2005 12:07 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> Second if you actaully took the time to test the online versions of code that
> use Moffat style code you would soon relize it not the best.

One can take Moffat et al code (version 3, 2000) and select some other
'accuracy vs speed' tradeoffs. If you know of a version of that coder
which is faster and produces smaller output (the figures I listed refer
to binary 0-order coders test), you are welcome present the comparison
figures between the two and I will gladly check it out. Keep in mind,
the speed factors in the paper show QI is 200+ times faster than AC on
some inputs, and typically 10-20 times faster (this was a generic QI
coder, i.e. the same loop and the same steps coding the dense and the
sparse arrays, and not something tuned to run fast for sparse arrays,
such as run length coder, which otherwise blows on size and speed for
denser inputs). There are dime a dozen quasi-arithmetic coders, which
sacrifice some compression quality (say another 5-10% larger output) to
gain some speed (a factor 2-5).

The new algorithm does much less work (and a simpler work: just an add
of the table addend, a machine word) per coding step and always does
fewer coding steps, thus it has a fundamental reason for a much better
performance than AC.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Rudiak-Gould  
View profile  
 More options Nov 20 2005, 12:14 pm
Newsgroups: comp.compression, sci.math
From: Ben Rudiak-Gould <br276delet...@cam.ac.uk>
Date: Sun, 20 Nov 2005 17:14:50 +0000
Local: Sun, Nov 20 2005 12:14 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

nightlight wrote:
>    http://arxiv.org/abs/cs.IT/0511057

I looked quickly through the paper, and here's my impression of what it's about.

The idea of "enumerative coding" is that you divide the set of possible
unencoded messages into disjoint subsets with the property that all of the
messages in a particular subset are equally likely to occur (according to
some model). Then the coded message is a code identifying the subset which
contains the message, followed by a code identifying an element of that subset.

For example, suppose that your messages are the bit strings [01]*, and you
divide them into subsets according to how many zeroes and ones they contain:

    {}, {0}, {1}, {00}, {01,10}, {11}, {000}, {001,010,100}, ...

If our message has K ones and L zeroes, we can encode the index of the
appropriate subset by encoding K and L at a cost of log K + log L bits.
Encoding the index of the appropriate element of that subset takes

     log C(K,K+L) = log ((K+L)! / (K! L!))

bits.

Compare this to arithmetic coding with a static order-0 model. Here we first
encode the length of the message (K+L) and p(one) = K/(K+L), which again
takes log K + log L bits. Then for each one bit in the message we encode log
((K+L)/K) bits, and for each zero bit we encode log ((K+L)/L) bits, for a
total of

     K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))

bits, where I write N? to mean N^N.

Arithmetic coding is asymptotically as efficient as enumerative coding,
since O(N?) = O(N!), but enumerative coding will always use fewer bits,
since N! < N? for interesting values of N.

I think that this paper describes an efficient algorithm for enumerative
coding for the order-0 model I used above, but generalized to arbitrarily
many symbols. If so, it won't replace coders based on sophisticated models
with arithmetic-coding back ends, but it may find a niche in applications
that use static order-0 models, such as video compression.

-- Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Willem  
View profile  
 More options Nov 20 2005, 12:33 pm
Newsgroups: comp.compression, sci.math
From: Willem <wil...@stack.nl>
Date: Sun, 20 Nov 2005 17:33:41 +0000 (UTC)
Local: Sun, Nov 20 2005 12:33 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
nightlight wrote:

)> Sounds like hype. Especially the statement "always compresses better
)> than the best of arithmetic coders" This is obviouly a false statement.
)> It is impossible for any compressor to always compress better than
)> every other compressor.
)
) The two are not arbitrary unrelated algorithms -- arithmetic coding
) (AC) and QI are both approximations of the exact enumerative coding
) (the exact enumeration of messages).

AC is not only enumerative coding.  It is also used with adaptive models
and higher-order statistics.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 1:04 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 10:04:48 -0800
Local: Sun, Nov 20 2005 1:04 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> If so, it won't replace coders based on sophisticated models
> with arithmetic-coding back ends,...

The AC modeling paradigm ('probability of next single symbol' modeling
interface bottleneck) was created following path of 'virtue out of
necessity'. The necessity being the particular entanglement of the
inner most coding steps with the model parameters p(a). The EC/QI can
code the same way, too, except they can also code a much better way
(Kolmogorov modeling paradigm). There is a bit on the modeling on the
page 9 in the paper and a much longer discussion in the tech report
cited:

   TR05-0625  "Quantized indexing: Background information"
   http://www.1stworks.com/ref/TR/tr05-0625a.pdf

As you may have noted in the row "Vary" of the test, which is a
non-probabilistic sequence for the order-0 coders, one can easily
provide inputs which will "surprise"  catastrophically the arithmetic
0-order coder, but you can't provide inputs on which AC works well and
EC fails as dramatically (as does AC on the data "Vary").  The same
phenomenon exists for the higher order coders, as soon as the input
falkls outside of the "predictable" at a given order. The adaptive AC
fails badly, while EC merely comes out slighly sub-optimally (since the
effect of a "suprise" is always much greater on the predictor scheme
than on the 'descriptor' scheme; with EC only the frequency table
encoding may turn out suboptimal, while the bulk of output, the index
is still optimal).

The essential difference between the adaptive AC and EC/QI is that AC
subjects the modeling engine to an aritificial constraint (the
memoryless coder allowed to have only a single symbol at a time, a
limitation largely not relevant in practical coding since Morse
telegraph days). The EC can do that, too, but it normally doesn't.
Trying to predict furture, as AC modeling paradigm insists, is
inherently harder than describing the present or past.  Even the
practical forms of AC modeling recognize the problem and allow a
backdoor cheating against the ideal pure prediction (via the fragile
and ad hoc escape mechanisms, where the coder is "allowed" to look
ahead and clue the decoder as to what is coming).

As another aspect of the distinction, note that the EC mdeling engine
normally needs to know only whether, say messages X and Y have the same
probabilities (whatever their values might be), while the AC modeling
engine needs to know what the actual values of the probabilities are,
which is a much harder task.

The upshot is that the native EC/QI modeling (the Kolmogorov/MDL
scheme) is much simpler and more robust than the predictive AC paradigm
(EC/QI can reuse the existent AC modeling engine, though, e.g. by
splitting the output into different probability classes, cf p. 9, [N7]).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Willem  
View profile  
 More options Nov 20 2005, 1:17 pm
Newsgroups: comp.compression, sci.math
From: Willem <wil...@stack.nl>
Date: Sun, 20 Nov 2005 18:17:57 +0000 (UTC)
Local: Sun, Nov 20 2005 1:17 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
nightlight wrote:

)> If so, it won't replace coders based on sophisticated models
)> with arithmetic-coding back ends,...
)
) The AC modeling paradigm ('probability of next single symbol' modeling
) interface bottleneck) was created following path of 'virtue out of
) necessity'. The necessity being the particular entanglement of the
)   ...
)
) As you may have noted in the row "Vary" of the test, which is a
) non-probabilistic sequence for the order-0 coders, one can easily
) provide inputs which will "surprise"  catastrophically the arithmetic
) 0-order coder,

You don't seem to grasp the existence of higher-order models.
Would something like PPM be easy to implement using this QI coder ?

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 1:35 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 10:35:30 -0800
Local: Sun, Nov 20 2005 1:35 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> AC is not only enumerative coding.  It is also used with adaptive models
> and higher-order statistics.

The coding aspects in both cases are message enumerations -- with EC in
the exact lists of messages (performed using the virtual dictionary of
combinations), with AC in the cummulative probablity space (which only
asymptotically for large n approaches the exact EC coding, while being
uniformly suboptimal for any finite n).  It is a historic fluke that EC
was presented in conjunction with the  'descriptive'/universal way of
modeling (MDL/Kolmogorov), while AC with the predictive way (such as
PPM).

Both coding methods can work with modeling engines either way, though.
The coding algorithm of AC does not separate cleanly the model
parameters from the coder arithmetic (by virtue of hardwiring
'probabilities of next single symbol' into its coding addends). Thus,
its native mode is the 'predictive' way, as a virtue out of necessity.

EC has a much cleaner separation between the modeling parameters and
the coding computations -- the modeling engine of EC defines the
enumerative classes (the subsets of "equiprobable messages"), while
coder computations deal only with the universal combinatorial
properties of sequences (belonging to a given class). The result is a
much faster operation, since the 'universal properties of sequences'
are independent of the particular source parameters, thus they can be
precomputed and optimized away, outside of the coding loop, as it were.
If AC were to code via precomputed values, it would need a separate
table for each set of source probabilities, which is impractical even
for a static binary source.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Willem  
View profile  
 More options Nov 20 2005, 1:53 pm
Newsgroups: comp.compression, sci.math
From: Willem <wil...@stack.nl>
Date: Sun, 20 Nov 2005 18:53:20 +0000 (UTC)
Local: Sun, Nov 20 2005 1:53 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
nightlight wrote:

)> AC is not only enumerative coding.  It is also used with adaptive models
)> and higher-order statistics.
)
) The coding aspects in both cases are message enumerations -- with EC in
) the exact lists of messages (performed using the virtual dictionary of
) combinations), with AC in the cummulative probablity space (which only
) asymptotically for large n approaches the exact EC coding, while being
) uniformly suboptimal for any finite n).

Not true.  Adaptive coding is trivially capable of outperforming a model
that uses only the cumulative probability, in cases where there are local
variations in the probability distribution.

Also, adaptive coding works very well with higher-order models, where it
is intractable to transmit the model because of its exponential size.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Rudiak-Gould  
View profile  
 More options Nov 20 2005, 2:19 pm
Newsgroups: comp.compression, sci.math
From: Ben Rudiak-Gould <br276delet...@cam.ac.uk>
Date: Sun, 20 Nov 2005 19:19:21 +0000
Local: Sun, Nov 20 2005 2:19 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

nightlight wrote:
>The AC modeling paradigm ('probability of next single symbol' modeling
>interface bottleneck) was created following path of 'virtue out of
>necessity'. [etc, etc, etc]

Look, I understand all of this. I understand the weaknesses of PPM and of
arithmetic coding, and I understand that an enumerative coder can do better
in principle. None of that matters. What matters is this: what is your
contribution? Can you, in practice, do a better job than state-of-the-art
compressors based on PPM+ari? Going on about theoretical benefits is a waste
of time. Theoretically, the best way to write a chess-playing program is to
use minimax on the complete game tree. Actual chess-playing algorithms are
gross hacks which attempt, in a highly inelegant way, to approximate that
ideal. Pointing out that that's true accomplishes nothing. Everyone versed
in the art knows it already.

>As you may have noted in the row "Vary" of the test, which is a
>non-probabilistic sequence for the order-0 coders, one can easily
>provide inputs which will "surprise"  catastrophically the arithmetic
>0-order coder, but you can't provide inputs on which AC works well and
>EC fails as dramatically (as does AC on the data "Vary").

Two questions. First, do you understand the difference between adaptive
order-0 and static order-0 models? Second, how would the "QI" and "AC" in
your tests perform if the input consisted of N/2 zeroes followed by N/2 ones?

-- Ben


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 2:29 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 11:29:41 -0800
Local: Sun, Nov 20 2005 2:29 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

>>) As you may have noted in the row "Vary" of the test, which is a
>> non-probabilistic sequence for the order-0 coders, one can easily
>> provide inputs which will "surprise"  catastrophically the arithmetic
>> 0-order coder,
> You don't seem to grasp the existence of higher-order models.

That was a test of the same order 0 coders. At any order, you can
suprise the "predictive" scheme catastrophically (unless it cheats via
some ad hoc esc mechanism, allowing it to use temporarily universal way
of coding), while you can't do so to a descriptive/universal scheme.

> Would something like PPM be easy to implement using this QI coder ?

There are couple ways, at least, to do that. QI can code (via lattice
jumps) to the 'probability of the next single symbol' exactly as AC,
but that costs it much of speed advantage (since mantissa scaling used
by the jumps, requires multiplications & divisions). But it can also
code from the AC modeling engine instructions much faster by splitting
the outputs into the probability classes (which needs also an error
estimates on the probabilities given to the coder; the scheme grows the
classes in the manner of Context Tree Weighing modeling scheme).

OTOH, it could do much better, with any given a priori knowledge about
the input that is fed to PPM, by coding the native EC way, which is by
splitting the input into proper enumerative classes. Such segmentation
can be done based on the last s symbols (e.g. for order-s static Markov
source; a method similar to Context Tree Weighing to select the output
stream), or dynamically via BWT-like segmentation which doesn't assume
a priory idealized finite order Markov source (you can check the
discussion of this in the Ref [23] cited earlier). Even with the kludgy
ad hoc entropy coding schemes (such as MTF variants which crudely
mangle the enumerative classes present in the final BWT matrix), BWT is
already surprisingly competitive with PPM in compression quality, while
executing much quicker.  With the proper handling of these presently
mangled BWT enumerative classes, which QI is naturally suited to do
quickly and optimally, BWT should consistently outperform PPM not just
in speed but in compression quality.

In short, AC is a particular historical approximation of QI, hence
anything AC can do, QI can do but generally much faster (or at least as
fast) and generally more accurately (or at least as accurately).
Similarly, the PPM or generally the predictive way of modeling via the
'probability of the next single symbol' is another historical accident
(tailored to a particular hardwired coding algorithm of AC), by no
means the only (and certainly not the best) way to model the properties
of finite sequences of symbols. AC algorithm happens to require that
all properties must be translated into exactly such form (as 'the
probabilities of the single next symbol') so it can encode it. That is
more of a handicap than a strength for the modeling engine to bend and
distort everything else around the AC computational peculiarities, to
have to translate and funnel all the conceivable properties of  symbol
sequences available to modeling engine through that particular coder
interface bottleneck.

So, instead of questing anyones grasp of 'higher order models'  you
might wish to reaxamine first, how do some manage to convince
themselves to perceive, with all their sincerity, a plain, gross
handicap as a great strength.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 3:39 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 12:39:00 -0800
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> Can you, in practice, do a better job than state-of-the-art compressors
> based on PPM+ari?

QI is a very new algorithm (and a first truly practical form of genuine
EC). A better BWT variant, without mangling of its contexts (as the MTF
phase presently does), is certainly high on the list of things to apply
QI to.  It is still only one among many interesting possibilities
opened by the existence of practical EC. The whole field of practical
EC modeling is presently in a very primitive form and most people have
many misconceptions what it is (or generally on modeling, identifying
it entirely with the AC modeling paradigm). Hopefully a publication of
a clean, well performing reference implementation of QI (on which I am
working, and which will perform even better than the tests results of
the crude research prototype shown in the preprint) in the forthcoming
months, should bring more people to try out new ways of modeling and
answer those questions.

> First, do you understand the difference between adaptive order-0 and
> static order-0 models? Second, how would the "QI" and "AC" in
> your tests perform if the input consisted of N/2 zeroes followed by N/2
> ones?

The sequence "Vary" was a sequence similar to that, since the negative
half of  that sequence is mostly 1's and the positive part mostly 0's.
The rest of the answer obviously depends on how the QI segments the
sequence, what are the rules of the game.  If you insist on requiring
that QI/EC models ...000111... sequence in a single block of size N,
then obviously it will produce output which is N + log*(N) + 1/2 log(N)
bits (since it needs to encode the length of a sequence in log*(N) =
log(log(...N)  bits and the count of 1's in log(N) bits, while the
index is C(N,N/2) which is Catalan number with approx N - 1/2 log(N)
bits). Of course, if you wish to compare apples to apples, then if the
predictive AC coder is allowed some adaptation, one would need to allow
QI to also have some kind of corresponding capabilities (i.e. not to
function as a static order-0 coder), such as allowing the encoder to
select the input segmentation, in which case it would produce output
smaller than adaptive AC (as array "Vary"  illustrates). Comparing
static vs adaptive order-0 coders is comparing apples and oranges.

There is no instrinsic algorithmic constraint for EC/QI to code static
order-0 way (i.e. a la Lynch-Davisson 1966  EC coder).  If you take AC
and EC/QI and make both code static order-0 way with the same
segmentation, the EC/QI will give you slighly smaller output and code
much faster. Then if you make AC adaptive, with adaptation rate faster
than N/2, you need to decide what is the corresponding EC/QI mode of
coding? Whatever it is, it certainly cannot be the same scheme that was
just tested as a comparison of the static order-0 coders, which is what
you seem to be hinting it ought to be. The most natural EC/QI
counterpart to AC adapting faster than N/2 is to allow EC/QI coder the
segmentation choice to at least 2 parts, in which case it will come out
on top again (as the array "Vary" showed; even though the QI coder
tested was somewhat more primitive than that: it had simply used fixed
1Kbit blocks).

In any case, this particular tangent is more about verbal pigeonholing,
rather than about the substance of the difference. The basic difference
between AC and QI coding algorithms is that AC enumerates messages via
cummulative probabilities and EC/QI via exact ranking of combinations
of symbol sequences. The latter way is more accurate and it is much
faster (since much of the coder work is with the universal sequence
properties, thus it is precomputed in the quantized binomial tables,
outside of the coding loop).

The modeling scheme is an entirely unrelated issue. You can select a
modeling scheme where both coders servicing that modeler perform
equally poorly (e.g. by forcing QI to do lattice jumps a la AC, in
which case it will have to work exactly as hard per symbol as AC; with
adaptive output splitting QI will compress well as adaptive AC, but
code much faster).  Generally, though, given the same modeling scheme,
QI will always perform at least as well as AC, in speed and in accuracy
(but mostly better, especially in speed).  Anything else is comparing
apples and oranges (note that the test in the row "Vary" is of  that
apple-vs-orange kind, except that QI was the one slightly handicapped
since it wasn't really as adaptive as the EC counterpart of adaptive AC
ought to be i.e. free select at least one input partition for AC
allowed to adapt faster than N/2; QI/EC produces 0 length index for
sequence of all 1's or all 0's).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 4:02 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 13:02:39 -0800
Local: Sun, Nov 20 2005 4:02 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> Adaptive coding is trivially capable of outperforming a
> model that uses only the cumulative probability, in cases
> where there are local variations in the probability distribution.

You are comparing different modeling schemes (adaptive vs static),
while I was comparing the bare coding algorithms (both can be attached
to the same model). At any given point AC gives a position (an
interval) on the cummulative probability line for the sequence seen so
far, while EC/QI gives the exact ordinal of the same sequence in the
same list of sequences (the list is what model produces, we assume the
same model and look at the coder difference). The only coder level
difference is that the cummulative probabilty index is an approximation
(a uniform majorization via the Stirling approx. of binomials and
dropping of the two factors < 1, cf p. 2 and ref [23] for more details)
of the exact ordinal index and that the QI algorithm can compute it
generally much faster than AC algorithm can (and always at least as
fast). The reason for the speed difference is discussed in ref [23] --
it is basically that the QI algorithm has much cleaner separation
between the coding computations and the specific instance properties of
the source, than the AC alogotithm, hence QI can shift much of the
coding computation outside of the instance specific coding loop.

> Also, adaptive coding works very well with higher-order models, where
> it is intractable to transmit the model because of its exponential size.

This is an interesting point, but outside of the comparison of QI and
AC coding algorithms.

Taken on its own, though, your statement is a bit self-contradictory,
since the model information obviously can be transmitted in
sub-exponential size (e.g. via Adaptive AC scheme). What you really
mean is that the first simple-minded encoding of the model parameters
that you can came up with, off the top of your head, is exponential.
Well, ok, so what? For example, the BWT, which is a bit less obvious
way to extract and transmit a higher order models, is not exponential
in size (and it is also not an adaptive PPM scheme). In fact, even when
coupled with the crude MTF obliteration of the context boundaries, it
performs nearly as well as PPM in compression quality (while running
much faster).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 5:41 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 14:41:51 -0800
Local: Sun, Nov 20 2005 5:41 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
The most relevant papers for this discussion are available online at:

      Quantized Indexing References
      http://www.1stworks.com/ref/tr/tr05-0625r.htm

Specifically of interest here are the papers:

 5. J. Rissanen  Generalised Kraft inequality and arithmetic coding
     IBM J. Res. Dev. 20, 198-203, 19 76
     http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

27. J. Rissanen   Arithmetic codings as number representations
      Acta Polyt. Scand., Math. & Comp. Sc.  Vol. 31 pp 44-51, 1979
      http://www.1stworks.com/ref/ariNumRepr.pdf

28. L.D. Davisson  Universal noiseless coding
     IEEE Trans. Inform. Theory  IT-19 (6), 783-795, 1973

http://cg.ensmp.fr/%7Evert/proj/bibli/local/Davisson1973Universal.pdf

29. J. Rissanen, G.G. Langdon  Universal Modeling and Coding
     IEEE Trans. Inform. Theory  IT-19 (1), 12-23, 1981

http://cg.ensmp.fr/%7Evert/proj/bibli/local/Rissanen1981Universal.pdf

34. J.G. Cleary, I.H. Witten  A Comparison of Enumerative and Adaptive
     Codes,   IEEE Trans. Inform. Theory  IT-30 (2), 306-315, 1984
      http://www.1stworks.com/ref/Cleary84Enum.pdf


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Carmody  
View profile  
 More options Nov 20 2005, 5:58 pm
Newsgroups: comp.compression, sci.math
From: Phil Carmody <thefatphil_demun...@yahoo.co.uk>
Date: 21 Nov 2005 00:58:02 +0200
Local: Sun, Nov 20 2005 5:58 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

"nightlight" <nightli...@omegapoint.com> writes:
> the speed factors in the paper show QI is 200+ times faster than AC on

The question this raises for me is why AC implementations are 200x
slower than reading from RAM? That sounds a little slow. Computer
arithmetic is now incredibly fast, and RAM is slow in comparison.

/If/ AC is only 50x slower than reading from RAM, say, (a figure that
wouldn't surprise me), then there's no way on earth that QI could be
200 times faster in _any_ circumstances.

Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Carmody  
View profile  
 More options Nov 20 2005, 6:04 pm
Newsgroups: comp.compression, sci.math
From: Phil Carmody <thefatphil_demun...@yahoo.co.uk>
Date: 21 Nov 2005 01:04:06 +0200
Local: Sun, Nov 20 2005 6:04 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

"nightlight" <nightli...@omegapoint.com> writes:
> > First, do you understand the difference between adaptive order-0 and
> > static order-0 models? Second, how would the "QI" and "AC" in
> > your tests perform if the input consisted of N/2 zeroes followed by N/2
> > ones?

> The sequence "Vary" was a sequence similar to that, since the negative
> half of  that sequence is mostly 1's and the positive part mostly 0's.

What do you mean by "negative half" and "positive part"?
Applying the concept of sign to sequences of bits is, erm, unconventional.

Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Phil Carmody  
View profile  
 More options Nov 20 2005, 6:09 pm
Newsgroups: comp.compression, sci.math
Followup-To: comp.compression
From: Phil Carmody <thefatphil_demun...@yahoo.co.uk>
Date: 21 Nov 2005 01:09:48 +0200
Local: Sun, Nov 20 2005 6:09 pm
Subject: [very OT] Re: New combinatorial coder: tighter & faster than Arithmetic coding

Ben Rudiak-Gould <br276delet...@cam.ac.uk> writes:
> X-Accept-Language: en-us, en

I'm confused - what on earth is the point of such a header in
news-posting context? And why is a cantabrigian prioritising
en-us above en?!?!?

Send my regards to the L&LL,
Phil
--
If a religion is defined to be a system of ideas that contains unprovable
statements, then Godel taught us that mathematics is not only a religion, it
is the only religion that can prove itself to be one. -- John Barrow


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 6:35 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 15:35:47 -0800
Local: Sun, Nov 20 2005 6:35 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> The question this raises for me is why AC implementations are 200x
> slower than reading from RAM?

The AC has to process and encode each bit, 32 per machine word, for
which QI needs only to establish if there is any of the "less frequent
symbols" (e.g. 1). Even though that AC implementation uses large loop
unrolling and other speed optimizations,  it only needs 8 times more
work per one bit than QI needs to fetch the dword and answer "is it
only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to
get the speed factor of 256. If you look their code (binary coder, no
contexts):

  http://www.cs.mu.oz.au/%7Ealistair/arith_coder/

you will see that they certainly do that much more work per encoding of
each bit, in fact even more than the ratios show, if you count the
numbers of instructions. But, as you note, the fast processors with
lots of cache mask out some of the speed difference, penalizing more
the new memory reads, which both coders have to do. On an older 400Mhz
Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI
than on the recent VAIO lapop where those figures in the preprint came
from.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 20 2005, 8:19 pm
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 20 Nov 2005 17:19:24 -0800
Local: Sun, Nov 20 2005 8:19 pm
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

>> The sequence "Vary" was a sequence similar to that, since the negative
>> half of  that sequence is mostly 1's and the positive part mostly 0's.

> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm, unconventional.

The negative 32 bit integer, say -1 has 32 1's. The idea was to give
both coders (which were order 0 binary coders, thus they view data as
sequence of uncorrelated bits) some simple sequence which has
regularity invisible to order-0 (but visible to some higher order
coder). It was meant to illustrate the robustness of the universal way
of coding of the EC/QI coder and the fragility of the
predictive/adaptive way of the AC coder.

The origin of the greater robustness in the "descriptive" way of
modeling is in the fact that effect of the "surprise"  affects chiefly
the count component of the output, which for the 1K input block is on
average a 10 bit number (e.g. specifying # of 1's in the block).  The
rest is a pure combinatorial index, giving the shortest possible
decription, absent any further knowledge by the coder, on how that
number of bits is placed in that block. With the adaptive AC, each
coding step and the entire output is affected by the incorrectly
conjectured probabilities.  Thus it produces as much as twice the
output size of the QI.

Obviously, the AC can be made to code the 'universal'/descriptive way,
in which case the error would be much smaller (similar to the rest of
tables, for the matching density of 1's and input size i.e. few percent
extra output), while the speed ratio would still remain dramatically in
QI favor.

The example was meant to help demythologize a bit the
'adaptive/predictive coding mystique' that lots of people are
apparently mesmerized by (as the previous discussion in this thread
illustrates). The 'predictive coding', at any modeling order, is a
handicap (a historical fluke of bending the entire modeling engine to
the peculiarities of the particular interpretation of the AC
computational algorithm, the perceived need to have probability of next
symbol to perform the encoding), not a strength as often touted, and a
'descriptive'/universal type algorithm at the same order of modeling
will always come out ahead.

When AC is made to code the universal/descriptive way, the
"probabilities" used for the computation are simply interpreted as
normalized plain counts (instead of the presumed infinite n limits of
such ratios, the 'probabilities", which is the adaptive AC
interpretation of the AC algorithm and which contains a much bigger
implicit assumption about the infinite rest of the sequence, that in
the case of the array "Vary" takes it to a catastrophically wrong
track). But in this universal mode, AC uses the plain normalized counts
in the exactly same way as the counts of EC/QI, slightly less
accurately (by 1/2 log(n) excess bits in the AC output, for the
sequence of length n) and encoded much more slowly. The Rissanen's 1976
coder (which was the first finite precision arithmetic coder) was
coding this way.  See the reference:

 5. J. Rissanen  Generalised Kraft inequality and arithmetic coding
     IBM J. Res. Dev. 20, 198-203, 19 76
     http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

where the quantities Phi (in eq. 5) are merely the approximate binomial
coefficients of regular EC (see footnote on page 2 in the preprint and
ref [23] for details). Although Rissanen suggests in that paper a table
based faster coding alternative (which allows skipping of the most
frequent symbol as in QI), for the AC one needs different tables of
O(n^2) size  for each source probability, while QI needs only a single
table of O(n^2), the quantized binomials C(n,k), which is valid for all
quasi-stationary sources and all probabilities.

That's why the AC algorithm was a doubly partial solution of the
precision problem of the exact enumeration -- it not only produced an
approximate (and a larger) enumerative index, but it also left unsolved
the O(n^3) table size problem of the exact enumeration (which it then
worked around by trading the table size for quite a few extra CPU
cycles, as illustrated in the tests chart). The QI solves both problems
- it takes the factor O(n) out of  the exact EC computation speed, as
did AC, and it takes the factor O(n) out of the table sizes, which AC
missed (thus QI doesn't need to trade off the coding speed to reduce
the table sizes as AC does; although QI can do that kind of tradeoffs
too, cf. page 9, [N2] in the preprint, but from a much better starting
point than AC). Rissanen was pleasantly surprised to see that "great
difficulty" (see the preprint pages 1-2, for citations) finaly solved
optimally and commented on the solution "Very clever!"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Malcolm Taylor  
View profile  
 More options Nov 20 2005, 9:55 pm
Newsgroups: comp.compression, sci.math
From: Malcolm Taylor <m...@me.com>
Date: Mon, 21 Nov 2005 15:55:00 +1300
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
How about comparing QI with a faster and more efficient binary
arithmetic coder such as the fpaq one (see
http://www.cs.fit.edu/~mmahoney/compression/ for more info).

Malcolm


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David A. Scott  
View profile  
 More options Nov 21 2005, 12:58 am
Newsgroups: comp.compression, sci.math
From: "David A. Scott" <daVvid_a_sc...@email.com>
Date: Mon, 21 Nov 2005 05:58:28 +0000 (UTC)
Local: Mon, Nov 21 2005 12:58 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
"nightlight" <nightli...@omegapoint.com> wrote in
news:1132514981.050575.313440@g44g2000cwa.googlegroups.com:

> That was a test of the same order 0 coders. At any order, you can
> suprise the "predictive" scheme catastrophically (unless it cheats via
> some ad hoc esc mechanism, allowing it to use temporarily universal way
> of coding), while you can't do so to a descriptive/universal scheme.

 Look is your so called coded able to actaully compress some files.
Do you have test files to test the order 0 codings. So one can campare
it to real world order 0 encoders. With out real examples one can test
I don't belive you.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
David A. Scott  
View profile  
 More options Nov 21 2005, 1:05 am
Newsgroups: comp.compression, sci.math
From: "David A. Scott" <daVvid_a_sc...@email.com>
Date: Mon, 21 Nov 2005 06:05:44 +0000 (UTC)
Local: Mon, Nov 21 2005 1:05 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
Phil Carmody <thefatphil_demun...@yahoo.co.uk> wrote in
news:87wtj2ncpl.fsf@megaspaz.fatphil.org:

> "nightlight" <nightli...@omegapoint.com> writes:
>> > First, do you understand the difference between adaptive order-0
>> > and static order-0 models? Second, how would the "QI" and "AC" in
>> > your tests perform if the input consisted of N/2 zeroes followed by
>> > N/2 ones?

>> The sequence "Vary" was a sequence similar to that, since the
>> negative half of  that sequence is mostly 1's and the positive part
>> mostly 0's.

> What do you mean by "negative half" and "positive part"?
> Applying the concept of sign to sequences of bits is, erm,
> unconventional.

> Phil

  I guess that means if your given say a file with say one
thousand bytes of zero followed by say one thousand bytes of all
ones 0xFF he has no idea how his compressor or some arbitrary
arithmetic would compress it. It really should not be that hard
to test. But maybe he can't test it with his method yet.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Willem  
View profile  
 More options Nov 21 2005, 4:45 am
Newsgroups: comp.compression, sci.math
From: Willem <wil...@stack.nl>
Date: Mon, 21 Nov 2005 09:45:20 +0000 (UTC)
Local: Mon, Nov 21 2005 4:45 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding
nightlight wrote:

) The negative 32 bit integer, say -1 has 32 1's. The idea was to give
) both coders (which were order 0 binary coders, thus they view data as
) sequence of uncorrelated bits) some simple sequence which has
) regularity invisible to order-0 (but visible to some higher order
) coder). It was meant to illustrate the robustness of the universal way
) of coding of the EC/QI coder and the fragility of the
) predictive/adaptive way of the AC coder.

If an adaptive order-0 AC coder performs badly on a sequence that has
varying statistics, then you wrote a shitty adaptive model.
A good adaptive model views symbols that are nearer with more weight.

It's no wonder that badly written code is outperformed.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
nightlight  
View profile  
 More options Nov 21 2005, 4:49 am
Newsgroups: comp.compression, sci.math
From: "nightlight" <nightli...@omegapoint.com>
Date: 21 Nov 2005 01:49:22 -0800
Local: Mon, Nov 21 2005 4:49 am
Subject: Re: New combinatorial coder: tighter & faster than Arithmetic coding

> How about comparing QI with a faster and more efficient binary
> arithmetic coder such as the fpaq one (see
> http://www.cs.fit.edu/~mmahoney/compression/ for more info).

QI is a fundamental algoritm not a program (although several research
variants of the algoritham are implemented). If your algorithm needs to
do more work than about 3 adds, 1 shift and two 1-dim array reads, per
less probably symbol, and any operation at all (other than traversing
the array to examine what the symbols are) per more probable symbol,
then it is inherently slower algorithm than QI. Any arithmetic coder
falls in this category, since a general purpose AC needs to encode 0's
and 1's explicitly. Check the simplicity of the generic EC/QI coding
loop (from p. 8 in [23])

23.      TR05-0625  "Quantized indexing: Background information"
       http://www.1stworks.com/ref/TR/tr05-0625a.pdf

I=0;                    // init cumulative path index
for(k=n=0; n<M; ++n)    // M is the number of input bits
{                       // or a block size
    if (getbit(n)==1)   // get bit at 0-based offset n
      {                 // skip on bit=0, add on bit=1
      ++k;              // increment count of 1's found
      I=I+C(n,k);       // add binomial to the path index
      }

}                // k is now the final count of 1's

The encoding step of the less frequent symbol (bit=1 by convention; for
the more frequent symbol only the outer loop executes) is simply an add
I=I+C(n,k); of a quantized binomial coefficient C(n,k), which is a 32
bit integer (taken out of an 1-dim array as (row[n])[k], with another
array giving the row n subarray) to a large integer "I" (which is of
the size of entire output). The 32 bit int C(n,k) is always added to
the leading 32 bits of "I"  (thus at most 1 carry can occur into
leading 0's of "I"). The symbolic getbit(n) gets the next unprocessed
input bit, which is just a shift of a 32 bit word holding the next set
of 32 input bits. That is really as simple and as fast as entropy
coding can ever get. The reason for such extreme simplicity is that the
real caluculation, the one of the quantized addends C(n,k), was done
outside of the coding loop.

That's the basic beauty of the EC/QI's much cleaner division of labor
than the one of AC (see [23] pages 19-26, 30-32), where each addend is
inseparably entangled with the source instance properties (the
particular symbol probabilities at that point), hence the addends can't
be precomputed generically/universally for all sources an taken outside
of the AC coding loop. That gives QI a huge and a fundamental
algorithmic edge which can't be compensated by any special programming
tricks available uniquely to AC.

Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter)  were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code.  We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level).

Any other arithmetic coder which improves on Moffat coder in these
kinds of simple common sense but otherwise trivial ways (as the
fpaq0.cpp seems to show), without providing any essential improvement
at the algorithmic level will do roughly the same as the chart showed
for the streamlined Moffat's coder.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Messages 1 - 25 of 125   Newer >
« Back to Discussions « Newer topic     Older topic »