There is a new algorithm for entropy coding, constrained coding and fast combinatorial enumeration described in the arXiv preprint cs.IT/0511057 (also submitted to DCC 2006):
It is coding based on enumerative combinatorics (walks on Pascal triangle, counting of integer lattice paths). The new algorithm significantly improves on the previously known method "enumerative coding" (which is the regular un/ranking procedure in combinatorics), increasing its speed and reducing memory size by factor O(n), where n is number of input symbols). It is the algorithm Rissanen was looking for when he found a partial solution, arithmetic coding (in 1976).
The new algorithm, even the exploratory unoptimized prototype runs much faster (see the table below, copied from the of paper, with speedup factors over 200) and always compresses better than the best of arithmetic coders (such as Moffat et al 1998). Besides its usefulness in entropy coding (for all types of file, image,... compressors where Huffman, arithmetic or similar coding is used in the final phase) and for encoding of complex objects (e.g. permutations, trees, graphs, self-delimiting sequences,.. etc) the new algorithm extends significantly the domain of enumerative combinatorics accessible to computer explorations.
Here is the abstract: --------------------------------------------------------- Title: Quantized Indexing: Beyond Arithmetic Coding Author: Ratko V. Tomic
Quantized Indexing is a fast and space-efficient form of enumerative (combinatorial) coding, the strongest among asymptotically optimal universal entropy coding algorithms. The present advance in enumerative coding is similar to that made by arithmetic coding with respect to its unlimited precision predecessor, Elias coding.
The arithmetic precision, execution time, table sizes and coding delay are all reduced by a factor O(n) at a redundancy below log(e)/2^(g-1) bits/symbol (for n input symbols and g-bit QI precision). Due to its tighter enumeration, QI output redundancy is below that of arithmetic coding (which can be derived as a lower accuracy approximation of QI). The relative compression gain vanishes in large n and in high entropy limits and increases for shorter outputs and for less predictable data.
QI is significantly faster than the fastest arithmetic coders, from factor 6 in high entropy limit to over 100 in low entropy limit (`typically' 10-20 times faster). These speedups are result of using only 3 adds, 1 shift and 2 array lookups (all in 32 bit precision) per less probable symbol and no coding operations for the most probable symbol.
Further, the exact enumeration algorithm is sharpened and its lattice walks formulation is generalized. A new numeric type with a broader applicability, sliding window integer, is introduced. ---------------------------------------------------------
Below are few test results of QI vs. AC (AC: Moffat et al. 1998, ver 3 from 2000, on its `best' settings, RAM to RAM coding, all buffers prealloced, no slow streams; only inner coding loops timed via sub-microsecond timers) for input sizes N (K=1024 bits) and for given number of 1's randomly distributed (except for the input "Vary" which represents a non-stationary source). Both coders tested were binary order 0 coders (i.e. they consider input data as a sequence of uncorrelated bits). Each result is an average obtained from 500 random inputs.
Column Speed: coding times ratio TimeAC/TimeQI (e.g. QI is up to 247.5 times faster than AC, cf. top right)
Column N: output size percent using (A/Q-1)*100 (e.g. AC output is up to 110% larger than QI's, cf. bottom left)
Input array Vary = array of int32 {...,-2,-1,0,+1,+2,...}.
> The new algorithm, even the exploratory unoptimized > prototype runs much faster (see the table below, copied > from the of paper, with speedup factors over 200) and > always compresses better than the best of arithmetic > coders (such as Moffat et al 1998). Besides its usefulness > in entropy coding (for all types of file, image,... > compressors where Huffman, arithmetic or similar coding > is used in the final phase) and for encoding of complex objects > (e.g. permutations, trees, graphs, self-delimiting sequences,.. > etc) the new algorithm extends significantly the domain of > enumerative combinatorics accessible to computer explorations.
Sounds like hype. Especially the statement "always compresses better than the best of arithmetic coders" This is obviouly a false statement. It is impossible for any compressor to always compress better than every other compressor. First of all the best compressors would have to be bijective and then the best they can do is change the mappings from one set to another. Take any compressor that could make one stream or file smaller. Then the identity compressor could beat it on files that the previous compressor lengthened since its a loose loose situation if you make something else smaller or better compressed then it has to make something else longer.
Second if you actaully took the time to test the online versions of code that use Moffat style code you would soon relize it not the best.
> Sounds like hype. Especially the statement "always compresses better > than the best of arithmetic coders" This is obviouly a false statement. > It is impossible for any compressor to always compress better than > every other compressor.
The two are not arbitrary unrelated algorithms -- arithmetic coding (AC) and QI are both approximations of the exact enumerative coding (the exact enumeration of messages). The difference is that AC first approximates the enumeration problem itself (via stirling approximation and ignoring some factors, cf. the paper p.2), then it applies additional precision reduction (truncation to r-bit arithmetic). The QI doesn't do the 1st AC approximation at all, i.e. it works from exact enumeration and it performs only the second part of the AC approximation and even here it selects precision truncation which has 4 times smaller redundancy than that of AC's truncation. So, the AC is a lower accuracy approximation of QI, hence the AC always produces larger or at best equal output as QI.
What you're referring to is trivial observation that for any given algorithm A, one can create another algorithm B which will compress a given input X better than A. It is trivial since you can select a message X for which A produces an output of more than one bit, then you simply create an algorithm B which outputs 0 for message X and outputs 1 followed by <output of A> otherwise.
The relation between AC and QI is not of this kind, though. The AC message index is the same number that QI produces, but multiplied with a number which is strictly greater than 1 (cf. footnote on p.2 on double index majorization by AC). This may or may not (depending on how fractions round to next whole bit) result in a greater number of whole bits output by AC, but mathematically it cannot ever result in a smaller number of whole bits in the AC output.
When Rissanen was looking to solve the precision problem of the exact enumeration (in 1976), he ran into problem which seemed intractable (due to a missing piece of formalism in the conventional EC, which the paper above constructs as a natural setting for QI). Hence he approximated the enumeration first (by majorizing the index), which allowed him to construct a decodable index -- this was the arithmetic coding. Rissanen (who is the original creator of arithmetic coding and of MDL modeling principle) recognized instantly the solution in this paper, his old nemesis conquered at last, commenting: "I understand what you have done. Very clever!"
> Second if you actaully took the time to test the online versions of code that > use Moffat style code you would soon relize it not the best.
One can take Moffat et al code (version 3, 2000) and select some other 'accuracy vs speed' tradeoffs. If you know of a version of that coder which is faster and produces smaller output (the figures I listed refer to binary 0-order coders test), you are welcome present the comparison figures between the two and I will gladly check it out. Keep in mind, the speed factors in the paper show QI is 200+ times faster than AC on some inputs, and typically 10-20 times faster (this was a generic QI coder, i.e. the same loop and the same steps coding the dense and the sparse arrays, and not something tuned to run fast for sparse arrays, such as run length coder, which otherwise blows on size and speed for denser inputs). There are dime a dozen quasi-arithmetic coders, which sacrifice some compression quality (say another 5-10% larger output) to gain some speed (a factor 2-5).
The new algorithm does much less work (and a simpler work: just an add of the table addend, a machine word) per coding step and always does fewer coding steps, thus it has a fundamental reason for a much better performance than AC.
I looked quickly through the paper, and here's my impression of what it's about.
The idea of "enumerative coding" is that you divide the set of possible unencoded messages into disjoint subsets with the property that all of the messages in a particular subset are equally likely to occur (according to some model). Then the coded message is a code identifying the subset which contains the message, followed by a code identifying an element of that subset.
For example, suppose that your messages are the bit strings [01]*, and you divide them into subsets according to how many zeroes and ones they contain:
If our message has K ones and L zeroes, we can encode the index of the appropriate subset by encoding K and L at a cost of log K + log L bits. Encoding the index of the appropriate element of that subset takes
log C(K,K+L) = log ((K+L)! / (K! L!))
bits.
Compare this to arithmetic coding with a static order-0 model. Here we first encode the length of the message (K+L) and p(one) = K/(K+L), which again takes log K + log L bits. Then for each one bit in the message we encode log ((K+L)/K) bits, and for each zero bit we encode log ((K+L)/L) bits, for a total of
K log (K+L)/K + L log (K+L)/L = log ((K+L)? / (K? L?))
bits, where I write N? to mean N^N.
Arithmetic coding is asymptotically as efficient as enumerative coding, since O(N?) = O(N!), but enumerative coding will always use fewer bits, since N! < N? for interesting values of N.
I think that this paper describes an efficient algorithm for enumerative coding for the order-0 model I used above, but generalized to arbitrarily many symbols. If so, it won't replace coders based on sophisticated models with arithmetic-coding back ends, but it may find a niche in applications that use static order-0 models, such as video compression.
)> Sounds like hype. Especially the statement "always compresses better )> than the best of arithmetic coders" This is obviouly a false statement. )> It is impossible for any compressor to always compress better than )> every other compressor. ) ) The two are not arbitrary unrelated algorithms -- arithmetic coding ) (AC) and QI are both approximations of the exact enumerative coding ) (the exact enumeration of messages).
AC is not only enumerative coding. It is also used with adaptive models and higher-order statistics.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
> If so, it won't replace coders based on sophisticated models > with arithmetic-coding back ends,...
The AC modeling paradigm ('probability of next single symbol' modeling interface bottleneck) was created following path of 'virtue out of necessity'. The necessity being the particular entanglement of the inner most coding steps with the model parameters p(a). The EC/QI can code the same way, too, except they can also code a much better way (Kolmogorov modeling paradigm). There is a bit on the modeling on the page 9 in the paper and a much longer discussion in the tech report cited:
As you may have noted in the row "Vary" of the test, which is a non-probabilistic sequence for the order-0 coders, one can easily provide inputs which will "surprise" catastrophically the arithmetic 0-order coder, but you can't provide inputs on which AC works well and EC fails as dramatically (as does AC on the data "Vary"). The same phenomenon exists for the higher order coders, as soon as the input falkls outside of the "predictable" at a given order. The adaptive AC fails badly, while EC merely comes out slighly sub-optimally (since the effect of a "suprise" is always much greater on the predictor scheme than on the 'descriptor' scheme; with EC only the frequency table encoding may turn out suboptimal, while the bulk of output, the index is still optimal).
The essential difference between the adaptive AC and EC/QI is that AC subjects the modeling engine to an aritificial constraint (the memoryless coder allowed to have only a single symbol at a time, a limitation largely not relevant in practical coding since Morse telegraph days). The EC can do that, too, but it normally doesn't. Trying to predict furture, as AC modeling paradigm insists, is inherently harder than describing the present or past. Even the practical forms of AC modeling recognize the problem and allow a backdoor cheating against the ideal pure prediction (via the fragile and ad hoc escape mechanisms, where the coder is "allowed" to look ahead and clue the decoder as to what is coming).
As another aspect of the distinction, note that the EC mdeling engine normally needs to know only whether, say messages X and Y have the same probabilities (whatever their values might be), while the AC modeling engine needs to know what the actual values of the probabilities are, which is a much harder task.
The upshot is that the native EC/QI modeling (the Kolmogorov/MDL scheme) is much simpler and more robust than the predictive AC paradigm (EC/QI can reuse the existent AC modeling engine, though, e.g. by splitting the output into different probability classes, cf p. 9, [N7]).
)> If so, it won't replace coders based on sophisticated models )> with arithmetic-coding back ends,... ) ) The AC modeling paradigm ('probability of next single symbol' modeling ) interface bottleneck) was created following path of 'virtue out of ) necessity'. The necessity being the particular entanglement of the ) ... ) ) As you may have noted in the row "Vary" of the test, which is a ) non-probabilistic sequence for the order-0 coders, one can easily ) provide inputs which will "surprise" catastrophically the arithmetic ) 0-order coder,
You don't seem to grasp the existence of higher-order models. Would something like PPM be easy to implement using this QI coder ?
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
> AC is not only enumerative coding. It is also used with adaptive models > and higher-order statistics.
The coding aspects in both cases are message enumerations -- with EC in the exact lists of messages (performed using the virtual dictionary of combinations), with AC in the cummulative probablity space (which only asymptotically for large n approaches the exact EC coding, while being uniformly suboptimal for any finite n). It is a historic fluke that EC was presented in conjunction with the 'descriptive'/universal way of modeling (MDL/Kolmogorov), while AC with the predictive way (such as PPM).
Both coding methods can work with modeling engines either way, though. The coding algorithm of AC does not separate cleanly the model parameters from the coder arithmetic (by virtue of hardwiring 'probabilities of next single symbol' into its coding addends). Thus, its native mode is the 'predictive' way, as a virtue out of necessity.
EC has a much cleaner separation between the modeling parameters and the coding computations -- the modeling engine of EC defines the enumerative classes (the subsets of "equiprobable messages"), while coder computations deal only with the universal combinatorial properties of sequences (belonging to a given class). The result is a much faster operation, since the 'universal properties of sequences' are independent of the particular source parameters, thus they can be precomputed and optimized away, outside of the coding loop, as it were. If AC were to code via precomputed values, it would need a separate table for each set of source probabilities, which is impractical even for a static binary source.
)> AC is not only enumerative coding. It is also used with adaptive models )> and higher-order statistics. ) ) The coding aspects in both cases are message enumerations -- with EC in ) the exact lists of messages (performed using the virtual dictionary of ) combinations), with AC in the cummulative probablity space (which only ) asymptotically for large n approaches the exact EC coding, while being ) uniformly suboptimal for any finite n).
Not true. Adaptive coding is trivially capable of outperforming a model that uses only the cumulative probability, in cases where there are local variations in the probability distribution.
Also, adaptive coding works very well with higher-order models, where it is intractable to transmit the model because of its exponential size.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
nightlight wrote: >The AC modeling paradigm ('probability of next single symbol' modeling >interface bottleneck) was created following path of 'virtue out of >necessity'. [etc, etc, etc]
Look, I understand all of this. I understand the weaknesses of PPM and of arithmetic coding, and I understand that an enumerative coder can do better in principle. None of that matters. What matters is this: what is your contribution? Can you, in practice, do a better job than state-of-the-art compressors based on PPM+ari? Going on about theoretical benefits is a waste of time. Theoretically, the best way to write a chess-playing program is to use minimax on the complete game tree. Actual chess-playing algorithms are gross hacks which attempt, in a highly inelegant way, to approximate that ideal. Pointing out that that's true accomplishes nothing. Everyone versed in the art knows it already.
>As you may have noted in the row "Vary" of the test, which is a >non-probabilistic sequence for the order-0 coders, one can easily >provide inputs which will "surprise" catastrophically the arithmetic >0-order coder, but you can't provide inputs on which AC works well and >EC fails as dramatically (as does AC on the data "Vary").
Two questions. First, do you understand the difference between adaptive order-0 and static order-0 models? Second, how would the "QI" and "AC" in your tests perform if the input consisted of N/2 zeroes followed by N/2 ones?
>>) As you may have noted in the row "Vary" of the test, which is a >> non-probabilistic sequence for the order-0 coders, one can easily >> provide inputs which will "surprise" catastrophically the arithmetic >> 0-order coder, > You don't seem to grasp the existence of higher-order models.
That was a test of the same order 0 coders. At any order, you can suprise the "predictive" scheme catastrophically (unless it cheats via some ad hoc esc mechanism, allowing it to use temporarily universal way of coding), while you can't do so to a descriptive/universal scheme.
> Would something like PPM be easy to implement using this QI coder ?
There are couple ways, at least, to do that. QI can code (via lattice jumps) to the 'probability of the next single symbol' exactly as AC, but that costs it much of speed advantage (since mantissa scaling used by the jumps, requires multiplications & divisions). But it can also code from the AC modeling engine instructions much faster by splitting the outputs into the probability classes (which needs also an error estimates on the probabilities given to the coder; the scheme grows the classes in the manner of Context Tree Weighing modeling scheme).
OTOH, it could do much better, with any given a priori knowledge about the input that is fed to PPM, by coding the native EC way, which is by splitting the input into proper enumerative classes. Such segmentation can be done based on the last s symbols (e.g. for order-s static Markov source; a method similar to Context Tree Weighing to select the output stream), or dynamically via BWT-like segmentation which doesn't assume a priory idealized finite order Markov source (you can check the discussion of this in the Ref [23] cited earlier). Even with the kludgy ad hoc entropy coding schemes (such as MTF variants which crudely mangle the enumerative classes present in the final BWT matrix), BWT is already surprisingly competitive with PPM in compression quality, while executing much quicker. With the proper handling of these presently mangled BWT enumerative classes, which QI is naturally suited to do quickly and optimally, BWT should consistently outperform PPM not just in speed but in compression quality.
In short, AC is a particular historical approximation of QI, hence anything AC can do, QI can do but generally much faster (or at least as fast) and generally more accurately (or at least as accurately). Similarly, the PPM or generally the predictive way of modeling via the 'probability of the next single symbol' is another historical accident (tailored to a particular hardwired coding algorithm of AC), by no means the only (and certainly not the best) way to model the properties of finite sequences of symbols. AC algorithm happens to require that all properties must be translated into exactly such form (as 'the probabilities of the single next symbol') so it can encode it. That is more of a handicap than a strength for the modeling engine to bend and distort everything else around the AC computational peculiarities, to have to translate and funnel all the conceivable properties of symbol sequences available to modeling engine through that particular coder interface bottleneck.
So, instead of questing anyones grasp of 'higher order models' you might wish to reaxamine first, how do some manage to convince themselves to perceive, with all their sincerity, a plain, gross handicap as a great strength.
> Can you, in practice, do a better job than state-of-the-art compressors > based on PPM+ari?
QI is a very new algorithm (and a first truly practical form of genuine EC). A better BWT variant, without mangling of its contexts (as the MTF phase presently does), is certainly high on the list of things to apply QI to. It is still only one among many interesting possibilities opened by the existence of practical EC. The whole field of practical EC modeling is presently in a very primitive form and most people have many misconceptions what it is (or generally on modeling, identifying it entirely with the AC modeling paradigm). Hopefully a publication of a clean, well performing reference implementation of QI (on which I am working, and which will perform even better than the tests results of the crude research prototype shown in the preprint) in the forthcoming months, should bring more people to try out new ways of modeling and answer those questions.
> First, do you understand the difference between adaptive order-0 and > static order-0 models? Second, how would the "QI" and "AC" in > your tests perform if the input consisted of N/2 zeroes followed by N/2 > ones?
The sequence "Vary" was a sequence similar to that, since the negative half of that sequence is mostly 1's and the positive part mostly 0's. The rest of the answer obviously depends on how the QI segments the sequence, what are the rules of the game. If you insist on requiring that QI/EC models ...000111... sequence in a single block of size N, then obviously it will produce output which is N + log*(N) + 1/2 log(N) bits (since it needs to encode the length of a sequence in log*(N) = log(log(...N) bits and the count of 1's in log(N) bits, while the index is C(N,N/2) which is Catalan number with approx N - 1/2 log(N) bits). Of course, if you wish to compare apples to apples, then if the predictive AC coder is allowed some adaptation, one would need to allow QI to also have some kind of corresponding capabilities (i.e. not to function as a static order-0 coder), such as allowing the encoder to select the input segmentation, in which case it would produce output smaller than adaptive AC (as array "Vary" illustrates). Comparing static vs adaptive order-0 coders is comparing apples and oranges.
There is no instrinsic algorithmic constraint for EC/QI to code static order-0 way (i.e. a la Lynch-Davisson 1966 EC coder). If you take AC and EC/QI and make both code static order-0 way with the same segmentation, the EC/QI will give you slighly smaller output and code much faster. Then if you make AC adaptive, with adaptation rate faster than N/2, you need to decide what is the corresponding EC/QI mode of coding? Whatever it is, it certainly cannot be the same scheme that was just tested as a comparison of the static order-0 coders, which is what you seem to be hinting it ought to be. The most natural EC/QI counterpart to AC adapting faster than N/2 is to allow EC/QI coder the segmentation choice to at least 2 parts, in which case it will come out on top again (as the array "Vary" showed; even though the QI coder tested was somewhat more primitive than that: it had simply used fixed 1Kbit blocks).
In any case, this particular tangent is more about verbal pigeonholing, rather than about the substance of the difference. The basic difference between AC and QI coding algorithms is that AC enumerates messages via cummulative probabilities and EC/QI via exact ranking of combinations of symbol sequences. The latter way is more accurate and it is much faster (since much of the coder work is with the universal sequence properties, thus it is precomputed in the quantized binomial tables, outside of the coding loop).
The modeling scheme is an entirely unrelated issue. You can select a modeling scheme where both coders servicing that modeler perform equally poorly (e.g. by forcing QI to do lattice jumps a la AC, in which case it will have to work exactly as hard per symbol as AC; with adaptive output splitting QI will compress well as adaptive AC, but code much faster). Generally, though, given the same modeling scheme, QI will always perform at least as well as AC, in speed and in accuracy (but mostly better, especially in speed). Anything else is comparing apples and oranges (note that the test in the row "Vary" is of that apple-vs-orange kind, except that QI was the one slightly handicapped since it wasn't really as adaptive as the EC counterpart of adaptive AC ought to be i.e. free select at least one input partition for AC allowed to adapt faster than N/2; QI/EC produces 0 length index for sequence of all 1's or all 0's).
> Adaptive coding is trivially capable of outperforming a > model that uses only the cumulative probability, in cases > where there are local variations in the probability distribution.
You are comparing different modeling schemes (adaptive vs static), while I was comparing the bare coding algorithms (both can be attached to the same model). At any given point AC gives a position (an interval) on the cummulative probability line for the sequence seen so far, while EC/QI gives the exact ordinal of the same sequence in the same list of sequences (the list is what model produces, we assume the same model and look at the coder difference). The only coder level difference is that the cummulative probabilty index is an approximation (a uniform majorization via the Stirling approx. of binomials and dropping of the two factors < 1, cf p. 2 and ref [23] for more details) of the exact ordinal index and that the QI algorithm can compute it generally much faster than AC algorithm can (and always at least as fast). The reason for the speed difference is discussed in ref [23] -- it is basically that the QI algorithm has much cleaner separation between the coding computations and the specific instance properties of the source, than the AC alogotithm, hence QI can shift much of the coding computation outside of the instance specific coding loop.
> Also, adaptive coding works very well with higher-order models, where > it is intractable to transmit the model because of its exponential size.
This is an interesting point, but outside of the comparison of QI and AC coding algorithms.
Taken on its own, though, your statement is a bit self-contradictory, since the model information obviously can be transmitted in sub-exponential size (e.g. via Adaptive AC scheme). What you really mean is that the first simple-minded encoding of the model parameters that you can came up with, off the top of your head, is exponential. Well, ok, so what? For example, the BWT, which is a bit less obvious way to extract and transmit a higher order models, is not exponential in size (and it is also not an adaptive PPM scheme). In fact, even when coupled with the crude MTF obliteration of the context boundaries, it performs nearly as well as PPM in compression quality (while running much faster).
"nightlight" <nightli...@omegapoint.com> writes: > the speed factors in the paper show QI is 200+ times faster than AC on
The question this raises for me is why AC implementations are 200x slower than reading from RAM? That sounds a little slow. Computer arithmetic is now incredibly fast, and RAM is slow in comparison.
/If/ AC is only 50x slower than reading from RAM, say, (a figure that wouldn't surprise me), then there's no way on earth that QI could be 200 times faster in _any_ circumstances.
Phil -- If a religion is defined to be a system of ideas that contains unprovable statements, then Godel taught us that mathematics is not only a religion, it is the only religion that can prove itself to be one. -- John Barrow
"nightlight" <nightli...@omegapoint.com> writes: > > First, do you understand the difference between adaptive order-0 and > > static order-0 models? Second, how would the "QI" and "AC" in > > your tests perform if the input consisted of N/2 zeroes followed by N/2 > > ones?
> The sequence "Vary" was a sequence similar to that, since the negative > half of that sequence is mostly 1's and the positive part mostly 0's.
What do you mean by "negative half" and "positive part"? Applying the concept of sign to sequences of bits is, erm, unconventional.
Phil -- If a religion is defined to be a system of ideas that contains unprovable statements, then Godel taught us that mathematics is not only a religion, it is the only religion that can prove itself to be one. -- John Barrow
Ben Rudiak-Gould <br276delet...@cam.ac.uk> writes: > X-Accept-Language: en-us, en
I'm confused - what on earth is the point of such a header in news-posting context? And why is a cantabrigian prioritising en-us above en?!?!?
Send my regards to the L&LL, Phil -- If a religion is defined to be a system of ideas that contains unprovable statements, then Godel taught us that mathematics is not only a religion, it is the only religion that can prove itself to be one. -- John Barrow
> The question this raises for me is why AC implementations are 200x > slower than reading from RAM?
The AC has to process and encode each bit, 32 per machine word, for which QI needs only to establish if there is any of the "less frequent symbols" (e.g. 1). Even though that AC implementation uses large loop unrolling and other speed optimizations, it only needs 8 times more work per one bit than QI needs to fetch the dword and answer "is it only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to get the speed factor of 256. If you look their code (binary coder, no contexts):
you will see that they certainly do that much more work per encoding of each bit, in fact even more than the ratios show, if you count the numbers of instructions. But, as you note, the fast processors with lots of cache mask out some of the speed difference, penalizing more the new memory reads, which both coders have to do. On an older 400Mhz Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI than on the recent VAIO lapop where those figures in the preprint came from.
>> The sequence "Vary" was a sequence similar to that, since the negative >> half of that sequence is mostly 1's and the positive part mostly 0's.
> What do you mean by "negative half" and "positive part"? > Applying the concept of sign to sequences of bits is, erm, unconventional.
The negative 32 bit integer, say -1 has 32 1's. The idea was to give both coders (which were order 0 binary coders, thus they view data as sequence of uncorrelated bits) some simple sequence which has regularity invisible to order-0 (but visible to some higher order coder). It was meant to illustrate the robustness of the universal way of coding of the EC/QI coder and the fragility of the predictive/adaptive way of the AC coder.
The origin of the greater robustness in the "descriptive" way of modeling is in the fact that effect of the "surprise" affects chiefly the count component of the output, which for the 1K input block is on average a 10 bit number (e.g. specifying # of 1's in the block). The rest is a pure combinatorial index, giving the shortest possible decription, absent any further knowledge by the coder, on how that number of bits is placed in that block. With the adaptive AC, each coding step and the entire output is affected by the incorrectly conjectured probabilities. Thus it produces as much as twice the output size of the QI.
Obviously, the AC can be made to code the 'universal'/descriptive way, in which case the error would be much smaller (similar to the rest of tables, for the matching density of 1's and input size i.e. few percent extra output), while the speed ratio would still remain dramatically in QI favor.
The example was meant to help demythologize a bit the 'adaptive/predictive coding mystique' that lots of people are apparently mesmerized by (as the previous discussion in this thread illustrates). The 'predictive coding', at any modeling order, is a handicap (a historical fluke of bending the entire modeling engine to the peculiarities of the particular interpretation of the AC computational algorithm, the perceived need to have probability of next symbol to perform the encoding), not a strength as often touted, and a 'descriptive'/universal type algorithm at the same order of modeling will always come out ahead.
When AC is made to code the universal/descriptive way, the "probabilities" used for the computation are simply interpreted as normalized plain counts (instead of the presumed infinite n limits of such ratios, the 'probabilities", which is the adaptive AC interpretation of the AC algorithm and which contains a much bigger implicit assumption about the infinite rest of the sequence, that in the case of the array "Vary" takes it to a catastrophically wrong track). But in this universal mode, AC uses the plain normalized counts in the exactly same way as the counts of EC/QI, slightly less accurately (by 1/2 log(n) excess bits in the AC output, for the sequence of length n) and encoded much more slowly. The Rissanen's 1976 coder (which was the first finite precision arithmetic coder) was coding this way. See the reference:
where the quantities Phi (in eq. 5) are merely the approximate binomial coefficients of regular EC (see footnote on page 2 in the preprint and ref [23] for details). Although Rissanen suggests in that paper a table based faster coding alternative (which allows skipping of the most frequent symbol as in QI), for the AC one needs different tables of O(n^2) size for each source probability, while QI needs only a single table of O(n^2), the quantized binomials C(n,k), which is valid for all quasi-stationary sources and all probabilities.
That's why the AC algorithm was a doubly partial solution of the precision problem of the exact enumeration -- it not only produced an approximate (and a larger) enumerative index, but it also left unsolved the O(n^3) table size problem of the exact enumeration (which it then worked around by trading the table size for quite a few extra CPU cycles, as illustrated in the tests chart). The QI solves both problems - it takes the factor O(n) out of the exact EC computation speed, as did AC, and it takes the factor O(n) out of the table sizes, which AC missed (thus QI doesn't need to trade off the coding speed to reduce the table sizes as AC does; although QI can do that kind of tradeoffs too, cf. page 9, [N2] in the preprint, but from a much better starting point than AC). Rissanen was pleasantly surprised to see that "great difficulty" (see the preprint pages 1-2, for citations) finaly solved optimally and commented on the solution "Very clever!"
nightlight wrote: >>The question this raises for me is why AC implementations are 200x >>slower than reading from RAM?
> The AC has to process and encode each bit, 32 per machine word, for > which QI needs only to establish if there is any of the "less frequent > symbols" (e.g. 1). Even though that AC implementation uses large loop > unrolling and other speed optimizations, it only needs 8 times more > work per one bit than QI needs to fetch the dword and answer "is it > only the more frequent symbols in this word" (e.g. w==0 or w==(-1)) to > get the speed factor of 256. If you look their code (binary coder, no > contexts):
> you will see that they certainly do that much more work per encoding of > each bit, in fact even more than the ratios show, if you count the > numbers of instructions. But, as you note, the fast processors with > lots of cache mask out some of the speed difference, penalizing more > the new memory reads, which both coders have to do. On an older 400Mhz > Celeron laptop the speed ratios are 1.5-2 times larger in favor of QI > than on the recent VAIO lapop where those figures in the preprint came > from.
> That was a test of the same order 0 coders. At any order, you can > suprise the "predictive" scheme catastrophically (unless it cheats via > some ad hoc esc mechanism, allowing it to use temporarily universal way > of coding), while you can't do so to a descriptive/universal scheme.
Look is your so called coded able to actaully compress some files. Do you have test files to test the order 0 codings. So one can campare it to real world order 0 encoders. With out real examples one can test I don't belive you.
> "nightlight" <nightli...@omegapoint.com> writes: >> > First, do you understand the difference between adaptive order-0 >> > and static order-0 models? Second, how would the "QI" and "AC" in >> > your tests perform if the input consisted of N/2 zeroes followed by >> > N/2 ones?
>> The sequence "Vary" was a sequence similar to that, since the >> negative half of that sequence is mostly 1's and the positive part >> mostly 0's.
> What do you mean by "negative half" and "positive part"? > Applying the concept of sign to sequences of bits is, erm, > unconventional.
> Phil
I guess that means if your given say a file with say one thousand bytes of zero followed by say one thousand bytes of all ones 0xFF he has no idea how his compressor or some arbitrary arithmetic would compress it. It really should not be that hard to test. But maybe he can't test it with his method yet.
) The negative 32 bit integer, say -1 has 32 1's. The idea was to give ) both coders (which were order 0 binary coders, thus they view data as ) sequence of uncorrelated bits) some simple sequence which has ) regularity invisible to order-0 (but visible to some higher order ) coder). It was meant to illustrate the robustness of the universal way ) of coding of the EC/QI coder and the fragility of the ) predictive/adaptive way of the AC coder.
If an adaptive order-0 AC coder performs badly on a sequence that has varying statistics, then you wrote a shitty adaptive model. A good adaptive model views symbols that are nearer with more weight.
It's no wonder that badly written code is outperformed.
SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT
QI is a fundamental algoritm not a program (although several research variants of the algoritham are implemented). If your algorithm needs to do more work than about 3 adds, 1 shift and two 1-dim array reads, per less probably symbol, and any operation at all (other than traversing the array to examine what the symbols are) per more probable symbol, then it is inherently slower algorithm than QI. Any arithmetic coder falls in this category, since a general purpose AC needs to encode 0's and 1's explicitly. Check the simplicity of the generic EC/QI coding loop (from p. 8 in [23])
I=0; // init cumulative path index for(k=n=0; n<M; ++n) // M is the number of input bits { // or a block size if (getbit(n)==1) // get bit at 0-based offset n { // skip on bit=0, add on bit=1 ++k; // increment count of 1's found I=I+C(n,k); // add binomial to the path index }
} // k is now the final count of 1's
The encoding step of the less frequent symbol (bit=1 by convention; for the more frequent symbol only the outer loop executes) is simply an add I=I+C(n,k); of a quantized binomial coefficient C(n,k), which is a 32 bit integer (taken out of an 1-dim array as (row[n])[k], with another array giving the row n subarray) to a large integer "I" (which is of the size of entire output). The 32 bit int C(n,k) is always added to the leading 32 bits of "I" (thus at most 1 carry can occur into leading 0's of "I"). The symbolic getbit(n) gets the next unprocessed input bit, which is just a shift of a 32 bit word holding the next set of 32 input bits. That is really as simple and as fast as entropy coding can ever get. The reason for such extreme simplicity is that the real caluculation, the one of the quantized addends C(n,k), was done outside of the coding loop.
That's the basic beauty of the EC/QI's much cleaner division of labor than the one of AC (see [23] pages 19-26, 30-32), where each addend is inseparably entangled with the source instance properties (the particular symbol probabilities at that point), hence the addends can't be precomputed generically/universally for all sources an taken outside of the AC coding loop. That gives QI a huge and a fundamental algorithmic edge which can't be compensated by any special programming tricks available uniquely to AC.
Note also that for the test we had stripped the Moffat et al. V3 binary coder to its bare engine, replaced stream i/o with memory to memory coding, no allocations were done within the timed loop, model decisions were taken out (since only order 0 was tested) and all dynamical coding options (alternatives that didn't matter) were hardwired as static to avoid time waste in the test. So the arithmetic coder tested was already quite a bit faster than the out-of-a-box Moffat et al code. We wanted it to take its best shot, the best it possibly could (since QI didn't need any such implementation/code level edge, being so much more efficient at the fundamental, algorithmic level).
Any other arithmetic coder which improves on Moffat coder in these kinds of simple common sense but otherwise trivial ways (as the fpaq0.cpp seems to show), without providing any essential improvement at the algorithmic level will do roughly the same as the chart showed for the streamlined Moffat's coder.