Anybody interested is welcome to take a look at a paper
I recently showed at the CISS conference.
The paper uses the Burrows Wheeler transform (BWT) to
preprocess a sequence of symbols. Afterwards a block-merging
algorithm estimates states, parameters, etc.
The main contributions are:
1. the bvlock mergins estimates the source well asymptotically
2. application to universal coding with lower redundancies than
previous BWT schemes
3. iid assumption not used (weaker assumption used)
It is downloadable from my website...
Dror
--------------------------
Dror Baron
ECE Department
University of Illinois at Urbana-Champaign(UIUC)
CSRL room 120
Phone: (217)-265-8234 (office)
Website: http://www.uiuc.edu/~dbaron
Note: Really, BWT has the same asimptotical complexity as PPM - O(d*N),
where d is average context length.
Thanks for your interest!
However, your claim is inaccurate. (I'll use LATEX for math:)
For modelling sources of depth $d$ PPM is indeed $O(d\cdot{N})$
where $N$ is the string length. However, usually $d=O(\log{N})$.
Thus, PPM is typically $O(N\log{N})$.
(NOTE: I got this confirmed by the author of the original
papers on PPM)
The BWT has several variants:
1. Sadakane's method is $O(N\log{N})$ worst case and $O(N\cdot{AML})$
typical case where AML is average matching length,
which is $O(H\cdot\log{N})$ where $H$ is the entropy rate.
2. Suffix trees are $O(N|X|)$ worst case, where $X$ is the alphabet
and $|X|$ its cardinality.
3. Suffix trees with hash tables are better yet...
Therefore, asymptotically $O(\log{N})$ is worse than $|X|$ which
a constant (not to mention the hash table implementations).
Lastly, should you think that CTW (context tree weighting) is
better, I talked at length with one of the authors last week
and he became convinced that his method is $O(N\log{N})$.
By the way, my paper (www.uiuc.edu/~dbaron and look for it)
discusses these issues briefly with pointers to the literature.
Again, thanks for the interest!
Dror
> Anybody interested is welcome to take a look at a paper
> I recently showed at the CISS conference.
> The paper uses the Burrows Wheeler transform (BWT) to
> preprocess a sequence of symbols. Afterwards a block-merging
> algorithm estimates states, parameters, etc.
Good article. The approach can be found very useful in practical BW-schemes
(I advise to read the paper everybody who works on BWT).
Here are some notes and suggestions:
> The main contributions are:
> 1. the bvlock mergins estimates the source well asymptotically
Can you produce the results for Calgary?
> 2. application to universal coding with lower redundancies than
> previous BWT schemes
Well, but what in practice? K-T estimating is not good for
BW-postprocessing, because the transformed sequence usually has a kind of
correlation (it's not memoryless). So, the usual approaches are preferred
and the merging algorithm here can help to adjust the coding parameters
(statistics reset, etc). Going this way we may successfully reject
non-optimal MTF-coding.
> 3. iid assumption not used (weaker assumption used)
Block-oriented character is the main disadvantage of the represented
technique. We assume that the information birth is nonrecurrent action. It
put some limitations on the source (from this point CTW has a prime
advantage). Therewith, I'd like to note that such method as well as original
Burrows-Wheeler algorithm is not adaptive.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
I suggest you go through references [2],[8],[13],[14], and [15]
in my paper.
Remember that the basic difference is that PPM is explicitly
modelling the source while BWT is only running a sort.
Thanks again,
Dror
Thanks for your comments!
I'll try to respond to your suggestions.
> Can you produce the results for Calgary?
I don't have simulation results yet.
However, I believe that for practical purposes
the block lengths will be quite large, so I
would not expect amazing results.
I guess we'll need to be patient.
> Well, but what in practice? K-T estimating is not good for
> BW-postprocessing, because the transformed sequence usually has a kind of
> correlation (it's not memoryless). So, the usual approaches are preferred
> and the merging algorithm here can help to adjust the coding parameters
> (statistics reset, etc). Going this way we may successfully reject
> non-optimal MTF-coding.
Exactly. Instead of running MTF on the BWT output,
we begin by estimating state segments, afterwards we
code each one of those separately.
> Block-oriented character is the main disadvantage of the represented
> technique. We assume that the information birth is nonrecurrent action. It
> put some limitations on the source (from this point CTW has a prime
> advantage). Therewith, I'd like to note that such method as well as original
> Burrows-Wheeler algorithm is not adaptive.
Yes - the block-orientation is limiting.
Thanks again,
We agree that good results on text are not expected.
The fundamental reason is that the algorithm is designed for modelling.
It isn't designed for text. I am sure you realize that I'm a
research-oriented PhD student and not a results-oriented software
developer - the flavor of my results will always be biased for
(what I consider to be) beauty and elegance and less for results.
By the way - I talked with Willems last week and he mentioned
they have some results on a variant of CTW which is about as
good as PPM. So I wouldn't rule anything out.
Last comment: from a theoretical viewpoint (universal coding
redundancies) CTW is extremely good.
BWT output will always be biased by $\log{N}$ per segment (see
Merhav's paper on MDL of piecewise stationary sources).
Therefore CTW is 'born' to be better than BWT.
> > Exactly. Instead of running MTF on the BWT output,
> > we begin by estimating state segments, afterwards we
> > code each one of those separately.
>
> Not quite right. Undoubtedly, there is a correlation between the segments.
> Therewith, separable coding means large statistics saving issues in a case
Theoretically you are correct - see reference [1] from my paper.
For practical purposes (justified by simulation) the correlation
between adjacent segments is negligible.
> BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed
> over BW-packers : )
Again: my emphasis is on theoretical research and not programming.
I am sure Dmitry is 10 times better at programming ;-)
At the end of the day there is an inherent bound on
the complexity of compression algorithms: it is $O(N\log{|X|})$.
Anybody who claims to be better is cheating somewhere.
Nobody can do real $O(N)$. Current BW algorithms are $O(N|X|)$.
Dror
> > Can you produce the results for Calgary?
>
> I don't have simulation results yet.
> However, I believe that for practical purposes
> the block lengths will be quite large, so I
> would not expect amazing results.
> I guess we'll need to be patient.
I have a serious doubt about amazing results:
(1) for BW (here and later Burrows-Wheeler method): as far as I know,
Michael Shindler's SZIP makes a similar thing in some aspects, but the
results are not very impressive;
(2) for CTW: well, your technique may be good in speed (especially, if radix
sort is in use (I mean partial context sort)), but what about the ratio? As
I said, it's not adaptive, and the estimate is not so clever as in CTW. So,
the compression effect may be poor for this class of algorithms;
(3) big trouble: the need for creating byte-oriented version. It's not easy.
One way is doing something like Tjalkens, Volf and Willems described in "A
Context-Tree Weighting Method for Text Generating Sources". This approach
looks awkward. Unfortunately, for today I don't see the other way.
> > Well, but what in practice? K-T estimating is not good for
> > BW-postprocessing, because the transformed sequence usually has a kind
of
> > correlation (it's not memoryless). So, the usual approaches are
preferred
> > and the merging algorithm here can help to adjust the coding parameters
> > (statistics reset, etc). Going this way we may successfully reject
> > non-optimal MTF-coding.
>
> Exactly. Instead of running MTF on the BWT output,
> we begin by estimating state segments, afterwards we
> code each one of those separately.
Not quite right. Undoubtedly, there is a correlation between the segments.
Therewith, separable coding means large statistics saving issues in a case
of non-adaptive method. So, I guess postprocessing should be adaptive and
the adaptation must depend on segments boundaries. BTW the separation can be
made during the sequential coding after the boundary is reached (an exact
boundary hit is not so important). I mean coder and decoder that
synchronously analyze the data behavior.
[from the other discussion]
>However, in PPM we (asymptotically) model the source as
>depth-$\log{N}$ thus at best $\log{N}$ operations per symbol are
>necessary. For BWT, nothing is being modelled explicitly.
>We need (suffix arrays) $|X|$ (without hash tables)
>operations per symbol.
Modern PPM algorithms don't require hashing. PATRICIA structures are often
used instead.
> For practical purposes $|X|$ is pretty big, but I'm sure you
> know how that can be overcome.
> So... asymptotically PPM is *not* linear complexity whereas
> BWT *is* linear.
BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed
over BW-packers : )
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
I think the point must be made that Dmitry's PPMD uses a suffix tree
approach which gives rise to an O(N) (the point being that the maximum
order and alphabet size are constant and so should be ignored :). This
is better than the O(N*log(N)) you describe for PPM algorithms in
general.
The O(N) that Dmitry achieves, makes use of the coherence between
adjacent contexts, and as such could also be applied to CTW models as
they use similar structures. This would mean that you could create a
CTW that has O(N) complexity as well.
ie. both PPM and CTW are able to achieve the lower bound in complexity
that you quote for 'any' compression algorithm.
Having said that, the current crop of CTW algorithms and PPM algorithm
implementations (not including Dmitry's PPMD that I know of) are
O(N*logN) (eg. RK uses an O(N*logN) implementation of PPM because the
extra complexity and memory saving techniques in my PPMZ destroys the
coherence needed for an O(N) implementation).
Also of note is that complexity analysis is only useful
asymptotically. The classic example is that bubble sort is faster than
quick sort for small sets since the constant per 'operation' is much
lower than quick sort. So a comparison of algorithms should take into
account the differences in constant time per 'operation' (eg. CTW uses
binary decomposition giving rise to at least 8 times the work of PPM).
Just my few points :)
Malcolm
> By the way - I talked with Willems last week and he mentioned
> they have some results on a variant of CTW which is about as
> good as PPM.
Note: I have nothing against CTW. Compression results are really great. But
truth to tell: because of the slow decoding method will never be widely used
in practice (as well as PPM, DMC, PSM). CTW remains nothing but admirable
theoretical issue.
> > > Exactly. Instead of running MTF on the BWT output,
> > > we begin by estimating state segments, afterwards we
> > > code each one of those separately.
> >
> > Not quite right. Undoubtedly, there is a correlation between the
segments.
> > Therewith, separable coding means large statistics saving issues in a
case
>
> Theoretically you are correct - see reference [1] from my paper.
> For practical purposes (justified by simulation) the correlation
> between adjacent segments is negligible.
Don't agree. Saying that you imply the particular class of sources. The
number of counter-examples I know allows to make an assumption that
correlation exists and we can't leave it out of account, especially if we
are in need of a clever statistical information storing mechanism in a case
of 256-symbol alphabet.
.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
Obviously if you showed me some of these examples I would be
extremely grateful.
Dror
Maximum order grows with $N$ so I don't agree with that approach.
Alphabet size - that's fine.
Theoretically, as one of the guys in this correspondence pointed out,
there is no reason to attain the $O(N\log{|X|})$ bound once we can
do $O(N|X|)$, because we can 'go down' to smaller alphabets.
> ie. both PPM and CTW are able to achieve the lower bound in complexity
> that you quote for 'any' compression algorithm.
This lower bound is simply the length in bits of the original
(uncompressed) version. I'm sure you agree we can't do better than
O(length original).
> Also of note is that complexity analysis is only useful
> asymptotically. The classic example is that bubble sort is faster than
> quick sort for small sets since the constant per 'operation' is much
> lower than quick sort. So a comparison of algorithms should take into
> account the differences in constant time per 'operation' (eg. CTW uses
> binary decomposition giving rise to at least 8 times the work of PPM).
Of course ;-)
Although I usually use the 'sort' in MATLAB which is
'other people's code' ;-)
Dror
> I agree. As a general rule BWT can not be better than PPM compressionwise;
> this
> was discussed on Dcc98 (not in the program, but among a few people). The
> reason is
> mainly that the BWT uses wrong statistics for the beginning of new
> contexts/buckerts/whatever.
> This is also the reason for improved compression for bigger blocks - the
> loss due to
> adaption is (nearly) the same for a big and a small block as the number of
> different contexts
> is about the same. For the big block the percentage of such badly compressed
> data will
> simply be smaller.
>
> I did a few experiments and these are approximate data:
> if you take a big text (maybe out of the canterbury large file corpus) and
> run statistics individually
> for the first few bytes of a context/bucket/whatever and for the remainder
> you will notice that
> the compression in the beginning is very low (3 bytes/char or more for the
> first few chars)
> while it then becomes good.
> However for large blocks (above 1 or 2 MB) the fraction of these badly
> compressed bytes is low.
> So do not expect too much from larger blocksizes. I have szip versions
> capable of handling blocks up to
> 2^31-1, but i never really tested above 10MB or so.
> My rule of thumb: limit(size) = 2*(size(using 4MB blocks)-size(using 2MB
> blocks)).
>
> >(1) for BW (here and later Burrows-Wheeler method): as far as I know,
> >Michael Shindler's SZIP makes a similar thing in some aspects, but the
> >results are not very impressive;
>
> szip does a partial context sort, the context size is selectable by -o
> switch. Try -o4 for speed, which
> will do two base 2^16 LSB first radix sorts between writing "processing..."
> and "coding..." (-v1 to see it)
> The time before processing is reading the input, between coding and done
> doing the modelling
> and output. szip does no asyncroneous IO; experiments indicate that when
> reading and sorting
> in seperate threads speed for -o4 would double - is anyone interested in
> this?
>
> The szip version available does no context border estimation. I found that
> the 2% smaller size was
> not worth 20% more CPU time. I did such experiments, so these 2% smaller for
> files on my harddisk
> are tested. Szip has other places too where it prefers speed over
> compression.
>
> The main benefit of limiting sort order is that you could easily do BWT in
> hardware - MTF is not a
> real problem, and so are entropy coders in hardware.
>
> There are ways to get better compression out of BWT than currently
> available.
>
> michael
>
> [snipped]
>
>
>
i'd like to add my 0.02 euro to this discussion:
Vladimir Semenyuk wrote in message <8bof54$m56$1...@news.runnet.ru>...
>Hello, Dror !
>
>> > Can you produce the results for Calgary?
>>
>> I don't have simulation results yet.
>> However, I believe that for practical purposes
>> the block lengths will be quite large, so I
>> would not expect amazing results.
>> I guess we'll need to be patient.
>
>I have a serious doubt about amazing results:
I agree. As a general rule BWT can not be better than PPM compressionwise;
Let's divide:
1. Finite order PPM.
It is easy: after statistics gathering(all possible states and all
possible transitions are created), it works similarly to ordinary order-0
arithmetic coder with cost proportional |X| per symbol, so PPM behaivor
is O(|X|*N).
2. Unbounded PPM*.
It is more complicated: number of operations per context is bounded by
|X| again and number of scanned contexts(number of escapes) is less or
equal |X| too, because We can visit only those nodes that contain different
number of transitions, their number is limited by |X|, tree nodes are added
in linear time, therefore PPM* behaivor is O(|X|*|X|*N).
Hmmm, it is slightly worse than my previous claim ;-).
3. CTW(bounded).
It is similar to PPM: finite model => finite time for each symbol.
4. CTW(unbounded, does it exists?).
It scans tree in depth for each symbol, tree depth is proportional
log(N), so it is really O(N*log(N)).
>>(1) for BW (here and later Burrows-Wheeler method): as far as I know,
>>Michael Shindler's SZIP makes a similar thing in some aspects, but the
>>results are not very impressive;
> szip does a partial context sort, the context size is selectable by -o
> switch. Try -o4 for speed, which
> will do two base 2^16 LSB first radix sorts between writing
>"processing..." and "coding..." (-v1 to see it)
I knew this ;-)
> The szip version available does no context border estimation. I found that
> the 2% smaller size was
> not worth 20% more CPU time. I did such experiments, so these 2% smaller
> for files on my harddisk
> are tested. Szip has other places too where it prefers speed over
> compression.
I am entirely with you in this. A few percents is not amazing result.
Saying "a similar thing in some aspects" I meant only the partial
renunciation of MTF. I must be mistaken, but it seems to me that SZIP
sometimes doesn't use MTF. Is it right?
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
Excellent - it seems that our understanding is converging.
There are nice $O(N|X|)$ algorithms, and they lead to $O(N\log{|X|})$.
Anyway BWT is unbounded and $O(N|X|)$ so it is fine.
By the way - CTW unbounded does indeed exist.
Willems told me that he has some paper on this.
I'll send you the reference when I come across it...
(Busy month)
Dror
Ppm and ctw etc. is mainly interesting from an academic point of view.
It's simply too slow and memory demanding for any real practical use.
Someone said
"We don't need more powerful computers, we need better algorithms!"
Too see parts of what is possible with bwt, visit
http://www.mathematik.uni-bielefeld.de/~bernhard/
I'm working to improve bwt too and my current
results are very similar to these.
I think it's possible to do better than Balkenhol
but it's getting more difficult.
The argument that better compression equals slower
execution is simply not true! (most of the time, lazy programmers?)
You ppm experts, perhaps you should use your expert
knowledge and energy to develop bwt instead?
Mikael
In article <Pine.GSO.4.21.00032...@grenada.csl.uiuc.edu>
,
Dror Baron <dba...@grenada.csl.uiuc.edu> wrote:
> Hi all,
>
> Anybody interested is welcome to take a look at a paper
> I recently showed at the CISS conference.
> The paper uses the Burrows Wheeler transform (BWT) to
> preprocess a sequence of symbols. Afterwards a block-merging
> algorithm estimates states, parameters, etc.
>
> The main contributions are:
> 1. the bvlock mergins estimates the source well asymptotically
> 2. application to universal coding with lower redundancies than
> previous BWT schemes
> 3. iid assumption not used (weaker assumption used)
>
> It is downloadable from my website...
>
> Dror
>
> --------------------------
> Dror Baron
> ECE Department
> University of Illinois at Urbana-Champaign(UIUC)
> CSRL room 120
> Phone: (217)-265-8234 (office)
> Website: http://www.uiuc.edu/~dbaron
>
>
Sent via Deja.com http://www.deja.com/
Before you buy.
mikael.l...@spray.se wrote:
> Ppm and ctw etc. is mainly interesting from an academic point of view.
> It's simply too slow and memory demanding for any real practical use.
Hmm, I don't agree here. PPMD (the compressor) shows that PPM can be
quite fast. I suspect that it's the same with CTW, but probably most of the
people that work on PPM/CTW don't value speed/memory requirements as much as
compression ratios...
> I think it's possible to do better than Balkenhol
> but it's getting more difficult.
Well, my BWT results are a bit better than Balkenhols, but I use a
completly different modelling technique...
> The argument that better compression equals slower
> execution is simply not true! (most of the time, lazy programmers?)
:-)
--
Edgar
You propose compressing block by block.
This is drastically different than the proposed scheme.
> I agree. As a general rule BWT can not be better than PPM compressionwise;
> this
> was discussed on Dcc98 (not in the program, but among a few people). The
> reason is
> mainly that the BWT uses wrong statistics for the beginning of new
> contexts/buckerts/whatever.
> This is also the reason for improved compression for bigger blocks - the
> loss due to
> adaption is (nearly) the same for a big and a small block as the number of
> different contexts
> is about the same. For the big block the percentage of such badly compressed
> data will
> simply be smaller.
OK, I was also one of this people who has statet that. But now I am no longer shure
My implementation gives
bib 111261
26469 1.903
book1 768771
218023 2.269
book2 610856
149734 1.961
geo 102400
53221 4.158
news 377109
114097 2.420
obj1 21504
10020 3.728
obj2 246814
74284 2.408
paper1 53161
16003 2.408
paper2 82199
24186 2.354
pic 513216
46131 0.719
progc 39611
12108 2.445
progl 71646
15028 1.678
progp 49379
10377 1.681
trans 93695
17106 1.461
AVERAGE 14 :
786787 2.257
on the files of the Calgary Corpus
and that is possible in complexity similar to gzip (gzip not bzip or
something else)
Everybody who is interrested can find the describtion in a report
http://www.mathematik.uni-bielefeld.de/~bernhard
the reports are under the link "Preprints im SFB"
and the mentioned one is a work together with Yuri M. Shtarkov:
"One Attempt of a Compression Algorithm based on the BWT"
That is still not the best I can achieve, I estimate for the next version
which I like
to describe something like 2.20 in the average (hopefully).
Therefore as a conclusion at least for myself it is no longer clear
that we can not do better
than PPM or CTW.
> .....
> >(1) for BW (here and later Burrows-Wheeler method): as far as I know,
> >Michael Shindler's SZIP makes a similar thing in some aspects, but the
> >results are not very impressive;
> ....
> There are ways to get better compression out of BWT than currently
> available.
>
> michael
>
> [snipped]
>
With my best regards
Bernhard
-- --------------------------------------------------------------------------- Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de Fakultaet fuer Mathematik Tel. +0049-521-106-3872 Universitaet Bielefeld Postfach 10 01 31 Fax +0049-521-106-4743 D - 33501 Bielefeld Germany
Hi,
Hi all!
I'm pleased to see that there are people still working to improve bwt.
Bwt still has a lot of room for that.
I just wish that you could have included more like examples and results!
I'm working to improve bwt too and my current
results are very similar to these.
I think it's possible to do better than Balkenhol
but it's getting more difficult.
The argument that better compression equals slower
execution is simply not true! (most of the time, lazy programmers?)
You ppm experts, perhaps you should use your expert
knowledge and energy to develop bwt instead?
The sorting can be done in between 1/4 and 1/2 of the time of Benson Sedwick with
With regards
MikaelIn article <Pine.GSO.4.21.00032...@grenada.csl.uiuc.edu>
,
--------------------------------------------------------------------------- Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de
> OK, I was also one of this people who has statet that. But now
> I am no longer shure about that. It can been shown that you can
> think about the BWT method as a generalisation of PPM because
> 1) You have to consider Models where a context is suffix of an other
> context
> 2) In fact you can estimate the changings of the real context and then
> work as in PPM
I have the objections. First of all, let's disjoint two cases:
(1) We use an original Burrow-Wheeler approach. It means: making a sorting
(i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),
coding.
Advantages:
(a) Fast modeling => fast compression.
(b) Very fast decompression => this method will be used in practice in
contrast to any prediction technique (i.e. statistical methods like PPM).
Disadvantages:
(a) A kind of abstraction from real model due to context forgetting =>
unavoidable loss of efficiency (even with estimating techniques).
(b) Block nature => loss of adaptivity => possible loss of efficiency.
(2) We make context sorting and the result is used to facilitate the
modeling. It can be useful in dictionary and statistical methods.
Advantages:
(a) Fast and simple modeling => possibility of making good results with low
complexity.
Disadvantages:
(a) It speeds up the compression, but the complexity of decompression
stays the same.
(b) It is not a new method (not BW). Just a sorting.
So, if you mean the first case, in general it won't guarantee toughest
compression (see Disadvantages); second approach has no connection
to Burrows-Wheeler method.
And now I'll show the same from the point of practice:
> My implementation gives
> bib 111261 26469 1.903
> book1 768771 218023 2.269
> book2 610856 149734 1.961
> geo 102400 53221 4.158
> news 377109 114097 2.420
> obj1 21504 10020 3.728
> obj2 246814 74284 2.408
> paper1 53161 16003 2.408
> paper2 82199 24186 2.354
> pic 513216 46131 0.719
> progc 39611 12108 2.445
> progl 71646 15028 1.678
> progp 49379 10377 1.681
> trans 93695 17106 1.461
Results for PPMD(E) by Dmitry Shkarin ("-o5" option):
bib 1.77
book1 2.22
book2 1.90
geo 4.46
news 2.28
obj1 3.60
obj2 2.30
paper1 2.25
paper2 2.22
pic 0.79
progc 2.28
progl 1.60
progp 1.62
trans 1.37
And speed was really impressive here. (PPMD usually passes ahead of
BW-based programs (I mean coding+decoding time).)
Conclusion: BW is interesting only from the point of speed. So, further
sorting improvements are welcome. Greedy coding is not so important (slowing
down doesn't worth it). It's my standpoint.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
Vladimir Semenyuk wrote:
[Bernhard: BWT as PPM generalisation]
> I have the objections. First of all, let's disjoint two cases:
>
> (1) We use an original Burrow-Wheeler approach. It means: making a sorting
> (i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),
> coding.
>
> Advantages:
> (a) Fast modeling => fast compression.
> (b) Very fast decompression => this method will be used in practice in
> contrast to any prediction technique (i.e. statistical methods like PPM).
You forgot (c), Context blending and (d), BW doesn't need very much memory.
> Disadvantages:
> (a) A kind of abstraction from real model due to context forgetting =>
> unavoidable loss of efficiency (even with estimating techniques).
> (b) Block nature => loss of adaptivity => possible loss of efficiency.
Well, this is correct...
> So, if you mean the first case, in general it won't guarantee toughest
> compression (see Disadvantages); second approach has no connection
> to Burrows-Wheeler method.
>
> And now I'll show the same from the point of practice:
[results comparision]
> And speed was really impressive here. (PPMD usually passes ahead of
> BW-based programs (I mean coding+decoding time).)
>
> Conclusion: BW is interesting only from the point of speed. So, further
> sorting improvements are welcome. Greedy coding is not so important (slowing
> down doesn't worth it). It's my standpoint.
Hmm, it isn't neccesary for BW to get slower with better modeling. It's
possible to improve both compression and speed...
--
Edgar
What makes bwt worth the effort is that
1. It's relatively new and unexplored (1994) compared to ppm.
2. When Burrows and Wheeler released the idea to the world,
it could compress Calgary corpus to 2.40 bpc average.
Today we can compress the same using bwt to 2.26 bpc and it's not a
limit.
3. As Balkenhol is saying, bwt is close to ppm in what and how it's doing
it but faster and much more efficient with memory, hinting that it has
not reached its true potential.
If we say it's impossible from the beginning, we will never discover
or gain anything. (Man had never walked on the moon either!)
Mikael
In article <8ciotv$e53$1...@news.runnet.ru>,
"Vladimir Semenyuk" <seme...@unitel.spb.ru> wrote:
> Hello, Bernhard !
>
> > OK, I was also one of this people who has statet that. But now
> > I am no longer shure about that. It can been shown that you can
> > think about the BWT method as a generalisation of PPM because
>
> > 1) You have to consider Models where a context is suffix of an other
> > context
> > 2) In fact you can estimate the changings of the real context and then
> > work as in PPM
>
> I have the objections. First of all, let's disjoint two cases:
>
> (1) We use an original Burrow-Wheeler approach. It means: making a sorting
> (i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),
> coding.
>
> Advantages:
> (a) Fast modeling => fast compression.
> (b) Very fast decompression => this method will be used in practice in
> contrast to any prediction technique (i.e. statistical methods like PPM).
>
> Disadvantages:
> (a) A kind of abstraction from real model due to context forgetting =>
> unavoidable loss of efficiency (even with estimating techniques).
> (b) Block nature => loss of adaptivity => possible loss of efficiency.
>
> (2) We make context sorting and the result is used to facilitate the
> modeling. It can be useful in dictionary and statistical methods.
>
> Advantages:
> (a) Fast and simple modeling => possibility of making good results with low
> complexity.
>
> Disadvantages:
> (a) It speeds up the compression, but the complexity of decompression
> stays the same.
> (b) It is not a new method (not BW). Just a sorting.
>
> So, if you mean the first case, in general it won't guarantee toughest
> compression (see Disadvantages); second approach has no connection
> to Burrows-Wheeler method.
>
> And now I'll show the same from the point of practice:
>
> > My implementation gives
> > bib 111261 26469 1.903
> > book1 768771 218023 2.269
> > book2 610856 149734 1.961
> > geo 102400 53221 4.158
> > news 377109 114097 2.420
> > obj1 21504 10020 3.728
> > obj2 246814 74284 2.408
> > paper1 53161 16003 2.408
> > paper2 82199 24186 2.354
> > pic 513216 46131 0.719
> > progc 39611 12108 2.445
> > progl 71646 15028 1.678
> > progp 49379 10377 1.681
> > trans 93695 17106 1.461
>
> Results for PPMD(E) by Dmitry Shkarin ("-o5" option):
>
> bib 1.77
> book1 2.22
> book2 1.90
> geo 4.46
> news 2.28
> obj1 3.60
> obj2 2.30
> paper1 2.25
> paper2 2.22
> pic 0.79
> progc 2.28
> progl 1.60
> progp 1.62
> trans 1.37
>
> And speed was really impressive here. (PPMD usually passes ahead of
> BW-based programs (I mean coding+decoding time).)
>
> Conclusion: BW is interesting only from the point of speed. So, further
> sorting improvements are welcome. Greedy coding is not so important (slowing
> down doesn't worth it). It's my standpoint.
>
> Regards,
> Vladimir.
>
> E-mail: seme...@unitel.spb.ru
This is exactly the attitude that is needed for any great breakthough
to occur. There are so many things that are new and novel concepts
such as BWT which will only come into existance when we push to
discover them. The focus here seems to be exclusively on the modeling
improvements to BWT however, my work was firstly to build a better
coder for use with the BWT. Here, I have made many such improvements
which will further speed up the overall BWT performance (although I
realize that the real bottlneck in speed is not the coder but the sort
transform. however, every cycle counts.) My coder achieves just over
2.4 b/b on average for the Calgary Corpus without any special modeling
other than the generic BWT. There is no MTF, no RLE and no other
alterations to the output from BWT. Moreover, there is no arithmetic
coding so the whole encoding process is far faster. And decoding is
even faster still. Now, I need to start working on the more advanced
improvements to the basic BWT as is the focus here. If only I have
more time....
-michael maniscalco
> You forgot (c), Context blending and (d), BW doesn't need very much
memory.
Context blending - this is not an advantage here, because it (blending)
could be made in a better way. BTW BW method doesn't actually makes context
blending (we lose the real contexts). Just source transformation.
Memory: you mean 5N+alpha, but it doesn't provide determined speed
complexity.
> Hmm, it isn't neccesary for BW to get slower with better modeling. It's
> possible to improve both compression and speed...
Better and faster than comp.exe by Michael? If you can, show me how. For
example, your distance coding still has a lack of speed.
Regards,
Vladimir.
> If we say it's impossible from the beginning, we will never discover
> or gain anything. (Man had never walked on the moon either!)
I didn't mean that someone should stop working. BW is great and useful. I
only pointed to the direction (sorting improvement), which by my opinion may
lead to success.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
Hallo all of you.
Hi, Edgar !
>....
Better and faster than comp.exe by Michael? If you can, show me how. For
example, your distance coding still has a lack of speed.
And I have compared the compression rates to PPMDE D=5 as mentioned
by Jan Aberg
1) But there the results look slieghtly different
2) As I have allready mentioned that is not the best I can achieve.
3) It is much slower then gzip and therefore also slower than my one.
And what I meen with generalisation is the answer to the question why
Moffat's idea
to count the occured symbols only ones and not realy the frequencies
(going done in the tree)
not only improves speed but also the compression rate.
Regards,
Vladimir.
Regards
Vladimir Semenjuk wrote:
> > You forgot (c), Context blending and (d), BW doesn't need very much
> memory.
>
> Context blending - this is not an advantage here, because it (blending)
> could be made in a better way. BTW BW method doesn't actually makes context
> blending (we lose the real contexts). Just source transformation.
That's not necessary true. Contexts *can* be recovered after a BWT, during
the encoding phase (although not all of them and unlikely very large
contexts)... Still, to be fair, I have to admit that the speed drop and
increased memory usage makes recovering quite unpractical - and it only brought
so far relatively small compression improvements in my tests.
> Memory: you mean 5N+alpha, but it doesn't provide determined speed
> complexity.
With a LZP preprocessing step (that helps BTW compression too), you'll have
determined speed...
> > Hmm, it isn't neccesary for BW to get slower with better modeling. It's
> > possible to improve both compression and speed...
>
> Better and faster than comp.exe by Michael? If you can, show me how. For
> example, your distance coding still has a lack of speed.
Okay, it's difficult to get better & faster than comp.exe. Still, I think
the compression/speed tradeoff is reasonable with DC...
--
Edgar
>> Better and faster than comp.exe by Michael? If you can, show me how. For
>> example, your distance coding still has a lack of speed.
> Yes, I can do it even faster then gzip but I have not described it yet.
> You can do the hole encoding not much more expensiv then the decoding.
> It is even faster then Michaels restricted sorting where I still produce a
total order.
Note, comp.exe by Michael Shindler doesn't make sorting.
>And I have compared the compression rates to PPMDE D=5 as mentioned by Jan
Aberg
>1) But there the results look slieghtly different
You took a wrong program :-)
Read a part of my recent message again:
VS> Results for PPMD(E) by Dmitry Shkarin ("-o5" option).
Test it. You can download it for free (pointer to PPMD(E) could be found at
www.act.by.net).
>2) As I have allready mentioned that is not the best I can achieve.
>3) It is much slower then gzip and therefore also slower than my one.
Please, compare your block-sorter with IMP using "-1" option (LZ-based
method).
Console version is freeware. GZIP is old and too slow.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
I fully belive Bernhard when he says his programs are faster and
compress better than szip. If szip would have started not as demo
to show that limited sorting works at all I would have done several
things different.
szip modelling still is based on the modelling designed back in '96 -
maybe it's about time to exploit the free dictionary approach you get
during the sort and a few other things, as well as commenting the
#define EXTRAFAST line in the rangecod.c used inside szip,
as this costs some compression.
The IO of the compressed data should also change the to
something more reasonable than putchar/getchar. Something like
collecting it in mem and output the whole block once 64 kB are there --
I know putchar does this, but i could shave off some cycles there.
Program counter sampling shows that IO takes a large fraction of
the time. Also CPU time is just about 1/2 of real time in an idle
system, so I could win by parallel IO and sorting.
I should also set the tuning of the modelling more towards small
files and not to the default blocksize of 1.7MB to perform better on
calgary corpus, and maybe focus more on text too :)
And maybe I should use a huffman coder with 20MB/sec throughput
instead of a rangecoder. Or mixed coders.
But then: all of above would break compatibility, and would mean I'd
have to spend some time on it.
When comparing with my restricted sorting: compare with order 4
(-o4 switch), as the other orders are really slow :(
Beeing faster than gzip -9 or even -6 is no miracle; use -1 (or program
imp) for a test. Or if one likes a real challange: beat LZOP (by Markus
Oberhumer) speedwise.
As for counting occurred symbols (and not frequencies): szip does this
since 2 years, so it probably is an old idea. Most of the time it just
counts
the symbols of the symbols of the last 32 runs - which implies that the
probability is never more accurate than 1/32. Nevertheless it gives
acceptable compression ;)
regards
Michael
--
DI Michael Schindler Compression Consulting
http://www.compressconsult.com/ mic...@compressconsult.com
Bernhard Balkenhol wrote in message
<38EDF352...@mathematik.uni-bielefeld.de>...
>Yes, I can do it even faster then gzip but I have not described it yet.
>You can do the hole encoding not much more expensiv then the decoding.
>It is even faster then Michaels restricted sorting where I still produce a
total order.
>And I have compared the compression rates to PPMDE D=5 as mentioned by Jan
>Aberg
>1) But there the results look slieghtly different
>2) As I have allready mentioned that is not the best I can achieve.
>3) It is much slower then gzip and therefore also slower than my one.
>
> > Context blending - this is not an advantage here, because it (blending)
> > could be made in a better way. BTW BW method doesn't actually makes
> > context
> > blending (we lose the real contexts). Just source transformation.
> That's not necessary true. Contexts *can* be recovered after a BWT,
> during the encoding phase (although not all of them and unlikely very
> large contexts)...
Look. We usually make BWT by sorting the real contexts. Then we as if forget
about the reality. Now you propose making the recovery. It looks really
strange. May be it would be better not to do BWT at all. Just try PPM, CTW
or Associative Coding. For speeding-up use sorting, but don't make BWT. Be
sure you will get much more.
*******************************************************
For everyone entrained into this discussion:
I try to show that we shouldn't improve BWT so as it become a variation
of some well-known prediction technique. Doing so we lose the charm of
original Burrows-Wheeler algorithm. Well, just don't name it BWT.
*******************************************************
> > Memory: you mean 5N+alpha, but it doesn't provide determined speed
> > complexity.
>
> With a LZP preprocessing step (that helps BTW compression too), you'll
> have determined speed...
It's a good and quite usual decision, but
(1) LZP step will take a portion of time;
(2) widely used technique is not optimal (it should (I guess it's possible)
be improved without LZP).
> Okay, it's difficult to get better & faster than comp.exe. Still, I
> think the compression/speed tradeoff is reasonable with DC...
Distance Coding is a fresh and elegant decision. You necessarily should
write a paper.
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
This is what I think you should put your focus on.
Again, it would be nice if you could move the source code to your ftp
server!
Regards,
Mikael
In article <38EDF352...@mathematik.uni-bielefeld.de>,
Bernhard Balkenhol <bern...@mathematik.uni-bielefeld.de> wrote:
>
> --------------A04C4D17A8E67E3C66AADDFA
> Content-Type: text/plain; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> Vladimir Semenjuk wrote:
>
> >
>
> Hallo all of you.
>
> > Hi, Edgar !
> > >....
> > Better and faster than comp.exe by Michael? If you can, show me how. For
> > example, your distance coding still has a lack of speed.
> >
>
> Yes, I can do it even faster then gzip but I have not described it yet.
> You can do the hole encoding not much more expensiv then the decoding.
> It is even faster then Michaels restricted sorting where I still produce a
> total order.
>
> And I have compared the compression rates to PPMDE D=5 as mentioned by Jan
> Aberg
> 1) But there the results look slieghtly different
> 2) As I have allready mentioned that is not the best I can achieve.
> 3) It is much slower then gzip and therefore also slower than my one.
>
> And what I meen with generalisation is the answer to the question why Moffat's
> idea
> to count the occured symbols only ones and not realy the frequencies (going
> done in the tree)
> not only improves speed but also the compression rate.
>
> >
> > Regards,
> > Vladimir.
>
> Regards
> Bernhard
>
> --
> ---------------------------------------------------------------------------
> Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de
> Fakultaet fuer Mathematik Tel. +0049-521-106-3872
> Universitaet Bielefeld
> Postfach 10 01 31 Fax +0049-521-106-4743
> D - 33501 Bielefeld
> Germany
>
> --------------A04C4D17A8E67E3C66AADDFA
> Content-Type: text/html; charset=us-ascii
> Content-Transfer-Encoding: 7bit
>
> <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
> <html>
> Vladimir Semenjuk wrote:
> <blockquote TYPE=CITE> </blockquote>
> Hallo all of you.
> <blockquote TYPE=CITE>Hi, Edgar !
> <br>>....
> <br>Better and faster than comp.exe by Michael? If you can, show me how.
> For
> <br>example, your distance coding still has a lack of speed.
> <br> </blockquote>
> Yes, I can do it even faster then gzip but I have not described it yet.
> <br>You can do the hole encoding not much more expensiv then the decoding.
> <br>It is even faster then Michaels restricted sorting where I still produce
> a total order.
> <p>And I have compared the compression rates to PPMDE D=5 as mentioned
> by Jan Aberg
> <br>1) But there the results look slieghtly different
> <br>2) As I have allready mentioned that is not the best I can achieve.
> <br>3) It is much slower then gzip and therefore also slower than my one.
> <p>And what I meen with generalisation is the answer to the question why
> Moffat's idea
> <br>to count the occured symbols only ones and not realy the frequencies
> (going done in the tree)
> <br>not only improves speed but also the compression rate.
> <blockquote TYPE=CITE>
> <br>Regards,
> <br>Vladimir.</blockquote>
>
> <p><br>Regards
> <br> Bernhard
> <br>
> <pre>--
> ---------------------------------------------------------------------------
> Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de
> Fakultaet fuer Mathematik Tel. +0049-521-106-3872
> Universitaet Bielefeld
> Postfach 10 01 31 Fax +0049-521-106-4743
> D - 33501 Bielefeld
> Germany</pre>
> </html>
>
> --------------A04C4D17A8E67E3C66AADDFA--
Vladimir Semenyuk wrote:
> > That's not necessary true. Contexts *can* be recovered after a BWT,
> > during the encoding phase (although not all of them and unlikely very
> > large contexts)...
>
> Look. We usually make BWT by sorting the real contexts. Then we as if forget
> about the reality. Now you propose making the recovery. It looks really
> strange. May be it would be better not to do BWT at all. Just try PPM, CTW
Okay, you have a point here...
> or Associative Coding. For speeding-up use sorting, but don't make BWT. Be
> sure you will get much more.
But it just happens that I like BWT :-) Well, maybe after I'm finished with
implementing all what I have in mind for DC I'll start with CTW...
> > With a LZP preprocessing step (that helps BTW compression too), you'll
> > have determined speed...
>
> It's a good and quite usual decision, but
>
> (1) LZP step will take a portion of time;
Wich, if good implemented will be compensated by faster sorting.
> (2) widely used technique is not optimal (it should (I guess it's possible)
> be improved without LZP).
Yes, it's not optimal - but not very bad either. If BWT compression on files
with long matches can get better & faster without LZP I would like to hear about
that method.
> > Okay, it's difficult to get better & faster than comp.exe. Still, I
> > think the compression/speed tradeoff is reasonable with DC...
>
> Distance Coding is a fresh and elegant decision. You necessarily should
> write a paper.
Well, I'm planning to write a paper for a while now, but I never found the
time to do it. Hopefully in the near future...
--
Edgar
Hi, Bernhard !Hi, Vladimir
>.....
VS> Results for PPMD(E) by Dmitry Shkarin ("-o5" option).
Test it. You can download it for free (pointer to PPMD(E) could be found at
www.act.by.net).
Thank you I will do it.
Please, compare your block-sorter with IMP using "-1" option (LZ-based
method).
Console version is freeware. GZIP is old and too slow.
I agree.
By the way on the side you have mentioned, notice that the calculation
of the average
compression rates on that side is different of that one in the publications
and in fact not fare.
For each file
rate = compressed size (in Bytes) *8 / orig size (in bytes)
for the average:
sum of rates / number of files.
The reason is that that otherwise you double the influence of the long files.
Regards,
Bernhard
Regards,
Vladimir.E-mail: seme...@unitel.spb.ru
> By the way on the side you have mentioned, notice that the calculation
> of the average compression rates on that side is different of that one in
> the publications and in fact not fare.
Fully agree ! It's truth as a fact that Calgary and Canterbury
are too far from reality ;-)
Regards,
Vladimir.
E-mail: seme...@unitel.spb.ru
Lots of discussion on BWT. Actually, does the algorithm is asymptotic
optimality? If yes, how can prove that?
A lot of theory but if you're really interested ...
Mikael
In article <38F2E2...@uxmail.cityu.edu.hk>,
It is wrong proof. Standard blocksorting scheme(BWT + MTF + entropy
coding) is asymptotically nonoptimal.
Example: order-0 source with nonuniform symbol distribution.
1) BWT will not change source properties, because permutation does not
affects to symbol probabilities;
2) MTF will flatten symbol distribution(it is its property);
3) AriCoder will encode this corrupted distribution;
2Bernhard: I can not see your messages, perhaps, You use html-format. Plain
ASCII text is standard for Usenet forums, use it, please.
Regards,
Mikael
>Note that bwt is really just the sorting phase, nothing else!
>The problem for us developing bwt lies mainly in the other areas,
>mostly in 2) but also in 3).
>I'm working on that.
Something similar to D.Baron & Y.Bresler`s approach can provide
asymptotic optimality, but it will lead us to the loss of simplicity.
NB: asymptotic optimality has small relation to the real performance.
Mikael
In article <8d1u20$1c9h$2...@gavrilo.mtu.ru>,
>However, I don't think it's wise to stare ourselves blind on theory
>when practical experiments often can lead to surprising results.
>As for complexity, the ppm scheme is *very* complex (complicated).
>But that doesn't stop you from trying to improve it, does it?
It is matter of preferences, for me, online PPM is more simple than
multipass BWT, but I`m partial person to this area ;-).
Some notes on compressors comparision:
1) Calgary(CC) and Canterbury corpora were designed for comparision of
*universal* lossless data compression algorithms, so various data type
dependent tricks, plays with program switches etc. are not allowed. In
opposite case, one can easily invent method that compresses each file of CC
to 4 bits and expands any other file on the same 4 bits;
2) Criterion for comparision is unweighted average bpb, but not total
archive size. See any paper on compression;