4 views

Skip to first unread message

Mar 24, 2000, 3:00:00 AM3/24/00

to

Hi all,

Anybody interested is welcome to take a look at a paper

I recently showed at the CISS conference.

The paper uses the Burrows Wheeler transform (BWT) to

preprocess a sequence of symbols. Afterwards a block-merging

algorithm estimates states, parameters, etc.

The main contributions are:

1. the bvlock mergins estimates the source well asymptotically

2. application to universal coding with lower redundancies than

previous BWT schemes

3. iid assumption not used (weaker assumption used)

It is downloadable from my website...

Dror

--------------------------

Dror Baron

ECE Department

University of Illinois at Urbana-Champaign(UIUC)

CSRL room 120

Phone: (217)-265-8234 (office)

Website: http://www.uiuc.edu/~dbaron

Mar 25, 2000, 3:00:00 AM3/25/00

to

Hi, Dror!

Note: Really, BWT has the same asimptotical complexity as PPM - O(d*N),

where d is average context length.

Mar 25, 2000, 3:00:00 AM3/25/00

to

Hi!

Thanks for your interest!

However, your claim is inaccurate. (I'll use LATEX for math:)

For modelling sources of depth $d$ PPM is indeed $O(d\cdot{N})$

where $N$ is the string length. However, usually $d=O(\log{N})$.

Thus, PPM is typically $O(N\log{N})$.

(NOTE: I got this confirmed by the author of the original

papers on PPM)

The BWT has several variants:

1. Sadakane's method is $O(N\log{N})$ worst case and $O(N\cdot{AML})$

typical case where AML is average matching length,

which is $O(H\cdot\log{N})$ where $H$ is the entropy rate.

2. Suffix trees are $O(N|X|)$ worst case, where $X$ is the alphabet

and $|X|$ its cardinality.

3. Suffix trees with hash tables are better yet...

Therefore, asymptotically $O(\log{N})$ is worse than $|X|$ which

a constant (not to mention the hash table implementations).

Lastly, should you think that CTW (context tree weighting) is

better, I talked at length with one of the authors last week

and he became convinced that his method is $O(N\log{N})$.

By the way, my paper (www.uiuc.edu/~dbaron and look for it)

discusses these issues briefly with pointers to the literature.

Again, thanks for the interest!

Dror

Mar 27, 2000, 3:00:00 AM3/27/00

to

Hi, Dror!

>

>Thanks for your interest!

>However, your claim is inaccurate. (I'll use LATEX for math:)

Yes, sorry, it was error: d must be replaced with |X|.

BWT and PPM uses the same data structure, so their asimptotical

complexity must be the same.

>

>Thanks for your interest!

>However, your claim is inaccurate. (I'll use LATEX for math:)

BWT and PPM uses the same data structure, so their asimptotical

complexity must be the same.

Mar 27, 2000, 3:00:00 AM3/27/00

to

Hello, Dror !

> Anybody interested is welcome to take a look at a paper

> I recently showed at the CISS conference.

> The paper uses the Burrows Wheeler transform (BWT) to

> preprocess a sequence of symbols. Afterwards a block-merging

> algorithm estimates states, parameters, etc.

Good article. The approach can be found very useful in practical BW-schemes

(I advise to read the paper everybody who works on BWT).

Here are some notes and suggestions:

> The main contributions are:

> 1. the bvlock mergins estimates the source well asymptotically

Can you produce the results for Calgary?

> 2. application to universal coding with lower redundancies than

> previous BWT schemes

Well, but what in practice? K-T estimating is not good for

BW-postprocessing, because the transformed sequence usually has a kind of

correlation (it's not memoryless). So, the usual approaches are preferred

and the merging algorithm here can help to adjust the coding parameters

(statistics reset, etc). Going this way we may successfully reject

non-optimal MTF-coding.

> 3. iid assumption not used (weaker assumption used)

Block-oriented character is the main disadvantage of the represented

technique. We assume that the information birth is nonrecurrent action. It

put some limitations on the source (from this point CTW has a prime

advantage). Therewith, I'd like to note that such method as well as original

Burrows-Wheeler algorithm is not adaptive.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Mar 27, 2000, 3:00:00 AM3/27/00

to

However, in PPM we (asymptotically) model the source as

depth-$\log{N}$ thus at best $\log{N}$ operations per symbol are

necessary. For BWT, nothing is being modelled explicitly.

We need (suffix arrays) $|X|$ (without hash tables)

operations per symbol.

For practical purposes $|X|$ is pretty big, but I'm sure you

know how that can be overcome.

So... asymptotically PPM is *not* linear complexity whereas

BWT *is* linear.

I suggest you go through references [2],[8],[13],[14], and [15]

in my paper.

Remember that the basic difference is that PPM is explicitly

modelling the source while BWT is only running a sort.

Thanks again,

Dror

Mar 27, 2000, 3:00:00 AM3/27/00

to

Dear Vladimir,

Thanks for your comments!

I'll try to respond to your suggestions.

> Can you produce the results for Calgary?

I don't have simulation results yet.

However, I believe that for practical purposes

the block lengths will be quite large, so I

would not expect amazing results.

I guess we'll need to be patient.

> Well, but what in practice? K-T estimating is not good for

> BW-postprocessing, because the transformed sequence usually has a kind of

> correlation (it's not memoryless). So, the usual approaches are preferred

> and the merging algorithm here can help to adjust the coding parameters

> (statistics reset, etc). Going this way we may successfully reject

> non-optimal MTF-coding.

Exactly. Instead of running MTF on the BWT output,

we begin by estimating state segments, afterwards we

code each one of those separately.

> Block-oriented character is the main disadvantage of the represented

> technique. We assume that the information birth is nonrecurrent action. It

> put some limitations on the source (from this point CTW has a prime

> advantage). Therewith, I'd like to note that such method as well as original

> Burrows-Wheeler algorithm is not adaptive.

Yes - the block-orientation is limiting.

Thanks again,

Mar 27, 2000, 3:00:00 AM3/27/00

to

Hi Vladimir,

We agree that good results on text are not expected.

The fundamental reason is that the algorithm is designed for modelling.

It isn't designed for text. I am sure you realize that I'm a

research-oriented PhD student and not a results-oriented software

developer - the flavor of my results will always be biased for

(what I consider to be) beauty and elegance and less for results.

By the way - I talked with Willems last week and he mentioned

they have some results on a variant of CTW which is about as

good as PPM. So I wouldn't rule anything out.

Last comment: from a theoretical viewpoint (universal coding

redundancies) CTW is extremely good.

BWT output will always be biased by $\log{N}$ per segment (see

Merhav's paper on MDL of piecewise stationary sources).

Therefore CTW is 'born' to be better than BWT.

> > Exactly. Instead of running MTF on the BWT output,

> > we begin by estimating state segments, afterwards we

> > code each one of those separately.

>

> Not quite right. Undoubtedly, there is a correlation between the segments.

> Therewith, separable coding means large statistics saving issues in a case

Theoretically you are correct - see reference [1] from my paper.

For practical purposes (justified by simulation) the correlation

between adjacent segments is negligible.

> BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed

> over BW-packers : )

Again: my emphasis is on theoretical research and not programming.

I am sure Dmitry is 10 times better at programming ;-)

At the end of the day there is an inherent bound on

the complexity of compression algorithms: it is $O(N\log{|X|})$.

Anybody who claims to be better is cheating somewhere.

Nobody can do real $O(N)$. Current BW algorithms are $O(N|X|)$.

Dror

Mar 28, 2000, 3:00:00 AM3/28/00

to

Hello, Dror !

> > Can you produce the results for Calgary?

>

> I don't have simulation results yet.

> However, I believe that for practical purposes

> the block lengths will be quite large, so I

> would not expect amazing results.

> I guess we'll need to be patient.

I have a serious doubt about amazing results:

(1) for BW (here and later Burrows-Wheeler method): as far as I know,

Michael Shindler's SZIP makes a similar thing in some aspects, but the

results are not very impressive;

(2) for CTW: well, your technique may be good in speed (especially, if radix

sort is in use (I mean partial context sort)), but what about the ratio? As

I said, it's not adaptive, and the estimate is not so clever as in CTW. So,

the compression effect may be poor for this class of algorithms;

(3) big trouble: the need for creating byte-oriented version. It's not easy.

One way is doing something like Tjalkens, Volf and Willems described in "A

Context-Tree Weighting Method for Text Generating Sources". This approach

looks awkward. Unfortunately, for today I don't see the other way.

> > Well, but what in practice? K-T estimating is not good for

> > BW-postprocessing, because the transformed sequence usually has a kind

of

> > correlation (it's not memoryless). So, the usual approaches are

preferred

> > and the merging algorithm here can help to adjust the coding parameters

> > (statistics reset, etc). Going this way we may successfully reject

> > non-optimal MTF-coding.

>

> Exactly. Instead of running MTF on the BWT output,

> we begin by estimating state segments, afterwards we

> code each one of those separately.

Not quite right. Undoubtedly, there is a correlation between the segments.

Therewith, separable coding means large statistics saving issues in a case

of non-adaptive method. So, I guess postprocessing should be adaptive and

the adaptation must depend on segments boundaries. BTW the separation can be

made during the sequential coding after the boundary is reached (an exact

boundary hit is not so important). I mean coder and decoder that

synchronously analyze the data behavior.

[from the other discussion]

>However, in PPM we (asymptotically) model the source as

>depth-$\log{N}$ thus at best $\log{N}$ operations per symbol are

>necessary. For BWT, nothing is being modelled explicitly.

>We need (suffix arrays) $|X|$ (without hash tables)

>operations per symbol.

Modern PPM algorithms don't require hashing. PATRICIA structures are often

used instead.

> For practical purposes $|X|$ is pretty big, but I'm sure you

> know how that can be overcome.

> So... asymptotically PPM is *not* linear complexity whereas

> BWT *is* linear.

BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed

over BW-packers : )

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Mar 28, 2000, 3:00:00 AM3/28/00

to

Dror Baron <dba...@grenada.csl.uiuc.edu> wrote:

>> BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed

>> over BW-packers : )

>

>> BTW Dmitry's PPMD (variant E) packer as a rule take the lead in coding speed

>> over BW-packers : )

>

>Again: my emphasis is on theoretical research and not programming.

>I am sure Dmitry is 10 times better at programming ;-)

>I am sure Dmitry is 10 times better at programming ;-)

I think the point must be made that Dmitry's PPMD uses a suffix tree

approach which gives rise to an O(N) (the point being that the maximum

order and alphabet size are constant and so should be ignored :). This

is better than the O(N*log(N)) you describe for PPM algorithms in

general.

The O(N) that Dmitry achieves, makes use of the coherence between

adjacent contexts, and as such could also be applied to CTW models as

they use similar structures. This would mean that you could create a

CTW that has O(N) complexity as well.

ie. both PPM and CTW are able to achieve the lower bound in complexity

that you quote for 'any' compression algorithm.

Having said that, the current crop of CTW algorithms and PPM algorithm

implementations (not including Dmitry's PPMD that I know of) are

O(N*logN) (eg. RK uses an O(N*logN) implementation of PPM because the

extra complexity and memory saving techniques in my PPMZ destroys the

coherence needed for an O(N) implementation).

Also of note is that complexity analysis is only useful

asymptotically. The classic example is that bubble sort is faster than

quick sort for small sets since the constant per 'operation' is much

lower than quick sort. So a comparison of algorithms should take into

account the differences in constant time per 'operation' (eg. CTW uses

binary decomposition giving rise to at least 8 times the work of PPM).

Just my few points :)

Malcolm

Mar 28, 2000, 3:00:00 AM3/28/00

to

Hello, Dror !

> By the way - I talked with Willems last week and he mentioned

> they have some results on a variant of CTW which is about as

> good as PPM.

Note: I have nothing against CTW. Compression results are really great. But

truth to tell: because of the slow decoding method will never be widely used

in practice (as well as PPM, DMC, PSM). CTW remains nothing but admirable

theoretical issue.

> > > Exactly. Instead of running MTF on the BWT output,

> > > we begin by estimating state segments, afterwards we

> > > code each one of those separately.

> >

> > Not quite right. Undoubtedly, there is a correlation between the

segments.

> > Therewith, separable coding means large statistics saving issues in a

case

>

> Theoretically you are correct - see reference [1] from my paper.

> For practical purposes (justified by simulation) the correlation

> between adjacent segments is negligible.

Don't agree. Saying that you imply the particular class of sources. The

number of counter-examples I know allows to make an assumption that

correlation exists and we can't leave it out of account, especially if we

are in need of a clever statistical information storing mechanism in a case

of 256-symbol alphabet.

.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Mar 28, 2000, 3:00:00 AM3/28/00

to

> Don't agree. Saying that you imply the particular class of sources. The

> number of counter-examples I know allows to make an assumption that

> correlation exists and we can't leave it out of account, especially if we

> are in need of a clever statistical information storing mechanism in a case

> of 256-symbol alphabet.

> number of counter-examples I know allows to make an assumption that

> correlation exists and we can't leave it out of account, especially if we

> are in need of a clever statistical information storing mechanism in a case

> of 256-symbol alphabet.

Obviously if you showed me some of these examples I would be

extremely grateful.

Dror

Mar 28, 2000, 3:00:00 AM3/28/00

to

Hi Malcolm,

Maximum order grows with $N$ so I don't agree with that approach.

Alphabet size - that's fine.

Theoretically, as one of the guys in this correspondence pointed out,

there is no reason to attain the $O(N\log{|X|})$ bound once we can

do $O(N|X|)$, because we can 'go down' to smaller alphabets.

> ie. both PPM and CTW are able to achieve the lower bound in complexity

> that you quote for 'any' compression algorithm.

This lower bound is simply the length in bits of the original

(uncompressed) version. I'm sure you agree we can't do better than

O(length original).

> Also of note is that complexity analysis is only useful

> asymptotically. The classic example is that bubble sort is faster than

> quick sort for small sets since the constant per 'operation' is much

> lower than quick sort. So a comparison of algorithms should take into

> account the differences in constant time per 'operation' (eg. CTW uses

> binary decomposition giving rise to at least 8 times the work of PPM).

Of course ;-)

Although I usually use the 'sort' in MATLAB which is

'other people's code' ;-)

Dror

Mar 28, 2000, 3:00:00 AM3/28/00

to

You propose compressing block by block.

This is drastically different than the proposed scheme.

This is drastically different than the proposed scheme.

> I agree. As a general rule BWT can not be better than PPM compressionwise;

> this

> was discussed on Dcc98 (not in the program, but among a few people). The

> reason is

> mainly that the BWT uses wrong statistics for the beginning of new

> contexts/buckerts/whatever.

> This is also the reason for improved compression for bigger blocks - the

> loss due to

> adaption is (nearly) the same for a big and a small block as the number of

> different contexts

> is about the same. For the big block the percentage of such badly compressed

> data will

> simply be smaller.

>

> I did a few experiments and these are approximate data:

> if you take a big text (maybe out of the canterbury large file corpus) and

> run statistics individually

> for the first few bytes of a context/bucket/whatever and for the remainder

> you will notice that

> the compression in the beginning is very low (3 bytes/char or more for the

> first few chars)

> while it then becomes good.

> However for large blocks (above 1 or 2 MB) the fraction of these badly

> compressed bytes is low.

> So do not expect too much from larger blocksizes. I have szip versions

> capable of handling blocks up to

> 2^31-1, but i never really tested above 10MB or so.

> My rule of thumb: limit(size) = 2*(size(using 4MB blocks)-size(using 2MB

> blocks)).

>

> >(1) for BW (here and later Burrows-Wheeler method): as far as I know,

> >Michael Shindler's SZIP makes a similar thing in some aspects, but the

> >results are not very impressive;

>

> szip does a partial context sort, the context size is selectable by -o

> switch. Try -o4 for speed, which

> will do two base 2^16 LSB first radix sorts between writing "processing..."

> and "coding..." (-v1 to see it)

> The time before processing is reading the input, between coding and done

> doing the modelling

> and output. szip does no asyncroneous IO; experiments indicate that when

> reading and sorting

> in seperate threads speed for -o4 would double - is anyone interested in

> this?

>

> The szip version available does no context border estimation. I found that

> the 2% smaller size was

> not worth 20% more CPU time. I did such experiments, so these 2% smaller for

> files on my harddisk

> are tested. Szip has other places too where it prefers speed over

> compression.

>

> The main benefit of limiting sort order is that you could easily do BWT in

> hardware - MTF is not a

> real problem, and so are entropy coders in hardware.

>

> There are ways to get better compression out of BWT than currently

> available.

>

> michael

>

> [snipped]

>

>

>

Mar 29, 2000, 3:00:00 AM3/29/00

to

hi!

i'd like to add my 0.02 euro to this discussion:

Vladimir Semenyuk wrote in message <8bof54$m56$1...@news.runnet.ru>...

>Hello, Dror !

>

>> > Can you produce the results for Calgary?

>>

>> I don't have simulation results yet.

>> However, I believe that for practical purposes

>> the block lengths will be quite large, so I

>> would not expect amazing results.

>> I guess we'll need to be patient.

>

>I have a serious doubt about amazing results:

I agree. As a general rule BWT can not be better than PPM compressionwise;

Mar 29, 2000, 3:00:00 AM3/29/00

to

Hi, Dror!

>The data structures are definitely similar.

>The data structures are definitely similar.

>However, in PPM we (asymptotically) model the source as

>depth-$\log{N}$ thus at best $\log{N}$ operations per symbol are

>necessary. For BWT, nothing is being modelled explicitly.

>We need (suffix arrays) $|X|$ (without hash tables)

>operations per symbol.

>depth-$\log{N}$ thus at best $\log{N}$ operations per symbol are

>necessary. For BWT, nothing is being modelled explicitly.

>We need (suffix arrays) $|X|$ (without hash tables)

>operations per symbol.

Let's divide:

1. Finite order PPM.

It is easy: after statistics gathering(all possible states and all

possible transitions are created), it works similarly to ordinary order-0

arithmetic coder with cost proportional |X| per symbol, so PPM behaivor

is O(|X|*N).

2. Unbounded PPM*.

It is more complicated: number of operations per context is bounded by

|X| again and number of scanned contexts(number of escapes) is less or

equal |X| too, because We can visit only those nodes that contain different

number of transitions, their number is limited by |X|, tree nodes are added

in linear time, therefore PPM* behaivor is O(|X|*|X|*N).

Hmmm, it is slightly worse than my previous claim ;-).

3. CTW(bounded).

It is similar to PPM: finite model => finite time for each symbol.

4. CTW(unbounded, does it exists?).

It scans tree in depth for each symbol, tree depth is proportional

log(N), so it is really O(N*log(N)).

Mar 29, 2000, 3:00:00 AM3/29/00

to

Hi, Michael !

>>(1) for BW (here and later Burrows-Wheeler method): as far as I know,

>>Michael Shindler's SZIP makes a similar thing in some aspects, but the

>>results are not very impressive;

> szip does a partial context sort, the context size is selectable by -o

> switch. Try -o4 for speed, which

> will do two base 2^16 LSB first radix sorts between writing

>"processing..." and "coding..." (-v1 to see it)

I knew this ;-)

> The szip version available does no context border estimation. I found that

> the 2% smaller size was

> not worth 20% more CPU time. I did such experiments, so these 2% smaller

> for files on my harddisk

> are tested. Szip has other places too where it prefers speed over

> compression.

I am entirely with you in this. A few percents is not amazing result.

Saying "a similar thing in some aspects" I meant only the partial

renunciation of MTF. I must be mistaken, but it seems to me that SZIP

sometimes doesn't use MTF. Is it right?

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Mar 29, 2000, 3:00:00 AM3/29/00

to

Hi Dmitry,

Excellent - it seems that our understanding is converging.

There are nice $O(N|X|)$ algorithms, and they lead to $O(N\log{|X|})$.

Anyway BWT is unbounded and $O(N|X|)$ so it is fine.

By the way - CTW unbounded does indeed exist.

Willems told me that he has some paper on this.

I'll send you the reference when I come across it...

(Busy month)

Dror

Mar 30, 2000, 3:00:00 AM3/30/00

to

Hi, Malcolm!

>Also of note is that complexity analysis is only useful

>asymptotically. The classic example is that bubble sort is faster than

>quick sort for small sets since the constant per 'operation' is much

>lower than quick sort. So a comparison of algorithms should take into

>account the differences in constant time per 'operation' (eg. CTW uses

>binary decomposition giving rise to at least 8 times the work of PPM).

I agree with You that asymptotical and real behaivors are two very

different things and here is more actual example: standard blocksorting

scheme(BWT + MTF + entropy coding) is asymptotically nonoptimal, but it is

successfully used by modern compressors ;-).

>Also of note is that complexity analysis is only useful

>asymptotically. The classic example is that bubble sort is faster than

>quick sort for small sets since the constant per 'operation' is much

>lower than quick sort. So a comparison of algorithms should take into

>account the differences in constant time per 'operation' (eg. CTW uses

>binary decomposition giving rise to at least 8 times the work of PPM).

different things and here is more actual example: standard blocksorting

scheme(BWT + MTF + entropy coding) is asymptotically nonoptimal, but it is

successfully used by modern compressors ;-).

Mar 30, 2000, 3:00:00 AM3/30/00

to

Hi all!

I'm pleased to see that there are people still working to improve bwt.

Bwt still has a lot of room for that.

I just wish that you could have included more like examples and results!

I'm pleased to see that there are people still working to improve bwt.

Bwt still has a lot of room for that.

I just wish that you could have included more like examples and results!

Ppm and ctw etc. is mainly interesting from an academic point of view.

It's simply too slow and memory demanding for any real practical use.

Someone said

"We don't need more powerful computers, we need better algorithms!"

Too see parts of what is possible with bwt, visit

http://www.mathematik.uni-bielefeld.de/~bernhard/

I'm working to improve bwt too and my current

results are very similar to these.

I think it's possible to do better than Balkenhol

but it's getting more difficult.

The argument that better compression equals slower

execution is simply not true! (most of the time, lazy programmers?)

You ppm experts, perhaps you should use your expert

knowledge and energy to develop bwt instead?

Mikael

In article <Pine.GSO.4.21.00032...@grenada.csl.uiuc.edu>

,

Dror Baron <dba...@grenada.csl.uiuc.edu> wrote:

> Hi all,

>

> Anybody interested is welcome to take a look at a paper

> I recently showed at the CISS conference.

> The paper uses the Burrows Wheeler transform (BWT) to

> preprocess a sequence of symbols. Afterwards a block-merging

> algorithm estimates states, parameters, etc.

>

> The main contributions are:

> 1. the bvlock mergins estimates the source well asymptotically

> 2. application to universal coding with lower redundancies than

> previous BWT schemes

> 3. iid assumption not used (weaker assumption used)

>

> It is downloadable from my website...

>

> Dror

>

> --------------------------

> Dror Baron

> ECE Department

> University of Illinois at Urbana-Champaign(UIUC)

> CSRL room 120

> Phone: (217)-265-8234 (office)

> Website: http://www.uiuc.edu/~dbaron

>

>

Sent via Deja.com http://www.deja.com/

Before you buy.

Mar 30, 2000, 3:00:00 AM3/30/00

to

Hi,

mikael.l...@spray.se wrote:

> Ppm and ctw etc. is mainly interesting from an academic point of view.

> It's simply too slow and memory demanding for any real practical use.

Hmm, I don't agree here. PPMD (the compressor) shows that PPM can be

quite fast. I suspect that it's the same with CTW, but probably most of the

people that work on PPM/CTW don't value speed/memory requirements as much as

compression ratios...

> I think it's possible to do better than Balkenhol

> but it's getting more difficult.

Well, my BWT results are a bit better than Balkenhols, but I use a

completly different modelling technique...

> The argument that better compression equals slower

> execution is simply not true! (most of the time, lazy programmers?)

:-)

--

Edgar

Apr 6, 2000, 3:00:00 AM4/6/00

to

You propose compressing block by block.

This is drastically different than the proposed scheme.

> I agree. As a general rule BWT can not be better than PPM compressionwise;

> this

> was discussed on Dcc98 (not in the program, but among a few people). The

> reason is

> mainly that the BWT uses wrong statistics for the beginning of new

> contexts/buckerts/whatever.

> This is also the reason for improved compression for bigger blocks - the

> loss due to

> adaption is (nearly) the same for a big and a small block as the number of

> different contexts

> is about the same. For the big block the percentage of such badly compressed

> data will

> simply be smaller.

OK, I was also one of this people who has statet that. But now I am no longer shure

about that. It can been shown that you can think about the BWT method as a generalisation

of PPM because

1) You have to consider Models where a context is suffix of an other context

2) In fact you can estimate the changings of the real context and then work as in PPM

My implementation gives

bib 111261
26469 1.903

book1 768771
218023 2.269

book2 610856
149734 1.961

geo 102400
53221 4.158

news 377109
114097 2.420

obj1 21504
10020 3.728

obj2 246814
74284 2.408

paper1 53161
16003 2.408

paper2 82199
24186 2.354

pic 513216
46131 0.719

progc 39611
12108 2.445

progl 71646
15028 1.678

progp 49379
10377 1.681

trans 93695
17106 1.461

AVERAGE 14 :
786787 2.257

on the files of the Calgary Corpus

and that is possible in complexity similar to gzip (gzip not bzip or
something else)

Everybody who is interrested can find the describtion in a report

http://www.mathematik.uni-bielefeld.de/~bernhard

the reports are under the link "Preprints im SFB"

and the mentioned one is a work together with Yuri M. Shtarkov:

"One Attempt of a Compression Algorithm based on the BWT"

That is still not the best I can achieve, I estimate for the next version
which I like

to describe something like 2.20 in the average (hopefully).

Therefore as a conclusion at least for myself it is no longer clear
that we can not do better

than PPM or CTW.

> .....

> >(1) for BW (here and later Burrows-Wheeler method): as far as I know,

> >Michael Shindler's SZIP makes a similar thing in some aspects, but the

> >results are not very impressive;

> ....

> There are ways to get better compression out of BWT than currently

> available.

>

> michael

>

> [snipped]

>

With my best regards

Bernhard

-- --------------------------------------------------------------------------- Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de Fakultaet fuer Mathematik Tel. +0049-521-106-3872 Universitaet Bielefeld Postfach 10 01 31 Fax +0049-521-106-4743 D - 33501 Bielefeld Germany

Apr 6, 2000, 3:00:00 AM4/6/00

to

mikael.l...@spray.se wrote:

Hi,

Hi all!

I'm pleased to see that there are people still working to improve bwt.

Bwt still has a lot of room for that.

I just wish that you could have included more like examples and results!

I'm working to improve bwt too and my current

results are very similar to these.

I think it's possible to do better than Balkenhol

but it's getting more difficult.

The argument that better compression equals slower

execution is simply not true! (most of the time, lazy programmers?)

You ppm experts, perhaps you should use your expert

knowledge and energy to develop bwt instead?

The sorting can be done in between 1/4 and 1/2 of the time of Benson Sedwick with

similar memory requirement but not with that terible worst cases like pic. This

can be done with the same algorithm in time complexity smaller than with suffix trees.

As far as I have described it I let you know.

With regards

Bernhard

MikaelIn article <Pine.GSO.4.21.00032...@grenada.csl.uiuc.edu>

,

--------------------------------------------------------------------------- Dr. Bernhard Balkenhol e-Mail bern...@mathematik.uni-bielefeld.de

Apr 6, 2000, 3:00:00 AM4/6/00

to

Hello, Bernhard !

> OK, I was also one of this people who has statet that. But now

> I am no longer shure about that. It can been shown that you can

> think about the BWT method as a generalisation of PPM because

> 1) You have to consider Models where a context is suffix of an other

> context

> 2) In fact you can estimate the changings of the real context and then

> work as in PPM

I have the objections. First of all, let's disjoint two cases:

(1) We use an original Burrow-Wheeler approach. It means: making a sorting

(i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),

coding.

Advantages:

(a) Fast modeling => fast compression.

(b) Very fast decompression => this method will be used in practice in

contrast to any prediction technique (i.e. statistical methods like PPM).

Disadvantages:

(a) A kind of abstraction from real model due to context forgetting =>

unavoidable loss of efficiency (even with estimating techniques).

(b) Block nature => loss of adaptivity => possible loss of efficiency.

(2) We make context sorting and the result is used to facilitate the

modeling. It can be useful in dictionary and statistical methods.

Advantages:

(a) Fast and simple modeling => possibility of making good results with low

complexity.

Disadvantages:

(a) It speeds up the compression, but the complexity of decompression

stays the same.

(b) It is not a new method (not BW). Just a sorting.

So, if you mean the first case, in general it won't guarantee toughest

compression (see Disadvantages); second approach has no connection

to Burrows-Wheeler method.

And now I'll show the same from the point of practice:

> My implementation gives

> bib 111261 26469 1.903

> book1 768771 218023 2.269

> book2 610856 149734 1.961

> geo 102400 53221 4.158

> news 377109 114097 2.420

> obj1 21504 10020 3.728

> obj2 246814 74284 2.408

> paper1 53161 16003 2.408

> paper2 82199 24186 2.354

> pic 513216 46131 0.719

> progc 39611 12108 2.445

> progl 71646 15028 1.678

> progp 49379 10377 1.681

> trans 93695 17106 1.461

Results for PPMD(E) by Dmitry Shkarin ("-o5" option):

bib 1.77

book1 2.22

book2 1.90

geo 4.46

news 2.28

obj1 3.60

obj2 2.30

paper1 2.25

paper2 2.22

pic 0.79

progc 2.28

progl 1.60

progp 1.62

trans 1.37

And speed was really impressive here. (PPMD usually passes ahead of

BW-based programs (I mean coding+decoding time).)

Conclusion: BW is interesting only from the point of speed. So, further

sorting improvements are welcome. Greedy coding is not so important (slowing

down doesn't worth it). It's my standpoint.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Apr 6, 2000, 3:00:00 AM4/6/00

to

Hi,

Vladimir Semenyuk wrote:

[Bernhard: BWT as PPM generalisation]

> I have the objections. First of all, let's disjoint two cases:

>

> (1) We use an original Burrow-Wheeler approach. It means: making a sorting

> (i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),

> coding.

>

> Advantages:

> (a) Fast modeling => fast compression.

> (b) Very fast decompression => this method will be used in practice in

> contrast to any prediction technique (i.e. statistical methods like PPM).

You forgot (c), Context blending and (d), BW doesn't need very much memory.

> Disadvantages:

> (a) A kind of abstraction from real model due to context forgetting =>

> unavoidable loss of efficiency (even with estimating techniques).

> (b) Block nature => loss of adaptivity => possible loss of efficiency.

Well, this is correct...

> So, if you mean the first case, in general it won't guarantee toughest

> compression (see Disadvantages); second approach has no connection

> to Burrows-Wheeler method.

>

> And now I'll show the same from the point of practice:

[results comparision]

> And speed was really impressive here. (PPMD usually passes ahead of

> BW-based programs (I mean coding+decoding time).)

>

> Conclusion: BW is interesting only from the point of speed. So, further

> sorting improvements are welcome. Greedy coding is not so important (slowing

> down doesn't worth it). It's my standpoint.

Hmm, it isn't neccesary for BW to get slower with better modeling. It's

possible to improve both compression and speed...

--

Edgar

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hi!

What makes bwt worth the effort is that

1. It's relatively new and unexplored (1994) compared to ppm.

2. When Burrows and Wheeler released the idea to the world,

it could compress Calgary corpus to 2.40 bpc average.

Today we can compress the same using bwt to 2.26 bpc and it's not a

limit.

3. As Balkenhol is saying, bwt is close to ppm in what and how it's doing

it but faster and much more efficient with memory, hinting that it has

not reached its true potential.

If we say it's impossible from the beginning, we will never discover

or gain anything. (Man had never walked on the moon either!)

Mikael

In article <8ciotv$e53$1...@news.runnet.ru>,

"Vladimir Semenyuk" <seme...@unitel.spb.ru> wrote:

> Hello, Bernhard !

>

> > OK, I was also one of this people who has statet that. But now

> > I am no longer shure about that. It can been shown that you can

> > think about the BWT method as a generalisation of PPM because

>

> > 1) You have to consider Models where a context is suffix of an other

> > context

> > 2) In fact you can estimate the changings of the real context and then

> > work as in PPM

>

> I have the objections. First of all, let's disjoint two cases:

>

> (1) We use an original Burrow-Wheeler approach. It means: making a sorting

> (i. e. simple x^n->y^n as in your papers), forgetting about contexts (!),

> coding.

>

> Advantages:

> (a) Fast modeling => fast compression.

> (b) Very fast decompression => this method will be used in practice in

> contrast to any prediction technique (i.e. statistical methods like PPM).

>

> Disadvantages:

> (a) A kind of abstraction from real model due to context forgetting =>

> unavoidable loss of efficiency (even with estimating techniques).

> (b) Block nature => loss of adaptivity => possible loss of efficiency.

>

> (2) We make context sorting and the result is used to facilitate the

> modeling. It can be useful in dictionary and statistical methods.

>

> Advantages:

> (a) Fast and simple modeling => possibility of making good results with low

> complexity.

>

> Disadvantages:

> (a) It speeds up the compression, but the complexity of decompression

> stays the same.

> (b) It is not a new method (not BW). Just a sorting.

>

> So, if you mean the first case, in general it won't guarantee toughest

> compression (see Disadvantages); second approach has no connection

> to Burrows-Wheeler method.

>

> And now I'll show the same from the point of practice:

>

> > My implementation gives

> > bib 111261 26469 1.903

> > book1 768771 218023 2.269

> > book2 610856 149734 1.961

> > geo 102400 53221 4.158

> > news 377109 114097 2.420

> > obj1 21504 10020 3.728

> > obj2 246814 74284 2.408

> > paper1 53161 16003 2.408

> > paper2 82199 24186 2.354

> > pic 513216 46131 0.719

> > progc 39611 12108 2.445

> > progl 71646 15028 1.678

> > progp 49379 10377 1.681

> > trans 93695 17106 1.461

>

> Results for PPMD(E) by Dmitry Shkarin ("-o5" option):

>

> bib 1.77

> book1 2.22

> book2 1.90

> geo 4.46

> news 2.28

> obj1 3.60

> obj2 2.30

> paper1 2.25

> paper2 2.22

> pic 0.79

> progc 2.28

> progl 1.60

> progp 1.62

> trans 1.37

>

> And speed was really impressive here. (PPMD usually passes ahead of

> BW-based programs (I mean coding+decoding time).)

>

> Conclusion: BW is interesting only from the point of speed. So, further

> sorting improvements are welcome. Greedy coding is not so important (slowing

> down doesn't worth it). It's my standpoint.

>

> Regards,

> Vladimir.

>

> E-mail: seme...@unitel.spb.ru

Apr 7, 2000, 3:00:00 AM4/7/00

to

> What makes bwt worth the effort is that

> 1. It's relatively new and unexplored (1994) compared to ppm.

> 2. When Burrows and Wheeler released the idea to the world,

> it could compress Calgary corpus to 2.40 bpc average.

> Today we can compress the same using bwt to 2.26 bpc and it's not a

> limit.

> 3. As Balkenhol is saying, bwt is close to ppm in what and how it's

doing

> it but faster and much more efficient with memory, hinting that it has

> not reached its true potential.

>

> If we say it's impossible from the beginning, we will never discover

> or gain anything. (Man had never walked on the moon either!)

>

> Mikael

This is exactly the attitude that is needed for any great breakthough

to occur. There are so many things that are new and novel concepts

such as BWT which will only come into existance when we push to

discover them. The focus here seems to be exclusively on the modeling

improvements to BWT however, my work was firstly to build a better

coder for use with the BWT. Here, I have made many such improvements

which will further speed up the overall BWT performance (although I

realize that the real bottlneck in speed is not the coder but the sort

transform. however, every cycle counts.) My coder achieves just over

2.4 b/b on average for the Calgary Corpus without any special modeling

other than the generic BWT. There is no MTF, no RLE and no other

alterations to the output from BWT. Moreover, there is no arithmetic

coding so the whole encoding process is far faster. And decoding is

even faster still. Now, I need to start working on the more advanced

improvements to the basic BWT as is the focus here. If only I have

more time....

-michael maniscalco

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hi, Edgar !

> You forgot (c), Context blending and (d), BW doesn't need very much

memory.

Context blending - this is not an advantage here, because it (blending)

could be made in a better way. BTW BW method doesn't actually makes context

blending (we lose the real contexts). Just source transformation.

Memory: you mean 5N+alpha, but it doesn't provide determined speed

complexity.

> Hmm, it isn't neccesary for BW to get slower with better modeling. It's

> possible to improve both compression and speed...

Better and faster than comp.exe by Michael? If you can, show me how. For

example, your distance coding still has a lack of speed.

Regards,

Vladimir.

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hi, Mikael !

> If we say it's impossible from the beginning, we will never discover

> or gain anything. (Man had never walked on the moon either!)

I didn't mean that someone should stop working. BW is great and useful. I

only pointed to the direction (sorting improvement), which by my opinion may

lead to success.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hallo all of you.

Hi, Edgar !

>....

Better and faster than comp.exe by Michael? If you can, show me how. For

example, your distance coding still has a lack of speed.

You can do the hole encoding not much more expensiv then the decoding.

It is even faster then Michaels restricted sorting where I still produce a total order.

And I have compared the compression rates to PPMDE D=5 as mentioned
by Jan Aberg

1) But there the results look slieghtly different

2) As I have allready mentioned that is not the best I can achieve.

3) It is much slower then gzip and therefore also slower than my one.

And what I meen with generalisation is the answer to the question why
Moffat's idea

to count the occured symbols only ones and not realy the frequencies
(going done in the tree)

not only improves speed but also the compression rate.

Regards,

Vladimir.

Regards

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hi,

Vladimir Semenjuk wrote:

> > You forgot (c), Context blending and (d), BW doesn't need very much

> memory.

>

> Context blending - this is not an advantage here, because it (blending)

> could be made in a better way. BTW BW method doesn't actually makes context

> blending (we lose the real contexts). Just source transformation.

That's not necessary true. Contexts *can* be recovered after a BWT, during

the encoding phase (although not all of them and unlikely very large

contexts)... Still, to be fair, I have to admit that the speed drop and

increased memory usage makes recovering quite unpractical - and it only brought

so far relatively small compression improvements in my tests.

> Memory: you mean 5N+alpha, but it doesn't provide determined speed

> complexity.

With a LZP preprocessing step (that helps BTW compression too), you'll have

determined speed...

> > Hmm, it isn't neccesary for BW to get slower with better modeling. It's

> > possible to improve both compression and speed...

>

> Better and faster than comp.exe by Michael? If you can, show me how. For

> example, your distance coding still has a lack of speed.

Okay, it's difficult to get better & faster than comp.exe. Still, I think

the compression/speed tradeoff is reasonable with DC...

--

Edgar

Apr 7, 2000, 3:00:00 AM4/7/00

to

Hi, Bernhard !

>> Better and faster than comp.exe by Michael? If you can, show me how. For

>> example, your distance coding still has a lack of speed.

> Yes, I can do it even faster then gzip but I have not described it yet.

> You can do the hole encoding not much more expensiv then the decoding.

> It is even faster then Michaels restricted sorting where I still produce a

total order.

Note, comp.exe by Michael Shindler doesn't make sorting.

>And I have compared the compression rates to PPMDE D=5 as mentioned by Jan

Aberg

>1) But there the results look slieghtly different

You took a wrong program :-)

Read a part of my recent message again:

VS> Results for PPMD(E) by Dmitry Shkarin ("-o5" option).

Test it. You can download it for free (pointer to PPMD(E) could be found at

www.act.by.net).

>2) As I have allready mentioned that is not the best I can achieve.

>3) It is much slower then gzip and therefore also slower than my one.

Please, compare your block-sorter with IMP using "-1" option (LZ-based

method).

Console version is freeware. GZIP is old and too slow.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Apr 7, 2000, 3:00:00 AM4/7/00

to

hi!

I fully belive Bernhard when he says his programs are faster and

compress better than szip. If szip would have started not as demo

to show that limited sorting works at all I would have done several

things different.

szip modelling still is based on the modelling designed back in '96 -

maybe it's about time to exploit the free dictionary approach you get

during the sort and a few other things, as well as commenting the

#define EXTRAFAST line in the rangecod.c used inside szip,

as this costs some compression.

The IO of the compressed data should also change the to

something more reasonable than putchar/getchar. Something like

collecting it in mem and output the whole block once 64 kB are there --

I know putchar does this, but i could shave off some cycles there.

Program counter sampling shows that IO takes a large fraction of

the time. Also CPU time is just about 1/2 of real time in an idle

system, so I could win by parallel IO and sorting.

I should also set the tuning of the modelling more towards small

files and not to the default blocksize of 1.7MB to perform better on

calgary corpus, and maybe focus more on text too :)

And maybe I should use a huffman coder with 20MB/sec throughput

instead of a rangecoder. Or mixed coders.

But then: all of above would break compatibility, and would mean I'd

have to spend some time on it.

When comparing with my restricted sorting: compare with order 4

(-o4 switch), as the other orders are really slow :(

Beeing faster than gzip -9 or even -6 is no miracle; use -1 (or program

imp) for a test. Or if one likes a real challange: beat LZOP (by Markus

Oberhumer) speedwise.

As for counting occurred symbols (and not frequencies): szip does this

since 2 years, so it probably is an old idea. Most of the time it just

counts

the symbols of the symbols of the last 32 runs - which implies that the

probability is never more accurate than 1/32. Nevertheless it gives

acceptable compression ;)

regards

Michael

--

DI Michael Schindler Compression Consulting

http://www.compressconsult.com/ mic...@compressconsult.com

Bernhard Balkenhol wrote in message

<38EDF352...@mathematik.uni-bielefeld.de>...

>Yes, I can do it even faster then gzip but I have not described it yet.

>You can do the hole encoding not much more expensiv then the decoding.

>It is even faster then Michaels restricted sorting where I still produce a

total order.

>And I have compared the compression rates to PPMDE D=5 as mentioned by Jan

>Aberg

>1) But there the results look slieghtly different

>2) As I have allready mentioned that is not the best I can achieve.

>3) It is much slower then gzip and therefore also slower than my one.

>

Apr 8, 2000, 3:00:00 AM4/8/00

to

Hi, Edgar and All !

> > Context blending - this is not an advantage here, because it (blending)

> > could be made in a better way. BTW BW method doesn't actually makes

> > context

> > blending (we lose the real contexts). Just source transformation.

> That's not necessary true. Contexts *can* be recovered after a BWT,

> during the encoding phase (although not all of them and unlikely very

> large contexts)...

Look. We usually make BWT by sorting the real contexts. Then we as if forget

about the reality. Now you propose making the recovery. It looks really

strange. May be it would be better not to do BWT at all. Just try PPM, CTW

or Associative Coding. For speeding-up use sorting, but don't make BWT. Be

sure you will get much more.

*******************************************************

For everyone entrained into this discussion:

I try to show that we shouldn't improve BWT so as it become a variation

of some well-known prediction technique. Doing so we lose the charm of

original Burrows-Wheeler algorithm. Well, just don't name it BWT.

*******************************************************

> > Memory: you mean 5N+alpha, but it doesn't provide determined speed

> > complexity.

>

> With a LZP preprocessing step (that helps BTW compression too), you'll

> have determined speed...

It's a good and quite usual decision, but

(1) LZP step will take a portion of time;

(2) widely used technique is not optimal (it should (I guess it's possible)

be improved without LZP).

> Okay, it's difficult to get better & faster than comp.exe. Still, I

> think the compression/speed tradeoff is reasonable with DC...

Distance Coding is a fresh and elegant decision. You necessarily should

write a paper.

Regards,

Vladimir.

E-mail: seme...@unitel.spb.ru

Apr 8, 2000, 3:00:00 AM4/8/00