443 views

Skip to first unread message

Apr 29, 2000, 3:00:00 AM4/29/00

to

Hello people,

Today I tried to replace the MTF in my BWT coder with distance codes, like

Edgar Binder described earlier in his message (aaabccca -> 11501100). But it

was very confusing to see after compressing the distance code -output that

it increased the size of calgary corpus by 1 bpb compared to the very basic

MTF scheme (note that I did not use any special modeling, just plain simple

order-0). I've clearly misunderstod something here. Can anybody give me a

hint?

Regards,

Sami

Apr 30, 2000, 3:00:00 AM4/30/00

to

Hi,

Sami wrote:

Sorry, this is my fault. I thought I made it clear that I didn't make

any exclusions in this example... Well, there are at least three DC

properties that can be used for that:

#1. Obviously, there are always some known symbols when encoding a

distance, and their locations can be excluded. This means "aaabccca" -->

11301100 (I assume the start configuration "a..bc..." to be known to the

decoder)

#2. Probably not so obvious, one doesn't have to encode distances when

the next symbol is unknown to the decoder, because it has to be equal to

the current symbol (if that wouldn't be true, the symbol would remain

unknown after the encoding of the distance --> impossible). On the

example, with known start configuration asssumed this means:

"aaabccca" --> 300 (this property is equivalent with a "free" RLE)

#3. Not all distances are needed for successful decoding of the input

(encoding should stop when all symbols

are known):

"aaabccca" --> 30 (this should reduce the cost of encoding the initial

configuration)

So, let's try to decode 30, with "a..bc..." as starting configuration.

Pos #1, "a..bc...". The next symbol is unknown, so it must be an "a"

Pos #2, "aa.bc...". Next symbol unknown, must be "a"

Pos #3, "aaabc...". Decoded distance = 3, "b", "c" positions skipped

Pos #4, "aaabc..a". Decoded distance = 0, "b" doesn't appear anymore

Pos #5, "aaabc..a". Next symbol unknown, must be "c"

Pos #6, "aaabcc.a". Next symbol unknown, must be "c"

Pos #7, "aaabccca". Decoding finished (successfully :-)

Well, I realize this isn't the best explanation but I hope it makes this

clear.

The scheme described here achieves with simple order-0 modelling

compression equivalent with BZip. It depends though on your particular

way of handling big distances how close it'll get. My implementation of

straight distance coding (same as used in the public version of DC) gets

with order-0 modelling about 229k on book1, wich I think it's fairly

close to the optimum...

--

Edgar

May 1, 2000, 3:00:00 AM5/1/00

to

Binde...@T-Online.de (Binder Edgar) wrote in <390B6254.D5113AD2@T-

Online.de>:

>#1. Obviously, there are always some known symbols when encoding a

>distance, and their locations can be excluded. This means "aaabccca" -->

>11301100 (I assume the start configuration "a..bc..." to be known to the

>decoder)

Online.de>:

>#1. Obviously, there are always some known symbols when encoding a

>distance, and their locations can be excluded. This means "aaabccca" -->

>11301100 (I assume the start configuration "a..bc..." to be known to the

>decoder)

Thanks for the reply and description Edgar. It's really not your fault that

I got it all wrong, this happens when I don't think and just try things out.

I'll try again and do it as you described and see if I got it right this

time. :-)

Regards,

Sami

May 1, 2000, 3:00:00 AM5/1/00

to

Hi there!

The distance coding is efficient on short distances but longer ones

doesn't seem to be a correct measurement of a symbol's probability.

I think distance coding or "inversion frequencies" are interesting

enough to deserve more r&d though.

Have anyone tried any other variations of this?

Maybe some kind of hybrid?

Mikael

In article <390B6254...@T-Online.de>,

Binder Edgar <Binde...@T-Online.de> wrote:

> Hi,

>

> Sami wrote:

>

> > Today I tried to replace the MTF in my BWT coder with distance

> > codes, like Edgar Binder described earlier in his message (aaabccca

> > -> 11501100). But it was very confusing to see after compressing the

> > distance code -output that it increased the size of calgary corpus

> > by 1 bpb compared to the very basic MTF scheme (note that I did not

> > use any special modeling, just plain simple order-0). I've clearly

> > misunderstod something here. Can anybody give me a hint?

>

> Sorry, this is my fault. I thought I made it clear that I didn't make

> any exclusions in this example... Well, there are at least three DC

> properties that can be used for that:

>

> #1. Obviously, there are always some known symbols when encoding a

> distance, and their locations can be excluded. This means "aaabccca" -->

> 11301100 (I assume the start configuration "a..bc..." to be known to the

> decoder)

>

Sent via Deja.com http://www.deja.com/

Before you buy.

May 1, 2000, 3:00:00 AM5/1/00

to

Hi,

mikael.l...@spray.se wrote:

> The distance coding is efficient on short distances but longer ones

> doesn't seem to be a correct measurement of a symbol's probability.

You presume that because distance can be bigger than 255, right ?

Well, this seems to be compensated by exclusions... If you want, you can

try doing a small experiment with DC - namely compress with everything

disabled but the distance coder some text file (or any other file that

doesn't compress with RLE). Well, if you compare the result with an

order-0 arithmetic coder you'll find out that DC compresses as well / or

slightly better. The public distance coder uses only order-0 statistics

to do that, therefore it must be optimal in this sense... (you can make

the same experiment with MTF if you think it's necessary :-)

> I think distance coding or "inversion frequencies" are interesting

> enough to deserve more r&d though.

Yes, they're quite interresting....

> Have anyone tried any other variations of this?

> Maybe some kind of hybrid?

Well, I tryied a few and some are slightly better for some class of

files, but there isn't any constant improvement. Straight distance

coding seems to have the best speed/compression tradeoff.

--

Edgar

P.S. I might be unable to reply to the next messages as I'm leaving

tomorrow to Italy for a week...

Oct 30, 2014, 3:59:23 PM10/30/14

to

Hello all.

Can U give me some idea what is the best way for store start configuration?

суббота, 29 апреля 2000 г., 10:00:00 UTC+3 пользователь Sami написал:

> Hello people,

>

> Sami

Can U give me some idea what is the best way for store start configuration?

суббота, 29 апреля 2000 г., 10:00:00 UTC+3 пользователь Sami написал:

> Hello people,

>

> Today I tried to replace the MTF in my BWT coder with distance codes, like

> Edgar Binder described earlier in his message (aaabccca -> 11501100). But it

> was very confusing to see after compressing the distance code -output that

> it increased the size of calgary corpus by 1 bpb compared to the very basic

> MTF scheme (note that I did not use any special modeling, just plain simple

> order-0). I've clearly misunderstod something here. Can anybody give me a

> hint?

>

>

> Regards,
> Today I tried to replace the MTF in my BWT coder with distance codes, like

> Edgar Binder described earlier in his message (aaabccca -> 11501100). But it

> was very confusing to see after compressing the distance code -output that

> it increased the size of calgary corpus by 1 bpb compared to the very basic

> MTF scheme (note that I did not use any special modeling, just plain simple

> order-0). I've clearly misunderstod something here. Can anybody give me a

> hint?

>

>

>

> Sami

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu