Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Skybuck's Random Compression based on Paul's Random Compression Draft.

16 views
Skip to first unread message

Skybuck Flying

unread,
Jan 10, 2011, 5:50:49 PM1/10/11
to
Hello,

I took Paul's incomplete, inefficient and probably partially flawed idea and
turned it into a real complete, efficient and probably correct idea and
implementation, plus an extension idea as well to mimic paul's member
changing idea, but just switching between groups is already enough to
achieve compression and is probably best:

Here are the algorithm's rules and examples for encoder and decoder:

Legend:
1:, 2:, 3:, etc = numbering of groups
[] = group of bits
-> = becomes

*** Compression/Decompression Algorithm: ***

Encoder rules:

Group1:
Input Output
[00] -> [0]
[11] -> [11]
Additional outputs:
Group switch code: 10
Extension idea/bit for member switch: 0 for no switch, 1 for switch.
So group switch with member switch: 101
So group switch with member noswitch: 100

Group2:
[01] -> [0]
[10] -> [11]
Additional outputs:
Group switch code: 10
Extension idea/bit for member switch: 0 for no switch, 1 for switch.
So group switch with member switch: 101
So group switch with member noswitch: 100

Decoder rules:
[0] -> [00] if group1 active
[0] -> [01] if group2 active
[11] -> [11] if group1 active
[11] -> [10] if group2 active
Members could be switched of either group based on additional inputs bits:
Group switch code: 10
Extension idea/bit for member switch: 0 for no switch, 1 for switch.
So group switch with member switch: 101
So group switch with member noswitch: 100

*** Compression/Decompression example (without extension) ***:

Encoder:

Random input:

1 2 3 4 5 6 7 8 9 10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

ActiveGroup->Group1

1: [11]->[11] ActiveGroup->Group1
2: [10]->[10]+[11] ActiveGroup->Group2
3: [10]->[11] ActiveGroup->Group2
4: [01]->[0] ActiveGroup->Group2
5: [01]->[0] ActiveGroup->Group2
6: [10]->[11] ActiveGroup->Group2
7: [11]->[10]+[11] ActiveGroup->Group1
8: [01]->[10]+[0] ActiveGroup->Group2
9: [01]->[0] ActiveGroup->Group2
10: [01]->[0] ActiveGroup->Group2
11: [01]->[0] ActiveGroup->Group2
12: [01]->[0] ActiveGroup->Group2

Final encoder result: 7x1 + 8x2 = 23 bits
1 bit saved.

Decoder:

Input:
1 2 3 4 5 6 7 8 9 10 11 12
11 10 11 11 0 0 11 10 11 10 0 0 0 0 0

ActiveGroup->Group1

1: [11]->[11] ActiveGroup->Group1
2: [10 11]->[10] ActiveGroup->Group2
3: [11]->[10] ActiveGroup->Group2
4: [0]->[01] ActiveGroup->Group2
5: [0]->[01] ActiveGroup->Group2
6: [11]->[10] ActiveGroup->Group2
7: [10 11]->[11] ActiveGroup->Group1
8: [10 0]->[01] ActiveGroup->Group2
9: [0]->[01] ActiveGroup->Group2
10: [0]->[01] ActiveGroup->Group2
11: [0]->[01] ActiveGroup->Group2
12: [0]->[01] ActiveGroup->Group2

You welcome to try and blow any holes in it... I think it's hole-proof ! ;)
:) =D

Bye,
Skybuck =D


Skybuck Flying

unread,
Jan 10, 2011, 7:04:54 PM1/10/11
to
Further notes I like to point out:

The basic compression method will only achieve compression if the input
contains lot's of [00]'s or [01]'s since those will be compressed towards
[0].

However if the input contains a lot fo [11]'s or [10]'s then there are a
number of possibilities to try and compress such inputs as well.

I can think of two possible approaches:

1. Pre-scan the input to see which patterns occur the most and assign the
most occuring pattern to [0]. There are 4 possible patterns, the two most
occuring patterns are therefore assigned to [0] and the other two to [11].

2. The extension idea. The extension idea could be usefull and interesting
in two scenerio's:

2.1 Scenerio 1:

The input contains rougly the same distribution/occurences/frequencies for
all 4 patterns when statically counting. Therefore it is impossible to
assign them efficiently just based on statically counting.

Thus instead of statically counting a dynamic approach is taken, that's what
the extension idea is about. The extension idea simply switches the
assignments on the fly, but an additional bit which is appended to the
switch code.

This idea can be further enhanced in the following ways:

2.1.1 The basic idea of the extension idea, simplistic approach/predicting.

The first idea is very simple, simply assume that the last occuring member
occurs again and again and again. Thus simply switch member to last
occurence.

This method would also be simply to use in case the next method 2.1.2 is not
possible. (When scanning ahead is not possible)

2.1.2 Advanced/pre-emptively scanning/scan ahead.

Perhaps a more interesting idea is to pre-emptively scan/scan ahead.

Even if the distribution/occurences is 50% vs 50% in total... parts of it
might actually be distributed differently.

A simplistic scan could be performed to see if it's viable and efficient to
switch the members.

The additional bit for the switch code must probably always be included
otherwise the decoder could get confused because it wouldn't know if it was
switching yes or no... except if it was maybe ruled by a rule... like for
example after a certain ammount of counting an additional switch bit might
be present... but this might be too advanced... and "might" sounds
doubtfull... it would have to be there or not according to a rule... so
after a certain ammount of counts there should be a bit if this approach
taken... but for now this approach will be ignored to try and save 1 little
bit... instead the extension bit will always be present, but I thought i'd
just like to point out this additional saving possibility... anyway.. don't
let me confuse you with this part and move on:

A simplistic scan could simply count the number of bits saved by "switching"
or "not switching" members... thus it can help the decoder by providing
efficient encodings. The encoder simply counts what it would have gained if
it had switched or not switched. So both decisions are determined.

Ofcourse it could have switched multiple times but I think that is
irrelevant because always switching would be bad... it only needs to switch
if it comes to the conclusion that a switch after a certain ammount of
scanning ahead would have been efficient. None the less, such a
determination might not be too obvious... perhaps it is a combinatorial
problem... because I can imagine 3 bits plus output bits of 2 or 1 is 4 bits
or 5 bits of output... versus 1 bit saved... so overhead for a switch is 2
or 3 bits... but still for each switch made there are either 2 or 3 overhead
bits maybe even 4... so switching is expensive... none the less, if it's not
clear how to do an efficient scan ahead scanner then simply try and brute
force it... the number of paths/possibilities becomes large for only a few
groups scanned ahead... but some groups might be so inefficient that they
can be savely dismissed.

For example: a switch occured between a certain number of groups until the
next switch... for example the input changed... when the input changes...
there is only a certain number of switch possibilities which would have made
sense...

So to me it seems the number of possibilities to search is simply dependent
on the input changes... as long as the input does not change a different
decision does not have to be made... in others a new decision does not have
to be made... this also applies to when the input is outputted efficiently.

Only when the input changes from efficient to inefficient encoding (lost a
bit of text here because outlook express can't re-do... shitty oh welll, it
wasn't too much I will try to reproduce it it probably not that hard:) can
the assignements be questioned: is it still efficient or inefficient. When
the input changes from group1 to group2 or vice versa there is always
overhead, the group switch must occured. The decision is therefore only
based on the extension bit which I decided for now must be included as
well... the decision/choice which may be made is:
switch members in group or not.

The question is now does it matter if we switch members or not: The
extension bit is included anyway so that is not what it's about... what is
is about is: the next switch will only occur if we switch groups, but not if
we "switch members"... we never switch members without a group switch that
would be algorithm wrong and probably inefficient. We cannot do a member
switch only since 01 is not available this would conflict with output 0.

Therefore the question upon a group switch: "Do we switch members as well"
is an important one because we will be stuck with the members until the next
group switch.

However while a certain group is active... the input might change from the
efficient member to the inefficient member.

The question now becomes:

How many efficient members are there compared to inefficient members until
the next group switch:

Thus the algorithm for the scanner is now clear and easy:

Step 1 for scanner:

Scan input ahead of current position until the next group switch.

While scanning determine the ammount of member1 and the ammount of member2
(occurences).

The choice which member to pick is now clear: the member with the most
ammount of occurences should be the member to pick so that it's encoding
will use the least ammount of bits for maximum compression efficiency.

So in short: pick the most occuring member.

I have now presented to you an efficient look-ahead scanner. (The number of
possibilities theory/brute force can not be thrown into the waste basket
it's not needed anymore.)


2.2 Scenerio 2 ?

I forgot a little bit what scenerio 2 is... but it's probably a scenerio
where the input flip flops a lot between group1 and group2...

Yeah that's good to point out as well...

For now in it's initial form this compression method will only work well if
there is a lot of group1 members followed by group2 members... with long
periods between group switches...

Now some solutions:

2.2.1 Detecting many switches.

The encoder can detect many switches either by scanning ahead or by
dynamically counting and perhaps keeping track of overhead... or perhaps
keep track of a ratio between compressions and switches...

Whatever method is chosen... if the encoder detects many switches than this
is could be an indication that the members of the groups where not properly
chosen.

First I like to point out that I just blew a hole into this compression
method... I just spotted a weakness ! ;) :) "Many potential switches
depending on input".

The question is now can it be resolved... potentially by pre-scanning or
dynamically counting...

But the question is now two fold:

1. Can the algorithm be changed to use different groups without consequences
for the correctness of the algorithm ?

2. Can we perhaps change the algorithm dynamically ? Further question: is
not necessary to output this in the encodings... probably not... since the
decoder can detect the many switches as well for a dynamic solution. And
thus the decoder can come to the same conclusion as the encoder... a
different approach is needed.

Thus question 2 is answered... yes: both encoder and decoder can 1. detect
to many switches and 2. can act in a synchronized way to change algorithm
dynamically.

However what about the scanning solution ? the scanning solution for this
situation, not the situation above... might not be that great anyway...
because it's not dynamic.... I kinda like the dynamic solution... buyt the
problem with the dynamic solution is that it's mosterd after the meal... and
it assumes that the output will continue like the meal... but it might in
fact not change.

Therefore there is a benefit of scanning approach if the scanning approach
can produce dynamic output for the encoder/decoder as well... so that they
can adept quickly. That would be real nice.

However this requires additional outputs in the output stream to change
their operation... to be ready for the meal from the start.

It also then becomes a question if this additional output is worth it... but
before I explore these idea's further question 1 needs to be answered... can
the algorithm be changed at all without problems ?

Exploring question 1:

What is group1 is changed from:


[00]->[0]
[11]->[11]

to:
[00]->[0]
[01]->[11]
or
[00]->[0]
[10]->[11]
and so forth...

This is probably not a problem... because the way the input is grouped is
irrelevant because the problem was with the decode... the problem with the
decode is solved by using [0] and [11] as outputs and [10] as special output
switch indicator.

Therefore it is possible for the algorithm to make up it's own groups from
the start which is a really nice benefit/property of this algorithm !

Which is really great ! This solves the "weakness" of the "many switches".

Now an implementation can simply try out all possibilities to see which one
produces the least ammount of switches which would probably be best for
compression. Switches have to occur anyway when the input changes... so the
best groupings will probably produce the least switches.. the only question
left then to answer is: "to switch members as well or not..." that question
has already been answered above how to detect it ;) :)

The only question now left to explore is: Do we want to change the groupings
dynamically and if so how ?

Well an easy way would simply be to expand on the member concept... instead
of switching the member or instead of switching groups all possible group
pairs could be made...

So let's see how many possibilities there are:

Possibility1: Group1: [00][01] Group2: [10][11]
Possibility2: Group1: [01][00] Group2: [10][11]
Possibility3: Group1: [00][01] Group2: [11][10]
Possibility4: Group1: [01][00] Group2: [11][10]

Possibility5: Group1: [00][11] Group2: [10][01]
Possibility6: Group1: [11][00] Group2: [10][01]
Possibility7: Group1: [00][11] Group2: [01][10]
Possibility8: Group1: [11][00] Group2: [01][10]

Possibility9: Group1: [00][10] Group2: [11][01]
Possibility10: Group1: [10][00] Group2: [11][01]
Possibility11: Group1: [00][10] Group2: [01][11]
Possibility12: Group1: [10][00] Group2: [01][11]

So far I see 12 possibilities.

This means to indicate 0-to-11 requires 2^4 = 16, is 4 bits.

10 was available as a switch code. (See it as a prefix code)

Thus 10 + 4 bits could be used to indicate how to switch the
tables/groupings.

That's 6 bits for a switch plus additional output.

That's quite a lot... but if the encoder thinks it's worth it it could do
that.

Thus what method to use completely depends on the input...

Those using 6 bits for a switch for a dynamic algorithm is a bit risky and
probably not too be recommended...

But for a situation where a scan ahead algorithm is possible, especially a
full scan... all possible algorithm variations can be tried out to see which
one is best and which one to finally use...

Perhaps there is a more efficient way to switch...

The groups can be thought of as a 2d dimension:

----X---->
| Group1
| [A][B]
Y Group2
| [C][D]
\|/

Suppose the algorithm detects that group1 and group2 is not switching
efficiently... this means a switch has to occur from group1 to group2...
it's not that relevant how the switch occurs as long as one of the members
is moved... this can be thought of as a "cross switch".

members of the "groups are crossed".

For example:
D has to swap with A

There are only 4 places available.

However to be able to move from each place to each other place would be
ideal.

To describe 4 positions requires 2 bits.
So again 4 bits would be needed to indicate the cross-switch.

However we know that the switch occurs vertically so there are only two
possibilities for each group:

is A or B switched ?
is C or D switched ?

Thus we only need two bits to indicate the entire cross switch:
1 bit to select the member from group1 to switch.
1 bit to select the member from group2 to switch.

Therefore the total number of bits for a cross-switch is only 4 bits instead
of 6 bits.

A good improvement ! ;) :)

So for just 1 bit extra compared to the previous 3 bits for the original
extension idea...

This super extension idea requires 4 bits for a tremendously powerfull
extension which solves any potential weakness and offers great flexibility !
;) :)

However I can still imagine inputs where this great switching power is
absolutely not necessary and where only a group switch would suffice... so
again it's still up to the implemention to decide if this super switching
power is necessary based on the input that it scanned...

Pretty cool stuff ! ;) :)

So this pretty much answers questions 1 and 2.

Which leaves a little bit the following question: is this cross-switch
output necessary... not really as I already indicated.. it depends a bit on
the chosen form of the algorithm... if a scanner is chosen than yes.... if a
dynamic method is chosen than no... the encoder/decoder could keep track of
the number of switches/a ratio and if it becomes bad chose a different
grouping based on results so far... so many different possibilities for
algorithm implementations.
All should probably be tried for any given input to see which one is best.

A little field at the start can then indicate which algorithm was chosen by
the encoder for maximum efficiency.

I am pretty excited and intrigiued by this compression method because it
does compress hard to compress data ! ;) :)

And many algortihm possibilities to code and explore.

I think I have already done a great job of exploring how far this new
compression method can be taken... pretty much all questions I had and
idea's I had have been explored in this posting... perhaps new idea's will
pop into my head into the future but I don't think so... I already have
enough stuff to explore and it's probably good enough the way it is...

The only thing I can imagine is perhaps trying to do a 3 bit or 4 bit or x
bit version for the groupings... but if such a thing is possible and
efficient I don't know... my guess would be: probably not ;) Maybe this 2
bit case is the exception where some nice 50% efficiency can be achieved
because of reducing 1 bit ;) :)

Bye,
Skybuck.


Skybuck Flying

unread,
Jan 10, 2011, 7:32:33 PM1/10/11
to
However I seem I made a little reasoning mistake about the 2 bit efficient
cross-switching.

[A] [B]

[C] [D]

The original idea was to be able to switch D to either A or B and immediatly
placing it at the most efficient position.

With the 2 bit efficient cross switching this functionality is lost and D
will simply end up on the place where either A or B was selected.

Furthermore there is an additional problem with the 2 bit efficient cross
switch.

If A and B have to be switched then this is no longer possible,which would
be a major hurdle/weakness/obstacle and would probably hurt compression.

The whole idea was to be able to select out of each possibility... there
were 12 possibilities... with 2 bits there are no only a few possibilities
left...

The funny thing is suppose D has to end up on C then the following could be
done:

(0, 1) A is switched with D
(1, 0) B is switched with C
end result is:
D C
B A

However what if B and A should be switched as well and D and C not ?

I think this can be done as well:

A C

B D

B C

A D

D C

A B

Which would require 3 switches.

So in theory it is possible to re-order the groups with the 2 bit way... but
it would require 3 switches... which would be 3x4 bits = 12 bits... quite a
lot...

while the original idea is just 6 bits... 2 + 4 bits.

So perhaps the original idea is better... but at least this again offers
even more ways to explore what to use...

Suppose that 2 bits is enough just to do a switch and perhaps sometimes some
additional switches then that's a possibility as well...

What is the best way in practice remains to be seen...

Perhaps it's even possible to do it in another way... perhaps a intermediate
way...

Not 4 bits, not 2 but 3 that would probably solve it and still be a bit more
efficient than 4 which seemed to be a bit inefficient. (since 4
possibilities of 4 bits where unused).

3 bit method:

The first bit indicates if it's a cross switch or not.

So:
Bit1: 0 cross switch, 1 not cross switch.
Bit2: 0 member switch group1, 1 not member switch group1
Bit3: 0 member switch group2, 1 not member switch group2

A cross switch simply means changing one of the two... does this matter ?

A B

C D

Cross switch possibilities are:

A C
B D

or

A D

C B

It does seem to matter...

A C might member-switch more efficiently than A D ?!? This might seem a bit
weird... but could be...

Since the input pattern could be anything...

Thus the 3 bit method above at least in it's current form is probably not
good enough.

Perhaps a rotation is good:

A B
C D

C A
D B

D C
B A

B D
A C

This would require 2 bits.

And perhaps a mirror

B A
D C
^ already same as above... so rotate and mirror is bullshit as well.

Ok one more time what's necessary:

Select either 0 or 1 for group1 for cross or no cross
Select either 0 or 1 for group2 for cross or no cross.

Perhaps try as follows:

First determine if a cross is necessary:

If no then it's easy...

Determine next if anything needs to change so:

Case1: no cross:

Bit 0: 0 = no cross.
then:
Bit 1: swap group1 members yes/no ?
Bit 2: swap group2 members yes/no ?

Case2: yes cross:
Reinterpretation of bits possible:
Bit 1: A or B ?
Bit 2: C or D ?

^ Problem cannot specify where to move to...

Suppose B and C were selected...

Would it be B C or C B this does matter....

No way to indicate that...

However we already know that either A or B is moved or C or D so those
questions are not relevent we could simply
select one...

But then we get following problem:

A D
or
A C

?

I already determined that that could be a bit interesting as well... a bit
advanced/complex maybe but that's the way it is...

Unless that advanced stuff is not necessary then a choice could be made...

Perhaps the choice is obvious and always the same...

if A and B were not switching nicely... and C and D were not switching
nicely... then perhaps the encoder can detect which one is most
efficient... but this would be based on scan ahead... and the decoder cannot
do that... so it has to be in the output since it cannot be sync-ed.

Thus eiter leave the requirement of doing it all at once or use the 4 bit
method... or use the 3 bit method but then double it with a 0 = no cross
to correct it... but then I would require use a 4 bit method and save some
bits... 2x5 versus 1x6 = 4 bits saved.

So finally conclusion:

The 4 bits method: (2 switch bits + 4 table bits=6 bits) is most powerfull
and a more efficient switching mechanism with the same switching power is
probably
not possible without compromises.

Bye,
Skybuck.

Skybuck Flying

unread,
Jan 12, 2011, 7:08:48 AM1/12/11
to
Encoder rules:

Group1:
Input Output
[00] -> [0]
[11] -> [11]
Additional outputs:
Group switch code: 10
Extension idea/bit for member switch: 0 for no switch, 1 for switch.
So group switch with member switch: 101
So group switch with member noswitch: 100

Group2:
[01] -> [0]
[10] -> [11]
Additional outputs:
Group switch code: 10
Extension idea/bit for member switch: 0 for no switch, 1 for switch.
So group switch with member switch: 101
So group switch with member noswitch: 100

Random input:

1 2 3 4 5 6 7 8 9 10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Round1:

ActiveGroup->Group1

1: [11]->[11] ActiveGroup->Group1
2: [10]->[10]+[11] ActiveGroup->Group2
3: [10]->[11] ActiveGroup->Group2
4: [01]->[0] ActiveGroup->Group2
5: [01]->[0] ActiveGroup->Group2
6: [10]->[11] ActiveGroup->Group2
7: [11]->[10]+[11] ActiveGroup->Group1
8: [01]->[10]+[0] ActiveGroup->Group2
9: [01]->[0] ActiveGroup->Group2
10: [01]->[0] ActiveGroup->Group2
11: [01]->[0] ActiveGroup->Group2
12: [01]->[0] ActiveGroup->Group2

Since one bit was saved it's kinda/slightly interesting to do another round
to see if it leads to a further reduction.

Round2:

Input from round1 (this time I will leave the remaining bit uncompressed)

1 2 3 4 5 6 7 8 9 10 11 skipped
11 10 11 11 00 11 10 11 10 00 00 0

ActiveGroup->Group1

1: [11]->[11] ActiveGroup->Group1

2: [10]->[10][11] ActiveGroup->Group2
3: [11]->[10][11] ActiveGroup->Group1
4: [11]->[11] ActiveGroup->Group1
5: [00]->[0] ActiveGroup->Group1
6: [11]->[11] ActiveGroup->Group1
7: [10]->[10][11] ActiveGroup->Group2
8: [11]->[10][11] ActiveGroup->Group1
9: [10]->[10][11] ActiveGroup->Group2
10: [00]->[10][0] ActiveGroup->Group1
11: [00]->[0]
skipped:[0]->[0]

Results: 4x1 + 14x2 = 32 bits

A big increase...

This seems logical in a way... since the original went from big to small,
and now it would go back from small to big... unless the dynamic/adeptive
algorithm extension would be used which would be kinda interesting to work
out in an example as well.

But this would require for example scanning ahead... I am to lazy to do that
myself and will let a computer program do that to see how it works out in
practice.. ;) :)

Bye,
Skybuck.


Skybuck Flying

unread,
Jan 12, 2011, 10:17:08 AM1/12/11
to
Here is yet another idea after carefull study of the more complex idea which
probably takes too many bits...

So this is a new idea which Paul probably had as well:

The basic form of the algorithm is an "active group" which has two members
of each two bits.

Each two bits is unique within the active group. For this two bit case this
automatically leaves two other groups inactive.

The general idea was to used 10 as a special code to indicate a switch is
needed.

However the next symbol in the input will automatically be different from
the current active group/members.

Therefore the input can immediatly be used to describe the new symbol.

This means the new symbol will not be compressed... but it will simply be
used to stuff it into a table.

Added benefit is that it is already described in the output and doesn't need
to be described again.

The only question remaining is:

Where should the symbol be placed among the active members ?

There are two possible algorithm designs:

1. It could be placed on the least frequency, so the second member assuming
the first one is the best one and will remain active for some time to
come...
then later as frequencies starts to shift the decoder could automaticall do
that as well based on data seen so far... this would be nice for a dynamic
data which cannot look ahead.

2. A special bit could be outputted to simply indicate to which member it
should move either first or second member. This is quite nice for a lookup
scanner if such a thing exists/is smart... which isn't clear cut... but it's
actually quite interesting in a way cause I just had another idea... but
first this idea:

The new encoding thus becomes:

escape code: member specifier: new symbol
10 0 or 1 (from input)

(other member is auotmatically moved to other position)

Now I just had an extra idea: what if current symbols need to be swapped ?!

This can also be done most efficiently:

The same encoding will be used... by the decoder will detect that it's
infact not a new symbol... it's the same symbol in the active group...

Thus this can be used to indicate that a switch should occur among the
active members... further more for this special case the output can
immediatly
be compressed saving another bit ! ;) :)

This can also be applied to 1... for dynamic data. (data on the flow/being
produced as we go ;))

The member specifier bit for case 2 is a bit redundant... but nothing can be
done about... though actually I just had a better idea...

Since 10 is an escape code anyway... the next two bits could indicate the
output... if the output is the same as an active member then a switch is
assumed
among members...
otherwise if it's an new symbol an additional bit could follow which
specifies in what position to place the new member... the other member is
then switch if necessary...

This is quite cool ! ;)

New encoding:

escape code: (potentially new) symbol) (place specifier)
10 (from input)

However with this new encoding we loose the possibility of immediatly
encoding the duplicate symbol... but that's ok since we already save one bit
from leaving away the specifier ! ;) :)

Quite efficient to me it seems ! ;) :)

Bye,
Skybuck.


Skybuck Flying

unread,
Jan 12, 2011, 11:39:53 AM1/12/11
to
Now it's time for an example to see this great new design in action plus
updated rules for encoder/decoder to clearify it a bit and try to poar it
into a shortened form/description:

Here are the algorithm's rules and examples for encoder and decoder:

I will also update to reflect new insights into the algorithm

Legend:
1:, 2:, 3:, etc = numbering

AA = Primary Member (leads to compression)
BB = Secondary member (leads to break-even)
CC = Inactive Member
DD = Inactive Member
[] = group of bits/input/output
-> = becomes

*** Compression/Decompression Algorithm: ***

Encoder rules:

ActiveGroup:
[AA]
[BB]

InActiveGroup:
[CC]
[DD]

Initialize:
Input Output
[AA] -> [AA]
[BB] -> [BB]

Loop:
Input Output
[AA] -> [0]
[BB] -> [11]

Output
Escape code -> [10] followed by:
Member -> [XX] // where XX can be AA or BB or CC or DD
if (XX=AA) or (XX=BB) then
begin
Swap(AA, BB);
end else
begin
Specifier -> [Y] // where Y can be 0 (XX becomes PrimaryMember) or 1 (XX
becomes Secondary Member)
Member[Not Y] = Member[Y];
Member[Y] = XX;
end;

Decoder rules:

Active Group:
[AA]
[BB]

InActiveGroup:
[CC]
[DD]

Initialize:
Input Output
[AA] -> [AA]
[BB] -> [BB]

Loop:
Input Output
[0] -> [AA]
[11] -> [BB]
[10] -> Escape code followed by:
[XX] -> if (XX=AA) or (XX=BB) then
begin
Swap(AA, BB);
end else
begin
[Y] -> // Specifier
Member[Not Y] = Member[Y];
Member[Y] = XX;
end;

*** Compression/Decompression example ***:

Encoder:

Random input:

1 2 3 4 5 6 7 8 9 10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Initialize:
0: [10] -> [10] AA->10
[11] -> [11] BB->11

Loop:
1: [11]->[11] AA=10 BB=11
2: [10]->[0] AA=10 BB=11
3: [10]->[0] AA=10 BB=11
4: [01]->[10] (Escape code) [01] (XX) [0] (Y) AA->01 BB->10
5: [01]->[0]
6: [10]->[11]
7: [11]->[10] (Escape code) [11] (XX) [1] (Y) AA->01 BB->11
8: [01]->[0]
9: [01]->[0]
10: [01]->[0]
11: [01]->[0]
12: [01]->[0]

Finar encoder result: 10x1 + 8x2 = 26 bits

+2 bits compared to input. But this is mostly because of initialization,
ignore initialization which is a one time fee it would have saved 2 bits.

There is a little problem with this example, it's not code-covering, which
means one branch is not tested, the swap branch.

Therefore I will add one output pair to the example here below to illustrate
how it would look like:

Let's suppose that AA and BB needs to be swapped
Input would be either 01 or 11 (interesting discovery it can be both ;) !):

13: [01=XX] or [11=XX] -> [10] [XX] AA->BB BB->AA

Hmm more interesting discoveries ! ;) :)

We could automatically switch the table assuming that the previous encoding
would happen again... if it was an active member...

That's kinda interesting... but would we like that ? ;)

At least this would scrap the need to use escape codes just to switch the
same tables/active members... however if input is like this:
01 11 01 11 01 11 01 11 01 then we would never get a saving because it would
flip flop...
Unless actually the input was in sync with the table then it would be
excellent !
0 0 0 0 0 0 0 0 0

Initially it would be: AA = 01, BB = 11
it would then automatically switch...

That's quite impressive, seeing the input data fluctuate like that ;)

But this would then give bad compression for
11, 11, 11, 11, which would then start flipping... but it wouldn't be all
bad:
0, 11, 0, 11, 0, 11

This would stil leads to 50% compression I think ?! ;) :)

Pretty nice... this example is a bit weird though because RLE would be
better...

But this would work as well:
01, 01, 01, 01
0, 11, 0, 11

However for run lengths there is already RLE...

It's much more likely that 01, 11, 01, 11, 01 would occur which cannot be
compressed easily by RLE...

So perhaps a switch algorithm like this would be nice.

Hmm... I will have to think about this...

Hmmm maybe it's time to update the input example a little bit too some more
bits and so...

Hmmm so far many different algoritm idea's... I had a hunch that such
algorithms might be used to compress hard to compress data...

So it could be interesting to simply program all idea's, then try out all
idea's note the one which work best for a frame and use that...

Thus each frame might require a different algorithm to compress it best.
Which seems logical...

So now a lot of different random compression algorithms to try out ;)

For now I will stick to the original idea of outputting XX for when needing
to swap... so no automatic swapping ;)

Bye,
Skybuck.


Skybuck Flying

unread,
Jan 12, 2011, 11:44:17 AM1/12/11
to
Hmm programming the more complex idea might be a little complex as well and
therefore having a hand written example might be nice...

It would also be nice to see how the more complex idea handles short data
sequences like the ones in the examples so far.

So now below I will attempt the more complex idea.

Plus perhaps some further algorithm notations for the look-ahead-scanners.

First the basic rules again:

Encoder rules:

Group1:
Input Output
[00] -> [0]
[11] -> [11]
Additional outputs:
Group switch code: 10

Group2:


[01] -> [0]
[10] -> [11]
Additional outputs:
Group switch code: 10

Decoder rules:


[0] -> [00] if group1 active
[0] -> [01] if group2 active
[11] -> [11] if group1 active
[11] -> [10] if group2 active
Members could be switched of either group based on additional inputs bits:
Group switch code: 10

Now I have to make a selection/choice out of the available extension
idea's...
I will go for the 4 bit extension idea for maximum powerfullness of the
tables.
Perhaps I will even learn from these visual examples... People have a habit
of learning
from visual examples yes ! ;) :) That's what my math teacher said ;)
I have already had some insight from below... so perhaps I will have to
post-pone examples
and go on with potential theories for improvements ! ;) :)

So additional rules:

Encoder rules:

Group switch extension bits:

First the table needs to be redesigned to only show the ActiveGroup
encodings,
The inactive group encoding is not important and could be either possibility
out of two.
This is illustrated below in InActiveGroupMembers:
The encodings which are available for future used are also shown.
4 Encodings are available for future use.

Number: Encoding: ActiveGroup: InActiveGroupMembers
01: 0000 [00] [01] [10] [11] or [11] [10]
02: 0001 [00] [10] [01] [11] or [11] [01]
03: 0010 [00] [11] [01] [10] or [10] [01]

04: 0011 [01] [00] [10] [11] or [11] [10]
05: 0100 [01] [10] [00] [11] or [11] [00]
06: 0101 [01] [11] [00] [10] or [10] [00]

07: 0110 [10] [00] [01] [11] or [11] [01]
08: 0111 [10] [01] [00] [11] or [11] [00]
09: 1000 [10] [11] [00] [01] or [01] [00]

10: 1001 [11] [00] [01] [10] or [10] [01]
11: 1010 [11] [01] [00] [10] or [10] [00]
12: 1011 [11] [10] [00] [01] pr [01] [00]

13: 1100 reserved
14: 1101 reserved
15: 1110 reserved
16: 1111 reserved

Conclusion:

With a 4 bit code the active group can switch to any possible other possible
group. Thus it doesn't necessary have to be
an inactive group, it can simply redefine itself.
Therefore the concept of "inactive group" is not really relevant anymore...
however it remains relevant to detect when
input cannot be encoded by the active group and thus a switch is needed...
but this switch can now be too anything.

Now that this should be understood it's possible to continue with a scan
ahead scanner to see what is best.

The algorithm/encoder could start with an initial scan to see what the
initial state should be and indicate a "switch".

The word/term "switch" might not cover it anymore and perhaps should be
considered something else like:
"A reconfiguration".

However a reconfigure is only needed if input doesn't fall within the
current group. In other words when an input member
is not one of the active group members.

The concept of a group is now also somewhat vague.

This could be more generalized towards ActiveMembers of an input to output
encoding... and inactive membvers of an input to output encoding.
Mostly (inactive input members).


The general concept of the algorithm seems to be to output encodings and to
reserve at least one output encoding which is unique and cannot be confused
with other encodings.... somewhat like a prefix...

This prefix can then be used to reconfigure the algorithm.

This reminds me of dynamic huffman encoding where one leave is also
considered as prefix. Or an escape code as you will.

It could even be called a "reconfiguration prefix code". Because that's what
huffman algorithm might do... however it might also stay the same... so it's
not truely a reconfiguration... it's a possible attempt at
reconfiguration... but not really it's just a new token indicator... the new
token however will be added
to the tree... the tree might change or not... with change in this regards
is towards other tokens.

For this algorithm however there is no doubt... a reconfiguration code in
this algorithm is truely a reconfigure... otherwise it would be left out.

To turn this algorithm into a general form where perhaps multiple bits of
input can be encoded to output... there would need to be an algorithm to be
able
to figure out what output prefix can be used for the reconfiguration... I am
pretty sure that such prefix finding algorithms exist...
Even huffman itself could be used... not so much the frequency concept and
tree building concept...

But the tree concept itself of describing a termination tree and perhaps a
leave tree... where leaves represent the real values.

With dynamic huffman the tree is rebalanced as frequencies change... with
this basic 2 bit algorithm there is also some frequencies going on... but a
scanner
can be done pre-emptively... so dynamic huffman in this regards is a bit
strange... though a dynamic huffman tree could also be reconfigured to be
optimal... the problem with dynamic huffman tree would be to re-organize the
tree so that it is again optimal, cheaply... how to describe a
reorganization of the huffman tree... based on look-ahead data...

The decoder does not yet know the look ahead data... so it's clear that the
decoder itself cannot do so based on seen frequencies.

Therefore the encoder has to somehow signal the decoder that the tree is to
be changed in a sort fashion. What node should go where ? ;)

For a simply 8 bit tree... there would be 256 nodes which could end up on
other 256 positions... but there are also intermediate nodes which might
also be need to reconfigured... so the ammount of possible 8 bit trees is
probably large than just 2^8 a tree could be completely left side... like
256 entries left... or 256 entries right or any of these combinations... I
guess the easiest way to reconfigure a tree would probably be to describe
the new frequencies for the lookup data... or simply a termination tree
followed by values in a sort tree walking order... yes that's how I did
it... but that would be expensive to do that everytime... while this
algorithm is a bit cheaper... it only describes the active member.... which
also reminds me a bit of runtime length encoding...

So this algorithm also seems a bit like runtime length encoding with one
difference... instead of encoding the count which would be pretty cheaply...
this algorith instead encodes a zero or whatever the other member encoding
is.... which could also be a 01 or 001 as already described still saving
some bits...
it then reconfigures it self somewhat... it seems to have mostly a potential
adventage for short random codes in the input... which quickly change in
input...
it shouldn't change to fast though... otherwise many switches would occur.
If it just switches between group members than this algorithm would probably
still be better then anything else I would guess.

Let's see how a 3 bit version might look like to see if it has scaling
problems:


Active Deactive
[000] [001] [010] [011] [100] [101] [110][111]
[000] [001] [010] [100]
[000] [001] [010] [101]
[000] [001] [010] [110]
[000] [001] [010] [111]

[000] [001] [011] [010]
[000] [001] [011] [100]
[000] [001] [011] [101]
[000] [001] [011] [110]
[000] [001] [011] [111]

[000] [001] [100] [010]
[000] [001] [100] [011]
[000] [001] [100] [101]
[000] [001] [100] [110]
[000] [001] [100] [111]

[000] [001] [101] [010]
[000] [001] [101] [011]
[000] [001] [101] [100]
[000] [001] [101] [110]
[000] [001] [101] [111]

[000] [001] [110] [010]
[000] [001] [110] [011]
[000] [001] [110] [100]
[000] [001] [110] [101]
[000] [001] [110] [111]

[000] [001] [111] [010]
[000] [001] [111] [011]
[000] [001] [111] [100]
[000] [001] [111] [101]
[000] [001] [111] [110]

Well it becomes clear that this algorithm has scaling issue's for the
possibility encodings... it would become something close to 12 bits or so...
maybe 11
maybe 10 but that's still double the ammount of bits required... maybe 8 or
so instead of 4... maybe even more...

So as the number of bits goes up the encodings become big... it's like a
combinatorial something.... might sitll be worht it or maybe not... but my
guess would probably be not ;)

So enough of the "inspectational theory" and on to the practice...

First I am going to scan the input to see what most occurs (just a random
strategy I am going to apply to see if it will produce something good ;)
:)):

No wait I am simply going to count all occurences first like huffman ;) :)

I know this is wrong for this dynamic algorithm but it's still kinda
interesting:

But I am going to do it a bit differently... I am going to count up to a
hit.../switch necessarity.

First two encountered unique symbols are taken, any other symbol causes a
switch neccessarity ;) :)

initial group will become ([10] [11])

1 2 3
11 10 10 switch needed for next one (active group will become: [01][10]

4 5 6
01 01 10 switch needed for next one (active group will become: [01][11]

7 8 9 10 11 12

11 01 01 01 01 01

So for the current/original input two switches are needed... perhaps three
to indicate what the initial input should be... however... for the algorithm
to initialize a switch code is not necessary... so we can just output the
two characters which are to be used for initializing. The character which
occurs most initialy is to be outputted first... followed by the next one.

So initial output is:

10 11

Thus the initial active group is: [10] [11]

Now the encoding can start:

1: [11]->[11]
2: [10]->[0]
3: [10]->[0]
4: [01]->[10] (switch prefix) [0100] (possibility/number 5) [0] (output)
5: [01]->[0]
6: [10]->[10]
7: [11]->[10] (switch prefix) [0101] (possibility/number 6) [11] (output)


8: [01]->[0]
9: [01]->[0]
10: [01]->[0]
11: [01]->[0]
12: [01]->[0]

Final result: 9x1 + 5x2 + 2x4 = 27 bits. forgot the first 4 so this is
actually 31 bits... quite bad but needed to switch anyway so don't matter
much.

which is worse then the input was only 24 bits.

However let's see what happens if another round is applied to this because I
am kinda curious about that:


Also note that in the switch one and two not counting the initial... there
is a redundancy... only 10 to 11 needs to be switch.

So this could have been indicated with: 0 switch member1 or 1 switch
member2.

Once it's know that member1 or member2 needs to be switch there are only 3
possibilities where to switch to...
So in total there would be 6 possibilities for a member switch... 6 means 3
bits needed... so it doesn't matter we still need 1 bit plus 2.

So then only 3 bits needed instead of 4 and the switch can be more
efficient... but we would pay a hefty fine if another member switch is
necessary
then an additional 2 bits plus another 3 bits... so that's 5 bit fine if we
have to do another member switch... so the ultimately best strategy depends
on output
however in this case the more efficient 3 bit version would have been better
so this can be tried....

So for now I am going to deploy the 3 bit strategy which would look as
follows:

3 bit strategy/switching table:

3 bit strategy/switching table:

Number: Encoding: Member to switch ActiveMember
01: 0 yy [0] [00][xx]
02: 0 yy [0] [01][xx]
03: 0 yy [0] [10][xx]
04: 0 yy [0] [11][xx]

05: 1 yy [1] [xx][00]
06: 1 yy [1] [xx][01]
07: 1 yy [1] [xx][10]
08: 1 yy [1] [xx][11]

Switch table for active member:

Encoding (yy):

00: [00] switch to [01]
01: [00] switch to [10]
10: [00] switch to [11]
11: reserved

00: [01] switch to [00]
01: [01] switch to [10]
10: [01] switch to [11]
11: reserved

00: [10] switch to [00]
01: [10] switch to [01]
10: [10] switch to [11]
11: reserved

00: [11] switch to [00]
01: [11] switch to [01]
10: [11] switch to [10]
11: reserved


The reserved bit encoding could be used to indicate a member swap with an
inactive member or so... or to move it out of the active group...
question is which one is then moved in ;) perhaps the most occuring one of
the inactives could be moved in... it's already moved out though...
so the reserved switch could be used to indicate that it is to be moved into
the other position/member... however it's then unclear to what should move
to it's position... but that's already indicated by the codes above but no
selection has been made hmm...

So perhaps this reserved encoding could be used to switch [00] to the other
member, and then use an additional two bits to move in the new member
according
to the same table..

so the encoding for a full switch then becomes:

10 for the prefix, 2 bits for a switch, plus another 2 bits for a move
in...

So this is quite cool... thus we now have a efficient way to convert a 3 bit
code to a 5 bit code... so only 2 extra bits used instead of a full 5 bits
used... as
originally planned... so we save 3 bits... quite cool....

Let's see if it can be done:


Switch table for active member:

Encoding (yy):

00: [00] switch to [01]
01: [00] switch to [10]
10: [00] switch to [11]
11: [00] is switched to the other member, additional 2 bits follows to
indicate what current member becomes.

00: [01] switch to [00]
01: [01] switch to [10]
10: [01] switch to [11]
11: [01] is switched to the other member, additional 2 bits follows to
indicate what current member becomes.

00: [10] switch to [00]
01: [10] switch to [01]
10: [10] switch to [11]
11: [10] is switched to the other member, additional 2 bits follows to
indicate what current member becomes.

00: [11] switch to [00]
01: [11] switch to [01]
10: [11] switch to [10]
11: [11] is switched to the other member, additional 2 bits follows to
indicate what current member becomes.

Ok cool...

So now we have a way to move a current member to another member and
important a new member cheaply...

This can be considered a "half new configuration".

A full new configuration would still require two switches.

So now to employ this new strategy to the original input:


Random input:

1 2 3 4 5 6 7 8 9 10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Switching is now expensive... from scanning it is apperent that 01 should
have lowest code like huffman... so unfortunately huffman's frequencies
scanning idea
seems to be better but that's ok... for now we simply use this... perhaps
later develop better code like when it's fruitfull to make a switch...

initiale to [01][11] seems to be best...

then change 11 to [10]

*** This is were it became apperently the table idea with above is unsuited
for the example and therefore bad ! ;) ***
*** Study was aborted in favor of better/more efficient idea ! So other
postings ! ;) :)) ***

Output below is wrong:

this would lead to:

1 2 3 4 5 6 7 8 9 10 11 12

11 10 10 0 0 10 11 0 0 0 0 0

7x1 + 5x2 = 17 bits.

one change needed so: 2 bits + 3 = 5 bits... total 22 bits... still better.
but needed 4 bits for init so 26 bits... 2 bits worse...


Perhaps it's better to switch members based on horizsontally... like better
position... at least that would be better for this input...

like switch 10 and 0 would need to have lowest encoding for any savings....

So perhaps use 100 to indicate member switch... but unfortunately can't...

Another idea could be to use a side channel which is stored seperately...
this way the prefix code is not necessary.
it is automatically detected ?!? No it is needed... otherwise can't detect
it hehe...

0
11
10

Maybe a better algorithm idea is the following by following huffman a bit:

0 for best frequency
10 for next frequency
110 for next frequency
111 for last frequency...

Let's see what would happen to this input:

Random input:

1 2 3 4 5 6 7 8 9 10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

Let's apply original algorithm but with a twist...


We simply scan and try to see which table layout would be best and then
encode that.

[01] [10] [11] [00] would clearly be best.

We would need to output these 8 bits first so that's a big overhead.

So then we use 10 to simply switch between groups.

Output:

1: [11]->10 (switch) [0]
2: [10]->10 (switch) [0]
3: [10]->11
4: [01]->0
5: [01]->0
6: [10]->11
7: [11]->10 (switch)[0]
8: [01]->10 (switch)[0]
9: [01]->0
10:->0
11:->0
12:->0

Carefull study which was aborted for the better idea which this study
inspired.

Gonna post it anyway... might still be usefull for those tables ;)

Bye,
Skybuck.


0 new messages