Analysis

LawCounsels

unread,

Apr 15, 2010, 5:27:52 PM4/15/10

to

Greetings

. in a file with N bits (sufficient large like 8Kbits onwards to
1Mbits) , assume the distributions of unique prefix '0's or '10' or
'110' or '1110' ... or '111...10' (always ends with a '0') is
standard binomial power (ie random) , BUT with added constraints/
restriction that the maximum length of '111....10' at any time could
only be at most log(base2)[R] where R is the total # of bits from
present bit position to the end Least Significant Bit position [ eg if
bits_string is 0 111110 11110 10 0 10 110 10 0 10 0 10 0 0 '
then here 110 is the 13th bits , there can be no unique prefix of
total# of bits length > log(base2)[R] at any time, where R is the bit
position # of the unique prefix counting from end Least Significant
position bit ....]

....so this is not simple regular usual normal binomial power series/
random distributions [ usual normal binomial power series/ random
distributions is 'god gifted' not to be compressible] , but here there
is added constraint/ restriction that eg '1110' ( of length 4 bits
long) could not occur at position R = 15 or smaller (since log(base2)
[R=15 or smaller value of R] will not allow unique prefix of total# of
bits >= 4 to occur at position R < 16 ...... kindly please forward
entropy calculations/solution to compress such bits_strings into
smaller total # of bits (eg smaller max range of ranked Index)....
saves me days work this standard maths work out .....

Wishing all abundance healths : )
LawCounsels

LawCounsels

unread,

Apr 17, 2010, 7:34:14 AM4/17/10

to

50$ contest prize money (by Western Union/ Paypal) for 1st one to
present correct entropy calculations/solution to compress such

bits_strings into smaller total # of bits (eg smaller max range of

ranked Index).... you may correct assume the multiplicities
occurrences of various unique prefixes '0' or '10' or '110' or
'111...10' provided

here are are helps if you intends :

there is this very good 'article' adaptation of normal ranked Index
derivation to introduced added 'constraints/restrictions' :

http://groups.google.co.uk/group/comp.compression/browse_thread/thread/3387e8bf923eec15?hl=en#

a model for this ?
hi, let's suppose we have a file made of 0,1,2,3 randomly distributed
with "the same" probability (about 25% each one) there is a condition:
we can't have a run of more than two identical digits as an example:
00312102211330120132021133 is ok but 00031222 is not ok, because we
have three '0' and three '2' , we have... more »
By inconnu - 4 Feb - 19 new of 19 messages

>>a starting base will be to consider eg 110 10 0 allows only exact 1 possibilities whereas if '110' & '10' & '0' can can freely occur at any position would give
3!/1!1!1! = 6 possibilities .... likewise 10 10 0 also allows only
exact 1 possibilities whereas if '10' & '10' & '0' can can
freely occur at any position would give
3!/2!1! = 3 possibilities

... so the max total # of possibilities ( or max ranked Index value
range ) can be Products[# of possibilities at i'th unique prefix's
position] i = 1 to N (# of unique prefixes) , instead of usual max
potential formula N!/(N1! * N2! |* ... * Ni!) where Ni is the # of
occurrences of i'th unique prefix ( 0 or 10 or 110 or 1110 .... so
forth or '111...10' ) , which is a very much smaller max ranked
Index value than N!/(N1! * N2! |* ... * Ni!)

where max # of possibilities at ith unique prefix's position is
usually = log(base2)[R] where R is the bit position of present unique
prefix (counting from Least Significant bit) , but could be less if
eg all '0's already used up taken up by preceding smaller i'th
position ... so now present i'th position could not have eg a
'0' (since all used up) thus total# of possibilities here would be
log(base2)[R] -1 .... so forth

>>if with multiplicities '10'=2 & '0'=2 , one of the two '0's would have to be allocated to Least Significant unique prefix position i=1 (possibilities multiplicant here =1) , one of the two '10's would have to be allocated to the unique prefix position i=2 ('0' or '10' can occur here, so the possibilities multiplicant here would be =2) ..... now we assume the 2 '0's (smallest) are already allocated to unique prefix positions i=1 & i=2, thus possibilities multiplicant at position i=3 is exact 1 (now the smallest available from 'remnant' pool is '10' & can be assumed to be allocated for position i=3) also possibilities multiplicant for position i=4 is =1 ..... PLS VERIFY

[ here examples above all SIMPLIFIED to assume position i can have
digit value at most = i ...similar to factoradic digits positional
possible values ]

biject

unread,

Apr 17, 2010, 11:11:17 AM4/17/10

to

On Apr 17, 5:34 am, LawCounsels <LawCouns...@aol.com> wrote:
> 50$ contest prize money (by Western Union/ Paypal) for 1st one to
> present correct entropy calculations/solution to compress such
> bits_strings into smaller total # of bits (eg smaller max range of
> ranked Index).... you may correct assume the multiplicities
> occurrences of various unique prefixes '0' or '10' or '110' or
> '111...10' provided
>
> here are are helps if you intends :
>
> there is this very good 'article' adaptation of normal ranked Index
> derivation to introduced added 'constraints/restrictions' :
>

> http://groups.google.co.uk/group/comp.compression/browse_thread/threa...

>
> a model for this ?
> hi, let's suppose we have a file made of 0,1,2,3 randomly distributed
> with "the same" probability (about 25% each one) there is a condition:
> we can't have a run of more than two identical digits as an example:
> 00312102211330120132021133 is ok but 00031222 is not ok, because we

> have three '0' and three '2' , we have... 0.415037499 »

> By inconnu - 4 Feb - 19 new of 19 messages
>

if the digit occur by itself going left to right it takes 2 bits if
it occurs just before then three cases follow
each with probilitry of 1/3 which takes log(3) / log(2) = 1.5849625
bits. so it would compress to the
following where n = number of character in string and p = number of
pairs
bits needed = 2*n - 0.415037499*p where (2 - (log(3) / log(2)) =
0.415037499)
p should = n/5 if as you say with everything else random.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

LawCounsels

unread,

Apr 17, 2010, 3:23:36 PM4/17/10

to

>
> if the digit occur by itself going left to right it takes 2 bits if
> it occurs just before then three cases follow
> each with probilitry of 1/3 which takes log(3) / log(2) = 1.5849625
> bits. so it would compress to the
> following where n = number of character in string and p = number of
> pairs
> bits needed = 2*n - 0.415037499*p where (2 - (log(3) / log(2)) =
> 0.415037499)
> p should = n/5 if as you say with everything else random.
>

can you simplify illustrate ..... in particular this statement
'...if the digit occur by itself going left to right it takes 2 bits
if
> it occurs just before then three cases follow each with probilitry of 1/3....' is most 'unexpected' really where this '2 bits' thing arises comes from (???)
PLS CLARIFY

[ not clear at this time , as with most initial forum discussions ....
but will clear up as threads progresses ]

A most simple scheme to compress series of uniqueprefixes '0' or
'111...10' with 'constraints/ restrictions that the max # of
possibilities (of '0' or '10' or '110'... or '111...10' ) to occur at
R'th bits position cannot be > log(base2)[R] ) , readily present
itself for immediate algorithmic implementation :

ie if input series of constrained unique prefixes is N bits , MSB
position's i'th unique prefix has log(base2)N # of possibilities ....
so forth

(even though this gives simplest immediate implementable algorithmic
substantial net compressions gains over usual max potential formula
N!/(N1!N2!...Ni!) ) but this not optimum best compressed possible
since MSB position's i'th unique prefix's max possible # of
possibilities may be further restricted eg when the available # of
'0's or '10's ...etc has all been alllocated with no such remaining
'0's or '10's ...etc in 'remnant pool .... but needs not be perfect
optimum at this time )

HOWEVER if needs can just imediate straight away implement this less
than optimum scheme => invariable net compressions gains already
attained over input seriers of constrained/restricted unique
prefixes totalling N bits in all

LawCounsels

unread,

Apr 17, 2010, 4:19:09 PM4/17/10

to

> A most simple scheme to compress series of uniqueprefixes '0' or
> '111...10' with 'constraints/ restrictions that the max # of
> possibilities (of '0' or '10' or '110'... or '111...10' ) to occur at
> R'th bits position cannot be > log(base2)[R] ) , readily present
> itself for immediate algorithmic implementation :
>
> ie if input series of constrained unique prefixes is N bits , MSB
> position's i'th unique prefix has log(base2)N # of possibilities ....
> so forth
>
> (even though this gives simplest immediate implementable algorithmic
> substantial net compressions gains over usual max potential formula
> N!/(N1!N2!...Ni!) ) but this not optimum best compressed possible
> since MSB position's i'th unique prefix's max possible # of
> possibilities may be further restricted eg when the available # of
> '0's or '10's ...etc has all been alllocated with no such remaining
> '0's or '10's ...etc in 'remnant pool .... but needs not be perfect
> optimum at this time )
>
> HOWEVER if needs can just imediate straight away implement this less
> than optimum scheme => invariable net compressions gains already
> attained over input seriers of constrained/restricted unique
> prefixes totalling N bits in all

TO CLARIFY ILLUSTRATE :

eg series of constrained/restricted unique prefixes may be { 1110
110 0 10 0 0 10 0 0 } , note here the Least Significant
position's unique prefix here is '0' (necesasary so) since at bit R=1
log(base2)1 is <1 thus not valid possible for unique prefix '10' or
'110' or longer '111...10' to occupy this LSB unique prefix
position .... thus the valid max # of possibilities @ bit position 1
is exactly 1 (only unique prefix '0' can occur here , not '10' or
'110' or longer '111...10' )

=> the max ranked Index value valid range of this input series of
constrained/ restricted unique prefixes of ttotal# of bits N would
be very much smaller than usual normal ranked Index formula N!/(N1! *
N2! * N3! * .... *Ni!) [ if all unique prefixes unconstrained can
freely occur anywhere ] where Ni is the # of occurences of unique
prefix of length i (eg '0' is of length 1 , '110' of length
3 ....) .... thus input totalling N bits of constrained/restricted
series of unique prefixes can invariable be encoded compressed
requiring very substantial smaller # of bits than input N bits

[ note : the multiplicities of occurrences would still be binomial
power/ random , but there is now this 'constraint/ restrictions'
making this binomial power/ random distribution very compressable ]

biject

unread,

Apr 17, 2010, 7:18:57 PM4/17/10

to

On Apr 17, 1:23 pm, LawCounsels <lawcouns...@gmail.com> wrote:
> > if the digit occur by itself going left to right it takes 2 bits if
> > it occurs just before then three cases follow
> > each with probilitry of 1/3 which takes log(3) / log(2) = 1.5849625
> > bits. so it would compress to the
> > following where n = number of character in string and p = number of
> > pairs
> > bits needed = 2*n - 0.415037499*p where (2 - (log(3) / log(2)) =
> > 0.415037499)
> > p should = n/5 if as you say with everything else random.
>
> can you simplify illustrate ..... in particular this statement
> '...if the digit occur by itself going left to right it takes 2 bits
> if> it occurs just before then three cases follow each with probilitry of 1/3....' is most 'unexpected' really where this '2 bits' thing arises comes from (???)
>
> PLS CLARIFY
>

it would have helped if you quoted the paragraph just above
my comments but you didn't.

here it is /quote on

> hi, let's suppose we have a file made of 0,1,2,3 randomly distributed
> with "the same" probability (about 25% each one) there is a condition:
> we can't have a run of more than two identical digits as an example:
> 00312102211330120132021133 is ok but 00031222 is not ok, because we
> have three '0' and three '2' , we have... 0.415037499 »
> By inconnu - 4 Feb - 19 new of 19 messages

/quote off

take the string 00312102211330120132021133
which you provided as your example. If the task
at hand is to compress this string. Given that 0
1 2 3 each occur 25% of the time. Everything else
is random (whatever that means) Works out to 2 bits per
symbol.

However there is an extra kicker. They each appear 25%
of the time yet they stay random except no string of 3 or
more characters is the same. Which in the sample above
the 0031 the first 0 had a 25% chance of occurring so did
the next 0 but where the 3 occurred there was only 1 2 or 3
possible. so the length used to compress is that for log(3)/log(2)
26 chacters and 6 pairs actaully I should only count pairs that
have a character following them. I did not clearly state that above
so use 5 since the 6th pair at end of this string it only effects the
next character if there is one.

so the number of bits in a string of zeros and ones to encode this
is (26 * 2) - ((5 * log(3)) / log(2)) = 44.0751875 meaning most
strings
44 bits and a few 41 bits where as without out the fact no are
over 2 bits would have required the use of 56 bits.

> [ not clear at this time , as with most initial forum discussions ....
> but will clear up as threads progresses ]

LawCounsels

unread,

Apr 18, 2010, 5:09:00 AM4/18/10

to

now with constraint/restriction total # of bits in unique prefix
choice @ i'th bit position
always =< log(base2)i , how would you similar adapt modify usual
normal ranked Index
derivation process ....with or without further lessening the valid #
of possibilities due to certain unique prefixes lengths all used up
became complete absent in remaining 'remnant' pool

example :

with eg 1 '110' & 2 '10's & 4 '0's (total 11 bits) , usual normal
max ranked Index value range = 7!/(4!*2!*1!) = 7*3*5= 105 + # of bits
for multiplicity , whereas with 'log(base2)i total# of possibilities
constraint when @ bit position i' max valid ranked Index range in
one of the worst cases = log(base2)12 * log(base2)9 * log(base2)7 *
log(base2)5 * log(base2)4 * log(base2)3 * log(base2)2 *
log(base2)1 = 3*2*2 *1*1*1*1= 12 .... very substantial bits
savings
.... BUT in fact it can be even better since 1 of the '10's must
necesary occupy at bit position i=7 so would be 3*2*1*1*1*1*1 = 6

IF further restricts # of possibilities (posiible choices '0' or '10'
or '110' or '111...10') @ i'th bit position due to some unique
prefixes length/s absent (eg '0' or '10' or '110' all used up
already , absent in remaining 'remnant' pool) , then worst case above
could be = 3 * 2(not 3, since '110' already used up thus only
possible unique prefixes are '0' or '10') * 1 = 6

Pls post correct adapted ranked Index value derivation process steps

LawCounsels

unread,

Apr 18, 2010, 1:51:12 PM4/18/10

to

this seems most promising 'specific tailored designed' algorithm thus
far suggested :

. note the ranked Index derivation formula of n =
3+(4*(1+(4*(1+(4*(1+(3*(2+(4*(2+(3*(1+(4*0))))))))))))) = 4311
of
http://groups.google.co.uk/group/comp.compression/browse_thread/thread/3387e8bf923eec15/eca136ce746238b5?hl=en&lnk=gst&q=inconnu#eca136ce746238b5

but we now rank backwards from Least Significant unique prefix 1st ,
so all the smaller length unique prefixes will dominates the initial
series & larger & larger unique prefixes can only appear as a valid
selection only possible at subsequent later & later stages where
log(base2)i allows [ i the present bit position ]

=> this adapted specific tailored designed ranking produces invariable
much smaller Index value

so now max possible n = 3possibilities + (3possibilities
*(3possibilities + ( .............+ (4possibilities *( ............. +
(5possibilities *( ....... + ( ......... +( log(base2)N
*( ............

Pls forum post above outlined 'specific tailored designed' adapted
ranked Index value derivation process steps & entropy calculations

James Dow Allen

unread,

Apr 19, 2010, 7:17:53 AM4/19/10

to

On Apr 16, 4:27 am, LawCounsels <LawCouns...@aol.com> wrote:
> ... BUT with added constraints/

> restriction that the maximum length of '111....10' at any time could

> only be at most log(base2)[R] where R is ...

If I understand correctly there are only
two legal strings of length 4:
10 0 0
0 0 0 0
So 3 bits can be saved when encoding a
string of length 4. Similarly, there are
exactly 8 legal strings of length 7, where
4 bits can be saved.

Here is a recurrence relation to find A(n),
the number of legal strings of length n.
A(1) = A(2) = A(3) = 1
A(n) = A(n-1) + A(n-2) + ... A(ceil(n-log n))
where log is base 2, so for example
ceil(15-log 15) = ceil(15 - 3.907) = 12

I don't know how to simplify that recurrence
but it is easy to calculate A(n), at least for
small n. Assuming each of the A(n) strings is
equally likely, then
n - log2 A(n)
is the number of bits which could be saved with
compression. Working some numbers it appears that
a very simple formula *may* provide a good estimate
of the bit savings. I'll make it clear with a
specific example
A(1878) ~= 2^1868 /* 10 bits can be saved */
A(4989) ~= 2^4978 /* 11 bits can be saved */
Thus, among the 3111 bits between the 1878'th
and 4989'th bit, one bit can be saved.
The inversed mean is 3125, very close to 3111,
so roughly 1/N bits can be saved at the N'th bit!

(I realize that probably didn't make sense. :-)

How are your bits really generated? I'd guess
that the different legal strings do *not* have
exactly equal probabilities, so a better approach
to compression could be possible.

James Dow Allen

LawCounsels

unread,

Apr 19, 2010, 8:05:39 AM4/19/10

to

On 19 Apr, 12:17, James Dow Allen <jdallen2...@yahoo.com> wrote:
> On Apr 16, 4:27 am, LawCounsels <LawCouns...@aol.com> wrote:
>
> > ... BUT with added constraints/
> > restriction that the maximum length of '111....10' at any time could
> > only be at most log(base2)[R] where R is ...
>
> If I understand correctly there are only
> two legal strings of length 4:
> 10 0 0
> 0 0 0 0

needs back track to start correct here :

the only legal strings (called unique prefixes because each always
must end with a '0') are : '0' or '10' or '110' or '1110' or
'11110' .... so forth '111....10'

if @ bit position 8 , log(base2)8 = 3 , thus there can be only 3
possible selections possible : '0' or '10' or '110' , but not
'1110' or longer '111...10'

if @ bit position 16 , log(base2)16 = 4 , thus there can be only 4
possible selections possible : '0' or '10' or '110' or '1110' , but
not '11110' or longer '111...10'

its very clear here in input file of eg 2^20 bits , the unique prefix
@ bit position 2^20 can only have 20 possible selections & the unique
prefix if @ bit position 2^20 -1 downto 2^19 can each have only 19
possible selections .... if @ bit position 8 downto 5 can have only 3
possible selections (either '0' or '10' or 110' only )

very clear if this file of series of unique prefixes ('0' or
'111...10', always ends with a '0') is 1st reversed , then all the
smaller unique prefixes dominates the initial series of unique
prefixes & the larger & larger unique prefixes can only occur at
subsequent later stages (ie only @ larger & larger bits
positions) .... thus intuitive very compressible

.... similar to when present existing ranking.exe binary file input
starts with thousands of '0's (not '1's) will automatic give rise to
thousands of bits net gains, even without any more needs be done

Pls forum post your 'specific tailored designed' adapted
ranked Index value derivation process steps & entropy calculations,
for input series of unique prefixes with above 'constraint/
restriction'

LawCounsels

unread,

Apr 19, 2010, 9:57:15 AM4/19/10

to

> On 19 Apr, 12:17, James Dow Allen <jdallen2...@yahoo.com> wrote:
>
> > On Apr 16, 4:27 am, LawCounsels <LawCouns...@aol.com> wrote:
>
> > > ... BUT with added constraints/
> > > restriction that the maximum length of '111....10' at any time could
> > > only be at most log(base2)[R] where R is ...
>
> > If I understand correctly there are only
> > two legal strings of length 4:
> > 10 0 0
> > 0 0 0 0

if there is an input of 4 bits long , the MSB unique prefix @ bit
position 4 can only valid possible be '0' or '10' ( log(base2)4 = 2
selections possible ) so there can be valid combinations of { 10 0
0 } or { 0 0 0 0 } ..... Yes , I see now you were right , when you
correctly states there are only two legal strings of length 4 : { 10
0 0 } & { 0 0 0 0 }

.... yes your entropy calculation now makes sense, deserves spending
time fully comprehend

Pls further forum post your 'specific tailored designed' adapted
ranked Index value derivation process step-by-step, & more complete
entropy calculations formula, for input series of unique prefixes with
above 'constraint/ restriction'

LawCounsels

unread,

Apr 22, 2010, 5:31:49 AM4/22/10

to

On 19 Apr, 12:17, James Dow Allen <jdallen2...@yahoo.com> wrote:

> so roughly 1/N bits can be saved at the N'th bit!
>

total# of bits saves then = SUM [ 1/n ] n=1 to N

how much bits savings if N=2^20 bits (?)

but this seems very miniscule compared to simply ranking backwards

from Least Significant unique prefix 1st , so all the smaller length
unique prefixes will dominates the initial
series & larger & larger unique prefixes can only appear as a valid
selection only possible at subsequent later & later stages where
log(base2)i allows [ i the present bit position ]
=> this adapted specific tailored designed ranking produces
invariable
much smaller Index value

so now n = weight of LSB unique prefix + (1possibilities
*(weight of 2nd LSB unique prefix + ( 1possibilities *(.............+
(weight of 4th LSB unique prefix +(2possibilities *( ............. +
(weight of 8th LSB unique prefix +(3possibilities *( ....... +
( weight of 16th LSB unique prefix *(4possibilities*( ......... +
( weight of N'th LSB unique prefix +(log(base2)N possibilities*( .....

where weight of unique prefix if '0' is 0 , if '10' is 1 , if '110' is
2 , if '1110' is 3 .... so forth

Chi

unread,

Apr 23, 2010, 9:43:38 AM4/23/10

to

I don't know much about compression. I have played around with BWT,
MTF, HUFFMAN-Encoding a bit. Mostly cut and paste work. But your
question looks interesting to me, because this string
00312102211330120132021133 looks like a quadtree to me. Generating all
possible strings like this can be done by computing a hilbert-curve.
This is a spatial-filing curve used to traverse a 2-dimensional plane.

> http://www.maptiler.org/google-maps-coordinates-tile-bounds-projection/
> http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves

>let's suppose we have a file made of 0,1,2,3 randomly distributed with
>"the same" probability (about 25% each one)
>there is a condition:
>we can't have a run of more than two identical digits
>as an example:
>00312102211330120132021133 is ok
>but
>00031222 is not ok, because we have three '0' and three '2' , we have

>three identical digits and this is not allowed

LawCounsels

unread,

Apr 24, 2010, 10:02:55 AM4/24/10

to

On 19 Apr, 12:17, James Dow Allen <jdallen2...@yahoo.com> wrote:

> On Apr 16, 4:27 am, LawCounsels <LawCouns...@aol.com> wrote:
>
> > ... BUT with added constraints/
> > restriction that the maximum length of '111....10' at any time could
> > only be at most log(base2)[R] where R is ...
>

> Here is a recurrence relation to find A(n),
> the number of legal strings of length n.
> A(1) = A(2) = A(3) = 1
> A(n) = A(n-1) + A(n-2) + ... A(ceil(n-log n))
> where log is base 2, so for example
> ceil(15-log 15) = ceil(15 - 3.907) = 12
>
> I don't know how to simplify that recurrence
> but it is easy to calculate A(n), at least for
> small n. Assuming each of the A(n) strings is
> equally likely, then
> n - log2 A(n)
> is the number of bits which could be saved with
> compression. Working some numbers it appears that
> a very simple formula *may* provide a good estimate
> of the bit savings. I'll make it clear with a
> specific example
> A(1878) ~= 2^1868 /* 10 bits can be saved */
> A(4989) ~= 2^4978 /* 11 bits can be saved */
> Thus, among the 3111 bits between the 1878'th
> and 4989'th bit, one bit can be saved.
> The inversed mean is 3125, very close to 3111,
> so roughly 1/N bits can be saved at the N'th bit!
>

> James Dow Allen

Yes James your enthropy formula is spot on , even if the # of bits
savings, SUM[1/n] n=1 to N eg 2^20 , does not grow fast (
http://www.jimloy.com/algebra/hseries.htm )

the suggest thought promising simply ranking backwards

from Least Significant unique prefix 1st , so all the smaller length
unique prefixes will dominates the initial
series & larger & larger unique prefixes can only appear as a valid
selection only possible at subsequent later & later stages where

log(base2)i allows [ i the present bit position ] , in fact produces
Index value requires 2x as many as original input # of bits ....
this is of no use

Pls you post your your 'specific tailored designed' adapted
ranked Index value derivation process steps ,

for input series of unique prefixes with above 'constraint/

restriction' , based on your enthropy formula above

Ernst

unread,

May 30, 2010, 9:54:33 PM5/30/10

to

I like that concept of generating all possible strings...

I do that..

Thanks for the heads op on a search topic.

--
-Ernst

-*-Math:Symbolism:Logic ~ Three_Amigos-*-

LawCounsels

unread,

Jul 5, 2010, 10:15:29 PM7/5/10

to

Pls post your your 'specific tailored designed' adapted

ranked Index value derivation process steps ,
for input series of unique prefixes with above 'constraint/
restriction'

Prize to 1st vworking .exe now increased to $150

yes will be neat to have closed form expression for the recurrence

of practical use would need algorithm .exe takes input series of M #
of unique prefixes '0' or '111...10' (constrained as described) with
total length = SUM[bits_length of ALL unique prefixes] , compresses
into smaller than SUM[bits_length of ALL unique prefixes] .... (?)
this would necessary requires ability to rank the input series of M #
of unique prefixes (constrained as described) into its Index # , and
to unrank an Index # into the original series of unique prefixes
(constrained as described) perhaps needs provides pass on parameter M
& SUM[bits_length of ALL unique prefixes]

>>. in a file with N bits (sufficient large like 8Kbits onwards to
1Mbits) , assume the distributions of unique prefix '0's or '10' or
'110' or '1110' ... or '111...10' (always ends with a '0') is
standard binomial power (ie random) , BUT with added constraints/
restriction that the maximum length of '111....10' at any time could
>>only be at most log(base2)[R] where R is the total # of bits from present bit position to the end Least Significant Bit position [ eg if bits_string is 0 111110 11110 10 0 10 110 10 0 10 0 10 0 0 ' then here 110 is the 13th bits , there can be no unique prefix of total# of bits length > log(base2)[R] at any time, where R is the bit position # of the unique prefix counting from end Least Significant position bit ....]

>>....so this is not simple regular usual normal binomial power series/ random distributions [ usual normal binomial power series/ random distributions is 'god gifted' not to be compressible] , but here there is added constraint/ restriction that eg '1110' ( of length 4 bits long) could not occur at position R = 15 or smaller (since log(base2) [R=15 or smaller value of R] will not allow unique prefix of total# of bits >= 4 to occur at position R < 16 ...... kindly please forward entropy calculations/solution to compress such bits_strings into smaller total # of bits (eg smaller max range of ranked Index)....

If understand correctly there are only

two legal strings of length 4:
10 0 0
0 0 0 0

So 3 bits can be saved when encoding a
string of length 4. Similarly, there are
exactly 8 legal strings of length 7, where
4 bits can be saved.

[ if there is an input of 4 bits long , the MSB unique prefix @ bit

position 4 can only valid possible be '0' or '10' ( log(base2)4 = 2
selections possible ) so there can be valid combinations of { 10 0

0 } or { 0 0 0 0 } ..... Yes , correctly stated above there are only

two legal strings of length 4 : { 10

0 0 } & { 0 0 0 0 } ]

Here is a recurrence relation to find A(n),
the number of legal strings of length n.
A(1) = A(2) = A(3) = 1
A(n) = A(n-1) + A(n-2) + ... A(ceil(n-log n))
where log is base 2, so for example
ceil(15-log 15) = ceil(15 - 3.907) = 12

don't know how to simplify that recurrence

but it is easy to calculate A(n), at least for
small n. Assuming each of the A(n) strings is
equally likely, then
n - log2 A(n)
is the number of bits which could be saved with
compression. Working some numbers it appears that
a very simple formula *may* provide a good estimate
of the bit savings. I'll make it clear with a
specific example
A(1878) ~= 2^1868 /* 10 bits can be saved */
A(4989) ~= 2^4978 /* 11 bits can be saved */
Thus, among the 3111 bits between the 1878'th
and 4989'th bit, one bit can be saved.
The inversed mean is 3125, very close to 3111,
so roughly 1/N bits can be saved at the N'th bit!

CAN USE SMALL CHUNKSIZE SO PRE_STORE LOOK_UP TABLE ENTRIES FOR THE
RECURRENCES MAPPING .... O(1) COMPUTATIONS SPEED

LawCounsels

unread,

Aug 8, 2010, 3:41:17 AM8/8/10

to

On Jul 6, 3:15 am, LawCounsels <LawCouns...@aol.com> wrote:
> Pls post your your 'specific tailored designed' adapted
> ranked Index value derivation process steps ,
> for input series of unique prefixes with above 'constraint/
> restriction'
>
> Prize to 1st vworking .exe now increased to $150
>

do you already knows Arithmetic Codings algorithm .... if so , has a
straight forward modification of Aritmetic coding job for you to
complete (prefers in C# source codes)

This presents a basic arithmetic coding implementation, if you have
never implemented an arithmetic coder, this is the article which suits
your needs, ...
www.arturocampos.com/ac_arithmetic.html - Cached - Similar

Our modification to Arithmetic Coding algorithm belows is very
simple : the probability of certain unique prefix length ('0' or '10'
or '110' or '111...10' of bits_length P -1 or '111...11' of
bits_length P -1) at each remaning as yet to be encoded bits_length
position of R , depends on the progressive smaller tabulated
Table's P value

... so the probability Table shown in algorithm below , is now made to
vary depending on present bits position R

Arithmetic has 'internal' 0 costs model of binomial power series
distributions of unique prefix lengths '0' '10' '110' '111...10' &
'constraint' unique prefix lengths at bits_length R , since this
'static' Model needs not be transmitted :

when subsequent stage is at bits_length R , from look-up Table's P
value Arithmetic Coder knows the max unique prefix length will be = P
# of bits ( eg '1110' consists of 4 bits , '0' consists of 1 bit ) ,
thus at this stage with bits_length of R the probabilities of
occurrences of '0' or '10' or '111...10' (of max possible.... P # of
bits long) is known to continue to be binomial power series EXCEPT
that look-up Table's P value determines the max possible unique prefix
length here .... so the present 'interval' will now be divided
according to present probabilities of these P # of unique prefix
lengths

Arithmetic coding
The idea behind arithmetic coding is to have a probability line, 0-1,
and assign to every symbol a range in this line based on its
probability, the higher the probability, the higher range which
assigns to it. Once we have defined the ranges and the probability
line, start to encode symbols, every symbol defines where the output
floating point number lands. Let's say we have:
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.7.5 , 1.0)

Note that the "[" means that the number is also included, so all the
numbers from 0 to 5 belong to "a" but 5. And then we start to code the
symbols and compute our output number. The algorithm to compute the
output number is:

Low = 0
High = 1 Loop. For all the symbols.
Range = high - low
High = low + range * high_range of the symbol being coded
Low = low + range * low_range of the symbol being coded
Where:
Range, keeps track of where the next range should be.
High and low, specify the output number.
And now let's see an example: Symbol Range Low value High value
0 1
b 1 0.5 0.75
a 0.25 0.5 0.625
c 0.125 0.59375 0.625
a 0.03125 0.59375 0.609375
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.75 , 1.0)

The output number will be 0.59375. The way of decoding is first to see
where the number lands, output the corresponding symbol, and then
extract the range of this symbol from the floating point number. The
algorithm for extracting the ranges is: Loop. For all the symbols.
Range = high_range of the symbol - low_range of the symbol
Number = number - low_range of the symbol
Number = number / range
And this is how decoding is performed: Symbol Range Number
b 0.25 0.59375
a 0.5 0.375
c 0.25 0.75
a 0.5 0
Symbol Probability Range
a 2 [0.0 , 0.5)
b 1 [0.5 , 0.75)
c 1 [0.75 , 1.0)

You may reserve a little range for an EoF symbol, but in the case of
an entropy coder you'll not need it (the main compressor will know
when to stop), with and stand-alone codec you can pass to the
decompressor the length of the file, so it knows when to stop. (I
never liked having an special EoF ;-)

LawCounsels

unread,

Aug 9, 2010, 1:08:06 AM8/9/10

to

are you able deliver this :

1. What kind of files do you plan to compress text files, pictures,
movies ...?

only for file of few thousand bits to tens of thousand bits ,
consisting only of Unique Prefixes '0' or '10' or '110' or ...
'111...10' (max bits length = P# of bits long ... '0' has bits_length
1 , '1110' has bits_length 4)

max P value = log(base2)R where is the the current bits position
length ... eg 8 bits long bits_string '110 0 10 10' initially at bit
position R of 8 can have valid selections '0' or '10' or '110' only (P
= log(base2)8 = 3) , after the 1st selection of '110' there remains
only 5 bits so next only valid selection next at R = 5 bits length
is '0' or '10' ( P=log(base2)5 = 2) .... so forth

NOTE : the distributions of unique prefixes lengths(symbols) is
binomial power series (ie '0' occurs twice as frequent as '10 &
'10' occurs twice as frequent as '110' .... so forth , AC able use
this as Model) , but with our further constraint here that max
possible unique prefix length when at remaining R bits_length =
log(base2)R then AC can fine-tune this , at remaininmg bits_length R
it can select only unique prefix of lengths =< log(base2)R & these
'restricted' valid possible selections(symbols) together added made up
100% of symbols probabilities [ even though earlier larger R
remaining bits_length have a much larger possible unique prefix
lengths possible ] .... this is 'source' of further encoding savings
over 'normal' AC Model which unnecessary allows the earlier much
larger unique prefix lengths to occur anywhere else (even at R=4 which
admits of only '0' or '10' possible valid)

you may generate such test input 'constraint' unique prefix lengths ,
seeded with binomial power series & constraint

2. Do you want only the compressor part or the decompressing tool
also ?

both

LawCounsels

unread,

Aug 10, 2010, 12:42:36 AM8/10/10

to

On 9 Aug, 06:08, LawCounsels <lawcouns...@gmail.com> wrote:
> are you able deliver this :

its straight forward , 1st sets up Arithmetic Coder's symbols
probabilities Model to provide usual binomial power series
distributions of unique prefix lengths ('0' twice as frequent as '10.
'10' twice as frequent as '110' ... so forth) , then simply adjust
Model to provide the same usual binomial power series symbols
probabilities BUT with max unique prefix length(symbol length) when
remaining bits_length is R bits long now limited tobe max
log(base2)R bits long