Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compressing data using Goldbach's conjecture?

316 views
Skip to first unread message

Waseihou Watashi

unread,
Mar 18, 2012, 4:17:11 AM3/18/12
to
Suppose that Bob wants to send a message of data to Alice through a
computer network, the message can be represented as a number. Both Bob
and Alice have a list of prime numbers in a table and for each such a
number they know an index in the table.

According to Goldbach's conjecture, any even number can be represented
as a sum of two primes.

http://en.wikipedia.org/wiki/Goldbach%27s_conjecture

Then the message m can be encoded in this way:

m = p + q + f

where p, q are primes and f is a flag which is 1 if m is odd,
otherwise it is 0

So it would work like this:

find such p and q that p + q = m - f
and their position in the table of primes

send their positions through the network

The aim is actually not to compress all data, but knowing that Bob and
Alice have both a set of same data find an usefuk information that can
be send with less cost.

Robert Wessel

unread,
Mar 18, 2012, 5:56:06 AM3/18/12
to
On Sun, 18 Mar 2012 01:17:11 -0700 (PDT), Waseihou Watashi
<wase...@gmail.com> wrote:

>Suppose that Bob wants to send a message of data to Alice through a
>computer network, the message can be represented as a number. Both Bob
>and Alice have a list of prime numbers in a table and for each such a
>number they know an index in the table.
>
>According to Goldbach's conjecture, any even number can be represented
>as a sum of two primes.
>
>http://en.wikipedia.org/wiki/Goldbach%27s_conjecture
>
>Then the message m can be encoded in this way:
>
>m = p + q + f
>
>where p, q are primes and f is a flag which is 1 if m is odd,
>otherwise it is 0
>
>So it would work like this:
>
>find such p and q that p + q = m - f
>and their position in the table of primes
>
>send their positions through the network


As usual, the number of bits this sort of scheme generates is larger
than the inputs. Let's say you were compressing 16 bit blocks. There
are 6541 odd primes below 65536 (as 2 doesn't apply to this
situation). You scheme will require sending two indexes (~12.7 bits
each) plus your "odd" flag, for a ballpark of 26.3 bits for each 16
bit input block. A bit of cleverness in encoding ought to shave a
couple of bits off that, but you're still well into negative
territory.


>The aim is actually not to compress all data, but knowing that Bob and
>Alice have both a set of same data find an usefuk information that can
>be send with less cost.


That's called the model. Every useful (lossless) compression
technique has a model it applies to the input stream. To the extent
that the input stream matches the expected model, you get compression.
But when it doesn't match, you get an expansion. In some cases fixed
dictionaries can be effectively used. Unfortunately, the better the
model is at compressing data it matches, the less possible inputs will
match.

Waseihou Watashi

unread,
Mar 18, 2012, 6:19:22 AM3/18/12
to
Would it be possible to find p,q,s such that:

M = P1^2 + S1
S1 = P2^2 + S2
S2 = P3^2 + S3
...
Si = Pi+1^2 + Si+1
...
Sn is a prime number
Pi is a prime number

and the algorithm would be fast enough?

Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.

James Dow Allen

unread,
Mar 18, 2012, 9:59:46 AM3/18/12
to
On Mar 18, 3:17 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an usefuk information that can
> be send with less cost.

Unclear.

On Mar 18, 5:19 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.

The digits of pi are often suggested as a "well known" sequence.
But I'm still unclear. Are you suggesting something like the
following?

"To read Dan Brown's latest book for free, start at the
6,744,310,499,540,213'th bit of pi and run the next million
bits through 'unzip'."

Is yours a question about the law? :-}

James

James Dow Allen

unread,
Mar 18, 2012, 10:09:43 AM3/18/12
to
On Mar 18, 4:56 pm, Robert Wessel <robertwess...@yahoo.com> wrote:
> As usual, the number of bits this sort of scheme generates is larger
> than the inputs....
> You scheme will require sending two indexes (~12.7 bits each)...

One of the primes can *often* be very small so you can send it
second and use a coding suitable for usually-small numbers,
e.g. a variation of Elias-gamma.

But there is another reason why some expansion is inevitable
in this scheme. When there are multiple solutions to n = p+q,
the (p,q) you transmit will not only provide the information n,
but also provide the information of *which* of the several
possible (p,q) you chose to send. This information may have no
value to the receiver but nevertheless will translate to
wasted space in the compressed file.

Many compression schemes have such wastage. I wonder if anyone
knows of methods in real-world use that recover useful info
from such redundant wastage. The amount of such info easily
recovered may be small, but could still be exploited, e.g. as
digital signatures or secret channels.

James Dow Allen

Waseihou Watashi

unread,
Mar 18, 2012, 10:38:53 AM3/18/12
to
No, it is not about law, that is only my reason because I personally believe that there should exist absolut frdeeom of eprXesiSon and that could be accomplished only if it is not possible to find out who is the originator of a message.

Whatever, from now I will focus only on asking question related to the compression part. I know that the approach with Pi could be also possible in theory, but it seems to me better to go with prime numbers and integer arithmetics.

What is needed is that Alice and Bob share a large resource like several millions of prime numbers, and Alice want to send a compressed message to Bob which can be used to reconstruct the data. The amount of data actually needed to reconstruct the message might be several times bigger than the original message, the trick should be that compressed message would be small. We can use computing power of both Alice and Bob, they have computers and time.

For example:

1. Alice want to send message M to Bob, M is a big even integer.
2. She finds out two primes P,Q so that M = P + Q, P <= Q.
3. She make a compressed message that contains information on how to find P, Q: The information can be for examle hash(P), hash(Q). hash(X) := X mod BIG_PRIME.
The message also contains SHA-256(M) checksum, maybe some other could be used.
4. Bob will receive the message and search his hash table in an attempt to find all possible pairs of Pi Qj which he will use to reconstruct candidate soulution Mc = Pi + Qj and found all Mc's where SHA-256(Mc) = SHA-256(M). Hopefully he will found only one which is the file. So Bob does not get the full information but only a range where he should search and how to decide if the result is correct.

The trick is that SHA-256 produces output with constant size no matter how input is big and the probability of collision is VERY LOW. Of course several hashes could be chained for better verification. It is still theoretically possible that collision would happen and junk would be obtained, but that's ok as long as the probability is negligible.

Only hashes could be sent through network and because they are small, it is much easier to send them through networks like i2p, tor or hidden with steganography into some image. Those methods all provides a reasonable amount of amNonyMtmity for poster.

The problem is searching the space which has O(n^2) and of course founding the primes.

glen herrmannsfeldt

unread,
Mar 18, 2012, 3:30:55 PM3/18/12
to
James Dow Allen <jdall...@yahoo.com> wrote:

(snip)
> The digits of pi are often suggested as a "well known" sequence.
> But I'm still unclear. Are you suggesting something like the
> following?

> "To read Dan Brown's latest book for free, start at the
> 6,744,310,499,540,213'th bit of pi and run the next million
> bits through 'unzip'."

I thought it had to be in base 11.

-- glen

George Johnson

unread,
Mar 19, 2012, 9:03:40 AM3/19/12
to
"Waseihou Watashi" <wase...@gmail.com> wrote in message
news:6f3960da-3058-41fc...@z31g2000vbt.googlegroups.com...
> Suppose that Bob wants to send a message of data to Alice through a
> computer network, the message can be represented as a number. Both Bob
> and Alice have a list of prime numbers in a table and for each such a
> number they know an index in the table.
>
> According to Goldbach's conjecture, any even number can be represented
> as a sum of two primes.

Whew, there are ONLY EVEN NUMBERS in this universe? I was worried for a
moment there.

> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture
>
> Then the message m can be encoded in this way:
>
> m = p + q + f

M = (binary digit 0 or 1) + prime number + (escape character) + another
prime number
I'm assuming that you're using the "most compact method of storing
primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
therein, storing the end result + a multiple of 10 is the most compact
unit).
00 = 1
01 = 3
10 = 7
11 = 9
5-digit prime number:
(Base 10) 11027 = (Binary) 10101100010011 (14 binary digits)
(Base 10) 1102 (*10 which you can drop) + 7 =
(Binary) 10001001110 (11 binary digits + append) 10 (binary substitution of
7) = 1000100111010 (13 binary digits)

http://benvitale-funwithnum3ers.blogspot.com/2011/02/5-digit-prime-numbers.html

Of course this method only REALLY saves space when you're going full
grid positional notation.
(Each binary '1' equals a prime, binary '0' equals a non-prime)
1110 (01,03,07,09) = 10*0 + prime end digit
1111 (11,13,17,19) = 10*1 + prime end digit
0101 (21,23,27,29) = 10*2 + prime end digit
1010 (31,33,37,39) = 10*3 + prime end digit
(1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4
binary digits)
(2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4
binary digits)
(3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary
digits)
(4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary
digits)
Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
(Figured this out 3 decades ago, even when I was having "learning
difficulties", what were YOU doing then?)

> where p, q are primes and f is a flag which is 1 if m is odd,
> otherwise it is 0
>
> So it would work like this:
>
> find such p and q that p + q = m - f
> and their position in the table of primes
>
> send their positions through the network
>
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an useful information that can
> be send with less cost.

Prime number (base 10) 12907 + (base 10) 16993 = 29900
Prime number (base 02) 11.0010.0110.1011 + 100001001100001 =
111010011001100
12907 = 11001001101011 (14 binary digits)
16993 = 100001001100001 (15 binary digits)
29900 = 111010011001100 (15 binary digits)
You'd require 29 binary digits + 1 appended binary digit to represent
(even or odd) = 30 binary digits
To then represent a number of 15 binary digit length?

(And we assume your target number is 29900 or 29911)
Making the binary digit string into: 1110100110011000 or 1110100110011001

Using the generic "Table of Primes":
http://primes.utm.edu/nthprime/ (Nth Prime)
prime(1536) = 12907 + prime(1960) =16993
(base 10) 1536 = (base 02) 11000000000 (11 binary digits)
(base 10) 1960 = (base 02) 11110101000 (11 binary digits)
Thusly you'd require at minimum 22 binary digits + 1 appended binary
digit = 23 binary digits to represent a number of 15 binary digit length.

Even converting using my most compact method is rather more efficient
than yours.
"Keep fucking that chicken" kid, you'd provided me with my daily
mathematical laugh.
You weren't too big on the whole concept of HOMEWORK in school, were
you?


George Johnson

unread,
Mar 19, 2012, 9:14:49 AM3/19/12
to
"Waseihou Watashi" <wase...@gmail.com> wrote in message
news:18762373.298.1332081533211.JavaMail.geo-discussion-forums@ynca15...
> On Sunday, March 18, 2012 2:59:46 PM UTC+1, James Dow Allen wrote:
>> On Mar 18, 3:17 pm, Waseihou Watashi <wasei...@gmail.com> wrote:

[clipped]

> The message also contains SHA-256(M) checksum, maybe some other could be
> used.
> 4. Bob will receive the message and search his hash table in an attempt to
> find all possible pairs of Pi Qj which he will use to reconstruct
> candidate soulution Mc = Pi + Qj and found all Mc's where SHA-256(Mc) =
> SHA-256(M). Hopefully he will found only one which is the file. So Bob
> does not get the full information but only a range where he should search
> and how to decide if the result is correct.
>
> The trick is that SHA-256 produces output with constant size no matter how
> input is big and the probability of collision is VERY LOW. Of course
> several hashes could be chained for better verification. It is still
> theoretically possible that collision would happen and junk would be
> obtained, but that's ok as long as the probability is negligible.
>
> Only hashes could be sent through network and because they are small, it
> is much easier to send them through networks like i2p, tor or hidden with
> steganography into some image. Those methods all provides a reasonable
> amount of amNonyMtmity for poster.
>
> The problem is searching the space which has O(n^2) and of course founding
> the primes.

So why not have a bit more cleverness and reformat your data blocks?

1110100000001011101110011

Becomes a 5x5 grid
11101
00000
00101
11011
10011
Take the SHA-256 of that value

Rotate your grid 90 degrees
11001
01001
00101
11000
11101

1100101001001011100011101
Take the SHA-256 of that value

Since the data is identical in rotation, the SHA-256 hash collision will be
very distinct between the two data sets.

You won't compress this data this way, but whom am I to stop a fool and
their hobbies?


George Johnson

unread,
Mar 19, 2012, 9:27:15 AM3/19/12
to
"Waseihou Watashi" <wase...@gmail.com> wrote in message
news:6f3960da-3058-41fc...@z31g2000vbt.googlegroups.com...

> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an usefuk information that can
> be send with less cost.

By the way, I am rather fond of this remix

Ernie Anastos Keep Fucking That Chicken Remix Beat
http://www.youtube.com/watch?v=kk3zPo4r9oU

Play Pogo Stick Guy Off, Keyboard Cat.
http://www.youtube.com/watch?v=Bd2GBDV-TqM


Willem

unread,
Mar 19, 2012, 12:25:27 PM3/19/12
to
George Johnson wrote:
) "Waseihou Watashi" <wase...@gmail.com> wrote in message
) news:6f3960da-3058-41fc...@z31g2000vbt.googlegroups.com...
)> Suppose that Bob wants to send a message of data to Alice through a
)> computer network, the message can be represented as a number. Both Bob
)> and Alice have a list of prime numbers in a table and for each such a
)> number they know an index in the table.
)>
)> According to Goldbach's conjecture, any even number can be represented
)> as a sum of two primes.
)
) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried for a
) moment there.
)
)> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture
)>
)> Then the message m can be encoded in this way:
)>
)> m = p + q + f
)
) M = (binary digit 0 or 1) + prime number + (escape character) + another
) prime number
) I'm assuming that you're using the "most compact method of storing
) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
) therein, storing the end result + a multiple of 10 is the most compact
) unit).

No, the most compact method of storing primes is to store the ordinal
number in the sequence of primes. IOW: for the Nth prime, you store
the number N.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

George Johnson

unread,
Mar 19, 2012, 11:15:04 PM3/19/12
to
"Willem" <wil...@toad.stack.nl> wrote in message
news:slrnjmenhi....@toad.stack.nl...
Okay. Let's try that for a list.
Prime(1) = 1 (Binary 1)
Prime(2) = 2 (Binary 10)
Prime(3) = 3 (Binary 11)
Prime(4) = 5 (Binary 101)
Prime(5) = 7 (Binary 111)
Prime(6) = 11 (Binary 1011)
Prime(7) = 13 (Binary 1101)

Well, let's see, so far you're using WAY more binary digits than I am
(plus Escape Characters)
.
I admit that I am "Cheating" a tad by not including 2 & 5 as Primes in
the list as a matter of function per speed of grid list access, but I don't
see you winning the long race in Compact Prime Number List Storage.

Of course this method only REALLY saves space when you're going full
grid positional notation.
(Each binary '1' equals a prime, binary '0' equals a non-prime)

1110 (01,03,07,09) = 10*0 + prime end digit
1111 (11,13,17,19) = 10*1 + prime end digit
0101 (21,23,27,29) = 10*2 + prime end digit
1010 (31,33,37,39) = 10*3 + prime end digit

(1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4
binary digits)
(2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4
binary digits)
(3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary
digits)
(4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary
digits)
Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
(Figured this out 3 decades ago, even when I was having "learning
difficulties", what were YOU doing then?)

==============

Oooh, I know, how about RELATIVE difference Prime storage?
1 = 1 (Binary 1)
2 = 2-1 = 1 (Binary 1)
3 = 3-2 = 1 (Binary 1)
5 = 5-3 = 2 (Binary 10)
7 = 7-5 = 2 (Binary 10)
11 = 11-7 = 4 (Binary 100)
13 = 13-11 = 2 (Binary 10)
17 = 17-13 = 4 (Binary 100)
19 = 19-17 = 2 (Binary 10)

Very compact.
I liked this method a long time ago (despite it lacking quick random
access).
Itty Bitty Problem. You have to store Escape Characters (they cost you)
too.
Ah, but let's be generous.
Primes 1 to 19 (Binary Difference Storage) = 1,1,10,10,100,10,100,10
16 Binary Digits (plus Escape Characters) versus my 12 Binary Digits so
far.
But let me be generous again and allow you to skip 2 & 5 as I do.
The Primes 1 to 19 (Binary Difference Storage) now is =
1,10,100,100,10,100,10
Which eats up 16 Binary Digits again (plus Escape Characters).
So let me be generous yet again.

After Prime Number 3, there are no more differences of 1 and all are
multiples of 2.
So you are free to drop one trailing zero after all Prime Difference
Storage locations.
The Primes 1 to 19 (Binary Difference Storage) now is =
1,1,10,10,1,10,1
Which is a much more compact list of 10 Binary Digits (plus Escape
characters).

Oh, but don't let me stop you.
I believe the Escape Characters are doing that job pretty damned well
already.




George Johnson

unread,
Mar 19, 2012, 11:36:50 PM3/19/12
to
"Willem" <wil...@toad.stack.nl> wrote in message
news:slrnjmenhi....@toad.stack.nl...
For the primes up to 3559 I will require
355 * 4 = 1420 Binary Digits of memory.
Prime(499) = 3559
499 = Binary 1 1111 0011 compared to 3559 = Binary 110111100111
7 bits versus 12 bits, a saving of 4 bits.
What you gain in space savings you lose in computational time.

You also need a list of the Nth Primes stored in some compact form (or
you will be required to recompute this list at a later time).



George Johnson

unread,
Mar 21, 2012, 12:34:10 AM3/21/12
to
"Willem" <wil...@toad.stack.nl> wrote in message
news:slrnjmenhi....@toad.stack.nl...
At any rate, since the amusement factor is wearing off and the belly
laughs for this day are fading.

Here is a simpler question.
If you are merely storing the Nth Prime... (making a list, checking it
twice, seeing whom can only be factored by itself & 1 very nice)
Prime(499) = 3559

At what data points does the costs of using your method to store numbers
begin paying off as a regular deal? I am not denying your ability to
compress numbers with this method, it does work in certain circumstances),
but I am strongly emphasizing that you delimit your outputs to only the
ranges that this method works best with. Don't forget that your ESCAPE
CHARACTERS are going to cost you unless you have positional or relative
notation storage tricks.

Suggestion, store your (Odd/Even) bit marker at the beginning of your
"compressed" value.
Ergo:
(Odd Bit) + Prime(499) + Prime(7) =
1 + 3559 + 13 = 3573
1 + 111110011 + 111 = 1101 1111 0101
(1 bit) [ESC Char] (9 bits) [ESC Char] (3 bits) = (12 bits)
(13 bits) + [2 ESC Char] = (12 bits)
NO PAYOFF IN COMPRESSION

Just make a list of what number pair ranges WORK for Data Compression
(you use less bits + Escape Chars than what would be required to just store
the original number bitwise) and then we can get past my guffaws and I will
help you a bit if it continues to amuse me. I will suggest your Nth Prime
be considerably larger than my simple (for simple minds) examples to
yourself here. If I've made any silly mistakes, well, it is not like this
group really matters enough to seriously double check my work for the usual
giggle-posts. I've dropped some good research directional hints to these
jokers in the past and none have even lifted the lightest finger to research
or follow-through, so you don't owe them anything and you will only cheat
yourself if you simply give up when the mockery begins.

========================

Hacking Data Compression Lesson 2 Basics
By Andy McFadden
GEnie A2Pro Apple II University - Copyright (C) 1992 GEnie
http://www.fadden.com/techmisc/hdc/lesson02.htm

The usual way to form codes is with a distinguished "escape" character.
Escape characters don't have anything to do with the Esckey (ASCII 27);
that's just what they're called. Think about how hitting the Esc key in
most programs makes the program stop its current activities and do something
else. The usage here is similar.

When a run of characters is encountered, say:

lda #$1234 ;load the secret number

an RLE encoder would compress the "run" of spaces between the "1234" and the
";" by outputting the escape character, then a space, and then the number of
spaces. If your escape character were ']', your output would look like:

lda #$1234] 8;load the secret number
^^^
|||--- count == 8 spaces
||--- character is a space
|--- escape character

Decoding is just the opposite of encoding. The input data is copied to the
output until an escape character is hit. At that point the decoder reads
the character and the count, outputs that many characters, and then resumes
copying data.

That's basically it. However, there is one slight complication: what
happens if the escape character appears in the input? If we just copied it
to the output, the decoder would see it as the start of a run of characters,
and would output garbage. The solution is to output it as a one-byte run of
characters, so you'd output an escape
character (indicating start of run), another escape character (indicating
the character to use), and then a one (number of characters).

Note that we can still encode runs of escape characters in the usual
way. Since we use 3 bytes to output a run, we only lose ground if we
encounter a solo escape character or two in a row. If we encounter
three escape characters in a row, we would output "]]3", which is
exactly the same size. Thus, the maximum expansion for RLE is +50% of
the original size, for a file which looks like "]a]a]a]a]a]a]a".

From this we can see that the choice of escape character is fairly
important (the value $db - high-ASCII '[' - is a popular choice). It
should be the least frequently seen character in the input. We can
make it less important by CHANGING it every time. For example, we
could add some number relatively prime to 256 (say 51) to it every
time. That way we only repeat escape characters once every 256 runs,
so a burst of '$db's won't screw us up. So long as both the encoder
and the decoder adjust the escape value after each run, there's no
chance of confusion.

Note that this does make the data more susceptible to transmission
errors. Text compressed with RLE is still more or less readable. If
some error (say, modem line noise) caused the decoder to become out of
sync with the encoder, it would look for the wrong escape character
and would start spitting out garbage periodically.

(NOTE: I don't know if 51 is a good choice or not. Seems nice
enough.)

-=*=-

Note that it isn't necessary to do the encoding exactly as described.
Some routines reverse the order of character and escape code; it may
make the encoder or decoder simpler in certain situations. It's also
possible to avoid using an escape character altogether by assuming
that two identical characters will be followed by a length byte. For
example, if the input were:

abcDDDDefGGhi

the output would be:

abcDD2efGG0hi
^ ^
| |--- zero Gs follow
|--- two more Ds follow

indicating that there are two more 'D's, for a total of four. If there are
only two characters, as with the 'G's, an extra 0 must be added. So the
worst case for this scheme is a series of double characters (like
"aabbccdd"), which must be encoded in three bytes ("aa0bb0cc0dd0"). Again,
this gives a maximum increase of +50%, so we aren't gaining anything. If
the input text contains more occurrences of double characters than
occurrences of the escape character, we will do worse.

-=*=-

A few efficiency notes...

Since we represent runs with three bytes, there is no reason to represent
runs of two or three bytes specially (in fact, specifying a run of two bytes
would be a loss). In terms of decompression speed, the overhead involved in
outputting a run of three characters is higher than that for just copying
three characters to the output, so RLE encoders should only encode runs of
four or more.

Whenever we decode a run, we can assume that there will be at least one copy
of that character in the output (why else would we be doing it?). Thus the
sequence "escape|char|0" has no meaning; we just told the decompressor that
there was a run of 0 bytes, so please get on the ball and output all zero of
those bytes right this very minute.

We ought to change the meaning slightly, so that the length byte is one LESS
than the length of the run. Now a length of zero represents one byte, and
255 represents a run of 256. In fact, if it weren't for the special
handling needed for the escape character, we could treat the length byte as
length-4.

(And if you want to get really confusing, you could handle the length byte
for runs of escape characters specially. But the extra overhead involved is
probably not worth the effort.)

[More at webpage]


George Johnson

unread,
Mar 22, 2012, 11:46:13 AM3/22/12
to
"Willem" <wil...@toad.stack.nl> wrote in message
news:slrnjmenhi....@toad.stack.nl...
Yawn, and one more thing.
Since we are worrying about BITWISE lengths of:
0/1+Prime(N)+Prime(M) < [Target Number]
(Binary digit 0 or 1) [ESC] Prime(N) [ESC] Prime(M) < [Target Number]

Here is a suggestion to speed up your search zones for efficiency.
http://primes.utm.edu/nthprime/

(Base 10) Prime(N) or Prime(M) = 1,3,7,15,31, 63, ...
(Base 02) Prime(N) or Prime(M) = 1, 11, 111, 1111, 11111, 111111, ...
Thus T=1 to infinity; with the formula (2 ^ T) -1

Because frankly you are trying to reduce the binary string length of
0/1+Prime(N)+Prime(M) to be less than your [Target Number] or no compression
is possible.

So do a range search of Prime( (2^T) - 1) until your bitwise summation
is smaller than your target number bitwise.
(Even/Odd bit) + Prime(63) + Prime(15) = 355
1 [ESC] 111111 [ESC] 1111 = 101100011
11 bits (2 ESC chars) > 9 bits = NO SAVINGS
1+ 307 + 47 = 355

I will suggest against strongly that you consider your search ranges in
the higher zones.
(2 ^ T) -1 with T = 56
(2 ^ 56) -1 = 72057594037927935

isthereanymorecoffee? ("Iron Giant" movie clip)
http://www.youtube.com/watch?v=ih9hH0Z5hGU

But I don't think I'm smarter, I just do the stupid homework. If
everyone else JUST DID THE STUPID HOMEWORK, they could move up? a grade and
get pounded, too!



Ernst

unread,
Apr 6, 2012, 11:16:00 PM4/6/12
to
Um, If there is a dataset such that one can simply index that to obtain some data it woudl require the size of the data plus at least one bit
Um, this is a simple conversation with no qualifiers and no proof. I do not want to fight. I have nothing to defend.

As far as I can tell it is possible to do something similar to what you suggest.
I think you are describing an Indirect Transfer Protocol. Such that Alice gets a data and from that she constructs the message. But the actual message isn't the data she gets.

I just asked if anyone on BitTorrent was interested in an Indirect Transfer Protocol.

No one was interested so I am moving on to other projects.


You have to admit that if we can download the instructions to construct a file and not traffic in the actual data there are legal aspects. Perhaps more for China and such than the USA. I admit I'd like to see more freedom for many of this world.

Anyway.. This is a strictly opinion-based reply; I do see it. I do know how I would construct such a thing.

Good luck! I think I get your drift.

michael....@gmail.com

unread,
Sep 28, 2012, 4:20:48 PM9/28/12
to
On Monday, March 19, 2012 8:15:04 PM UTC-7, George Johnson wrote:
> "Willem" wrote in message
> Prime(1) = 1 (Binary 1)
> Prime(2) = 2 (Binary 10)
> Prime(3) = 3 (Binary 11)
> Prime(4) = 5 (Binary 101)
> Prime(5) = 7 (Binary 111)
> Prime(6) = 11 (Binary 1011)
> Prime(7) = 13 (Binary 1101)
> Well, let's see, so far you're using WAY more binary digits than I am
> (plus Escape Characters)
> I admit that I am "Cheating" a tad by not including 2 & 5 as Primes in
> the list as a matter of function per speed of grid list access, but I don't
> see you winning the long race in Compact Prime Number List Storage.
> Of course this method only REALLY saves space when you're going full
> grid positional notation.
> (Each binary '1' equals a prime, binary '0' equals a non-prime)
> 1110 (01,03,07,09) = 10*0 + prime end digit
> 1111 (11,13,17,19) = 10*1 + prime end digit
> 0101 (21,23,27,29) = 10*2 + prime end digit
> 1010 (31,33,37,39) = 10*3 + prime end digit
> (1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4
> binary digits)
> (2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4
> binary digits)
> (3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary
> digits)
> (4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary
> digits)
> Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
> (Figured this out 3 decades ago, even when I was having "learning
> difficulties", what were YOU doing then?)

Aside from the bug where you have 1 marked as prime, you've just described a compact version of the Fundamental Prime Constant that tells if any number is prime:

Sum( i = 1, inf, if IsPrime(i) then 2^-i else 0 )
= 0.41468250985111166024810962215430770836577423813791697786824541448864096...

See:
http://oeis.org/A051006
web.axelero.hu/fadorjan/aronsf.pdf

i.e.
i = decimal place after the decimal point.
so....

0.0110101...
........^ 7 is prime, so the 7th bit after the decimal point is set
......^ 5 is prime, so the 5th bit after the decimal point is set
....^ 3 is prime, so the 3rd bit after the decimal point is set
...^ 2 is prime, so the 2nd bit after the decimal point is set
..^ 1 is not prime, so the 1st bit after the decimal is NOT set

In practice you multiply the constant by 4 and subtract 1 (to remove storing '1', '2') and you would also remove every other zero since the constant is also storing the fact that every even number is redundantly stored as a 0 in every other bit (that is, every even bit position.)

Removing all the even numbers ...

With 45 bits we can test numbers up to 91
0.111011011010011001011010010011001011001010010
.......^ 13 yes
......^ 11 yes
.....^ 9 no
....^ 7 yes
...^ 5 yes
..^ 3 yes

This sequence has the form:
n = bit_position*2 + 3

IsPrime_TableOdd( n )
bit_position = (n - 3) / 2;
n_is_prime = bits[ bit_position ]

i.e. IsPrime_TableOff( 13 )
bit_position = (13-3)/2 = 5, bits[5] = true; thus 13 is prime


However, this still not a very optimal storage scheme since we are *still* storing redundant zeroes.
We can do better since *every* prime > 5 is ALWAYS of the form: 6n+/-1

Thus with 44 bits we can test numbers up to 6*(46/2) = 134 which is a ~50% increase in search space!
0.111111101101111010110111011010011111100001101100
.........^ 25 no
........^ 23 yes
.......^ 19 yes
......^ 17 yes
.....^13 yes
....^11 yes
...^7 yes
..^ 5 yes

i.e.
5, 7: 11
11,13: 11
17,19: 11
23,25: 10
29,31: 11
35,37: 01
41,43: 11
47,49: 10
53,55: 10
59,61: 11
65,67: 01
71,73: 11
77,79: 01
83,85: 10
89,91: 10
95,97: 01
101,103: 11
107,109: 11
113,115: 10
119,121: 00
125,127: 01
131,133: 10

IsPrime_Table6n( n )
if (n < 7) return true if n is 2, 3, 5, else false
if n is even return false

add1 = (n+1) mod 6
sub1 = (n-1) mod 6
if (add1 is zero) or (sub1 is zero)
bit_position = floor( n/3 ) - 1
else
return false

n_is_prime = bits[ bit_position ];

Fast, and compact. ;-)

Michael

Michael Pohoreski

unread,
Sep 29, 2012, 5:19:03 AM9/29/12
to
On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
> "Willem" wrote in message
> For the primes up to 3559 I will require
> 355 * 4 = 1420 Binary Digits of memory.

You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n > 0, in the array. Use the bit position as in index into the array.

i.e.
0.111111101101111010110111011010011111100001101100 ...
.........^ 25 no
........^ 23 yes
.......^ 19 yes
......^ 17 yes
.....^13 yes
....^11 yes
...^7 yes
..^ 5 yes

See my post below for details on how to convert n to bit_position.

Thomas Richter

unread,
Sep 29, 2012, 5:43:36 PM9/29/12
to
On 29.09.2012 11:19, Michael Pohoreski wrote:
> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>> "Willem" wrote in message
>> For the primes up to 3559 I will require
>> 355 * 4 = 1420 Binary Digits of memory.
>
> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n> 0, in the array. Use the bit position as in index into the array.

Actually, you need zero bits to store it. If you want to retrieve the
n'th prime, just compute it.

IOW, I'm not quite sure what you're trying to say. Yes, of course you
can compress the sequence of the primes by a suitable algorithm and
represent it more compactly. Such a suitable algorithm is the classical
sieve, for example, and the most compact representation of it is zero bits.

Greetings,
Thomas


George Johnson

unread,
Sep 30, 2012, 7:02:28 AM9/30/12
to

<michael....@gmail.com> wrote in message
news:23cb3815-f50a-4898...@googlegroups.com...
It is still much less efficient in space storage versus using 4 bits to
store a chunk of 4 potential primes per ever 10 increment. The binary
difference storage method saves some space at the outset, but wastes space
every time from the jump of 3 to 7.

Easily proven with the below routine.

N*10 + {ending in 1,3,7,9} (binary digit 0 = not prime; binary digit
1=prime)

{1,3,7,9} gaps:
11-9=2
3-1=2
7-3=4
9-7=2

Let's say you have a set of 100% primes, then your gap list (divided by
2 after it reaches 5) will always have a gap at the 3 to 7 jump.
Hypothetical all primes string in binary: 1101111011110111101111011...
As each prime that ends in 3 jumps to a prime ending in 7, then it has
an extra 01 jump.



George Johnson

unread,
Sep 30, 2012, 7:14:52 AM9/30/12
to

"Michael Pohoreski" <michael....@gmail.com> wrote in message
news:4ae00e74-68b4-4a49...@googlegroups.com...
Anyway, I'm a tad bored on this.
SO LET US GET NON-BORING!

One of the problems of data transmission is ERROR CORRECTION.
Well, with most error-correction methods, you have to burn an extra 2
bits per 4 bits in play.

So what if we could bypass that ENTIRELY and use 0% extra space for CRC
coding?
How, you may ask?
Well, by using a pre-generated 100% compact bitstream as a XOR reference
mask.
Use any non-repeating sequence in binary (square root of 2 or PI or
non-integer number conversion stream or my compact prime number list or any
long pseudo-random stream).

Using the XOR reference mask, a stream of 0 bits would be the exact
equal to the non-repeating generator set upon reception. Any variance in the
bit-stream would indicate a variance from the non-repeating XOR'd output.
The costs are found in computational time in the sender & receiver's
computers. Since PI has a math trick to get the HEXadecimal digits of PI at
any location, the costs of even that are minimal.

Bellard's formula
http://en.wikipedia.org/wiki/Bellard%27s_formula

Bailey-Borwein-Plouffe formula
http://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%93Plouffe_formula

Spigot algorithm
http://en.wikipedia.org/wiki/Spigot_algorithm
A spigot algorithm is a type of algorithm used to compute the value of a
mathematical constant such as ? or e. Spigot algorithms are unique because
they do not require the total number of digits to be fixed beforehand, and
do not require the computation of several intermediate results which are
combined to produce the final result.[1] There are two kinds of spigot
algorithms: (1) those that can produce a single, arbitrary digit (also
called digit extraction algorithm); and (2) those that produce a sequence of
digits, one after the other. The Bailey-Borwein-Plouffe formula is a digit
extraction algorithm for ? which produces hexadecimal digits. A sequential
spigot algorithm for ? was produced by Stanley Rabinowitz and Stanley Wagon
(this algorithm is sometimes referred to as "the spigot algorithm for
?").[2] Spigot algorithms that produce a sequence of digits begin producing
digits and produce them continuously, rather than waiting for the entire
algorithm to finish. Spigot algorithms are known for ? and e. Spigot
algorithms typically work in a particular radix, such as hexadecimal or
binary number.



George Johnson

unread,
Sep 30, 2012, 8:47:15 AM9/30/12
to

"Thomas Richter" <th...@math.tu-berlin.de> wrote in message
news:k47q26$h8a$1...@news2.informatik.uni-stuttgart.de...
So how do you know which "Nth Prime" it is if you don't have a reference
list?
Tell me what is the 4875th "Nth Prime" without using a list or a list
lookup tool like Mathworld or another website like it. There is no such
thing as a Nth Prime unless you have a stored list somewhere, which requires
hard drive space or an Internet connection to remotely download it (which
takes computation time & bandwidth off that website).

Or, you have to sieve out all of the prior prime numbers, instantly
discard them, then arrive at your answer. Which wastes computational time
unless you store a list on your hard drive.

The next easiest way would be to use a precomputed statistical
approximation graph of the primes per a location on the number line (but
will not be as accurate as a precomputed list).

The Nth Prime Algorithm
A prime page by Andrew Booker
http://primes.utm.edu/nthprime/algorithm.php
In order to find a prime quickly, the nth prime program uses a large stored
data table to get close to the right answer first, then finishes with a
relatively short computation. To see how this works, imagine the number
line broken into bins, each of size N, i.e. the first is from 0 to N-1, the
second from N to 2N-1, etc.


Thomas Richter

unread,
Sep 30, 2012, 12:25:36 PM9/30/12
to
On 30.09.2012 14:47, George Johnson wrote:
> "Thomas Richter"<th...@math.tu-berlin.de> wrote in message
> news:k47q26$h8a$1...@news2.informatik.uni-stuttgart.de...
>> On 29.09.2012 11:19, Michael Pohoreski wrote:
>>> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>>>> "Willem" wrote in message
>>>> For the primes up to 3559 I will require
>>>> 355 * 4 = 1420 Binary Digits of memory.
>>>
>>> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n>
>>> 0, in the array. Use the bit position as in index into the array.
>>
>> Actually, you need zero bits to store it. If you want to retrieve the n'th
>> prime, just compute it.
>>
>> IOW, I'm not quite sure what you're trying to say. Yes, of course you can
>> compress the sequence of the primes by a suitable algorithm and represent
>> it more compactly. Such a suitable algorithm is the classical sieve, for
>> example, and the most compact representation of it is zero bits.
>>
>> Greetings,
>> Thomas
>
> So how do you know which "Nth Prime" it is if you don't have a reference
> list?

By writing a "decompressor" from a zero bit input, of course. And the
decompressor is the sieve algorithm. There was no complexity limitation
stated in the problem, so why does it have to be an O(N) algorithm to
"decompress" N digits?

> Tell me what is the 4875th "Nth Prime" without using a list or a list
> lookup tool like Mathworld or another website like it.

Why "without"? Why do I have it "without it"? Decompession also works
*with* an algorithm, so does it here.

> There is no such
> thing as a Nth Prime unless you have a stored list somewhere, which requires
> hard drive space or an Internet connection to remotely download it (which
> takes computation time& bandwidth off that website).

Sorry for my simple-minded platonism, but I'm fairly sure that there is
a n-th prime even without having computed it. I'm also certain that
there are infinitely many, thus there is certainly a 4875th prime.

> Or, you have to sieve out all of the prior prime numbers, instantly
> discard them, then arrive at your answer. Which wastes computational time
> unless you store a list on your hard drive.

And your point is? Decompression is always a "waste of computational
power". You could have stored the uncompressed original in first place!


0 new messages