Suppose that Bob wants to send a message of data to Alice through a
computer network, the message can be represented as a number. Both Bob
and Alice have a list of prime numbers in a table and for each such a
number they know an index in the table.
According to Goldbach's conjecture, any even number can be represented
as a sum of two primes.
where p, q are primes and f is a flag which is 1 if m is odd,
otherwise it is 0
So it would work like this:
find such p and q that p + q = m - f
and their position in the table of primes
send their positions through the network
The aim is actually not to compress all data, but knowing that Bob and
Alice have both a set of same data find an usefuk information that can
be send with less cost.
<wasei...@gmail.com> wrote:
>Suppose that Bob wants to send a message of data to Alice through a
>computer network, the message can be represented as a number. Both Bob
>and Alice have a list of prime numbers in a table and for each such a
>number they know an index in the table.
>According to Goldbach's conjecture, any even number can be represented
>as a sum of two primes.
>where p, q are primes and f is a flag which is 1 if m is odd,
>otherwise it is 0
>So it would work like this:
>find such p and q that p + q = m - f
>and their position in the table of primes
>send their positions through the network
As usual, the number of bits this sort of scheme generates is larger
than the inputs. Let's say you were compressing 16 bit blocks. There
are 6541 odd primes below 65536 (as 2 doesn't apply to this
situation). You scheme will require sending two indexes (~12.7 bits
each) plus your "odd" flag, for a ballpark of 26.3 bits for each 16
bit input block. A bit of cleverness in encoding ought to shave a
couple of bits off that, but you're still well into negative
territory.
>The aim is actually not to compress all data, but knowing that Bob and
>Alice have both a set of same data find an usefuk information that can
>be send with less cost.
That's called the model. Every useful (lossless) compression
technique has a model it applies to the input stream. To the extent
that the input stream matches the expected model, you get compression.
But when it doesn't match, you get an expansion. In some cases fixed
dictionaries can be effectively used. Unfortunately, the better the
model is at compressing data it matches, the less possible inputs will
match.
On Sunday, March 18, 2012 9:17:11 AM UTC+1, Waseihou Watashi wrote:
> Suppose that Bob wants to send a message of data to Alice through a
> computer network, the message can be represented as a number. Both Bob
> and Alice have a list of prime numbers in a table and for each such a
> number they know an index in the table.
> According to Goldbach's conjecture, any even number can be represented
> as a sum of two primes.
> where p, q are primes and f is a flag which is 1 if m is odd,
> otherwise it is 0
> So it would work like this:
> find such p and q that p + q = m - f
> and their position in the table of primes
> send their positions through the network
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an usefuk information that can
> be send with less cost.
Would it be possible to find p,q,s such that:
M = P1^2 + S1
S1 = P2^2 + S2
S2 = P3^2 + S3
...
Si = Pi+1^2 + Si+1
...
Sn is a prime number
Pi is a prime number
and the algorithm would be fast enough?
Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.
On Mar 18, 3:17 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an usefuk information that can
> be send with less cost.
Unclear.
On Mar 18, 5:19 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.
The digits of pi are often suggested as a "well known" sequence.
But I'm still unclear. Are you suggesting something like the
following?
"To read Dan Brown's latest book for free, start at the
6,744,310,499,540,213'th bit of pi and run the next million
bits through 'unzip'."
On Mar 18, 4:56 pm, Robert Wessel <robertwess...@yahoo.com> wrote:
> As usual, the number of bits this sort of scheme generates is larger
> than the inputs....
> You scheme will require sending two indexes (~12.7 bits each)...
One of the primes can *often* be very small so you can send it
second and use a coding suitable for usually-small numbers,
e.g. a variation of Elias-gamma.
But there is another reason why some expansion is inevitable
in this scheme. When there are multiple solutions to n = p+q,
the (p,q) you transmit will not only provide the information n,
but also provide the information of *which* of the several
possible (p,q) you chose to send. This information may have no
value to the receiver but nevertheless will translate to
wasted space in the compressed file.
Many compression schemes have such wastage. I wonder if anyone
knows of methods in real-world use that recover useful info
from such redundant wastage. The amount of such info easily
recovered may be small, but could still be exploited, e.g. as
digital signatures or secret channels.
On Sunday, March 18, 2012 2:59:46 PM UTC+1, James Dow Allen wrote:
> On Mar 18, 3:17 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> > The aim is actually not to compress all data, but knowing that Bob and
> > Alice have both a set of same data find an usefuk information that can
> > be send with less cost.
> Unclear.
> On Mar 18, 5:19 pm, Waseihou Watashi <wasei...@gmail.com> wrote:
> > Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.
> The digits of pi are often suggested as a "well known" sequence.
> But I'm still unclear. Are you suggesting something like the
> following?
> "To read Dan Brown's latest book for free, start at the
> 6,744,310,499,540,213'th bit of pi and run the next million
> bits through 'unzip'."
> Is yours a question about the law? :-}
> James
No, it is not about law, that is only my reason because I personally believe that there should exist absolut frdeeom of eprXesiSon and that could be accomplished only if it is not possible to find out who is the originator of a message.
Whatever, from now I will focus only on asking question related to the compression part. I know that the approach with Pi could be also possible in theory, but it seems to me better to go with prime numbers and integer arithmetics.
What is needed is that Alice and Bob share a large resource like several millions of prime numbers, and Alice want to send a compressed message to Bob which can be used to reconstruct the data. The amount of data actually needed to reconstruct the message might be several times bigger than the original message, the trick should be that compressed message would be small. We can use computing power of both Alice and Bob, they have computers and time.
For example:
1. Alice want to send message M to Bob, M is a big even integer.
2. She finds out two primes P,Q so that M = P + Q, P <= Q.
3. She make a compressed message that contains information on how to find P, Q: The information can be for examle hash(P), hash(Q). hash(X) := X mod BIG_PRIME.
The message also contains SHA-256(M) checksum, maybe some other could be used.
4. Bob will receive the message and search his hash table in an attempt to find all possible pairs of Pi Qj which he will use to reconstruct candidate soulution Mc = Pi + Qj and found all Mc's where SHA-256(Mc) = SHA-256(M). Hopefully he will found only one which is the file. So Bob does not get the full information but only a range where he should search and how to decide if the result is correct.
The trick is that SHA-256 produces output with constant size no matter how input is big and the probability of collision is VERY LOW. Of course several hashes could be chained for better verification. It is still theoretically possible that collision would happen and junk would be obtained, but that's ok as long as the probability is negligible.
Only hashes could be sent through network and because they are small, it is much easier to send them through networks like i2p, tor or hidden with steganography into some image. Those methods all provides a reasonable amount of amNonyMtmity for poster.
The problem is searching the space which has O(n^2) and of course founding the primes.
> The digits of pi are often suggested as a "well known" sequence.
> But I'm still unclear. Are you suggesting something like the
> following?
> "To read Dan Brown's latest book for free, start at the
> 6,744,310,499,540,213'th bit of pi and run the next million
> bits through 'unzip'."
> Suppose that Bob wants to send a message of data to Alice through a
> computer network, the message can be represented as a number. Both Bob
> and Alice have a list of prime numbers in a table and for each such a
> number they know an index in the table.
> According to Goldbach's conjecture, any even number can be represented
> as a sum of two primes.
Whew, there are ONLY EVEN NUMBERS in this universe? I was worried for a moment there.
M = (binary digit 0 or 1) + prime number + (escape character) + another prime number
I'm assuming that you're using the "most compact method of storing primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9, therein, storing the end result + a multiple of 10 is the most compact unit).
00 = 1
01 = 3
10 = 7
11 = 9
5-digit prime number:
(Base 10) 11027 = (Binary) 10101100010011 (14 binary digits)
(Base 10) 1102 (*10 which you can drop) + 7 =
(Binary) 10001001110 (11 binary digits + append) 10 (binary substitution of 7) = 1000100111010 (13 binary digits)
Of course this method only REALLY saves space when you're going full grid positional notation.
(Each binary '1' equals a prime, binary '0' equals a non-prime)
1110 (01,03,07,09) = 10*0 + prime end digit
1111 (11,13,17,19) = 10*1 + prime end digit
0101 (21,23,27,29) = 10*2 + prime end digit
1010 (31,33,37,39) = 10*3 + prime end digit
(1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4 binary digits)
(2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4 binary digits)
(3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary digits)
(4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary digits)
Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
(Figured this out 3 decades ago, even when I was having "learning difficulties", what were YOU doing then?)
> where p, q are primes and f is a flag which is 1 if m is odd,
> otherwise it is 0
> So it would work like this:
> find such p and q that p + q = m - f
> and their position in the table of primes
> send their positions through the network
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an useful information that can
> be send with less cost.
Prime number (base 10) 12907 + (base 10) 16993 = 29900
Prime number (base 02) 11.0010.0110.1011 + 100001001100001 = 111010011001100
12907 = 11001001101011 (14 binary digits)
16993 = 100001001100001 (15 binary digits)
29900 = 111010011001100 (15 binary digits)
You'd require 29 binary digits + 1 appended binary digit to represent (even or odd) = 30 binary digits
To then represent a number of 15 binary digit length?
(And we assume your target number is 29900 or 29911)
Making the binary digit string into: 1110100110011000 or 1110100110011001
Using the generic "Table of Primes":
http://primes.utm.edu/nthprime/ (Nth Prime)
prime(1536) = 12907 + prime(1960) =16993
(base 10) 1536 = (base 02) 11000000000 (11 binary digits)
(base 10) 1960 = (base 02) 11110101000 (11 binary digits)
Thusly you'd require at minimum 22 binary digits + 1 appended binary digit = 23 binary digits to represent a number of 15 binary digit length.
Even converting using my most compact method is rather more efficient than yours.
"Keep fucking that chicken" kid, you'd provided me with my daily mathematical laugh.
You weren't too big on the whole concept of HOMEWORK in school, were you?
> The message also contains SHA-256(M) checksum, maybe some other could be > used.
> 4. Bob will receive the message and search his hash table in an attempt to > find all possible pairs of Pi Qj which he will use to reconstruct > candidate soulution Mc = Pi + Qj and found all Mc's where SHA-256(Mc) = > SHA-256(M). Hopefully he will found only one which is the file. So Bob > does not get the full information but only a range where he should search > and how to decide if the result is correct.
> The trick is that SHA-256 produces output with constant size no matter how > input is big and the probability of collision is VERY LOW. Of course > several hashes could be chained for better verification. It is still > theoretically possible that collision would happen and junk would be > obtained, but that's ok as long as the probability is negligible.
> Only hashes could be sent through network and because they are small, it > is much easier to send them through networks like i2p, tor or hidden with > steganography into some image. Those methods all provides a reasonable > amount of amNonyMtmity for poster.
> The problem is searching the space which has O(n^2) and of course founding > the primes.
So why not have a bit more cleverness and reformat your data blocks?
1110100000001011101110011
Becomes a 5x5 grid
11101
00000
00101
11011
10011
Take the SHA-256 of that value
Rotate your grid 90 degrees
11001
01001
00101
11000
11101
1100101001001011100011101
Take the SHA-256 of that value
Since the data is identical in rotation, the SHA-256 hash collision will be very distinct between the two data sets.
You won't compress this data this way, but whom am I to stop a fool and their hobbies?
> The aim is actually not to compress all data, but knowing that Bob and
> Alice have both a set of same data find an usefuk information that can
> be send with less cost.
) "Waseihou Watashi" <wasei...@gmail.com> wrote in message ) news:6f3960da-3058-41fc-8770-0db735674034@z31g2000vbt.googlegroups.com...
)> Suppose that Bob wants to send a message of data to Alice through a
)> computer network, the message can be represented as a number. Both Bob
)> and Alice have a list of prime numbers in a table and for each such a
)> number they know an index in the table.
)>
)> According to Goldbach's conjecture, any even number can be represented
)> as a sum of two primes.
)
) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried for a ) moment there.
)
)> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture )>
)> Then the message m can be encoded in this way:
)>
)> m = p + q + f
)
) M = (binary digit 0 or 1) + prime number + (escape character) + another ) prime number
) I'm assuming that you're using the "most compact method of storing ) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9, ) therein, storing the end result + a multiple of 10 is the most compact ) unit).
No, the most compact method of storing primes is to store the ordinal
number in the sequence of primes. IOW: for the Nth prime, you store
the number N.
SaSW, Willem
-- Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
> George Johnson wrote:
> ) "Waseihou Watashi" <wasei...@gmail.com> wrote in message
> ) > news:6f3960da-3058-41fc-8770-0db735674034@z31g2000vbt.googlegroups.com...
> )> Suppose that Bob wants to send a message of data to Alice through a
> )> computer network, the message can be represented as a number. Both Bob
> )> and Alice have a list of prime numbers in a table and for each such a
> )> number they know an index in the table.
> )>
> )> According to Goldbach's conjecture, any even number can be represented
> )> as a sum of two primes.
> )
> ) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried > for a
> ) moment there.
> )
> )> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture > )>
> )> Then the message m can be encoded in this way:
> )>
> )> m = p + q + f
> )
> ) M = (binary digit 0 or 1) + prime number + (escape character) + > another
> ) prime number
> ) I'm assuming that you're using the "most compact method of storing
> ) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
> ) therein, storing the end result + a multiple of 10 is the most compact
> ) unit).
> No, the most compact method of storing primes is to store the ordinal
> number in the sequence of primes. IOW: for the Nth prime, you store
> the number N.
> SaSW, Willem
> -- > Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
Well, let's see, so far you're using WAY more binary digits than I am (plus Escape Characters)
.
I admit that I am "Cheating" a tad by not including 2 & 5 as Primes in the list as a matter of function per speed of grid list access, but I don't see you winning the long race in Compact Prime Number List Storage.
Of course this method only REALLY saves space when you're going full grid positional notation.
(Each binary '1' equals a prime, binary '0' equals a non-prime)
1110 (01,03,07,09) = 10*0 + prime end digit
1111 (11,13,17,19) = 10*1 + prime end digit
0101 (21,23,27,29) = 10*2 + prime end digit
1010 (31,33,37,39) = 10*3 + prime end digit
(1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4 binary digits)
(2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4 binary digits)
(3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary digits)
(4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary digits)
Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
(Figured this out 3 decades ago, even when I was having "learning
difficulties", what were YOU doing then?)
Very compact.
I liked this method a long time ago (despite it lacking quick random access).
Itty Bitty Problem. You have to store Escape Characters (they cost you) too.
Ah, but let's be generous.
Primes 1 to 19 (Binary Difference Storage) = 1,1,10,10,100,10,100,10
16 Binary Digits (plus Escape Characters) versus my 12 Binary Digits so far.
But let me be generous again and allow you to skip 2 & 5 as I do.
The Primes 1 to 19 (Binary Difference Storage) now is = 1,10,100,100,10,100,10
Which eats up 16 Binary Digits again (plus Escape Characters).
So let me be generous yet again.
After Prime Number 3, there are no more differences of 1 and all are multiples of 2.
So you are free to drop one trailing zero after all Prime Difference Storage locations.
The Primes 1 to 19 (Binary Difference Storage) now is = 1,1,10,10,1,10,1
Which is a much more compact list of 10 Binary Digits (plus Escape characters).
Oh, but don't let me stop you.
I believe the Escape Characters are doing that job pretty damned well already.
> George Johnson wrote:
> ) "Waseihou Watashi" <wasei...@gmail.com> wrote in message
> ) > news:6f3960da-3058-41fc-8770-0db735674034@z31g2000vbt.googlegroups.com...
> )> Suppose that Bob wants to send a message of data to Alice through a
> )> computer network, the message can be represented as a number. Both Bob
> )> and Alice have a list of prime numbers in a table and for each such a
> )> number they know an index in the table.
> )>
> )> According to Goldbach's conjecture, any even number can be represented
> )> as a sum of two primes.
> )
> ) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried > for a
> ) moment there.
> )
> )> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture > )>
> )> Then the message m can be encoded in this way:
> )>
> )> m = p + q + f
> )
> ) M = (binary digit 0 or 1) + prime number + (escape character) + > another
> ) prime number
> ) I'm assuming that you're using the "most compact method of storing
> ) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
> ) therein, storing the end result + a multiple of 10 is the most compact
> ) unit).
> No, the most compact method of storing primes is to store the ordinal
> number in the sequence of primes. IOW: for the Nth prime, you store
> the number N.
> SaSW, Willem
> -- > Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
For the primes up to 3559 I will require
355 * 4 = 1420 Binary Digits of memory.
Prime(499) = 3559
499 = Binary 1 1111 0011 compared to 3559 = Binary 110111100111
7 bits versus 12 bits, a saving of 4 bits.
What you gain in space savings you lose in computational time.
You also need a list of the Nth Primes stored in some compact form (or you will be required to recompute this list at a later time).
> George Johnson wrote:
> ) "Waseihou Watashi" <wasei...@gmail.com> wrote in message
> ) > news:6f3960da-3058-41fc-8770-0db735674034@z31g2000vbt.googlegroups.com...
> )> Suppose that Bob wants to send a message of data to Alice through a
> )> computer network, the message can be represented as a number. Both Bob
> )> and Alice have a list of prime numbers in a table and for each such a
> )> number they know an index in the table.
> )>
> )> According to Goldbach's conjecture, any even number can be represented
> )> as a sum of two primes.
> )
> ) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried > for a
> ) moment there.
> )
> )> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture > )>
> )> Then the message m can be encoded in this way:
> )>
> )> m = p + q + f
> )
> ) M = (binary digit 0 or 1) + prime number + (escape character) + > another
> ) prime number
> ) I'm assuming that you're using the "most compact method of storing
> ) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
> ) therein, storing the end result + a multiple of 10 is the most compact
> ) unit).
> No, the most compact method of storing primes is to store the ordinal
> number in the sequence of primes. IOW: for the Nth prime, you store
> the number N.
> SaSW, Willem
> -- > Disclaimer: I am in no way responsible for any of the statements
> made in the above text. For all I know I might be
> drugged or something..
> No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT
At any rate, since the amusement factor is wearing off and the belly laughs for this day are fading.
Here is a simpler question.
If you are merely storing the Nth Prime... (making a list, checking it twice, seeing whom can only be factored by itself & 1 very nice)
Prime(499) = 3559
At what data points does the costs of using your method to store numbers begin paying off as a regular deal? I am not denying your ability to compress numbers with this method, it does work in certain circumstances), but I am strongly emphasizing that you delimit your outputs to only the ranges that this method works best with. Don't forget that your ESCAPE CHARACTERS are going to cost you unless you have positional or relative notation storage tricks.
Suggestion, store your (Odd/Even) bit marker at the beginning of your "compressed" value.
Ergo:
(Odd Bit) + Prime(499) + Prime(7) =
1 + 3559 + 13 = 3573
1 + 111110011 + 111 = 1101 1111 0101
(1 bit) [ESC Char] (9 bits) [ESC Char] (3 bits) = (12 bits)
(13 bits) + [2 ESC Char] = (12 bits)
NO PAYOFF IN COMPRESSION
Just make a list of what number pair ranges WORK for Data Compression (you use less bits + Escape Chars than what would be required to just store the original number bitwise) and then we can get past my guffaws and I will help you a bit if it continues to amuse me. I will suggest your Nth Prime be considerably larger than my simple (for simple minds) examples to yourself here. If I've made any silly mistakes, well, it is not like this group really matters enough to seriously double check my work for the usual giggle-posts. I've dropped some good research directional hints to these jokers in the past and none have even lifted the lightest finger to research or follow-through, so you don't owe them anything and you will only cheat yourself if you simply give up when the mockery begins.
The usual way to form codes is with a distinguished "escape" character. Escape characters don't have anything to do with the Esckey (ASCII 27); that's just what they're called. Think about how hitting the Esc key in most programs makes the program stop its current activities and do something else. The usage here is similar.
When a run of characters is encountered, say:
lda #$1234 ;load the secret number
an RLE encoder would compress the "run" of spaces between the "1234" and the ";" by outputting the escape character, then a space, and then the number of spaces. If your escape character were ']', your output would look like:
lda #$1234] 8;load the secret number
^^^
|||--- count == 8 spaces
||--- character is a space
|--- escape character
Decoding is just the opposite of encoding. The input data is copied to the output until an escape character is hit. At that point the decoder reads the character and the count, outputs that many characters, and then resumes copying data.
That's basically it. However, there is one slight complication: what happens if the escape character appears in the input? If we just copied it to the output, the decoder would see it as the start of a run of characters, and would output garbage. The solution is to output it as a one-byte run of characters, so you'd output an escape
character (indicating start of run), another escape character (indicating the character to use), and then a one (number of characters).
Note that we can still encode runs of escape characters in the usual
way. Since we use 3 bytes to output a run, we only lose ground if we
encounter a solo escape character or two in a row. If we encounter
three escape characters in a row, we would output "]]3", which is
exactly the same size. Thus, the maximum expansion for RLE is +50% of
the original size, for a file which looks like "]a]a]a]a]a]a]a".
From this we can see that the choice of escape character is fairly
important (the value $db - high-ASCII '[' - is a popular choice). It
should be the least frequently seen character in the input. We can
make it less important by CHANGING it every time. For example, we
could add some number relatively prime to 256 (say 51) to it every
time. That way we only repeat escape characters once every 256 runs,
so a burst of '$db's won't screw us up. So long as both the encoder
and the decoder adjust the escape value after each run, there's no
chance of confusion.
Note that this does make the data more susceptible to transmission
errors. Text compressed with RLE is still more or less readable. If
some error (say, modem line noise) caused the decoder to become out of
sync with the encoder, it would look for the wrong escape character
and would start spitting out garbage periodically.
(NOTE: I don't know if 51 is a good choice or not. Seems nice
enough.)
-=*=-
Note that it isn't necessary to do the encoding exactly as described.
Some routines reverse the order of character and escape code; it may
make the encoder or decoder simpler in certain situations. It's also
possible to avoid using an escape character altogether by assuming
that two identical characters will be followed by a length byte. For
example, if the input were:
abcDDDDefGGhi
the output would be:
abcDD2efGG0hi
^ ^
| |--- zero Gs follow
|--- two more Ds follow
indicating that there are two more 'D's, for a total of four. If there are only two characters, as with the 'G's, an extra 0 must be added. So the worst case for this scheme is a series of double characters (like "aabbccdd"), which must be encoded in three bytes ("aa0bb0cc0dd0"). Again, this gives a maximum increase of +50%, so we aren't gaining anything. If the input text contains more occurrences of double characters than occurrences of the escape character, we will do worse.
-=*=-
A few efficiency notes...
Since we represent runs with three bytes, there is no reason to represent runs of two or three bytes specially (in fact, specifying a run of two bytes would be a loss). In terms of decompression speed, the overhead involved in outputting a run of three characters is higher than that for just copying three characters to the output, so RLE encoders should only encode runs of four or more.
Whenever we decode a run, we can assume that there will be at least one copy of that character in the output (why else would we be doing it?). Thus the sequence "escape|char|0" has no meaning; we just told the decompressor that there was a run of 0 bytes, so please get on the ball and output all zero of those bytes right this very minute.
We ought to change the meaning slightly, so that the length byte is one LESS than the length of the run. Now a length of zero represents one byte, and 255 represents a run of 256. In fact, if it weren't for the special handling needed for the escape character, we could treat the length byte as length-4.
(And if you want to get really confusing, you could handle the length byte for runs of escape characters specially. But the extra overhead involved is probably not worth the effort.)
> George Johnson wrote:
> ) "Waseihou Watashi" <wasei...@gmail.com> wrote in message
> ) > news:6f3960da-3058-41fc-8770-0db735674034@z31g2000vbt.googlegroups.com...
> )> Suppose that Bob wants to send a message of data to Alice through a
> )> computer network, the message can be represented as a number. Both Bob
> )> and Alice have a list of prime numbers in a table and for each such a
> )> number they know an index in the table.
> )>
> )> According to Goldbach's conjecture, any even number can be represented
> )> as a sum of two primes.
> )
> ) Whew, there are ONLY EVEN NUMBERS in this universe? I was worried > for a
> ) moment there.
> )
> )> http://en.wikipedia.org/wiki/Goldbach%27s_conjecture > )>
> )> Then the message m can be encoded in this way:
> )>
> )> m = p + q + f
> )
> ) M = (binary digit 0 or 1) + prime number + (escape character) + > another
> ) prime number
> ) I'm assuming that you're using the "most compact method of storing
> ) primes" (all primes except 2 & 5, in Base 10 end in 1 or 3 or 7 or 9,
> ) therein, storing the end result + a multiple of 10 is the most compact
> ) unit).
> No, the most compact method of storing primes is to store the ordinal
> number in the sequence of primes. IOW: for the Nth prime, you store
> the number N.
> SaSW, Willem
Yawn, and one more thing.
Since we are worrying about BITWISE lengths of:
0/1+Prime(N)+Prime(M) < [Target Number]
(Binary digit 0 or 1) [ESC] Prime(N) [ESC] Prime(M) < [Target Number]
(Base 10) Prime(N) or Prime(M) = 1,3,7,15,31, 63, ...
(Base 02) Prime(N) or Prime(M) = 1, 11, 111, 1111, 11111, 111111, ...
Thus T=1 to infinity; with the formula (2 ^ T) -1
Because frankly you are trying to reduce the binary string length of 0/1+Prime(N)+Prime(M) to be less than your [Target Number] or no compression is possible.
So do a range search of Prime( (2^T) - 1) until your bitwise summation is smaller than your target number bitwise.
(Even/Odd bit) + Prime(63) + Prime(15) = 355
1 [ESC] 111111 [ESC] 1111 = 101100011
11 bits (2 ESC chars) > 9 bits = NO SAVINGS
1+ 307 + 47 = 355
I will suggest against strongly that you consider your search ranges in the higher zones.
(2 ^ T) -1 with T = 56
(2 ^ 56) -1 = 72057594037927935
But I don't think I'm smarter, I just do the stupid homework. If everyone else JUST DID THE STUPID HOMEWORK, they could move up? a grade and get pounded, too!
On Sunday, March 18, 2012 3:19:22 AM UTC-7, Waseihou Watashi wrote:
> On Sunday, March 18, 2012 9:17:11 AM UTC+1, Waseihou Watashi wrote:
> > Suppose that Bob wants to send a message of data to Alice through a
> > computer network, the message can be represented as a number. Both Bob
> > and Alice have a list of prime numbers in a table and for each such a
> > number they know an index in the table.
> > According to Goldbach's conjecture, any even number can be represented
> > as a sum of two primes.
> > Then the message m can be encoded in this way:
> > m = p + q + f
> > where p, q are primes and f is a flag which is 1 if m is odd,
> > otherwise it is 0
> > So it would work like this:
> > find such p and q that p + q = m - f
> > and their position in the table of primes
> > send their positions through the network
> > The aim is actually not to compress all data, but knowing that Bob and
> > Alice have both a set of same data find an usefuk information that can
> > be send with less cost.
> Would it be possible to find p,q,s such that:
> M = P1^2 + S1
> S1 = P2^2 + S2
> S2 = P3^2 + S3
> ...
> Si = Pi+1^2 + Si+1
> ...
> Sn is a prime number
> Pi is a prime number
> and the algorithm would be fast enough?
> Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.
Um, If there is a dataset such that one can simply index that to obtain some data it woudl require the size of the data plus at least one bit
On Sunday, March 18, 2012 3:19:22 AM UTC-7, Waseihou Watashi wrote:
> On Sunday, March 18, 2012 9:17:11 AM UTC+1, Waseihou Watashi wrote:
> > Suppose that Bob wants to send a message of data to Alice through a
> > computer network, the message can be represented as a number. Both Bob
> > and Alice have a list of prime numbers in a table and for each such a
> > number they know an index in the table.
> > According to Goldbach's conjecture, any even number can be represented
> > as a sum of two primes.
> > Then the message m can be encoded in this way:
> > m = p + q + f
> > where p, q are primes and f is a flag which is 1 if m is odd,
> > otherwise it is 0
> > So it would work like this:
> > find such p and q that p + q = m - f
> > and their position in the table of primes
> > send their positions through the network
> > The aim is actually not to compress all data, but knowing that Bob and
> > Alice have both a set of same data find an usefuk information that can
> > be send with less cost.
> Would it be possible to find p,q,s such that:
> M = P1^2 + S1
> S1 = P2^2 + S2
> S2 = P3^2 + S3
> ...
> Si = Pi+1^2 + Si+1
> ...
> Sn is a prime number
> Pi is a prime number
> and the algorithm would be fast enough?
> Maybe I actually want a method how to construct an effective algorithm for a given block of data that compress it or at least represent it and it uses only primers or other numbers from well known sequences so that those numbers cannot be subjected to any patent or copyright.
Um, this is a simple conversation with no qualifiers and no proof. I do not want to fight. I have nothing to defend.
As far as I can tell it is possible to do something similar to what you suggest.
I think you are describing an Indirect Transfer Protocol. Such that Alice gets a data and from that she constructs the message. But the actual message isn't the data she gets.
I just asked if anyone on BitTorrent was interested in an Indirect Transfer Protocol.
No one was interested so I am moving on to other projects.
You have to admit that if we can download the instructions to construct a file and not traffic in the actual data there are legal aspects. Perhaps more for China and such than the USA. I admit I'd like to see more freedom for many of this world.
Anyway.. This is a strictly opinion-based reply; I do see it. I do know how I would construct such a thing.
On Monday, March 19, 2012 8:15:04 PM UTC-7, George Johnson wrote:
> "Willem" wrote in message > Prime(1) = 1 (Binary 1)
> Prime(2) = 2 (Binary 10)
> Prime(3) = 3 (Binary 11)
> Prime(4) = 5 (Binary 101)
> Prime(5) = 7 (Binary 111)
> Prime(6) = 11 (Binary 1011)
> Prime(7) = 13 (Binary 1101)
> Well, let's see, so far you're using WAY more binary digits than I am > (plus Escape Characters)
> I admit that I am "Cheating" a tad by not including 2 & 5 as Primes in > the list as a matter of function per speed of grid list access, but I don't > see you winning the long race in Compact Prime Number List Storage.
> Of course this method only REALLY saves space when you're going full > grid positional notation.
> (Each binary '1' equals a prime, binary '0' equals a non-prime)
> 1110 (01,03,07,09) = 10*0 + prime end digit
> 1111 (11,13,17,19) = 10*1 + prime end digit
> 0101 (21,23,27,29) = 10*2 + prime end digit
> 1010 (31,33,37,39) = 10*3 + prime end digit
> (1st line = 1, 11, 111 = 6 binary digits + 3 escape characters becomes 4 > binary digits)
> (2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes into 4 > binary digits)
> (3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary > digits)
> (4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary > digits)
> Thusly 45 binary digits + 7 escape characters becomes 16 binary digits.
> (Figured this out 3 decades ago, even when I was having "learning
> difficulties", what were YOU doing then?)
Aside from the bug where you have 1 marked as prime, you've just described a compact version of the Fundamental Prime Constant that tells if any number is prime:
Sum( i = 1, inf, if IsPrime(i) then 2^-i else 0 ) = 0.41468250985111166024810962215430770836577423813791697786824541448864096.. .
i.e.
i = decimal place after the decimal point.
so....
0.0110101...
........^ 7 is prime, so the 7th bit after the decimal point is set
......^ 5 is prime, so the 5th bit after the decimal point is set
....^ 3 is prime, so the 3rd bit after the decimal point is set
...^ 2 is prime, so the 2nd bit after the decimal point is set
..^ 1 is not prime, so the 1st bit after the decimal is NOT set
In practice you multiply the constant by 4 and subtract 1 (to remove storing '1', '2') and you would also remove every other zero since the constant is also storing the fact that every even number is redundantly stored as a 0 in every other bit (that is, every even bit position.)
Removing all the even numbers ...
With 45 bits we can test numbers up to 91
0.111011011010011001011010010011001011001010010
.......^ 13 yes
......^ 11 yes
.....^ 9 no
....^ 7 yes
...^ 5 yes
..^ 3 yes
This sequence has the form:
n = bit_position*2 + 3
i.e. IsPrime_TableOff( 13 )
bit_position = (13-3)/2 = 5, bits[5] = true; thus 13 is prime
However, this still not a very optimal storage scheme since we are *still* storing redundant zeroes.
We can do better since *every* prime > 5 is ALWAYS of the form: 6n+/-1
Thus with 44 bits we can test numbers up to 6*(46/2) = 134 which is a ~50% increase in search space!
0.111111101101111010110111011010011111100001101100
.........^ 25 no
........^ 23 yes
.......^ 19 yes
......^ 17 yes
.....^13 yes
....^11 yes
...^7 yes
..^ 5 yes
On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
> "Willem" wrote in message > For the primes up to 3559 I will require
> 355 * 4 = 1420 Binary Digits of memory.
You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n > 0, in the array. Use the bit position as in index into the array.
> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>> "Willem" wrote in message
>> For the primes up to 3559 I will require
>> 355 * 4 = 1420 Binary Digits of memory.
> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n> 0, in the array. Use the bit position as in index into the array.
Actually, you need zero bits to store it. If you want to retrieve the n'th prime, just compute it.
IOW, I'm not quite sure what you're trying to say. Yes, of course you can compress the sequence of the primes by a suitable algorithm and represent it more compactly. Such a suitable algorithm is the classical sieve, for example, and the most compact representation of it is zero bits.
> On Monday, March 19, 2012 8:15:04 PM UTC-7, George Johnson wrote:
>> "Willem" wrote in message
>> Prime(1) = 1 (Binary 1)
>> Prime(2) = 2 (Binary 10)
>> Prime(3) = 3 (Binary 11)
>> Prime(4) = 5 (Binary 101)
>> Prime(5) = 7 (Binary 111)
>> Prime(6) = 11 (Binary 1011)
>> Prime(7) = 13 (Binary 1101)
>> Well, let's see, so far you're using WAY more binary digits than I am
>> (plus Escape Characters)
>> I admit that I am "Cheating" a tad by not including 2 & 5 as Primes >> in
>> the list as a matter of function per speed of grid list access, but I >> don't
>> see you winning the long race in Compact Prime Number List Storage.
>> Of course this method only REALLY saves space when you're going full
>> grid positional notation.
>> (Each binary '1' equals a prime, binary '0' equals a non-prime)
>> 1110 (01,03,07,09) = 10*0 + prime end digit
>> 1111 (11,13,17,19) = 10*1 + prime end digit
>> 0101 (21,23,27,29) = 10*2 + prime end digit
>> 1010 (31,33,37,39) = 10*3 + prime end digit
>> (1st line = 1, 11, 111 = 6 binary digits + 3 escape characters >> becomes 4
>> binary digits)
>> (2nd line = 1011, 1101, 10001, 10011 = 18 binary digits + escapes >> into 4
>> binary digits)
>> (3rd line = 10111, 11101 = 10 binary digits + escapes into 4 binary
>> digits)
>> (4th line = 11111, 100111 = 11 binary digits + escapes into 4 binary
>> digits)
>> Thusly 45 binary digits + 7 escape characters becomes 16 binary >> digits.
>> (Figured this out 3 decades ago, even when I was having "learning
>> difficulties", what were YOU doing then?)
> Aside from the bug where you have 1 marked as prime, you've just described > a compact version of the Fundamental Prime Constant that tells if any > number is prime:
> Sum( i = 1, inf, if IsPrime(i) then 2^-i else 0 )
> = > 0.41468250985111166024810962215430770836577423813791697786824541448864096.. .
> i.e.
> i = decimal place after the decimal point.
> so....
> 0.0110101...
> ........^ 7 is prime, so the 7th bit after the decimal point is set
> ......^ 5 is prime, so the 5th bit after the decimal point is set
> ....^ 3 is prime, so the 3rd bit after the decimal point is set
> ...^ 2 is prime, so the 2nd bit after the decimal point is set
> ..^ 1 is not prime, so the 1st bit after the decimal is NOT set
> In practice you multiply the constant by 4 and subtract 1 (to remove > storing '1', '2') and you would also remove every other zero since the > constant is also storing the fact that every even number is redundantly > stored as a 0 in every other bit (that is, every even bit position.)
> Removing all the even numbers ...
> With 45 bits we can test numbers up to 91
> 0.111011011010011001011010010011001011001010010
> .......^ 13 yes
> ......^ 11 yes
> .....^ 9 no
> ....^ 7 yes
> ...^ 5 yes
> ..^ 3 yes
> This sequence has the form:
> n = bit_position*2 + 3
> i.e. IsPrime_TableOff( 13 )
> bit_position = (13-3)/2 = 5, bits[5] = true; thus 13 is prime
> However, this still not a very optimal storage scheme since we are *still* > storing redundant zeroes.
> We can do better since *every* prime > 5 is ALWAYS of the form: 6n+/-1
> Thus with 44 bits we can test numbers up to 6*(46/2) = 134 which is a ~50% > increase in search space!
> 0.111111101101111010110111011010011111100001101100
> .........^ 25 no
> ........^ 23 yes
> .......^ 19 yes
> ......^ 17 yes
> .....^13 yes
> ....^11 yes
> ...^7 yes
> ..^ 5 yes
> IsPrime_Table6n( n )
> if (n < 7) return true if n is 2, 3, 5, else false
> if n is even return false
> add1 = (n+1) mod 6
> sub1 = (n-1) mod 6
> if (add1 is zero) or (sub1 is zero)
> bit_position = floor( n/3 ) - 1
> else
> return false
> n_is_prime = bits[ bit_position ];
> Fast, and compact. ;-)
> Michael
It is still much less efficient in space storage versus using 4 bits to store a chunk of 4 potential primes per ever 10 increment. The binary difference storage method saves some space at the outset, but wastes space every time from the jump of 3 to 7.
Easily proven with the below routine.
N*10 + {ending in 1,3,7,9} (binary digit 0 = not prime; binary digit 1=prime)
{1,3,7,9} gaps:
11-9=2
3-1=2
7-3=4
9-7=2
Let's say you have a set of 100% primes, then your gap list (divided by 2 after it reaches 5) will always have a gap at the 3 to 7 jump.
Hypothetical all primes string in binary: 1101111011110111101111011...
As each prime that ends in 3 jumps to a prime ending in 7, then it has an extra 01 jump.
> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>> "Willem" wrote in message
>> For the primes up to 3559 I will require
>> 355 * 4 = 1420 Binary Digits of memory.
> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n > 0, > in the array. Use the bit position as in index into the array.
> See my post below for details on how to convert n to bit_position.
Anyway, I'm a tad bored on this.
SO LET US GET NON-BORING!
One of the problems of data transmission is ERROR CORRECTION.
Well, with most error-correction methods, you have to burn an extra 2 bits per 4 bits in play.
So what if we could bypass that ENTIRELY and use 0% extra space for CRC coding?
How, you may ask?
Well, by using a pre-generated 100% compact bitstream as a XOR reference mask.
Use any non-repeating sequence in binary (square root of 2 or PI or non-integer number conversion stream or my compact prime number list or any long pseudo-random stream).
Using the XOR reference mask, a stream of 0 bits would be the exact equal to the non-repeating generator set upon reception. Any variance in the bit-stream would indicate a variance from the non-repeating XOR'd output. The costs are found in computational time in the sender & receiver's computers. Since PI has a math trick to get the HEXadecimal digits of PI at any location, the costs of even that are minimal.
Spigot algorithm
http://en.wikipedia.org/wiki/Spigot_algorithm A spigot algorithm is a type of algorithm used to compute the value of a mathematical constant such as ? or e. Spigot algorithms are unique because they do not require the total number of digits to be fixed beforehand, and do not require the computation of several intermediate results which are combined to produce the final result.[1] There are two kinds of spigot algorithms: (1) those that can produce a single, arbitrary digit (also called digit extraction algorithm); and (2) those that produce a sequence of digits, one after the other. The Bailey-Borwein-Plouffe formula is a digit extraction algorithm for ? which produces hexadecimal digits. A sequential spigot algorithm for ? was produced by Stanley Rabinowitz and Stanley Wagon (this algorithm is sometimes referred to as "the spigot algorithm for ?").[2] Spigot algorithms that produce a sequence of digits begin producing digits and produce them continuously, rather than waiting for the entire algorithm to finish. Spigot algorithms are known for ? and e. Spigot algorithms typically work in a particular radix, such as hexadecimal or binary number.
> On 29.09.2012 11:19, Michael Pohoreski wrote:
>> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>>> "Willem" wrote in message
>>> For the primes up to 3559 I will require
>>> 355 * 4 = 1420 Binary Digits of memory.
>> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n> >> 0, in the array. Use the bit position as in index into the array.
> Actually, you need zero bits to store it. If you want to retrieve the n'th > prime, just compute it.
> IOW, I'm not quite sure what you're trying to say. Yes, of course you can > compress the sequence of the primes by a suitable algorithm and represent > it more compactly. Such a suitable algorithm is the classical sieve, for > example, and the most compact representation of it is zero bits.
> Greetings,
> Thomas
So how do you know which "Nth Prime" it is if you don't have a reference list?
Tell me what is the 4875th "Nth Prime" without using a list or a list lookup tool like Mathworld or another website like it. There is no such thing as a Nth Prime unless you have a stored list somewhere, which requires hard drive space or an Internet connection to remotely download it (which takes computation time & bandwidth off that website).
Or, you have to sieve out all of the prior prime numbers, instantly discard them, then arrive at your answer. Which wastes computational time unless you store a list on your hard drive.
The next easiest way would be to use a precomputed statistical approximation graph of the primes per a location on the number line (but will not be as accurate as a precomputed list).
The Nth Prime Algorithm
A prime page by Andrew Booker
http://primes.utm.edu/nthprime/algorithm.php In order to find a prime quickly, the nth prime program uses a large stored data table to get close to the right answer first, then finishes with a relatively short computation. To see how this works, imagine the number line broken into bins, each of size N, i.e. the first is from 0 to N-1, the second from N to 2N-1, etc.
> "Thomas Richter"<t...@math.tu-berlin.de> wrote in message
> news:k47q26$h8a$1@news2.informatik.uni-stuttgart.de...
>> On 29.09.2012 11:19, Michael Pohoreski wrote:
>>> On Monday, March 19, 2012 8:36:50 PM UTC-7, George Johnson wrote:
>>>> "Willem" wrote in message
>>>> For the primes up to 3559 I will require
>>>> 355 * 4 = 1420 Binary Digits of memory.
>>> You only need 1186 bits if you store the pairs 6n-1 and 6n+1, where n>
>>> 0, in the array. Use the bit position as in index into the array.
>> Actually, you need zero bits to store it. If you want to retrieve the n'th
>> prime, just compute it.
>> IOW, I'm not quite sure what you're trying to say. Yes, of course you can
>> compress the sequence of the primes by a suitable algorithm and represent
>> it more compactly. Such a suitable algorithm is the classical sieve, for
>> example, and the most compact representation of it is zero bits.
>> Greetings,
>> Thomas
> So how do you know which "Nth Prime" it is if you don't have a reference
> list?
By writing a "decompressor" from a zero bit input, of course. And the decompressor is the sieve algorithm. There was no complexity limitation stated in the problem, so why does it have to be an O(N) algorithm to "decompress" N digits?
> Tell me what is the 4875th "Nth Prime" without using a list or a list
> lookup tool like Mathworld or another website like it.
Why "without"? Why do I have it "without it"? Decompession also works *with* an algorithm, so does it here.
> There is no such
> thing as a Nth Prime unless you have a stored list somewhere, which requires
> hard drive space or an Internet connection to remotely download it (which
> takes computation time& bandwidth off that website).
Sorry for my simple-minded platonism, but I'm fairly sure that there is a n-th prime even without having computed it. I'm also certain that there are infinitely many, thus there is certainly a 4875th prime.
> Or, you have to sieve out all of the prior prime numbers, instantly
> discard them, then arrive at your answer. Which wastes computational time
> unless you store a list on your hard drive.
And your point is? Decompression is always a "waste of computational power". You could have stored the uncompressed original in first place!