http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin
It is 415,241 bytes long. It is a binary representation of a million digit
number generated all the way back in 1955 by people at the RAND corporation
with lots of time and money.
My open challenge is simple:
Send me a program whose size is less than 415,241 bytes that creates an
exact duplicate of this file.
If you're looking for huge financial rewards, I will put up $100. Perhaps
Matt Mahoney will throw in the $100 he mentioned earlier, bringing the
jackpot up to $200. Of course, the cash reward is nothing compared to the
fame that this accomplishment would generate.
The only catch is that we will have to openly negotiate the final terms on
how the ground rules for the demonstration, using this forum. If you've
really blown the doors off the counting theorem, I don't think we'll have a
problem. The only reason for that open negotiation would be to make sure
there was no cheating - the program should be able to generate the file
using only its own resources. Built-in language capabilities, run-time
library resources, things like that are obviously acceptable, even if they
help you get around the size problems. Moving data to file names or
attributes is obviously not.
Just as a example, if you send me Java source plus a data file that could
run without a network connection, I certainly think everyone here would
gladly accept that as a valid entry.
If you sent an executable that ran on a Linux system with no network
connection, I would guess that would probably pass muster as well.
Over the past 10 years, I've seen a lot of people make public claims that
meant they could easily pass this challenge. People like Jules have
certainly led me to believe that they have already solved the problem, and
would be happy to demonstrate under just the right circumstances.
The bar really isn't that high, either. I think I could generate a program
that does this in perhaps 415,261 bytes - only 20 away from success!
Well, this is a pretty decent way to prove it. Do so, slink away with your
tail between your legs.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
> If you're looking for huge financial rewards, I will put up $100. Perhaps
> Matt Mahoney will throw in the $100 he mentioned earlier, bringing the
> jackpot up to $200. Of course, the cash reward is nothing compared to the
> fame that this accomplishment would generate.
Now Mark, do you really believe that $200 qualifies as huge financial
rewards?
(Don't they pay you at Cisco?)
First, get CISCO or someone else to put up real money. Second, I
would agree to use Leo or the fellow who started the Compression Test
Archives. (Sorry, I forgot his name -- and he's a great guy, too.)
But Leo's formula needs some work. It shouldn't be linear, but
square-term at least. Come on!, you guys should all know that. So I
want a judge I trust, and I want it done on the Cape. Cape Cod. I'll
pay for the hotel conference area and security. Who knows, I might
even spring for lunch. (Leo or someone else I approve of get's money
to fly in.)
I have a neat little method that should be just enough to crunch that
file. I will need a few days to play.
And Mark, you have a history of making rude remarks. If you can
manage not to now, I'll reward you by using an "open" program for the
demonstration, a compression system that any observer will clearly
understand how it works as it's used. (I no longer care about
compression. Predicting is a whole lot more fun.)
So go find a funding source. To give you a little something to
interest the folks you speak to, here are some stock prices for
tomorrow.
These are the OPEN, HIGH, LOW, and CLOSE'ing prices for these stocks
tomorrow (Monday, 6/7). Tuesday you can check how well I did using
finance.yahoo.com
SPY $113.15 $113.88 $112.35 $113.23
BRKB $2971.97 $2984.05 $2948.59 $2968.67
QQQ $36.22 $36.55 $35.82 $36.26
BRKA $89140.43 $89506.66 $88571.97 $88954.61
INTC $28.10 $28.50 $27.81 $28.18
MSFT $26.01 $26.25 $25.76 $26.02
CSCO $22.85 $23.28 $22.54 $22.78
IWM $113.56 $114.76 $112.09 $113.03
DIA $102.82 $103.47 $102.19 $102.81
RIMM $115.11 $117.67 $113.08 $115.69
SMH $37.51 $38.17 $37.16 $37.57
EBAY $88.52 $90.30 $87.87 $89.29
C $46.63 $47.02 $46.21 $46.57
AMAT $18.76 $19.09 $18.49 $18.77
YHOO $31.89 $32.55 $31.26 $31.79
DELL $35.12 $35.53 $34.79 $35.21
GE $31.33 $31.67 $31.10 $31.27
QCOM $67.51 $68.52 $66.82 $67.44
Are some prices wrong? You bet! What matters is how close they are
and whether one can trade profitably using these numbers. (I do most
every day.)
For the record, Mark has repeatedly told people that my predictions
are junk -- haven't you Mark. But still, several people use these
numbers (well, we do change the numbers every day!) to trade. I'm not
an expert trader (I use someone who really is quite good,) but from
the above I think I'd try to short QCOM, looking for most of the
almost $2 profit (I'd be happy with 2%) from the difference of the
HIGH - LOW.
But that's just me. Your milage may vary. And don't take my word for
any of this; check the results on Tuesday.
---------------------------
"Mark Nelson" <ma...@ieee.org> wrote in message news:<gGQwc.61668$Ly.49932@attbi_s01>...
Assuming the claim verification protocol is satisfactory, which I'm sure
it will be, you can include me too.
However, I would reserve the right to remain an objectionable old git on
Usenet when in dialogue with crackpot claimants.
Phil
--
1st bug in MS win2k source code found after 20 minutes: scanline.cpp
2nd and 3rd bug found after 10 more minutes: gethost.c
Both non-exploitable. (The 2nd/3rd ones might be, depending on the CRTL)
> ... the program should be able to generate the file
> using only its own resources. Built-in language capabilities, run-time
> library resources, things like that are obviously acceptable, even if they
> help you get around the size problems. Moving data to file names or
> attributes is obviously not.
So if a program uses libgmp.so, a shared library version of the GNU multiple
precision arithmetic library, then how many of its bytes count against
the limit of 415,241 bytes?
-----
$ rpm -qa | grep gmp
gmp-4.1.2-2
$ ls -l /usr/lib/libgmp.so.3.3.2
-rwxr-xr-x 1 root root 179304 Jan 24 2003 /usr/lib/libgmp.so.3.3.2
$ size /usr/lib/libgmp.so.3.3.2
text data bss dec hex filename
176574 1412 28 178014 2b75e /usr/lib/libgmp.so.3.3.2
$
--
>I have a file that has been posted on my web site for quite some time:
>
>http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin
>
>It is 415,241 bytes long. It is a binary representation of a million digit
>number generated all the way back in 1955 by people at the RAND corporation
>with lots of time and money.
>
>My open challenge is simple:
>
>Send me a program whose size is less than 415,241 bytes that creates an
>exact duplicate of this file.
>
>If you're looking for huge financial rewards, I will put up $100. Perhaps
>Matt Mahoney will throw in the $100 he mentioned earlier, bringing the
>jackpot up to $200. Of course, the cash reward is nothing compared to the
>fame that this accomplishment would generate.
Will attempts and results be published in public or are they only accessible to
people involved in the bet? (And if they are public where can we expect them to
be published?)
Here's how I would score it:
If libgmp is used only to compress/decompress a single challenge number,
then it ought to be counted in its entirety. If it's used to compress/
decompress a hundred different challenge numbers, then only 1/100th of
it counts. As long as you would in principle be prepared to tackle
any of 2000000 or more different compression challenges, then it would
count zero.
In effect, only that which is specifically required for each challenge
is counted. Ship off all common code into pcode-interpreters, perl
automatic-code-generators, .dll's, .so's, and what-have-you, so that
it can be discounted. i.e. even libz and libbz2 are for free, as you
only need them once for any number of challenges.
Here is why. First I am incredibly busy now. (No, I am not playing
-- I am swamped with work.)
But second, and really critical to this agenda is this: I have no
idea about building load modules and self-extracting .exe's, etc. I
simply can't perform if expected to build a program that is
self-extracting. Sorry. Yes I used to build code-generators, but
today all of my tools are setup for different kinds of projects.
I do (what appear to be) SEND-side, CHANNEL, and RCVE-side systems.
That's not exactly right, but that is the block diagram that is
standard in this industry.
Also, if I do something, I will only do it for serious bucks. I
haven't changed, and in fact, Mark has been around to recall our
initial conversations.
But I am willing, with a little notice, to do this for money, if I can
do a straight SEND-CHANNEL-RCVE compression/decompression tour.
--jg
PS: The two folks who I'd accept are Leo and Gilchrist (see, I
remembered the name.)
SuperFly <no.v...@email.com> wrote in message news:<un89c01ma58b6sj5e...@4ax.com>...
I would allow any format allowed by the Calgary challenge,
http://mailcom.com/challenge/
Source or executable plus the compressed data would have to be packed into
an archive smaller than 415,241 bytes. I don't think the use of common
shared libraries would be a problem.
-- Matt Mahoney
Not exactly. My contention is that your stock picks work in the same sense
that Maxwell's Demon works. Watch the atoms zipping past and just grab the
fast ones, you win.
For example, your instructions might be to buy FOOB tomorrow and sell it if
it goes up 1/4 point. If the stock opened up 33 cents and then plummeted to
close off $2, you would normally count that as a win - all I had to do was
execute perfectly and I would have made the 25 cent profit.
My belief is that you can't count this as a win.
But I'm not going to trash your stock picking - I don't think it will work,
(and if it did, why bother selling your analysis? Just sit back and rake in
the profits, dude.) But you could be right, and if you are, you go boy!
Back to the topic on hand, though, I'm glad to hear you feel you can beat
the challenge. Once you have the program working, let's find a way to prove
it. Remember, you don't have to give away the store to win, you can take
your secret to the grave if you wish.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Jules Gilbert" <jul...@financier.com> wrote in message
news:9c7d0936.04060...@posting.google.com...
My humble opinion is that it doesn't count at all. It's just general purpose
computational code, and so I think we happily just treat it as an
off-the-shelf CPU.
I know that this isn't entirely logical, which makes the whole challenge
notion a little squirrely. The problem is of course that what we want to do
is draw a clear-cut line of demarcation between program and data in the
decompressor. If someone has a program that can compress that random data
file by 20%, it is remarkable whether the program is 100 bytes or 10 GBytes,
right?
So there are two ways to do this. One is to challenge someone to compress
*any* file, then hold off on presenting the file until the challenge is set.
The second is to do what I have done, which is to simply avoid the problem
by not trying to separate the program and compressed data.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"John Reiser" <jre...@BitWagon.com> wrote in message
news:ca1uu...@enews1.newsguy.com...
Agreed, exactly! A little messy to pin down, but in principle we want to let
people have as many visits to the general-purpose code buffet they like.
So if someone were to truly achieve this (clearly impossible) feat, they
should be able to repeat it on the same file after it has been encrypted
using a suitably up-to-date algorithm - demonstrating that no a priori
knowledge of the file contents are hidden in the code or libraries.
But that's only for the victory lap - meeting the original challenge is more
than enough for right now.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Phil Carmody" <thefatphi...@yahoo.co.uk> wrote in message
news:877juj1...@nonospaz.fatphil.org...
Well, the beauty of it is that anyone attempting to meet this challenge can
clearly figure out in advance if they cut the mustard. There are of course
possible glitches related to platform incompatibilities and so on, but
reasonable people ought to be able to work those out. I'll just say that the
goal here is not to humiliate someone who creates a fouled up entry.
All I would ask in exchange for the $100 prize is that the conditions of the
test be clearly identified in advance, and maybe that the winner be
identified. If the entrant wants to keep their crown jewels totally secret,
I have no objections to finding a way to do that.
And just to clarify things a bit, let me state what the goal of this
challenge is. I want people who make incredible claims to have a clearly
defined test that allows me and the rest of the world to say "Put up or shut
up." It's a really simple test, and I'll bend over backwards to find a way
to let someone meet it.
Without additional comment, let's all note that Jules Gilbert is the first
to publicly duck out.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"SuperFly" <no.v...@email.com> wrote in message
news:un89c01ma58b6sj5e...@4ax.com...
>> Will attempts and results be published in public or are they only
>accessible to
>> people involved in the bet? (And if they are public where can we expect
>them to
>> be published?)
>
>Well, the beauty of it is that anyone attempting to meet this challenge can
>clearly figure out in advance if they cut the mustard. There are of course
>possible glitches related to platform incompatibilities and so on, but
>reasonable people ought to be able to work those out. I'll just say that the
>goal here is not to humiliate someone who creates a fouled up entry.
>
>All I would ask in exchange for the $100 prize is that the conditions of the
>test be clearly identified in advance, and maybe that the winner be
>identified. If the entrant wants to keep their crown jewels totally secret,
>I have no objections to finding a way to do that.
>
>And just to clarify things a bit, let me state what the goal of this
>challenge is. I want people who make incredible claims to have a clearly
>defined test that allows me and the rest of the world to say "Put up or shut
>up." It's a really simple test, and I'll bend over backwards to find a way
>to let someone meet it.
If the rest of the world doesn't know what is being tested and what
the results are, this test really doesn't prove anything to them. It
would only prove something to you.
Modify the rules. Let me deliver (well, show) a state machine, coded
in C, which when run produces an approximation of the million digit
file. And the differences (comparing the original to the
approximation) when compressed will produce a much smaller file.
THIS I CAN DO.
I will show the C code, compile the code after including a small
control routine, execute it, produce the approximation, compare this
approimation witht the original to produce a difference file using
commonly available materials freely available on the net, and show a
much smaller file than the roughly 400k file the process starts with.
Surely either Leo or Gilchrist will, on your pre-payment to them
personally, come to Boston and observe this. And after observing the
above, will hand me a check for say, $25k.
Now earlier I declared my willingness to do using "open" materials,
showing the core details of my methods if you could simply behave
yourself during these negotiations. I see that this was asking too
much of you.
Okay, then. Yes I will do the above -- but the observer will be
required to sign a tough non-disclosure and this community of
compression enthusiasts will have to content themselves with examining
my past disclosures.
Do this again *Mark* and I will cancel -- this means little to me and
I do not respect you or the people here, save I would like each of you
to realize how foolish you are. (Here I have set myself a task that
is probably impossible...)
(I'd like to add one more person to the list of candidates for the job
of monitor. Phil Frisby of hawksoft.com I have no idea as to his
availability.)
I want to be very specific in these next few paragraphs to describe
the technical aspects of what I claim, so that later no one disputes
that I have shown what I claim.
1) I will read a copy of a roughly 400k binary file -- of whatever
content one want's to supply. In order to insure that people don't
claim fakery, I suggest that the monitor bring a fresh copy with them
so that this will not be a contested issue later.
2) I will cause a computer program to read the file and produce a set
of pattern descriptors which will be converted to C. (I'll use
aiSOFT, perhaps aiTRAN -- not sure now.)
3) (I will make a short segment of the generated C source available
for publication, but just a snippet.)
4) I will supply a pre-written header component and the entire file
will be compiled to produce an executable load module.
5) When I come to the demo, I will have materials with me sufficient
that the monitor can decide to do this on either Linux or FreeBSD. My
team will supply the computers, although the monitor will have
reasonable access to them. He can examine them, run programs on them,
etc.
6) Again, in order to make things comfortable for the monitor, we
will build our C-based state-machine on one box and execute it on
another. Thus making it very hard for my side to cheat -- as we won't
know the contents of the 400k file until it is made available to be
read so as to produce the C-machine.
7) The load module produced by STEP4 will be moved to a second
machine and executed. A very close approximation of the original
input file will be produced and this will be compared to the original.
-------------------------
8) At this point, I have not a clue how to *prove* conformance. My
suggestion is that we compress the difference file and show that the
difference file is smaller by some metric, say at least X%. What is
the value of X? I think we need to discuss this... (If I knew -- I'd
say...)
"Mark Nelson" <ma...@ieee.org> wrote in message news:<_P9xc.60291$eY2.26562@attbi_s02>...
[snip]
>Surely either Leo or Gilchrist will, on your pre-payment to them
>personally, come to Boston and observe this. And after observing the
>above, will hand me a check for say, $25k.
[snip]
>4) I will supply a pre-written header component and the entire file
>will be compiled to produce an executable load module.
>
>5) When I come to the demo, I will have materials with me sufficient
>that the monitor can decide to do this on either Linux or FreeBSD. My
>team will supply the computers, although the monitor will have
>reasonable access to them. He can examine them, run programs on them,
>etc.
>
>6) Again, in order to make things comfortable for the monitor, we
>will build our C-based state-machine on one box and execute it on
>another. Thus making it very hard for my side to cheat -- as we won't
>know the contents of the 400k file until it is made available to be
>read so as to produce the C-machine.
This setup screams: BIG FAT SCAM ! :-D Be a man and pick a random
location and use random 3rd party computers. You deliver the disk with
the compressor and the testing team delivers a new file. If you can
perform, i'm sure you can keep the computers and everything on it.
Surely you could do better than that. At
http://cs.fit.edu/~mmahoney/compression/barf.html
you'll find a program that compresses the million random digits file to 1
byte, or to 0 bytes in 2 passes, since it also happens to compress all
nonempty files. You can borrow it if you like. I'm sure it will be a more
convincing demo in front of all the CS professors at MIT that don't believe
in the counting argument, than your convoluted setup where we will wonder if
you transferred the file hidden in unallocated floppy disk sectors.
-- Matt Mahoney
(snip)
> 7) The load module produced by STEP4 will be moved to a second
> machine and executed. A very close approximation of the original
> input file will be produced and this will be compared to the original.
At what point during your demonstration were you going to reproduce
the *original* file? Isn't that the point of lossless compression? I
didn't see that listed in your steps...
True, and I'm paying for the privilege. I would guess anyone who succeeds
will be happy to publish their results, but I'm not going to rule out the
possibility they won't.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"SuperFly" <no.v...@email.com> wrote in message
news:vscbc0lkmh0ami686...@4ax.com...
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Jules Gilbert" <jul...@financier.com> wrote in message
news:9c7d0936.04060...@posting.google.com...
> Surely you could do better than that. At
> http://cs.fit.edu/~mmahoney/compression/barf.html
> you'll find a program that compresses the million random digits file to 1
> byte, or to 0 bytes in 2 passes, since it also happens to compress all
> nonempty files
You mean moves one byte from the file to its extension? :)
Re.
There's no filesystem I know of that that would work for. Matt's method
doesn't cheat like that.
Phil, being deliberately obscure and pedantic
Jim, I showed the C source for a similar state machine produced by the
same process here (maybe three years ago.) Really, I don't think more
than a couple of folks here understood the implications of achieving
this -- because by XOR'ing to get the wrong bits and compressing that
bit-file; one need send only two things; the compressed 'wrong-bit'
file and the reconstructor engine.
And although several of these folks act very much like howling wolves
angry at not being fed, the point is legitimate: what size the
reconstructor machine -- (ie., how much should it count) and this is
an answer I can't give.
tri...@despammed.com (Jim Leonard) wrote in message news:<f93aad69.04060...@posting.google.com>...
Yes. This is the easiest way of hiding data, though there are others, like
using the sizes of multiple files, timestamps, and other attributes that
might be preserved when a file is copied. That's why I suggest a challenge
where the compressed file and decompressor have to be packed into an
archive, as in the Calgary challenge.
-- Matt Mahoney
> Modify the rules. Let me deliver (well, show) a state machine, coded
> in C, which when run produces an approximation of the million digit
> file. And the differences (comparing the original to the
> approximation) when compressed will produce a much smaller file.
[snip]
> 7) The load module produced by STEP4 will be moved to a second
> machine and executed. A very close approximation of the original
> input file will be produced and this will be compared to the original.
> -------------------------
> 8) At this point, I have not a clue how to *prove* conformance. My
> suggestion is that we compress the difference file and show that the
> difference file is smaller by some metric, say at least X%. What is
> the value of X? I think we need to discuss this... (If I knew -- I'd
> say...)
Jules, thanks for remembering my name. ;) The point of the test is to
show that the file can be compressed and then re-created again
somewhere else. An "approximation", even if it is very close, does
not help. If you can't re-create the original, the file is corrupt
and useless. Anyone can "compress" any file by throwing away a few
bits here and there and stating that they have an "approximation".
Proving conformance in this test is very easy since the decompressed
file should be identical to the original so a bit-by-bit comparison
can be performed. Some decent hash function(s) can be used to double
check like SHA-256, SHA-1, even MD5 would probably be fine.
Does this mean that the various compression schemes you have developed
over the years were never able to reproduce the exact input file when
decompressed, but only approximations?
Jeff.
> Modify the rules. Let me deliver (well, show) a state machine, coded
> in C, which when run produces an approximation of the million digit
> file. And the differences (comparing the original to the
> approximation) when compressed will produce a much smaller file.
[snip]
> 7) The load module produced by STEP4 will be moved to a second
> machine and executed. A very close approximation of the original
> input file will be produced and this will be compared to the original.
> -------------------------
> 8) At this point, I have not a clue how to *prove* conformance. My
> suggestion is that we compress the difference file and show that the
> difference file is smaller by some metric, say at least X%. What is
> the value of X? I think we need to discuss this... (If I knew -- I'd
> say...)
Jules, thanks for remembering my name. ;) The point of the test is to
If this is anything like your earlier program, then yes, it would compress
the million random digits file, with a couple more steps. Your program
outputs an approximation. Then you take the difference (or XOR) with the
original input and compress that with any standard compressor like gzip.
The receiver runs your program to reproduce the approximation, uncomprsses
the difference file and is able to reconstruct the million random digits.
However this would not count as compression unless your modeling program and
the compressed difference file together were smaller than the original
million random digits. As I recall your earlier modeling program is a very
big C program which was generated as output of another program that was
trained on the data to be compressed. Sorry, that doesn't count. If the
compressor and decompressor (your modeling program) knows what the input is
going to be, then compressing random data is trivial (e.g. BARF).
There are two ways you can prove your claim of universal comprssion, one
where the input data is known in advance, and one where it is not. Take
your choice.
Challenge 1. Produce an archive containing a decompressor and compressed
data that together output the million random digits file. The archive must
be smaller than 415,241 bytes and otherwise follow the rules of the Calgary
challenge.
Challenge 2. Write a compressor/decompressor pair accepting uncompressed
files within a size range you specify, and challenge anyone to find (1) a
file that does not compress by at least one byte, (2), two files that result
in the same compressed output, or (3) a file that does not compress and
decompress back to itself. Challengers are allowed to rename or change
attributes of the compressed file other than its size and contents. You may
hold the compressor or decompressor, but not both.
There is no way to convince anyone who knows anything about compression that
you have achieved the impossible without revealing at least the compressor
or the decompressor, so just forget about magic shows for private audiences.
-- Matt Mahoney
Take a file any long file. Count the indidvual ones and zeros. See which
has the most. For sake of arument say there are more zeros. Next calulate
N = 5% of the total number of bits. Then repalce the first N bits of the
non predominate occuring bit. For our test file where more zero replace
the first N ones with zeros. If more ones replace first N zeroes with ones.
The point is theres now a large imbalance in the number of ones and zeros.
At this point you could even use a classic two state arithmetic compressor.
have a count field and compress with two weights since the imblance is
guaranteed to be at least 55 to 45 for a long file you get several bytes of
compression. However just like Jules on decompression. It will be only 95%
accurate. But as Jules points out it would compress any large file. But
so what most people already know that. If you can't get 100% back then its
not lossless compression.
David A. Scott
--
My Crypto code
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott16u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
No, (what I am saying here many of these participants know already,)
is that about five years ago I began looking for methods that were not
repeatable but would UNDER VERY WELL CONTROLLED CONDITIONS to compress
rad. Once, just once!
The value for me having such a program was high -- and at the time my
financial prediction engine was going nowhere. Why was this useful,
indeed more useful than a repeatable system? Because I could
sell/license it without fear of people using it without properly
licensing it.
Well I did develop such a system. It does take considerable CPU power
to run. The method examines the entire previously
best-conventionally-compressed file and locates patterns within the
'text' of that file. (Sure, there are no simple patterns left -- of
course, but there are lot's of patterns just the same.)
In fact, I took a famous file that Mark had built and ran it and found
many patterns. Indeed, I am sure that these participants remember
that I published some instances of applicable patterns here (on
comp.cpx.)
In this situation, the file Mark is challenging us with presents a
similar situation;
Nothing repeatable is needed to win here. I merely have to reproduce
the file at a remote location, having moved only a small intermediate
representation and not the original file.
The pattern system was based on a very sensitive recognizer
multi-stage 'acceptor', somewhat akin to an LR(1) parser, but with
memory. And the program that built the final version of the C-source
consisting of the input recognizer ALSO builds an inverse, a small set
of C source routines which accept a 'seed' (misnomer here, but you get
the idea) and then rebuilds the approximation of the original file at
a remote location. BUT the reconstructed file is only approximate and
thus an XOR file (describing the 'wrong' bits) is necessary to correct
the approximation.
My perpetual-compression stuff is not based on this at all. I
concentrate on systems that are 'perfect' where the initial
construction is identical to the original input file/msg.
PS: Jeff -- if your hotmail account is 'real' expect a note from me.
Otherwise, email me.
jsgil...@hotmail.com (Jeff Gilchrist) wrote in message news:<2247d0b5.04060...@posting.google.com>...
Does the reconstructor engine grow in size if the file fed to it
grows?
You wrote that, on a 400K file, it guesses 95% correctly -- that would
imply 95% compression. How does it fair on a much larger file? Say,
20 megabytes?
I ask these questions because, if you can run the process on a 20MB
file, and come up with a reconstructor engine program + difference
file that adds up to one percent less than the original data file, you
would be a winner.
wow, if you can achieve this, please tell me how ! ;-) but I'm afraid
guessing 95% of the bits correctly means about only 71% compression, since
the entropy of the bit-error source is about 0.29...
Regards
Alexis
And getting that correction data squeezed down is a very thorny problem. I
think a good illustration of this is when we look at the difference in size
of lossless and lossy audio compressors. Over on hydrogenaudio, they refer
to a really good lossy compressed music file as being "transparent". There
are claims for lots of lossy compressors that achieve transparency
(subjectively) in the 200-250 Kbps range. But the best lossless audio
compressors take twice that bandwidth for the same CD stereo sources.
In this case, the correction needed to take a transparent lossy track to the
lossless state might be subjectively pretty small. But when it comes time to
compress that difference, it is so noisy and ill-behaved that it ends up
looking quite random to the encoder.
I don't have a good name for this fallacy, but it's based on the principle
that thinks that just because it's easy to pick the low-hanging fruit,
finishing the hard parts of the job can't be too much worse.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Jim Leonard" <tri...@despammed.com> wrote in message
news:f93aad69.04060...@posting.google.com...
It would be pretty amazing to find 3.2 Million bits that didn't have "many
patterns." Might be as difficult to find as it is to find the first
non-interesting integer.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Jules Gilbert" <jul...@financier.com> wrote in message
news:9c7d0936.04060...@posting.google.com...
Obviously my math is off :-) but hopefully Jules got the point. He
was getting hung up on the details of which files, how to transmit,
etc. when all he has to do is provide program+datafile less than the
size of the original that reconstructs the original.
This is called Rate-Distorion Theory ... The scenario is transmission
over a noisy Channel with an accepted Error
Example:
Assume binary symbols with p=0.5
You accept 10% of your data beeing lost ... (distortion D=0.10) ...
you could throw away the last 20% of the bits and say my Rate
is now only 0.80 ... the receiver just guesses and appends random
bits (and he will be right in p cases) so in the mean your distortion
will be 10%
R(D)-Theory tells us that you can do better ... the calculations
give, that your minimum rate is 0.5310 - below that the transmission
is not possible (at that distortion)
How can we do better? well - in the above example we still
know exactly where the distorted bits are - they should be
evenly distributed over the whole sequence ...
One aproach are the Channel-Coding Algorithms ... Ususally
they expand a sequence, transmit it and then reduce it by
the same amount ... Just swap the encoder and decoder ..
the maths:
the R(D) function for da memoryless binary source:
R: minimum Rate
D: accepted Distortion
e2: binary entropy function
p0: probability of input-symbols, here p0=0.5
R(D) = e2(p0) - e2(D)
with e2(x) = -x*log2(x) - (1-x)*log2(1-x)
for a distortion of 0.0417 (as Jules Gilbert claims to achieve
with his "algorithm") this gives you a Rate of 0.75
He has not yet posted any file-sizes for his reconstructor-programm
but I doubt they are below 0.75*450k ...
just dropping below 450k is easily possible ..
bye,
Michael
PS: and thanks for the entertainment last week ;-)
If I say I can do this, I need it to be a cinch -- no question at all.
About the method; today I concentrate on predicting the future, just
the easy stuff, not the hard things like whether a blue car or a red
car will pass you in traffic next. Only easy tasks, and I leave the
hard stuff for the God of the Bible, and he doesn't need computers and
he gets it right all the time. No waiting!
But years ago I did compilers. Mostly code generators and special
purpose languages for AI and programming support. (Google "aitran
edm", with the quotes. That's the last reference on the web, and as
far as I know I am the only user.)
Anyway, one way to do CG work is with LR(1) parsing. I did. And by
adding memory states, and doing some other things, I built tools that
would locate patterns in previously compressed files. True, not
simple patterns, but still patterns! Lot's of them, and even with
levels of instances. All kinds of things that other compressor
engine's seemed to be completely blind to.
Which was perfect for me, as I didn't want to compromise my
perpetual-compression methods.
Instead I wanted to sell a method that would compress RAD, that would
only work once, and that I could sell to media houses that distribute
entertainment materials. That stuff has already been compressed once
and I can make it a lot smaller. The method does require a lot of CPU
time, but not beyond what I have available. Of course production runs
would become the job of the licensee, not my responsibility...
If you want to know more, look up "entertainment media jules gilbert
compression pattern nibble" -- I have not done this but that should
get you some hits. Also, I am sure that Mark could help you.
As I recall one person did actually test the sample I processed and
reported positive results wrt finding a sample pattern I reported in
one of Mark's files.
Jim, if you have more interest in this, contact me off-line.
tri...@despammed.com (Jim Leonard) wrote in message news:<f93aad69.04060...@posting.google.com>...
I think about 71.4% compression.
it's clear that the original poster doesn't understand (or does and is
exploiting for flim flam) the difference between an information source
and a single emitted draw of symbols from that source. The noiseless
coding theorem discusses of course "all but a few (infinitesimally
improbable)" emissions, right?
technologically I bet The Thing is an egregious overfitter and does
work in a "maximum likelihood" sense but the parameter space is
enormous and hence is bad in the minimum description length sense.
> And getting that correction data squeezed down is a very thorny problem. I
> think a good illustration of this is when we look at the difference in
size
> of lossless and lossy audio compressors. Over on hydrogenaudio, they refer
> to a really good lossy compressed music file as being "transparent". There
> are claims for lots of lossy compressors that achieve transparency
> (subjectively) in the 200-250 Kbps range. But the best lossless audio
> compressors take twice that bandwidth for the same CD stereo sources.
Alas, the best lossless audio compressors still need more than 700 Kbps on
average... so the gap between perceptual and numerical audio transparency
would even be closer to 3x
> In this case, the correction needed to take a transparent lossy track to
the
> lossless state might be subjectively pretty small. But when it comes time
to
> compress that difference, it is so noisy and ill-behaved that it ends up
> looking quite random to the encoder.
Indeed... that is very clear when you run an embedded bitplane compressor :
each new bitplane typically requires at least twice as many bits as the
previous one, whereas the SNR/perceptual improvements brought by each
bitplane are getting smaller and smaller.
Regards
Alexis
Let's consider the particular class of programs that have no input and
their output is a sequence of bits. Also, let's assume a certain
formal way to represent these programs as a sequence of bits. You can
pick for example a certain bitwise representation of Turing machines
or even the x86 assembly language, etc. All these ways to express a
computable function (an algorithm) are semantically equivalent as we
all know.
Then for any given number N > 1 there is at least one buffer of size N
bits that cannot be generated using a program whose size is strictly
less than N bits.
(proof left as an exercise to the reader)
Thanks, Adi
"Mark Nelson" <ma...@ieee.org> wrote in message news:<gGQwc.61668$Ly.49932@attbi_s01>...
> I have a file that has been posted on my web site for quite some time:
>
> http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin
>
> It is 415,241 bytes long. It is a binary representation of a million digit
> number generated all the way back in 1955 by people at the RAND corporation
> with lots of time and money.
>
> My open challenge is simple:
>
> Send me a program whose size is less than 415,241 bytes that creates an
> exact duplicate of this file.
>
> If you're looking for huge financial rewards, I will put up $100. Perhaps
> Matt Mahoney will throw in the $100 he mentioned earlier, bringing the
> jackpot up to $200. Of course, the cash reward is nothing compared to the
> fame that this accomplishment would generate.
>
> The only catch is that we will have to openly negotiate the final terms on
> how the ground rules for the demonstration, using this forum. If you've
> really blown the doors off the counting theorem, I don't think we'll have a
> problem. The only reason for that open negotiation would be to make sure
> there was no cheating - the program should be able to generate the file
> using only its own resources. Built-in language capabilities, run-time
> library resources, things like that are obviously acceptable, even if they
> help you get around the size problems. Moving data to file names or
> attributes is obviously not.
>
> Just as a example, if you send me Java source plus a data file that could
> run without a network connection, I certainly think everyone here would
> gladly accept that as a valid entry.
>
> If you sent an executable that ran on a Linux system with no network
> connection, I would guess that would probably pass muster as well.
>
> Over the past 10 years, I've seen a lot of people make public claims that
> meant they could easily pass this challenge. People like Jules have
> certainly led me to believe that they have already solved the problem, and
> would be happy to demonstrate under just the right circumstances.
>
> The bar really isn't that high, either. I think I could generate a program
> that does this in perhaps 415,261 bytes - only 20 away from success!
>
> Well, this is a pretty decent way to prove it. Do so, slink away with your
> tail between your legs.
> I think about 71.4% compression.
I don't think this has to do with compression ... I would say:
evenly distributed bits (p0=0.5) and distortion is D = 0.05
so the equvalent rate is 71,36% ...
One could say: the algorithm makes D so the equivalent
information is only 285K ...
anything below that would mean compression - but as
we do not know the resulting size (after applying the
algorithm) you can't calculate any compression rate/ratio
bye,
Michael
> I don't think this has to do with compression ... I would say:
> evenly distributed bits (p0=0.5) and distortion is D = 0.05
> so the equvalent rate is 71,36% ...
>
On my side, I think it has to do with compression... don't forget the
context here : you have a 415K file, Jules Gilbert supposedly makes 95%
accurate predictions... the straightforward approach is then to perform
entropy coding of the error file. Don't forget that we are talking about a
*lossless* compression challenge !
Thus, modeling the compression algorithm as a noisy channel doesn't make
sense. I'd rather say the prediction errors of the compressor are sent on a
noiseless channel.
> One could say: the algorithm makes D so the equivalent
> information is only 285K ...
>
> anything below that would mean compression - but as
> we do not know the resulting size (after applying the
> algorithm) you can't calculate any compression rate/ratio
Reaching the 285K size means achieving an optimal entropy coding, which is
already practically impossible. Thus, anything below 285K doesn't mean
compression to me, it means very questionable black magic.
Regards,
Alexis
As Jules has produced a lossy algorithm (by accident) I tried to show
the optimum filesize neccessary to represent the data ...
Reducing the filesize to 90% with 5% bit-error is easy (as discribed)
With 5% error you need to compare to the 71% ...
The concept of compressing the prediction error would require some
other algorithms (which may not be optimal) and so it gets harder to
analyze the performance ...
> Reaching the 285K size means achieving an optimal entropy coding, which is
> already practically impossible. Thus, anything below 285K doesn't mean
> compression to me, it means very questionable black magic.
As far as I understood the challenge the input is a random file without
redundancy ... to my knowledge you can't go below that filesize anyhow
and this is what some people claim here ...
bye,
Michael
I think it also serves as a good counterpoint to think about when we say
"for any given sequence of N bits, there exists a Turing machine which has a
1-bit program that can generate this sequence". Your statement is sort of
the other side of that coin.
--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft.com - Serious About Components
|
"Adi Oltean [MSFT]" <aol...@microsoft.com> wrote in message
news:151ad89d.04061...@posting.google.com...
00000 10097 32533 76520 13586 34673 54876 80959 09117 39292 74945
00001 37542 04805 64894 74296 24805 24037 20636 10402 00822 91665
-----
when read by libgmp.so [gmp-4.1.2, the GNU multiple precision library]
after the file length in bytes has been pre-pended as a 4-byte integer
in big-endian format:
-----
# Linux x86 system [little endian]
$ od -Ax -tx4 AMillionRandomDigits.bin | sed 1q
000000 083f9b1b 9f55517f 89cb10e8 044a084a
$ od -Ax -tx4 AMillionRandomDigits.mpz | sed 1q
000000 09560600 083f9b1b 9f55517f 89cb10e8
$ od -c AMillionRandomDigits.dec | sed 2q
0000000 1 0 0 9 7 3 2 5 3 3 7 6 5 2 0 1
0000020 3 5 8 6 3 4 6 7 3 5 4 8 7 6 8 0
$ ls -l AMillionRandomDigits.*
415241 AMillionRandomDigits.bin
1000000 AMillionRandomDigits.dec
415245 AMillionRandomDigits.mpz
$
-----
where 0x00065609 is 415241.
Divide the integer value by 256**415241 to produce a rational fraction F.
Use Euclid's algorithm to compute the simple continued fraction expansion
F = q[0] + 1/(q[1] + 1/(q[2] + 1/(q[3] + 1/(q[4] + ... ))))
where q[k] are integers: {0 9 3 1 1 1 16 1 3 1 1 2 2 2 1 36 6 5 2 1 951 ...
2 7 4 1 7 2 1 1 6 1 1 4 1 10 2 1 1 13 1 1 2 3 40 4 9}. Evaluate the
convergents c[k] (the value upon truncating the simple continued fraction
after successive q[k]):
c[k] = {0/1, 1/9, 3/28, 4/37, 7/65, ...}
There are 1941715 q in the simple continued fraction for the challenge,
and c[1941714] is (Mark's_number) / (256**415241) .
The convergents of a simple continued fraction have interesting properties
which are well-known in number theory. Starting with c[0] (and excluding
the last c[1941714]), all the c[2*k] are less than the number, and all
the c[1+2*k] are greater than the number. The difference c[1+k] - c[k]
is ((-1)**k) / (d[1+k] * d[k]) where d[k] is the k-th denominator.
Each convergent is the closest rational approximation to F, of all the
fractions with smaller or equal denominators.
Of particular interest to the challenge, q[199224] is 11845119, which
is astoundingly large. This means that c[199223] is about log2(11845119)
bits "better" as an approximation to F than what might be expected for a
fraction with denominator d[199223]. That is, for c[199223] you get about
23 bits closer to F than expected based on the width of the numerator plus
the width of the denominator: "compression." Other large partial quotients:
9934417 = q[1212205], 3277487 = q[522119],
1593770 = q[334901], and 1249899 = q[1065498].
The histogram of partial quotients begins:
occurs q
----------
1 0
805592 1
330176 2
181211 3
114403 4
79073 5
58160 6
43975 7
34834 8
28103 9
23320 10
19490 11
16496 12
14480 13
12491 14
10822 15
9604 16
8471 17
7752 18
6927 19
6378 20
-----
The values of F - c[k] for k < (1941715 / 2) may well
be interesting as [one of the] inputs to a compression and
regeneration scheme.
--
> With 5% error you need to compare to the 71% ...
> The concept of compressing the prediction error would require some
> other algorithms (which may not be optimal) and so it gets harder to
> analyze the performance ...
Well, anyway both apporaches assume that channel errors are uncorrelated.
So, I guess both are pretty inaccurate models. Never mind, since imo we are
modeling the behaviour of a program that doesn't exist ;-)
> As far as I understood the challenge the input is a random file without
> redundancy ... to my knowledge you can't go below that filesize anyhow
> and this is what some people claim here ...
Agreed.
Regards,
Alexis
Well, is there some theory that offers good estimates about the
probability distribution of the denominators in the expansion?
So long,
Thomas
Let's begin again.
Given a list of bits, the objective is to transfer this bit-list, in
the original order, to a remote location, transferring fewer bits, and
with two restrictive conditions. First we desire to transfer as few
bits as possible. Second we agree to accept only those lists which
have already been processed in such a way so as to remove as much
pattern and context content as possible. This process produces a
random-appearing bit-list, and that is what we must transfer.
This is the model I have focused on; My system need not obey
Shannon's laws wrt channel capacity because, while I do transfer bits,
the transferred bits are not the message (the "bit-list"). Of course,
all transferred bits (bits which ARE ACTUALLY transferred) must obey
Shannon.
Simple example: Let's say we have a bit-list of all 1's. We could
simply construct a small state machine on the RCVE side of the channel
to build such a list and accomplish our goal. And Shannon
constraint's simply don'r apply.
But if we need to transfer the length (because the bit-list of all 1's
is of variable length, then Shannon applies to the variable containing
the bit count.
My method is more robust in that it can accept and process actual
real-world random lists.
If this community want's to put together a serious prize, perhaps $1M,
but I will take more if you can get it, I will prove to say, Jeff
Gilchrist, having signed an appropriate NDA that my method is real.
As to practical, no -- I've been trying hard for years to make
something that could compete with gzip (or now, arj and it's
successors, which are really good products), at least in terms of
elapsed time, but I don't even work on the problem anymore because I
don't think it's possible (or at least, not possible for me) to do.
In hardware, and not implemented via PLA technology but in silicon,
yes -- then it might actually be sufficiently fast to be useful. But
perpetual compression ain't easy! and while it is possible, it does
involve lot's of machine time using X86 boxes, however fast. (And I
have a room full of them.)
A few weeks ago I was considering just telling what I do, and when I
tested the waters..., well I think it's clear that telling would be
helping my enemies.
(Also I noticed that a few days ago when I disclosed my first method,
the system based on polynomial curve fitting of a random but monotonic
list of values, no one built it, no one even asked a question as to
implementation. This told me quite a bit...)
Yes it does. The entropy of a source of all 1's is 0. Shannon does say you
don't need to transmit anything.
> If this community want's to put together a serious prize, perhaps $1M,
> but I will take more if you can get it, I will prove to say, Jeff
> Gilchrist, having signed an appropriate NDA that my method is real.
A few months ago I posted half of a proof that P = NP, which would claim the
$1 million Clay prize. (I'm sure you can find it with google). The other
half is to supply a compression program that compresses all strings of
length n (you choose n). I offered to split the prize with whoever
completes the proof. Just for you, Jules, I will let you keep the entire $1
million.
> As to practical, no -- I've been trying hard for years to make
> something that could compete with gzip (or now, arj and it's
> successors, which are really good products), at least in terms of
> elapsed time, but I don't even work on the problem anymore because I
> don't think it's possible (or at least, not possible for me) to do.
That's a great idea. Work on real compression for a change? PAQ is open
source and has lots of contributors. Make it faster, or add some new
models. If you stop making impossible claims with no proof and start making
positive contributions, then you will get some respect.
> (Also I noticed that a few days ago when I disclosed my first method,
> the system based on polynomial curve fitting of a random but monotonic
> list of values, no one built it, no one even asked a question as to
> implementation. This told me quite a bit...)
Why should we waste our time implementing your theories about perpetual
compression when anyone who can read the FAQ knows it won't work? Build it
yourself if you're so sure.
-- Matt Mahoney
>Simple example: Let's say we have a bit-list of all 1's. We could
>simply construct a small state machine on the RCVE side of the channel
>to build such a list and accomplish our goal. And Shannon
>constraint's simply don'r apply.
How do you convey the information needed to construct that state machine?
-Scott
>My oh my...
Crackpots (crackheads?) like you are what keep me reading this group.
Please don't go away!
--
Sev
Whatever you call it, it's clearly at the root of many of the
erroneous compression schemes that are posted here. Perhaps even most
of them, even if you include the ones that don't try to transcend the
counting argument.
b
And trust me, I have built and run such systems many many times, even
(at one point) giving over the implementation details to a genetic
system,) and using that to examine a wide class of such machines,
looking for ways that such systems could be made truly practical.
And neither I nor my machine searching system found anything
particularly efficient. (That's one reason I no longer consider these
systems to be important.) And while on occasion I notice that you
(and a few others) do come up with an occasional insight, with
particularity towards you, I fear for your students and am amazed that
you found work teaching.
It's not that you are dumb. No. Nor is it that you lack knowledge,
no -- indeed I think you demonstrate considerable knowledge of the
subject of compression on frequent occasion.
But you insist that the world is as you think it is. And it's not.
Matt, I found a way to avoid 'sending' the bits associated with a
compressed file and for that (and because I choose not to disclose my
method) you seem to hate me (I say this based on your past remarks
made on these pages.) Also, on one or two occasions, before I knew
you well, I trusted you, and boy!, did I regret that.
Perhaps I shouldn't judge you so, perhaps because of your strong
academic orientation you think of the world in such a way that you
think anyone with a good new idea should share it. If this is your
problem, all I can say is that my life experiences have led me to see
the world I live and work in quite differently. Or maybe if I knew
more about you I would view your remarks in a brighter light. I don't
know...
(I will compliment you on your neural-compression work, that was neat
work.)
As to the Clay prize, because of your past remarks I choose not to
partner with you in any project or undertaking.
If you want me to change my thinking, then you have to do a little
work. Something. Even building a simple implementation of my curve
fitting "perpetual compressor", something. If that's what you do, do
this version:
First segment the input file into separate blocks and process each
block 'concurrently'. After the initial curve fit and subtraction of
the approximation, rotate the work, so that the n'th cell of each
buffer becomes the N'th buffer in a second stage. Why do this?
Because then you will have transformed the problem from processing RAd
to a problem of processing buffers wherein each buffer is a vector
with *approximately* similar values in it.
But don't expect me to jump to the opportunity to go after the Clay
prize with you; these days I concentrate on doing things other than
compression; mostly associated with understanding and knowing about
specific aspects of the short-term future. Such as what's going to
happen tomorrow.
I do have a few success's to my credit -- I predicted the US/China
problems wrt the plane incident (and yes, I told a few people, one
being a Harvard-based Chinese national, at the time a graduate student
doing work at MIT, another being an MIT department head.) Also, a
second system, one with a financial orientation, successfully predicts
the next-day prices for major stocks. A few people who post here
always know which stocks will go up or down tomorrow. (And as you
know, the prices are not anywhere near perfect, though so far, each
day, the head-line picks for that day have been perfect.
The program (here I discuss the program I used to predict the China
conflict) produced results that suggested a strong intensification of
conflict between China and the US 3-6 months ahead, although (of
course) I had no idea *how* the conflict would arise. This was
experimental work and anyone skilled in data mining who can program in
languages like Perl (or my own aiSOFT), using such tools to read news
can produce similar systems.
Oh yea, that Chinese national did go back to China for a visit, in
spite of my heads-up, and was in China WHILE the incident occurred and
was resolved.
If anyone is wondering, make no mistake, I use computers to do these
things and make use of natural laws that the God of the Bible has put
into place. Nothing else.
If someone does want to collaborate on such projects, I do have some
concrete short-term sub-projects. My goal is to build a three month
prediction system that can be tasked by others without my involvement
(so that it can be licensed to industry), can independently acquire
knowledge from the net so as to develop and extend it's knowledge from
a number of diverse sources, independently plan experiments and guide
it's own further development. etc. The language is important because
the user/manager can control/refine the actions of the program by
using the built-in scripting tool.
Today I have a language for specifying much of the above, a general
purpose tool for making predictions, both from the perspective of
backcasting and forecasting. I need better means of web searching;
and while I have wonderful tools for maintaining, extending and using
acquired knowledge, that sub-system is computationally expensive wrt
space.
--jg
I already said you could keep all the money. Just find my half of the
proof, submit your perpetual compressor as the other half, and take the
cash.
Of course you have to play by their rules. That means open disclosure of
your algorithm followed by 2 years of peer review. You are allowed to
patent it, and I suggest you do. Yeah, it's work, but you can hire a lawyer
to do it all for about 1% of your winnings. Besides, if you can predict
stock prices then you are already filthy rich and it shouldn't even be a
concern.
But you better hurry, because there are 3 or 4 other people on this
newsgroup who have perpetutual compression algorithms, and as soon as one of
them gets their program debugged, your window of opportunity has closed.
-- Matt Mahoney
I bet Jules is pursuing compression but not universal compression,
which is compression how most of the rest of us understands it.
> "Jules Gilbert" <jul...@financier.com> wrote in message
> news:9c7d0936.04061...@posting.google.com...
>> As to the Clay prize, because of your past remarks I choose not to
>> partner with you in any project or undertaking.
>
> I already said you could keep all the money. Just find my half of
> the proof, submit your perpetual compressor as the other half, and
> take the cash.
Not to mention the fact that you'd be famous once the proof was
confirmed, at least among mathematicians and computer scientists. (And
probably in more mainstream circles as well. The popular media loves a
story of non-professional amateurs solving some world-famous problem.)
Plus, Matt, if memory serves, the proof is a constructive one, isn't
it? Couldn't a completed proof be used to find polynomial-time
algorithms for problems in NP? I mean, the industrial applications
would be staggering. The $1M Clay prize would be peanuts in
comparison.
b
> Matt Mahoney <matma...@yahoo.com>:
>
>> "Jules Gilbert" <jul...@financier.com> wrote in message
>> news:9c7d0936.04061...@posting.google.com...
>>> As to the Clay prize, because of your past remarks I choose not to
>>> partner with you in any project or undertaking.
>>
>> I already said you could keep all the money. Just find my half of
>> the proof, submit your perpetual compressor as the other half, and
>> take the cash.
>
> Not to mention the fact that you'd be famous once the proof was
> confirmed, at least among mathematicians and computer scientists. (And
> probably in more mainstream circles as well. The popular media loves a
> story of non-professional amateurs solving some world-famous problem.)
>
Actually the press isn't bright enough to know if it was done or
not. So in fact there really are no bars getting the press to say
how great this compression is. In fact you could tell us in advance who
wins the next presidential election and which states go to each candidate.
Just make sure it makes Kerry look good to assure front page coverage.
David A. Scott
--
My Crypto code
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19u.zip
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott16u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Exactly. The proof goes like this. If you can compress all k+1 bit strings
to k bits or less, then you can recursively compress any string of length n
> k down to exactly k in O(n) time. You do that by starting with the first
k bits of input, then append 1 bit and compress back to k using a 2^(k+1) by
k lookup table, repeating n - k times. Decompression is also O(n), using a
2^k by k+1 table and repeating n - k times. There is also the detail of
encoding the end of string, but this is not hard. For example, you can use
the code 0 = 00, 1 = 01, EOF = 11.
Now you can compute any NP-complete (or harder) problem in O(n) time as
follows:
1. compress the input to k bits in O(n) time
2. compute the compressed output using a 2^k by k table in O(1) time
3. uncompress in O(n) time
so the entire computation is O(n).
BTW, this also solves the halting problem, but I don't think there is a
prize for that.
-- Matt Mahoney
My email address is jul...@financier.com If someone does code the
method up, please let me know...
PS: Severian, you won't get girls to like you with an email address
like that and if you've ever written an email to a nice young lady
using that ISP address, I'll bet she didn't write back. Of course on
second thought, maybe you shouldn't reproduce...
--jg
Actually, there is, but it involves a conical hat and chair in the
corner.
b
>I am overwhelmed by all of you. Judgement and wisdom in this group is
>akin to that of an unthinking child, angry at not being allowed to
>get too close to a dangerous cliff, or one who doesn't yet appreciate
>how hot a stove can get.
I'm certainly not angry. I'm laughing too hard.
>My email address is jul...@financier.com If someone does code the
>method up, please let me know...
Sorry, I'm too busy coding things that are possible.
>PS: Severian, you won't get girls to like you with an email address
>like that and if you've ever written an email to a nice young lady
>using that ISP address, I'll bet she didn't write back.
My girlfriend thinks it's funny. Of course, she has a sense of humor.
>Of course on
>second thought, maybe you shouldn't reproduce...
It's too late to prevent that. I've already reproduced: four known
children.
--
Sev
>My email address is jul...@financier.com If someone does code the
>method up, please let me know...
Hey, it's *your* method, *you* code it. Code it up, patent it, license it,
and watch the money roll in. If it really works, I'll be one of the first
ones in line to buy a copy.
-Scott
>I am overwhelmed by all of you. Judgement and wisdom in this group is
>akin to that of an unthinking child, angry at not being allowed to
>get too close to a dangerous cliff, or one who doesn't yet appreciate
>how hot a stove can get.
>
>My email address is jul...@financier.com If someone does code the
>method up, please let me know...
Why not code it yourself and show it to others. If this is just an old
method that has no significance to your other work, you have nothing
to loose and much to gain.
I don't regard the system as valuable now, though...
SuperFly <no....@email.net> wrote in message news:<q49oc01s59168e55b...@4ax.com>...
>Oh I spent about five years working on different implementations and
>variants, even using a genetic system to experiment for me, trying
>different designs.
>
>I don't regard the system as valuable now, though...
Perhaps because it doesn't work? Amazing...
>SuperFly <no....@email.net> wrote in message news:<q49oc01s59168e55b...@4ax.com>...
>> On 12 Jun 2004 19:45:07 -0700, jul...@financier.com (Jules Gilbert)
>> wrote:
>>
>> >I am overwhelmed by all of you. Judgement and wisdom in this group is
>> >akin to that of an unthinking child, angry at not being allowed to
>> >get too close to a dangerous cliff, or one who doesn't yet appreciate
>> >how hot a stove can get.
>> >
>> >My email address is jul...@financier.com If someone does code the
>> >method up, please let me know...
>>
>> Why not code it yourself and show it to others. If this is just an old
>> method that has no significance to your other work, you have nothing
>> to loose and much to gain.
--
Sev
I wonder if I've missed an incredible higher truth -- publish, because
you guys really are idiots who use windoz and once Basic lost favor,
you sit and write nasty msgs. Meets the available data.
Severian <seve...@chlamydia-is-not-a-flower.com> wrote in message news:<oueqc0p0hibf9k8j7...@4ax.com>...
Sorry Jules. You are right. We are all idiots who will never write any
programs or publish anything because we don't know anything about data
compression. Perhaps you can enlighten us by showing us some of your great
works.
-- Matt Mahoney
That is because you are telling us that we would fly upward off the
cliff and would get our hand frozen by the stove.
At first, like all social accidents, this was incredibly amusing. But
now it's just getting tedious. Either put up (provide a working
example of your compression, which you can patent) or shut up (stop
posting).
A while ago, a friend joked to me that it was impossible to display
full-screen full-motion (24fps or faster) video, in color, on an
original IBM PC/XT (4.77MHz 8088) with stock CGA card. I proved him
wrong -- BY PROVIDING A WORKING EXAMPLE. Please provide a working
example or everyone will forever continue to call you a crackpot.
I would personally love to see a program of yours that would:
1. Take any file as input
2. Generate a reconstructor
3. Generate the errors of the reconstrutor (like you said, the
differences found by XOR'ing the original with the reconstructed
input)
...and then we could verify your claims that RAD can have redundancy
eliminated, by providing a reconstrutor+error file that together are
smaller than the original. I understand that the XOR'd error file
will be the same length as the source, so just gzip it.
Do that, and you will garner all the respect in the world.
Phil Carmody was posting command-line Perl expressions just the other
day; I have coded x86 assembly; Matt posted a C program. I would
imagine there are many others.
>You guys are not programmers! Am I one of only four or five actual
>programmer's in this newsgroup?
In five years, you could not produce a working implementation of your
compression algorithm. Either you are not a programmer, or your
algorithm is broken.
As I said before, I program things that are *possible*.
>I wonder if I've missed an incredible higher truth -- publish, because
>you guys really are idiots who use windoz and once Basic lost favor,
>you sit and write nasty msgs. Meets the available data.
--
Sev
[snip]
>Oh I spent about five years working on different implementations and
>variants, even using a genetic system to experiment for me, trying
>different designs.
>
>I don't regard the system as valuable now, though...
Then releasing it to the public shouldn't be a problem...
I am writing a reply. I will be one or two days. You'll just have to
wait. But I will try to be responsive.
But in the meantime, why don't you send me a valid email address to my
'financier' email address.
--jg
>You guys are not programmers! Am I one of only four or five actual
>programmer's in this newsgroup?
>
>I wonder if I've missed an incredible higher truth -- publish, because
>you guys really are idiots who use windoz and once Basic lost favor,
>you sit and write nasty msgs. Meets the available data.
I think a lot, if not most people in this newsgroup have good computer
programming skills. But i doubt that anyone will try to build a
prototype based on the description you gave of this algorithm. Which
if i understand it correctly goes something like: (1) split the file
with rad data in several parts, (2) convert these parts to integer
lists by adding up the values successively, (3) curve fit these lists,
(4) compress "(3) subtracted by (1)". Unless you specify: how many
bits the used parts and values are, which curve fitting scheme you use
and which algorithm you use to compress the final data, it's hard to
predict what the exact results will be. But imho this method will
never remove enough noise from the base curve to convert it to a
result that is smaller then the original rad file, because the curve
fit will not be perfect and add extra imperfection. And i find it very
hard to believe that you have compressed the million digit rad file to
a few hundred bytes with this method. I might be incorrect though, so
feel free to point out where i'm wrong.
[snip]
>with rad data in several parts, (2) convert these parts to integer
>lists by adding up the values successively, (3) curve fit these lists,
>(4) compress "(3) subtracted by (1)". Unless you specify: how many
Should be : (4) compress "(3) subtracted by (2)"
Get this: I don't want to help you. Several people here have made
unfair and flatout false and malicious statements about me with an
intention to defame and injure. More than this, many of you have
clearly stated (again, on these pages, and both by direct statement
and also by the means of strong implication) that you do not favor or
value things that I (as well as most westerner's) believe important.
(Does the term Judaeo-Christian ring any bells?)
Such is your choice. But none of you should think that I will ever
cooperate with you. Not unless a substantial reward is attached.
So maybe, if you put up a lot of money, I just find it convienient, to
show up and show you what others can do, people unlike the collection
of folks that seems to assemble here from time to time.
(And yes, I realize that I could have quietly gone away several times,
but I CAN DO WHAT I CLAIM -- I am not the guilty party here.)
But Superfly, show you for free? Have you ever done me any favor?
Any at all? Look, take a minute and review what you've said about me
in the past. Now honestly, do you think you have any goodness coming
from me? (If you think I have mis-judged you, email me privately as
jul...@financier.com But you have a great deal of explaining to
do...)
I've been accurate and honest here. (And yes I did build aiSOFT and
aiTRAN. And yes I can compress previously compressed files.)
Maybe I should have been instructed myself, because while I made
serious money using/selling aiSOFT, aiTRAN was a commercial failure.
(Programmer's loved it, but universally failed to convince their
manager's to license it.) Perhaps I should have taken more away from
those learning experiences than I did...
Anyway, you guys are a group; Put some money together and give it to
someone I trust, and I will demonstrate perpetual compression, in
exchange for walking away with the prize.
I will try to have a schedule of costs here in a few days, but for
now, go to your employer's, raise at least $250k or so, and find
yourselves a leader. I am not sure how much I want -- but I will
commit to organizing something.
Make no mistake though, I will only show up if a neutral party I trust
has the funds and is ready to disburse them based on observations
meeting simple well defined criteria.
The schedule will setup costs for showing up, for demoing, for
teaching one of you how it's done, etc...
And if your really really nice, I just might give you a demonstration
of aiTRAN. Of course, that demo will cost you $10k. A demo of
aiSOFT, which may look like Rebus but will generate C source code as
well as CLIPS facts (as well as Snobol) will cost you $10k as well.
(Remember, we are talking demos here. No code leaves with you.) And
unless you pay to learn the underlying theory, no explanation of how
perpetual-compression works.
So, organize. Get some money together. Make it worth my while.
Otherwise don't bother me.
The amazing thing is a week or two ago I was contemplating just
publishing. (And yes this is the truth too.) But not in this life.
Not without serious compensation.
Reviewing the above, the only thing I need to correct is that the word
"perpetual" is a misnomer. Someone here used to say that if I could
do what I say I can that it would be like accomplishing perpetual
motion. But while I can compress RAD ("random-appearing data")
several times, having built and run the apparatus, I can tell you
that real limits exist. For example, record sizes, things like that.
Still, "perpetual" is a nice word so I use it; but it's not really
perpetual of course. Repeatable yes. Perpetual no.
===========================================================================
SuperFly <no....@email.net> wrote in message news:<l09sc0t7qbjhonijl...@4ax.com>...
>Superfly:
>
>Get this: I don't want to help you. Several people here have made
>unfair and flatout false and malicious statements about me with an
>intention to defame and injure.
You can only defame someone, if there is any fame to take. And so far
there is no indication that you have the right to claim any.
[snip]
>More than this, many of you have
>clearly stated (again, on these pages, and both by direct statement
>and also by the means of strong implication) that you do not favor or
>value things that I (as well as most westerner's) believe important.
>(Does the term Judaeo-Christian ring any bells?)
>
>Such is your choice. But none of you should think that I will ever
>cooperate with you. Not unless a substantial reward is attached.
Hmmm .. no pay, no proof .. no proof, no pay. Seems like a deadlock to
me.
[snip]
>(And yes, I realize that I could have quietly gone away several times,
>but I CAN DO WHAT I CLAIM -- I am not the guilty party here.)
It looks to me like you want prove something by posting here. And
people in this newsgroup have given you enough room to do just that.
But you decided over and over not to do so ...
> Agreed, exactly! A little messy to pin down, but in principle we want to let
> people have as many visits to the general-purpose code buffet they like.
>
> So if someone were to truly achieve this (clearly impossible) feat,
Oh please, don't keep saying it's impossible. It's only very very
unlikely. But that's fundamentally different from some other claims
made here which are in fact impossible.
--
Falk
Jules, you are already making millions by predicting stock prices. Why are
you still here? Do you really think people are going to pay to see your
magic show? If your random compression really works, then the $1 million
Clay prize is yours. Are you going to walk away from that?
Many people here have written real compression programs and gave them away.
When people see what you can do, money and respect will follow. Or you can
be a troll.
-- Matt Mahoney
If I came up with a way to compress random data further, breaking all
previously-known laws of physics, I would be BURSTING WITH JOY and so
ecstatic that I would want to SHOW EVERYONE how I achieved the
impossible. Wild horses could not keep me away from the media, the
internet, the web. I would be standing on street corners giving
people demonstrations if that's what it took. At the very *least* I
would put up a web page somewhere and show off the technology (which
is, in fact, what I am in the process of doing with my own
"impossible" achievement previously mentioned, and which I will
announce here when it is finished).
So here's what I don't understand: If you have indeed achieved the
impossible, why aren't you doing the same? Why aren't you breathless
from rushing a demonstration of this technology out to the public?
You can patent your idea, so there's no worry whatsoever of the
technology being stolen... wouldn't you want to share your discovery?
Fame *and* fortune would be yours -- I'm not kidding, it really would
be yours.