Matt Mahoney wrote: > John McCarten wrote: > > maud wrote: > > > Matt Mahoney wrote in part:
> > > > Here is an easier problem. I generate one 8 MB file and post it. You > > > > compress it and post an archive containing the compressed file and a > > > > decompression program, such that the archive is at least one byte > > > > smaller than the file I post. Follow the rules of > > > > http://mailcom.com/challenge/ for creating the archive and decompressor > > > > (substituting the test file for the Calgary corpus). > > > > ... > > > > To make it easier still, I will tell you how I will create the test > > > > file. I will use the program sharnd.exe to create it. > > > > ...
> > If you look at the rules for the challenge you will see an important > > element missing for challenges such as the one proposed. There is no > > time limit.
> I am patient.
Yes. Me too. One often has to wait patiently for a bite. I only got one bite, from BR, but that's the way it sometimes goes when fishing :)
> > The second thing is that the challenge file is to be created by sharnd. > > The source code of which has been made available.
> So the problem ought to be easy.
True. Lucky for me the source was available. I would have been in serious trouble otherwise.
> After einstein gives up
You may be in for a long wait, but as you've said :)
> I'll post a > decompressor under 30 KB just to prove it can be done.
This will be definitely be worth seeing. I won't be calling tho'. I will be safely out of harm's way, sitting on the rail, when you play this hand.
> Now, the problem is: how on earth do you ever expect to be able to map > SEVEN (or 2^(N+1) - 1) plain text strings to THREE (or 2^N - 1) > compressed strings? > E.g.: > '11' -> (compression) -> '1' > '10' -> (compression) -> '0' > '01' -> (compression) -> '' > '00' -> (compression) -> ??????? // no possibilities left... > '0' -> (compression) -> ??????? // no possibilities left... > '1' -> (compression) -> ??????? // no possibilities left... > '' -> (compression) -> ??????? // no possibilities left... > This goes for any N!!! Claiming that you can compress *any* and *all* > strings of length N or less, is saying that you can map all possible > 2^(N+1) - 1 plain text strings to 2^N - 1 compressed strings. I'm afraid > you want to fill pigeon holes with on average about 2 pigeons per hole.
It's a very good and clear example. And: if the program compress all strings of n bytes (that what is needed to compress a random file using a rigorous definition of random), then nothing prevent you to use it recursively (the output file is an *any file*), so if any input file is compressed at least of 1 bit each time we have that any file of n bit must disappear (being 0 bit of size) after being recursively compressed by that program n times, that is equal to proof that the compressor is not lossless since all the n bits of information are now gone.
As said in other posts, anyone could use a PRN generator and so provide a very efficient compressor that map each random looking output to his seed and size, however it is indeed compressing a subgroup of files, not at all compressing all files. In other worlds a program that generate a random looking output, use ent or diehard suite to say that "it's random" (really, the tests can only say that is probable that the file doesn't contain any recognizable structure, and many good PRNG will pass that kind of tests) and compress it to an amazingly small file is not enough to say that any arbirtrary input can be compressed. Trying to recompress the output recursively is a way to disprove the claim, if the compressor can compress an arbitrary input it must compress any of the output of the recusion until the output size is 0, proving itself to not be a lossless compressor.
> > Just curious, which did you prove, P = NP or P != NP?
> > -- Matt Mahoney
> Yes, do tell, I've been dying to see how this one turns out.
> I'm not calling for a straw poll or anything, but we're all betting on > P != NP, right?
> But if P = NP, what a world!
> | > | Mark Nelson > |
Yep, and it's a simple proof too. All you need is a compressor that will compress any string longer than m, and you can perform any computation in O(n) time, which is in P:
1. Recursively compress the input from size n to size m in O(n) time. (I have omitted some details, such as encoding the number of recursions, dealing with compression by more than one bit, and showing that each cycle does not depend on n so is O(1), but the basic result is the same). 2. Compute the compressed output in O(1) time using a 2^m by m lookup table. 3. Recursively decompress the output in O(n) time. Total computation is O(n).
As a corrolary, the halting problem is in P. This fact can be used to solve the 6 remaining Clay millenium prize problems.
Einstein, if you are still interested in the Clay prizes, let me know. Here is the deal. 1. You pay me a fee of $1000 by July 1, 2006. 2. You write and publish (with source code) a compressor/decompressor pair that will compress every file or string larger than some size m by at least one bit, where m is any positive integer you choose, and the decompressor restores the original input to the compressor. 3. After step 2, I will write a paper with you as co-author solving the 7 millenium problems and get it published in an appropriate mathematics journal within 6 months. 4. The Clay Institute requires a 2 year waiting period after publication, acceptance within the mathematics community, and approval by their scientific advisory board before awarding the prizes ($7 million). We split the prize money, regardless of the actual amount. 5. If after step 2 I fail to complete steps 3 and 4 (for the full $7 million) then I refund my fee. 6. There is no refund if you fail to complete step 2. 7. After July 1, 2006 my fee increases to $2000, then increases $1000 per week after that. The amount of the fee is determined by the day that I receive payment. 8. There is no time limit for you to complete step 2. 9. You are free to file for patents worldwide on your algorithm for step 2 with you as sole inventor.
If we have a deal, contact me by email to arrange payment.
Of course you can go it alone and claim the whole $7 million for yourself (or at least $1 million for P = NP using either the proof I posted or the one sitting on your other computer). I don't know if you have tried to publish any papers before, but it can be a humbling experience. It really helps to get outside assistance in this matter.
So as long as guys like Harrington are out there, I need to make sure that we keep the million random digit challenge on the table. In the past it's been a little hard to find, you have to search through comp.compression and there are various false hits.
Einstein, the point of the million random digit challenge is basically to encourage people with magical compressor claims to put up or shut up. If you can beat the challenge within the spirit of the rules, you win.
It's been out there for four or five years, and nobody has taken a serious crack at it. One or two people have noted that there is a tiny bit of redundancy in the million random digit file, but not enough to exploit with a conventional decompressor - nibbling away a dozen bytes doesn't leave room for much of a decompression program.
Nope, to compress this file will take a magic compressor. So, why not code up what you've been talking about and earn undying fame by meeting the challenge? It's no Clay Prize, but then, I don't have a retail empire or a family fortune to fund a prize of this nature.
maud wrote: > tc...@lsa.umich.edu wrote: > > In article <1150377194.973543.41...@i40g2000cwc.googlegroups.com>, > > maud <maudlin.po...@hotmail.com> wrote: > > >a) Create all sets of possible values that occupy 8MB space. > > >b) Compress and decompress them all without error using your method. > > >c) Count up how many sets when compressed result in a smaller space > > >than 8MB.
> > >Now, it may take some time but hey! thats combinatorics for you.
> > No kidding. 8MB is 64 million bits, so there are 2^(64 million) possible > > values that occupy 8MB space. Even if you handle a googol cases per > > femtosecond, you would handle a negligible fraction of all cases in a > > googol universe lifetimes.
> Probably be a good idea if we went for lunch first then :)
I found a way to create a recursive compressor software right away if some one could transform any random file into this format (for example) with minimal additional bytes. 111110110001101111000
each section in the binary string of the output must have runs of 1, with the length atleast 2 and above, and run of the zero is atleast 1 and above...
x>=2 , y>=1 where x = run length of 1, and y = run length of 0
so the minimal group will be like this 110
if anyone found any variable length code that have this characteristic or output of some codec generate something like this, please contact me immediately... we will create the miracle and solve this problem once for all.
I am hinting Fibonanci for the 11 but still it is not suitable...
Einstein, this is a challenge for you.. I only need a data transform to complete the quest. so before I complete it with the help of experts here, I do hope you did first.