Million Random Digit Challenge

Mark Nelson

unread,

Jan 5, 2010, 10:26:29 AM1/5/10

to

I have a short blog post on Dobbs Code Talk re the Million Digit
Challenge:

http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million-Random-Digit-Challenge.html&Itemid=29
(or http://bit.ly/8i63hd)

While proof is impossible, I think it has been pretty well
demonstrated that this file is not going to succumb to any sort of
statistical analysis. The mathematicians at RAND did a great job,
leaving at most a few dozen bits of potential savings:

http://groups.google.com/group/comp.compression/browse_frm/thread/987e4ef26de2d7e8?tvc=1&q=matt+mahoney+million+group%3Acomp.compression
(or http://bit.ly/70iWFe)

In this blog post I include a few speculative ways in which this file
could be compressed.

For example: what if the million digit number was actually the nth
prime? And while the length of n will be pretty close to a million
digits, maybe n is a bit compressible? (Unfortunately we are a long
way from numbering primes up to a million digits.)

Or what if the million digit number could be described by some fairly
short polynomial? Factoring the million digit number might have some
interesting fallout.

It's pretty cool to imagine that the million digit number might
actually be an interesting number. What is harder to calculate is just
how astronomically unlikely that is. Most people don't have an
intuitive sense of that, hence the never-ending supply of fresh
believers.

- Mark - ma...@ieee.org

biject

unread,

Jan 5, 2010, 10:48:21 AM1/5/10

to

On Jan 5, 8:26 am, Mark Nelson <snorkel...@gmail.com> wrote:
> I have a short blog post on Dobbs Code Talk re the Million Digit
> Challenge:
>

> http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million...
> (orhttp://bit.ly/8i63hd)

>
> While proof is impossible, I think it has been pretty well
> demonstrated that this file is not going to succumb to any sort of
> statistical analysis. The mathematicians at RAND did a great job,
> leaving at most a few dozen bits of potential savings:
>

> http://groups.google.com/group/comp.compression/browse_frm/thread/987...
> (orhttp://bit.ly/70iWFe)

>
> In this blog post I include a few speculative ways in which this file
> could be compressed.
>
> For example: what if the million digit number was actually the nth
> prime? And while the length of n will be pretty close to a million
> digits, maybe n is a bit compressible? (Unfortunately we are a long
> way from numbering primes up to a million digits.)
>
> Or what if the million digit number could be described by some fairly
> short polynomial? Factoring the million digit number might have some
> interesting fallout.
>
> It's pretty cool to imagine that the million digit number might
> actually be an interesting number. What is harder to calculate is just
> how astronomically unlikely that is. Most people don't have an
> intuitive sense of that, hence the never-ending supply of fresh
> believers.
>
> - Mark - ma...@ieee.org

Mark
I don't think most files of one million bits can be
compressed. I also think if a real random source was used
that it would be incompressible. But not trusting
government contractors that feed at the public trough.
I wonder if the file is really random. They may have
taken shortcuts in its production. I think this file
should be looked at by many. I don't think I will compress
it but for example I would like to do a BWTS of the file
in binary and see if that changes any of the statistics.
Before and after such a transform. I would suspect if
random they should stay about the same.
Any way I hope some one does compress it greatly this year
I guess I am not trusting of how it was really created.
I trust you not them.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Mark Adler

unread,

Jan 5, 2010, 8:37:16 PM1/5/10

to

On 2010-01-05 07:26:29 -0800, Mark Nelson <snork...@gmail.com> said:
> For example: what if the million digit number was actually the nth
> prime?

It's not prime. It's even. The last digits in the RAND list are 88,
the last byte in your file is 0x64. So I already know that the number
is divisible by four. Another test showed that it is also divisible by
seven.

Mark

Mark Adler

unread,

Jan 5, 2010, 9:24:14 PM1/5/10

to

On 2010-01-05 17:37:16 -0800, Mark Adler said:
> Another test showed that it is also divisible by seven.

Oops. Messed that one up. It's not divisible by 7, nor by 3, 5, 11,
or 13. But it is divisible by 2.

Mark

glen herrmannsfeldt

unread,

Jan 5, 2010, 10:17:29 PM1/5/10

to

Mark Nelson <snork...@gmail.com> wrote:
(snip)

> While proof is impossible, I think it has been pretty well
> demonstrated that this file is not going to succumb to any sort of
> statistical analysis. The mathematicians at RAND did a great job,
> leaving at most a few dozen bits of potential savings:

You do have to count your bits carefully. I still am bothered
by the claims of tape manufacturers on the storage capacity of tapes.
Ultrium 1 claims 200GB, and only in fine print says compressed.
Of course that 200GB only contains 100GB of information or it
wouldn't easily compress to 100GB on tape.

In the case of random digits, one could have a file of one million
ASCII characters representing decimal digits. Good compressors
should approach the 415,000 bytes at 3.32 bits per decimal digit.
Some would say that it was compressing random data.

Mostly, though, humans prefer files with redundant information.
Pretty pictures and pleasant audio tends to have a highly
ordered structure that is reasonably compressible.

> http://groups.google.com/group/comp.compression/browse_frm/thread/987e4ef26de2d7e8?tvc=1&q=matt+mahoney+million+group%3Acomp.compression
> (or http://bit.ly/70iWFe)

-- glen

Denis

unread,

Jan 7, 2010, 10:13:32 AM1/7/10

to

Time ago I tried to compress it with an experimental compressor and it
seem incompressible for groups up to 5 bytes .
I test it little bit for 6 bytes groups but with my machine I estimate I
need about one year of computational time .
I think it is a good test but I don't want only to compress this
specific million digit I want to use it as a test so I don't want to use
"tricks" to compress but I think it is interesting to find a general
compression method able to compress the million test.

Denis.

glen herrmannsfeldt

unread,

Jan 7, 2010, 2:41:19 PM1/7/10

to

Mark Nelson <snork...@gmail.com> wrote:
(snip)

> While proof is impossible, I think it has been pretty well
> demonstrated that this file is not going to succumb to any sort of
> statistical analysis. The mathematicians at RAND did a great job,
> leaving at most a few dozen bits of potential savings:

Could someone with the ASCII version of this file, all 1000000
digits of it, run it through gzip to see how well it does?

-- glen

Ernst

unread,

Jan 7, 2010, 5:49:18 PM1/7/10

to

On Jan 7, 11:41 am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

Just to tag in here...

Hi.. I'm back fiddling with writing data encoders and the
MillionDigit file.

Thanks Mark and thanks all the Data Compression people.

I approached this from the idea of changing the data to something
compressible.

I believe I see the essence of information transferred into different
encodings and none of the encodings seem to offer
enough compression to justify the encoding.

However, I feel some of my encoders are really clever.

So I hope I will be accepted as a poster in this forum. I would like
to be part of the group even though my education is limited on the
subject.

I'd like to be a part of the quest, so to speak yet my real skill is
imagination.

So Hello again.. Sorry for the lack of dedication to the forum. but I
am here now.

Ernst

robert...@yahoo.com

unread,

Jan 7, 2010, 6:37:08 PM1/7/10

to

On Jan 7, 1:41 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:

The original is at:

http://www.rand.org/pubs/monograph_reports/MR1418/index.html

It's a bit formatted (for easy reading, there are fifty digits per
line, arranged in 10 groups of 5, plus line numbers) so you'll need to
cook it a bit if you just want a million byte file.

glen herrmannsfeldt

unread,

Jan 7, 2010, 7:32:15 PM1/7/10

to

robert...@yahoo.com <robert...@yahoo.com> wrote:
(snip, I wrote)

>> Could someone with the ASCII version of this file, all 1000000
>> digits of it, run it through gzip to see how well it does?

> The original is at:

> http://www.rand.org/pubs/monograph_reports/MR1418/index.html

The result seems to be 470429 bytes. About 13% worse than binary.

That is just the 1000000 ASCII characters, no blanks,
line numbers, or line terminators.

It downloads as a ZIP file at 663943 bytes.

> It's a bit formatted (for easy reading, there are fifty digits per
> line, arranged in 10 groups of 5, plus line numbers) so you'll need to
> cook it a bit if you just want a million byte file.

Well, first I took it a little to literally and downloaded
the above indicated file. The actual file is at:

http://www.rand.org/pubs/monograph_reports/2005/digits.txt.zip

thanks,

-- glen

Mark Adler

unread,

Jan 8, 2010, 12:38:53 AM1/8/10

to

On 2010-01-07 11:41:19 -0800, glen herrmannsfeldt said:
> Could someone with the ASCII version of this file, all 1000000
> digits of it, run it through gzip to see how well it does?

The binary file is 415,241 bytes long. (Though I could argue that Mark
left off a leading zero, and it should be 415,242 bytes.)

Putting the 1,000,000 byte ASCII digits file through gzip -9 results in
470,429 bytes.

Running the same through zlib, generating a gzip file using the
Z_HUFFMAN_ONLY strategy gets it to 436,943 bytes. For cases like this,
that's better than letting zlib try to look for matches.

Since the deflate format needs to code 11 symbols (the ten digits and
the end-of-block code), you would expect five 3-bit codes and six 4-bit
codes. For equal frequencies, you would then expect about 437,500
bytes not including overhead. The frequencies aren't quite equal, so
taking that into account, you would expect 437,330 bytes not including
overhead. The zlib Huffman coder does a smidge better, even with the
overhead, by breaking the input into about 30 blocks and getting some
more frequency dispersion in the smaller blocks to take advantage of.

Mark

biject

unread,

Jan 8, 2010, 11:09:20 AM1/8/10

to

If you used a static bijective huffman you need only ten
predefined symbols. You would use 6 3 bit symbols and 4 4
bit symbols for 425000 bytes. If you used bijective static
arthmetic with predefined 10 symbols you need 415241
or 415242 bytes. which makes me wonder how did Mark
convert it to binary. Did he use a special program to
change a decimal fraction to binary or what. Knowing
exactly how he did it might lead to a way to compress
his result.

John Reiser

unread,

Jan 8, 2010, 12:21:58 PM1/8/10

to

On 01/08/2010 08:09 AM, biject wrote:
> If you used bijective static
> arthmetic with predefined 10 symbols you need 415241
> or 415242 bytes. which makes me wonder how did Mark
> convert it to binary. Did he use a special program to
> change a decimal fraction to binary or what. Knowing
> exactly how he did it might lead to a way to compress
> his result.

Probably it was done using the GNU multiple precision library "gmp",
after processing the text input to leave exactly the one million
ASCII digits and nothing else.

Mark Nelson issued the same challenge in 2004, about 5.5 years ago:
http://groups.google.com/group/comp.compression/browse_thread/thread/b7b1d6477fd9c00c/61c6e2954d8cd3ca?q=comp.compression+million+rand+challenge#61c6e2954d8cd3ca

--

Mark Adler

unread,

Jan 8, 2010, 8:17:23 PM1/8/10

to

On 2010-01-08 08:09:20 -0800, biject said:
> If you used bijective static arthmetic with predefined 10 symbols you
> need 415241 or 415242 bytes.

Due to the small dispersion in the actual frequencies, I calculated
that an efficient arithmetic coder could do one byte better: 415,240
bytes. (Actually, 415,239 bytes and seven bits.)

Mark

biject

unread,

Jan 9, 2010, 12:42:54 AM1/9/10

to

If you know the exact frequency of each symbols yes
you could do it easily with that compressor. However
its when you try to make a general arithmetic for
any file of the set {0..9} that you get into trouble.
when you add in the count data or do it adaptively.

I have never seen the original file. But I am sure
I tested binary with both arb2x and arb255 I have used
over the years for these two adaptive bijective
stationary compresses sometimes the Lapalce where
the states have a stating count of 1 and some times
the KT methods which some think is hot where the
starting values are 1/2. The KT method gave longer
files if I remember correctly. Also tested the BWTS
of the file and got exactly the same lengths. Many
so called arithmetic will get different lengths a
pure arithmetic should get the same length regardless
of the permutation. If the starting values of counter
very large you would get the lenghts I calculated above
there is likely some starting value that would be
optimal for this file. Not sure what it would be
or if it would be really make it smaller if one
played the tuning for nonstionary game that Shelwien
plays I guess you should be able to save more space.
In my mind if the file does not compress with Shelwien
tuning then it really might be random. Note when you
use it tuned with Shelwien stuff the BWTS of file would
likely increse in length if you try to compress it since
the coder would no longer be a stgationary arithmetic
coder.

Phil Carmody

unread,

Jan 9, 2010, 9:11:07 AM1/9/10

to

Mark Nelson <snork...@gmail.com> writes:
> I have a short blog post on Dobbs Code Talk re the Million Digit
> Challenge:
>
> http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million-Random-Digit-Challenge.html&Itemid=29
> (or http://bit.ly/8i63hd)
>
> While proof is impossible, I think it has been pretty well
> demonstrated that this file is not going to succumb to any sort of
> statistical analysis. The mathematicians at RAND did a great job,
> leaving at most a few dozen bits of potential savings:

I think that means they did a completely lousy job. Calling them
'mathematicians' is an insult to mathematicians, they were just
intellectually lazy bodgers who did a totally half-arsed job.

Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1

Thomas Richter

unread,

Jan 9, 2010, 9:41:31 AM1/9/10

to

Phil Carmody wrote:

>> While proof is impossible, I think it has been pretty well
>> demonstrated that this file is not going to succumb to any sort of
>> statistical analysis. The mathematicians at RAND did a great job,
>> leaving at most a few dozen bits of potential savings:
>
> I think that means they did a completely lousy job. Calling them
> 'mathematicians' is an insult to mathematicians, they were just
> intellectually lazy bodgers who did a totally half-arsed job.

Can you provide any evidence for that? IOW, do you know how hard it is
to come up with a string of digits that withstands so many statistical
analysis?

So long,
Thomas

Mark Nelson

unread,

Jan 9, 2010, 11:54:49 AM1/9/10

to

On Jan 8, 11:21 am, John Reiser <jreise...@comcast.net> wrote:
>
> Probably it was done using the GNU multiple precision library "gmp",
> after processing the text input to leave exactly the one million
> ASCII digits and nothing else.

I posted the code to this NG back in 2002 - I used the Java bignum
class. Gordon Cormack also posted some C code to do the same thing:

Hope this link works, Google Groups search function seems to be doing
better these days, but they make no promise of any type of permalink:

http://groups.google.com/group/comp.compression/browse_frm/thread/7cb284374eb99eb5

- Mark

Mark Nelson

unread,

Jan 9, 2010, 12:07:19 PM1/9/10

to

On Jan 8, 10:09 am, biject <biject.b...@gmail.com> wrote:
> If you used a static bijective huffman you need only ten
> predefined symbols. You would use 6 3 bit symbols and 4 4
> bit symbols for 425000 bytes. If you used bijective static
> arthmetic with predefined 10 symbols you need 415241

Bijective coding plus small alphabet plus uniform frequencies = win.
If everyone here will concede the point, maybe we can then close the
topic for good!

Well, not likely, but it is pretty clear cut.

- Mark - ma...@ieee.org

Ernst

unread,

Jan 10, 2010, 1:35:28 AM1/10/10

to

Bijective data sets seem to be the direction.

Did I say that right "Bijective data sets?"
<- ... A <> B <> C <>D ... ->

That looks like fun..

Love the challenge I have spent the years learning about encoding
data.

I may have a bijective encoder.. Cross my fingers.

Again Great fun!

.

James Dow Allen

unread,

Jan 10, 2010, 6:58:02 AM1/10/10

to

On Jan 5, 10:26 pm, Mark Nelson <snorkel...@gmail.com> wrote:
> For example: what if the million digit number was actually the nth
> prime?

According to my calculations, if a number N
comprising a million decimal digits is the k'th
prime, P_k, then k itself is a number with
almost 999,994 digits. Thus to win the $100 prize
I'd have to squeeze the program
f(k)
{ print the k'th prime; }
into about 21 bits. Can this be done?

Of course, k *might* just be the j'th prime,
and j the i'th prime, and i the h'th prime....

One comment about the Million Digits intrigues me;
something like "RAND carefully constructed the digits
to pass certain statistical tests."

Suppose this means that all 3-digit patterns
occur equally often. Then, for example, our
"compressed file" could start with the first
998,000 digits after which we would know a *lot*
about the final 2000 digits and could encode
them well.

Any more info on how the Million Digits were
constructed? I suppose they were *decimal*
digits, so rendering them in binary would
blur some of those "statistical regularities."
Can we win the prize if we output just the
decimal digits, or would our "decompressor"
have to include a decimal-to-binary converter?

(Given P_k = N ~= 10^6, by the Prime Number
theorem:
k ~= N/log N
~= 10^10^6 / 10^6.36
~= 10^999993.6
)

James Dow Allen

Phil Carmody

unread,

Jan 10, 2010, 10:29:06 AM1/10/10

to

The evidence Mark posted several years ago.

They height of their logic was "shit, there's bias - erm, let's add them!".

I could trivially come up with one million digits with better
statistical properties (more precisely - fewer demonstrable
statistical weaknesses) than the RAND ones.

The RAND ones are an epic failure, people are just emotionally
attached to them as they're historical.

Mark Nelson

unread,

Jan 10, 2010, 1:10:37 PM1/10/10

to

> I think that means they did a completely lousy job. Calling them
> 'mathematicians' is an insult to mathematicians, they were just
> intellectually lazy bodgers who did a totally half-arsed job.

Had they done a "completely lousy job", the file would have been quite
a bit more compressible than the currently theorized couple of dozen
bytes.

Maybe you object more to their methodology than their results?

On Jan 9, 8:11 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

> Mark Nelson <snorkel...@gmail.com> writes:
> > I have a short blog post on Dobbs Code Talk re the Million Digit
> > Challenge:
>

> >http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million...
> > (orhttp://bit.ly/8i63hd)

Mark Nelson

unread,

Jan 10, 2010, 1:14:20 PM1/10/10

to

On Jan 10, 5:58 am, James Dow Allen <jdallen2...@yahoo.com> wrote:
> According to my calculations, if a number N
> comprising a million decimal digits is the k'th
> prime, P_k, then k itself is a number with
> almost 999,994 digits. Thus to win the $100 prize
> I'd have to squeeze the program
> f(k)
> { print the k'th prime; }
> into about 21 bits. Can this be done?
>

Nope, you would have to find some compressibility in k. That's where
the luck comes in.

>
> Any more info on how the Million Digits were
> constructed? I suppose they were *decimal*
> digits, so rendering them in binary would
> blur some of those "statistical regularities."
> Can we win the prize if we output just the
> decimal digits, or would our "decompressor"
> have to include a decimal-to-binary converter?

Yes, both a paper and a book can be located via the Google Oracle.

- Mark - ma...@ieee.org

Ernst

unread,

Jan 10, 2010, 9:34:53 PM1/10/10

to

I'm trying to find an encoding of million digit file that will
compress. So some might try that approach.

Phil Carmody

unread,

Jan 11, 2010, 6:06:09 PM1/11/10

to

Mark Nelson <snork...@gmail.com> writes:
>> I think that means they did a completely lousy job. Calling them
>> 'mathematicians' is an insult to mathematicians, they were just
>> intellectually lazy bodgers who did a totally half-arsed job.
>
> Had they done a "completely lousy job", the file would have been quite
> a bit more compressible than the currently theorized couple of dozen
> bytes.

Maybe 'completely lousy' is an exageration, but if the target is
to be paedagogical, the attempt should be beyond criticism.

> Maybe you object more to their methodology than their results?

Both. They realised their source was biased, that was good, and
then they post-process in a way which can give a mathematically
modellable reduction in shortfall of entropy rate, which they
apparently considered good enough, despite the fact that it was
clearly still visible. So it wasn't good enough - bad - and they
were happy with that - double bad.

Thomas Richter

unread,

Jan 12, 2010, 2:28:02 AM1/12/10

to

Phil Carmody wrote:
>
> The evidence Mark posted several years ago.
>
> They height of their logic was "shit, there's bias - erm, let's add them!".

I still don't understand, which type of bias?

> I could trivially come up with one million digits with better
> statistical properties (more precisely - fewer demonstrable
> statistical weaknesses) than the RAND ones.

Which weaknesses? For example, this file would be weak, for example, if
all digits would come up exactly equally probable - this is also
unlikely to happen exactly.

So long,
Thomas

Paul

unread,

Jan 12, 2010, 7:44:21 AM1/12/10

to

"Mark Nelson" <snork...@gmail.com> wrote in message
news:1b45d87a-0af0-4667...@a6g2000yqm.googlegroups.com...

> I have a short blog post on Dobbs Code Talk re the Million Digit
> Challenge:
>

> http://dobbscodetalk.com/index.php?option=com_myblog&show=The-Million-Random-Digit-Challenge.html&Itemid=29
> (or http://bit.ly/8i63hd)

>
> While proof is impossible, I think it has been pretty well
> demonstrated that this file is not going to succumb to any sort of
> statistical analysis. The mathematicians at RAND did a great job,
> leaving at most a few dozen bits of potential savings:
>

> http://groups.google.com/group/comp.compression/browse_frm/thread/987e4ef26de2d7e8?tvc=1&q=matt+mahoney+million+group%3Acomp.compression
> (or http://bit.ly/70iWFe)
>
> In this blog post I include a few speculative ways in which this file
> could be compressed.
>

> For example: what if the million digit number was actually the nth

> prime? And while the length of n will be pretty close to a million
> digits, maybe n is a bit compressible? (Unfortunately we are a long
> way from numbering primes up to a million digits.)
>
> Or what if the million digit number could be described by some fairly
> short polynomial? Factoring the million digit number might have some
> interesting fallout.
>
> It's pretty cool to imagine that the million digit number might
> actually be an interesting number. What is harder to calculate is just
> how astronomically unlikely that is. Most people don't have an
> intuitive sense of that, hence the never-ending supply of fresh
> believers.
>
> - Mark - ma...@ieee.org

Mark don't you think it will be embarrassing if someone comes along and does
compress the bin file a couple hundred bytes and you got all these postings
across the internet that says it can't be done?

Mark Nelson

unread,

Jan 13, 2010, 9:18:38 AM1/13/10

to

On Jan 12, 6:44 am, "Paul" <p...@tretbase.com> wrote:

>
> Mark don't you think it will be embarrassing if someone comes along and does
> compress the bin file a couple hundred bytes and you got all these postings
> across the internet that says it can't be done?

No, it would be really cool if somebody succeeded in doing something
is widely viewed as impossible.

It would be enormously interesting. Who would be embarrassed by that?

I'd be out $100 though, so it's not like there is no downside.

- Mark

Earl_Colby_Pottinger

unread,

Jan 13, 2010, 2:20:52 PM1/13/10

to

The same thought went thru my mind, why be embarrassed if the
challenge results into new insight about compression/source coding?

There is nothing more fun than finding out that you are wrong and
something you thought was impossible, has now become possible.

Phil Carmody

unread,

Jan 14, 2010, 6:18:51 PM1/14/10

to

Thomas Richter <th...@math.tu-berlin.de> writes:
> Phil Carmody wrote:
>>
>> The evidence Mark posted several years ago.
>>
>> They height of their logic was "shit, there's bias - erm, let's add them!".
>
> I still don't understand, which type of bias?
>
>> I could trivially come up with one million digits with better
>> statistical properties (more precisely - fewer demonstrable
>> statistical weaknesses) than the RAND ones.
>
> Which weaknesses?

The weaknesses Mark posted several years ago. Or was it not Mark?
Which Mark, at that? Anyway. The source numbers were biased, RAND
made that known. They then removed some of that bias by summing
digits in a column, which means that the parity of an unknown digit
in a column can be worked out given knowledge of the parity of all
the other digits in the column. That's another weakness - they've
just given you 50 bits for free. Or should I say they've just taken
away 50 bits.

> For example, this file would be weak, for example,
> if all digits would come up exactly equally probable - this is also
> unlikely to happen exactly.

If they were exactly equally probable that wouldn't be a weakness;
if they were exactly equally distributed that would demonstrate a
weakness.

LawCounsels

unread,

Feb 8, 2010, 11:58:06 AM2/8/10

to

> No, it would be really cool if somebody succeeded in doing something
> is widely viewed as impossible.
>
> It would be enormously interesting. Who would be embarrassed by that?
>
> I'd be out $100 though, so it's not like there is no downside.
>
> - Mark

I last mentioned its the limit of new frontiers to be ascertained ....

download the compressed & decompressed http://random.org files :
https://www.box.net/shared/static/5q6n4dludd.zip

download the .exe : https://www.box.net/shared/static/rm79sk26oy.zip

attach files shows 1,048,576 bytes of Random2009-08-5.dat (total
random file from http://www.random.org/files/ ) , accepted being quite
random not possible compresses even 1 byte less by all present best
state-of-art known compressors , but long predicted now proven
successful compressed reduced by 1Kbytes into 1,047,887 bytes
Index.dat

1,047,887 bytes Index.dat compressed file successful lossless
decompressed reconstructed back into exact verbatim same
Ramdom2009-08-05.dat

ranking.exe also able further compresses winzip's final compressed
reduced file even further

Looking for serious confidential (only needs this for short time
period) technological capable collaborator ... pls email ... look
forward together extend the frontier limits as we know today : )

INSTRUCTIONS
===========

attach DOS command prompt exe , 2nd more versatile powerful version up
next [ this version takes few hours compresses 1MB http//random.org
files , next version optimised rank/unrank will take only minutes ]

copy all into DOS default directory , bring up DOS type in :
ranking.exe -r 1 random2009-08-05.dat <return>
-r is to compress , 1 accept 1 input byte per symbol ( upto max 3
bytes ) random2009-08-5.dat is the input file to be compressed ( from
http://random.org , you may use any other files there )
this produces 1 single compressed file Index.ind ( you may want to
transfer this file to a different PC to 'separately' reconstruct )

to reconstruct type in : ranking.exe -u reconstructed.dat
<return>
-u means reconstruct reconstructed.dat is the name you give to the
reconstructed output file
( this uses Index.ind the compressed file produced earlier to
reconstruct ) this produces reconstructed.dat which should be exact
same as original input file

both PCs needs have .NET framework & MSVC (Microsoft Visual Studio )
installed

Earl_Colby_Pottinger

unread,

Feb 8, 2010, 4:41:30 PM2/8/10

to

This does not sound right. If I understood the challenge, the
decompressor AND the compressed data together must be smaller than
million digits file.

If your compressed data file is only one(1)K smaller than the original
file how do we not know the missing 1K of data is embedded in the
decompressor program.

Still, if I find the time I will try it out out by compressing some
test files on my Dell (Windows XP) laptop, transfer the compressed
file to my Toshiba (Windows 7) laptop and try decompressing there,
then test the resulting output file by moving the hard drive to my
Compaq (Haiku) desktop.

Since, only the Toshiba has network access all file transfers are
sanitize by transferring thru a USB flash drive. The USB drive is
formatted BFS insuring no hidden files can be passed around.

Give me over the weekend.

Sportman

unread,

Feb 8, 2010, 7:05:28 PM2/8/10

to

On Feb 8, 5:58 pm, LawCounsels <LawCouns...@aol.com> wrote:
> attach files shows 1,048,576 bytes of Random2009-08-5.dat (total

> random file fromhttp://www.random.org/files/) , accepted being quite

> random not possible compresses even 1 byte less by all present best
> state-of-art known compressors , but long predicted now proven
> successful compressed reduced by 1Kbytes into 1,047,887 bytes
> Index.dat
>
> 1,047,887 bytes Index.dat compressed file successful lossless
> decompressed reconstructed back into exact verbatim same
> Ramdom2009-08-05.dat

I have verified this claim and it true:

Ranking.exe -r 1 Random2009-08-05.dat give a 1,047,887 bytes output
file in 1 hour and 47 minutes.
Ranking.exe -u output.dat give a 1,048,576 bytes input file back in 2
hours 17 minutes
Compare between original input file and decompressed output file give
Ok

It looks like you are the first one to deliver a compressor and
decompressor what can shape the size of a true random input file and
back.

Good job!

The science can go back to the drawing tables.
30 January 2010 the second law of thermodynamics is publicly cracked
by Steorn with their Orbo and now random data compression is publicly
cracked by LawCounsels with Ranking.exe :-)

http://www.youtube.com/watch?v=T4Q3Klq5dxM
http://www.youtube.com/watch?v=p7i7P63IByY

Sportman

unread,

Feb 8, 2010, 7:46:48 PM2/8/10

to

On Feb 9, 1:05 am, Sportman <sport...@gmail.com> wrote:
> I have verified this claim and it true:

If you have a slow computer you can split the input file with HJSplit
in 400KB parts:
http://www.freebyte.com/hjsplit/

Ranking.exe -r 1 rand.dat give a 409,548 bytes output
file in 22 minutes.
Ranking.exe -u rand2.dat give a 409,600 bytes input file back in 28

minutes
Compare between original input file and decompressed output file give
Ok

It's not possible to compress the output file further.

LawCounsels

unread,

Feb 9, 2010, 12:02:35 AM2/9/10

to

sportman writes :

> Ranking.exe -r 1 rand.dat give a 409,548 bytes output
> file in 22 minutes.
> Ranking.exe -u rand2.dat give a 409,600 bytes input file back in 28
> minutes
> Compare between original input file and decompressed output file give
> Ok
> It's not possible to compress the output file further.

this really only very small tip of iceberg sighted sometime ago ...
things moved on since things to come :)

inconnu

unread,

Feb 9, 2010, 2:36:35 AM2/9/10

to

> It's not possible to compress the output file further.

did you think to encrypt the result and check if the result can be
compressed ?

it would be interesting to know...

Earl_Colby_Pottinger

unread,

Feb 9, 2010, 8:09:59 AM2/9/10

to

On Feb 8, 6:05 pm, Sportman <sport...@gmail.com> wrote:

> I have verified this claim and it true:
>
> Ranking.exe -r 1 Random2009-08-05.dat give a 1,047,887 bytes output
> file in 1 hour and 47 minutes.
> Ranking.exe -u output.dat give a 1,048,576 bytes input file back in 2
> hours 17 minutes
> Compare between original input file and decompressed output file give
> Ok

I don't have the time yet to do my own tests, but the one thing I
know, never, Never, NEVER use the data files supplied by the person
making the claim. Hopefully over the weekend I will create my own
version of random data for testing purposes.

Sportsman, did you use his supplied data, if yes then strike one.

Sportsman, did you do the tests on a computer with active internet
connection, if yes then strike two.

Sportsman, did you do the decompression on the same machine that you
did the compression on, if yes then strike three.

Mark Nelson

unread,

Feb 9, 2010, 8:49:38 AM2/9/10

to

The only way a program like this is interesting is if it is able to
repeat its performance on a variety of datasets.

What is the smallest program that can compress, then expand the
million random digit file by 100K bytes? 100K +n, where n is smallest
assembly language equivalent of cat. I think this code falls into that
category - tuned for a specific file, intentionally or not.

The strict definition of the challenge requires that a program smaller
than the size of the million digit file be able to recreate the file.
LawCounsels is nowhere close to that.

A looser definition of the challenge would require that the program
compress to a data file smaller than the million digit file, the
decompress correctly. Then be able to repeat the process a few more
times on the file encrypted using DES.

- Mark

Sportman

unread,

Feb 9, 2010, 1:05:10 PM2/9/10

to

On 9 feb, 06:02, LawCounsels <lawcouns...@gmail.com> wrote:
> this really only very small tip of iceberg sighted sometime ago ...
> things moved on since things to come :)

I tried to recompress the output of the provided Random2009-08-05.dat
but Ranking.exe make it a little bigger:
index.ind 1,047,887 bytes to 1,048,216 bytes

Took 1 hour and 47 minutes to compress.

I also tested two other random files from random.org and also
Ranking.exe made them a little bigger:

2006-03-11.bin 1,048,576 bytes to 1,048,903 bytes
2010-02-09.bin 1,048,576 bytes to 1,048,901 bytes

Both took also 1 hour and 47 minutes to compress.

Because it looks like something is different with the provided
Random2009-08-05.dat I compared that file with the original from
random.org 2009-08-05.bin.

I found 4010 differences, I give the first 25 of them:
44: 00 20
1A8: 00 20
22C: 00 20
31E: 00 20
3F8: 00 20
479: 00 20
4DC: 00 20
6BB: 00 20
6D6: 00 20
707: 00 20
816: 00 20
821: 00 20
A5B: 00 20
AFC: 00 20
AFE: 00 20
CD8: 00 20
D27: 00 20
F04: 00 20
F37: 00 20
1144: 00 20
1226: 00 20
1262: 00 20
135B: 00 20
139E: 00 20
14C8: 00 20
......

So it looks like with testing your software decompressing you have
overwrite your original file with a file who is not 100% random
anymore. And later used this file as input...

I'm afraid your compressor can't compress random files and you need to
go back to the drawing table...

Earl_Colby_Pottinger

unread,

Feb 9, 2010, 2:24:10 PM2/9/10

to

On Feb 9, 12:05 pm, Sportman <sport...@gmail.com> wrote:

Sportsman, thank you for doing further tests. And saving me the time/
trouble of doing the tests myself.

> Because it looks like something is different with the provided
> Random2009-08-05.dat I compared that file with the original from
> random.org 2009-08-05.bin.

An important step.

Thus why I said never use the author's supplied data file to do the
tests.

> So it looks like with testing your software decompressing you have
> overwrite your original file with a file who is not 100% random
> anymore. And later used this file as input...

Yes, a very good point! This is a simple mistake to make and once the
wrong file is being used for testing the author will in complete
honesty think that they are developing something great.

I also made a similar mistake in the past, I ran my 7-bit text
compressor (it stripped out the high bit first) on a 8 bit file, then
used the decompressed version to test the 8-bit compressor that I was
developing the following month - I was getting great compression
ratios until I discovered my error. :(

Earl Colby Pottinger (Compression fool)

biject

unread,

Feb 9, 2010, 3:05:47 PM2/9/10

to

On Feb 9, 12:24 pm, Earl_Colby_Pottinger

If it was April Fools Day I would believe it was a joke.
But you should have thought something was up when he pointed
to the site with the Irish perpetual motion machine. It
may make money for the inventors if they can find enough
"investors". However that is most likely fake too.
Oh well it was a nice break from the horrors of the
real world which is either on the brink of a nuclear war
or freezing due to an ice age.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

LawCounsels

unread,

Feb 9, 2010, 4:54:14 PM2/9/10

to

On 9 Feb, 18:05, Sportman <sport...@gmail.com> wrote:
> On 9 feb, 06:02, LawCounsels <lawcouns...@gmail.com> wrote:
>
> > this really only very small tip of iceberg sighted sometime ago ...
> > things moved on since things to come :)
>
>

> So it looks like with testing your software decompressing you have
> overwrite your original file with a file who is not 100% random
> anymore. And later used this file as input...

Thanks

yes when I dust up this early version posted here thought http://random.org
was quite random not as random as Mark's
so was compressable .....

forgot Window's Notepad unexpected 'flaw' converts both hex'00' &
hex'20' to same hex when saving random.org 's file with Notepad...
making random.org 's file not so random , overlooked since Notepad
displays both original .bin & saved .bin both exact same visual
display both hex'00' & hex'20' exact same on screen

However with quite random file such as reasonable size WinZip final
compressed file , this early version ranking.exe often can reduce the
compressed winzip file even further final winzipped file probably
quite random but not always complete random .... its assumed winzip
has the best state-of-art compressions available (?), cost 1bit to
indicate if further compressable thus compressed reduced further with
ranking.exe

its simple enough confirm verify with own files, winzip them & then
further compress with ranking.exe

subsequent refined version/s have progressive reduced the 'excess'
bytes when compressing true random files (like both http://random.org
& Mark's) ....Mark has possession of one of subsequent ranking.exe
version tested on true .bin (not saved by Notepad) which expanded the
406KB random file byonly some 12 bytes

very latest refined related methods implementation at present has
attained exact 0 'excess' bytes (not with Notepad)... progress in
right direction remembers this is not straight forward transformations
which simply preserves exact same # of bits as original like BWT etc
but actual able reduces many less than complete random files ....
present shortly awaits new improved methods implemention towards
compressions gains (not just exact same # of bits as it stands at this
time) hopeful , or if not at very least able reduce many more sets of
files even further than at present

Carl Kaufmann

unread,

Feb 9, 2010, 6:58:11 PM2/9/10

to

On 2010-02-09 16:54, LawCounsels wrote:
> On 9 Feb, 18:05, Sportman<sport...@gmail.com> wrote:
>> On 9 feb, 06:02, LawCounsels<lawcouns...@gmail.com> wrote:
>>
>>> this really only very small tip of iceberg sighted sometime ago ...
>>> things moved on since things to come :)
>>
>>
>> So it looks like with testing your software decompressing you have
>> overwrite your original file with a file who is not 100% random
>> anymore. And later used this file as input...
>
> Thanks
>
> yes when I dust up this early version posted here thought http://random.org
> was quite random not as random as Mark's
> so was compressable .....
>
> forgot Window's Notepad unexpected 'flaw' converts both hex'00'&
> hex'20' to same hex when saving random.org 's file with Notepad...
> making random.org 's file not so random , overlooked since Notepad

> displays both original .bin& saved .bin both exact same visual

> display both hex'00'& hex'20' exact same on screen
>
> However with quite random file such as reasonable size WinZip final
> compressed file , this early version ranking.exe often can reduce the
> compressed winzip file even further final winzipped file probably
> quite random but not always complete random .... its assumed winzip
> has the best state-of-art compressions available (?), cost 1bit to
> indicate if further compressable thus compressed reduced further with
> ranking.exe

Please refer to http://www.maximumcompression.com/index.html to see what
the real state-of-the-art is.

> its simple enough confirm verify with own files, winzip them& then

> further compress with ranking.exe
>
>
> subsequent refined version/s have progressive reduced the 'excess'
> bytes when compressing true random files (like both http://random.org
> & Mark's) ....Mark has possession of one of subsequent ranking.exe
> version tested on true .bin (not saved by Notepad) which expanded the
> 406KB random file byonly some 12 bytes
>
> very latest refined related methods implementation at present has
> attained exact 0 'excess' bytes (not with Notepad)... progress in
> right direction remembers this is not straight forward transformations
> which simply preserves exact same # of bits as original like BWT etc
> but actual able reduces many less than complete random files ....
> present shortly awaits new improved methods implemention towards
> compressions gains (not just exact same # of bits as it stands at this
> time) hopeful , or if not at very least able reduce many more sets of
> files even further than at present
>

The only way of discovering the limits of the possible is to venture a
little way past them into the impossible.
- Arthur C. Clarke

I do not believe that you will succeed, but in the spirit of the above
quote, I support you attempt.

biject

unread,

Feb 9, 2010, 7:54:06 PM2/9/10

to

On Feb 9, 2:54 pm, LawCounsels <LawCouns...@aol.com> wrote:
> On 9 Feb, 18:05, Sportman <sport...@gmail.com> wrote:
>
> > On 9 feb, 06:02, LawCounsels <lawcouns...@gmail.com> wrote:
>
> > > this really only very small tip of iceberg sighted sometime ago ...
> > > things moved on since things to come :)
>
> > So it looks like with testing your software decompressing you have
> > overwrite your original file with a file who is not 100% random
> > anymore. And later used this file as input...
>
> Thanks
>

> yes when I dust up this early version posted here thoughthttp://random.org

> was quite random not as random as Mark's
> so was compressable .....
>
> forgot Window's Notepad unexpected 'flaw' converts both hex'00' &
> hex'20' to same hex when saving random.org 's file with Notepad...
> making random.org 's file not so random , overlooked since Notepad
> displays both original .bin & saved .bin both exact same visual
> display both hex'00' & hex'20' exact same on screen
>
> However with quite random file such as reasonable size WinZip final
> compressed file , this early version ranking.exe often can reduce the
> compressed winzip file even further final winzipped file probably
> quite random but not always complete random .... its assumed winzip
> has the best state-of-art compressions available (?), cost 1bit to
> indicate if further compressable thus compressed reduced further with
> ranking.exe
>
> its simple enough confirm verify with own files, winzip them & then
> further compress with ranking.exe
>
> subsequent refined version/s have progressive reduced the 'excess'

> bytes when compressing true random files (like bothhttp://random.org

> & Mark's) ....Mark has possession of one of subsequent ranking.exe
> version tested on true .bin (not saved by Notepad) which expanded the
> 406KB random file byonly some 12 bytes
>
> very latest refined related methods implementation at present has
> attained exact 0 'excess' bytes (not with Notepad)... progress in
> right direction remembers this is not straight forward transformations
> which simply preserves exact same # of bits as original like BWT etc
> but actual able reduces many less than complete random files ....
> present shortly awaits new improved methods implemention towards
> compressions gains (not just exact same # of bits as it stands at this
> time) hopeful , or if not at very least able reduce many more sets of
> files even further than at present

You could take the file convert it to ASCII ones and zeroes
then do a BWTS on it and convert back to the binary packed
file. Which would be the same size. See if you can compress that
if you can I would say first file not random. But its a test
they are not likely to do that kind of test before they put
the file up.