Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Random digit challenge

192 views
Skip to first unread message

Mark Nelson

unread,
Jul 24, 2002, 10:38:44 PM7/24/02
to
Gordon Cormack already posted a pointer to this, but out of personal
interest I duplicated his work. I wrote a short Java program (got to love
those Java libraries!) that converts the RAND Corp million digit text file
to a 400K+ binary file, using the quite spiffy BigInteger class. I've posted
the binary file temporarily at:

http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin

The source, (appended to this message as well) is at:

http://datacompression.info/Miscellaneous/Convert.java

The bin file stymied every compressor I threw at it in a limited test. I'd
love to find a program that can at least squeeze a few bytes out of it.

If you choose to run the converter on your own system, be warned that the
conversion process took about 2 hours on a 900 MHz Celeron. Yes, Java has a
downside as well as an upside!

Now, who's going to volunteer to compare Gordon's file with mine to see if
we have the same bit stream, possibly reversed or o/w mangled?

----------------------------------Convert.java------------------------------
-
import java.io.*;
import java.math.BigInteger;

public class Convert {
String fileName;
final static int maxLines = 20000;
String decimalData;

public Convert( String fileName ) {
this.fileName = fileName;
}
public void readData() throws Exception {
BufferedReader in = new BufferedReader( new FileReader(
this.fileName ) );
StringBuffer buf = new StringBuffer( maxLines * 50 );
for ( int i = 0 ; i < maxLines ; i++ ) {
if ( ( i % 100 ) == 0 ) System.out.println( i );
String s = in.readLine();
s = s.substring( 8, 13 ) + s.substring( 14, 19 ) +
s.substring( 21, 26 ) + s.substring( 27, 32 ) +
s.substring( 34, 39 ) + s.substring( 40, 45 ) +
s.substring( 47, 52 ) + s.substring( 53, 58 ) +
s.substring( 60, 65 ) + s.substring( 66, 71 );
buf.append( s );
}
in.close();
decimalData = buf.toString();
}
public void writeBigInteger( String filename ) throws Exception {
System.out.println( "About to convert to binary. This will take some
time..." );
BigInteger big = new BigInteger( decimalData );
System.out.println( "Done!" );
FileOutputStream outputStream = new FileOutputStream( filename );
outputStream.write( big.toByteArray() );
outputStream.close();
}

public static void main(String[] args) {
Convert convert = new Convert( "A Million Random Digits.txt" );
try {
convert.readData();
convert.writeBigInteger( "A Million Random Digits.bin" );
}
catch ( Exception e ) {
System.out.println( "Exception caught: " + e );
}
}
}
----------------------------------EOF-------------------------------

--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|


Gordon V Cormack

unread,
Jul 24, 2002, 10:58:21 PM7/24/02
to
In article <UwJ%8.106814$_51.8...@rwcrnsc52.ops.asp.att.net>,

Mark Nelson <ma...@ieee.org> wrote:
>Gordon Cormack already posted a pointer to this, but out of personal
>interest I duplicated his work. I wrote a short Java program (got to love
>those Java libraries!) that converts the RAND Corp million digit text file
>

I know this isn't a fair comparison, but I can't resist Java bashing.
Here are the line/token/character counts for the two implementations:

20 48 369 decbin.c
20 106 651 marknelson.java

--
#include <stdio.h>
int d[200000];
char c;

main(){
register i,j,r,first,n;
for (n=0;1 == scanf("%d",&d[n]);n++);
first = 0;
while (first < n) {
r = 0;
for (j=first;j<n;j++) {
d[j] += r*100000;
r = d[j]%256;
d[j] /= 256;
if (j==first && !d[j]) first++;
}
c = r;
fwrite(&c,1,1,stdout);
}
}
--
Gordon V. Cormack CS Dept, University of Waterloo, Canada N2L 3G1
gvco...@uwaterloo.ca http://cormack.uwaterloo.ca/cormack

SCOTT19U.ZIP_GUY

unread,
Jul 24, 2002, 11:54:20 PM7/24/02
to
ma...@ieee.org (Mark Nelson) wrote in
<UwJ%8.106814$_51.8...@rwcrnsc52.ops.asp.att.net>:

>Gordon Cormack already posted a pointer to this, but out of personal
>interest I duplicated his work. I wrote a short Java program (got to
>love those Java libraries!) that converts the RAND Corp million digit
>text file to a 400K+ binary file, using the quite spiffy BigInteger
>class. I've posted the binary file temporarily at:
>
>http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin
>

Just for the heck of it I throw a couple of my compressors at
it. One only expanded the file by "one byte". But none got it to
the same size or smaller. Just like the one from Patrick. Some get
close but none of mine quite bet it.
I know my can't compress random. But if I make one that compresses
this file I will let you know since it would mean most likely the
data is really not random.

David A. Scott
--
My Crypto code
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott19.zip
http://cryptography.org/cgi-bin/crypto.cgi/Misc/scott16.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

Sachin Garg

unread,
Jul 25, 2002, 7:49:01 AM7/25/02
to
Hi,

Such files are good to research upon. Can you please point me to a bigger
file (About 10 MB). (I already have the 3 MB 5000$ challenge file.)

Sachin Garg. [India]
http://sachingarg.cjb.net
http://www.geocities.com/schngrg


Terra

unread,
Jul 25, 2002, 10:54:26 AM7/25/02
to

"Gordon V Cormack" <gvco...@uwaterloo.ca> ha scritto nel messaggio
news:ahnpgd$io5$1...@speedy.uwaterloo.ca...

There is a bug in your program, the array d[] is too small. The correct size
should be 220000.

Gordon V Cormack

unread,
Jul 25, 2002, 12:03:16 PM7/25/02
to
In article <ahp3et$d3r$1...@serv1.iunet.it>,

Terra <talarico-...@infinito.it> wrote:
>
>"Gordon V Cormack" <gvco...@uwaterloo.ca> ha scritto nel messaggio
>
>There is a bug in your program, the array d[] is too small. The correct size
>should be 220000.
>

The first column of digits.txt is a sequence number, not
part of the random sequence.

My program assumes this column has been removed. I forgot
to mention that. Sorry.

Gordon V Cormack

unread,
Jul 25, 2002, 12:42:22 PM7/25/02
to
In article <UwJ%8.106814$_51.8...@rwcrnsc52.ops.asp.att.net>,
Mark Nelson <ma...@ieee.org> wrote:
>
>http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin

>
>Now, who's going to volunteer to compare Gordon's file with mine to see if
>we have the same bit stream, possibly reversed or o/w mangled?

The files are the same, modulo reversal of the byte order. Here's the
program that shows it.

--
#include <string.h>
#include <stdio.h>
#include <assert.h>

char me[10000000], mark[10000000];
int nme, nmark;

main(){
int i;
FILE *fme = fopen("digits.binary","r");
FILE *fmark = fopen("AMillionRandomDigits.bin","r");

nme = fread(me,1,10000000,fme);
nmark = fread(mark,1,10000000,fmark);

printf("File length: me %d Mark %d\n",nme,nmark);
assert(nme == nmark);

for (i=0;i<nme;i++) {
if (me[i] != mark[nme-i-1]) {
printf("not the same at position %d (%d): %02x (%02x)\n",i,nme-i-1,me[i],mark[nme-i-1]);
exit(1);
}
}

printf("Files are the same (bytes reversed)\n");

Mark Nelson

unread,
Jul 25, 2002, 10:03:25 PM7/25/02
to
>I know this isn't a fair comparison, but I can't resist Java bashing.

Yes, but the Java code is so esthetically pleasing!

And of course, Gordon, you are cheating a bit - your code won't read the
million digit file, looks like you applied a bit of preprocessing. So no
more chest thumping, okay?


--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|

"Gordon V Cormack" <gvco...@uwaterloo.ca> wrote in message
news:ahnpgd$io5$1...@speedy.uwaterloo.ca...

Mark Nelson

unread,
Jul 25, 2002, 10:08:27 PM7/25/02
to
>The files are the same, modulo reversal of the byte order.

Thanks for the confirmation, that's great!

--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|

"Gordon V Cormack" <gvco...@uwaterloo.ca> wrote in message

news:ahp9pe$mup$1...@speedy.uwaterloo.ca...

Mark Nelson

unread,
Jul 25, 2002, 10:08:49 PM7/25/02
to
>But none got it to the same size or smaller

The random binary file is once again the winner, and still undefeated
champion!

Only Jules has been brash enough to guarantee victory. Will his Namathesque
confidence lead to comeuppance or humiliation?

I'm still guessing that there's a compressor out there somewhere that can
squeeze a byte or two out of it.

--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|

"SCOTT19U.ZIP_GUY" <david_...@emailv.com> wrote in message
news:9255DE4DAH110W...@216.168.3.40...

Gordon V Cormack

unread,
Jul 25, 2002, 11:08:54 PM7/25/02
to
In article <N5209.646794$352.136279@sccrnsc02>,

Mark Nelson <ma...@ieee.org> wrote:
>>I know this isn't a fair comparison, but I can't resist Java bashing.
>
>Yes, but the Java code is so esthetically pleasing!

To each his own.

>And of course, Gordon, you are cheating a bit - your code won't read the
>million digit file, looks like you applied a bit of preprocessing. So no
>more chest thumping, okay?

Fair enough (it was an oversight on my part - when I saw your post I just
copied the file; I forgot I removed that column beforehand).

Frank Schoep

unread,
Jul 26, 2002, 4:04:11 AM7/26/02
to
Hi,

alright, I'm thinking of participating in this 'contest' as well, but now
that we have two (2) random files in reverse order, things are becoming a
bit more complex, now aren't they? Are we now supposed to write a compressor
that can compress *both* files, or would being able to compress one of them
be enough. Altough doing the impossible once is, well, erm, impossible but
doing it twice would make it even more impossible, right?

Sincerely,
Frank


"Mark Nelson" <ma...@ieee.org> schreef in bericht
news:va209.667340$cQ3.105906@sccrnsc01...

Gordon V Cormack

unread,
Jul 26, 2002, 6:58:39 AM7/26/02
to
You have to reproduce the original digits.txt. I gave you a binary version
of it for your convenience, that's all.

If you can reproduce it once you can reproduce it *n* times. It is the
same information.

In article <3d41015c$0$22324$fb62...@news1.zeelandnet.nl>,


Frank Schoep <fsc...@zeelandnet.nl> wrote:
>Hi,
>
>alright, I'm thinking of participating in this 'contest' as well, but now
>that we have two (2) random files in reverse order, things are becoming a
>bit more complex, now aren't they? Are we now supposed to write a compressor
>that can compress *both* files, or would being able to compress one of them
>be enough. Altough doing the impossible once is, well, erm, impossible but
>doing it twice would make it even more impossible, right?
>
>Sincerely,
>Frank

Marco Schmidt

unread,
Jul 26, 2002, 9:16:29 AM7/26/02
to
Gordon V Cormack wrote:

>I know this isn't a fair comparison, but I can't resist Java bashing.

When Mark posted his program I was curious whether I could improve it
(ran 42 minutes on my system). I couldn't find anything in that
BigInteger(String) constructor, though.

When you posted your version, I assumed you had a much better
algorithm than whatever BigInteger does and I ported it to Java
(syntax isn't that different) and ran both overnight after I saw that
decbin wasn't that fast either. Now your program took 3h 51m (gcc 2.91
/ cygwin), the Java port of it 3h 46m (JRE 1.4.0_01). I wonder what
your results were?! Not so much the five minutes that Java decbin was
faster but the factor ~ 5.5 between your program and Mark's?

>Here are the line/token/character counts for the two implementations:
>
> 20 48 369 decbin.c
> 20 106 651 marknelson.java

Ah, low token counts, the true sign of a good program... Actually, my
program is a bit smaller than Mark's, but can't quite reach yours.
Maybe the comparison was fairer if Java's runtime library named its
classes st, br, fop etc. ("C style") or if you used self-explaining
variables names - on the other hand, that would have increased the
character count, so it's probably out of the question! ;-)

Anyway, here's my port:
=== convert2.java
import java.io.*;

public class convert2 {
public static void
main(String[] args) throws Exception {
int[] d = new int[220000];
int n = 0;
StreamTokenizer st = new StreamTokenizer(new BufferedReader(new
InputStreamReader(System.in)));
int token;
while ((token = st.nextToken()) != StreamTokenizer.TT_EOF)
if (token == StreamTokenizer.TT_NUMBER)
d[n++] = (int)st.nval;
FileOutputStream out = new FileOutputStream(FileDescriptor.out);
int first = 0;
while (first < n) {
int r = 0;
for (int j=first;j<n;j++) {


d[j] += r*100000;
r = d[j]%256;
d[j] /= 256;

if (j==first && d[j] == 0) first++;
}
out.write(r);
}
}
}
===

Regards,
Marco

Sachin Garg

unread,
Jul 26, 2002, 4:25:35 AM7/26/02
to

"Mark Nelson" <ma...@ieee.org> wrote in message
news:Ra209.667343$cQ3.105713@sccrnsc01...

> I'm still guessing that there's a compressor out there somewhere that can
> squeeze a byte or two out of it.
>

Hi,

Yep, I have a slightly modified adaptive arithmetic coder that can.But I am
having trouble implementing a part of it. (Will take a few days). But I can
predict it will work.

It can compress the $5000 challenge file too, but can not compress enough to
cover up the bytes used by the decompressor. :-(

And gotta mention, it compresses only a byte or two. ;-)
(actually a few 10s)

Sachin Garg. [India]
http://sachingarg.cjb.net

http://go.to/sachingarg


Sachin Garg

unread,
Jul 26, 2002, 1:59:17 PM7/26/02
to
Hi,

Mark, your news client, whichever you are using, doesn't puts the ">" before
the lines when you reply. It makes it kinda hard to read your messages.

It will be good if you fix it.

Mark Nelson

unread,
Jul 26, 2002, 9:44:20 PM7/26/02
to
Interesting experiment, Marco! It sounds like the implementation of
BigInteger is probably pretty good.

I'm astonished that the gcc version and java port ran at nearly the same
speed! I would have thought that C/C++ had a huge advantage for something so
bound up in computation. I'd like to know what JVM you are running? Perhaps
this program would make a good benchmark...

--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|

"Marco Schmidt" <marcos...@geocities.com> wrote in message
news:fri2kus3jgdm0i0q8...@4ax.com...

Mark Nelson

unread,
Jul 26, 2002, 9:47:51 PM7/26/02
to
>Mark, your news client, whichever you are using, doesn't puts the ">"
before

I'm not sure what you're saying??? I usually manually quote the bare minimum
context, just like in this message, and leave the original unchanged below
the sig. As in this message. Which part don't you like, the quoted line
above or the intact message below?

After my bashing from Chris, I will be doing whatever it takes to please, so
just say the word!

--
|
| Mark Nelson - ma...@ieee.org - http://MarkNelson.us
| DataCompression.info - http://www.datacompression.info
|

"Sachin Garg" <sch...@yahoo.com> wrote in message
news:ahs2pf$i72$1...@newsreader.mailgate.org...

Sachin Garg

unread,
Jul 28, 2002, 6:38:05 AM7/28/02
to

"Mark Nelson" <ma...@ieee.org> wrote in message
news:8Zm09.657740$352.137867@sccrnsc02...

> I'm not sure what you're saying??? I usually manually quote the bare
minimum
> context, just like in this message, and leave the original unchanged below
> the sig. As in this message. Which part don't you like, the quoted line
> above or the intact message below?

The part below actually. But I didn't noticed that you were quoting the part
above. It's fine this way but it will be better if you have the > added
before every line of previous message.

Don't take it too seriously.

Marco Schmidt

unread,
Jul 28, 2002, 7:08:44 PM7/28/02
to
Mark Nelson wrote:

>Interesting experiment, Marco! It sounds like the implementation of
>BigInteger is probably pretty good.

I don't know enough about that subject, so I can't tell. Maybe there
are specialized algorithms for this task (long ASCII number => binary
representation) that are much better.

>I'm astonished that the gcc version and java port ran at nearly the same
>speed! I would have thought that C/C++ had a huge advantage for something so
>bound up in computation.

Actually, numerical computing is one of the things that Java is pretty
good at: <http://math.nist.gov/javanumerics/>,
<http://www.javagrande.org/>. Things like array access are still
relatively expensive with Java, but Sun's Hotspot JIT compiler creates
good native code for everything that gets executed often (like the
nested while and for loops in decbin). gcc in the latest version with
all speed optimizations on may be better, or maybe the Java native
compiler JET (with array checking off, and all other speed
optimizations on) on the Java version of decbin, but that's not what I
was really interested in, just the relative performance between
BigInteger's algorithm and decbin.

>I'd like to know what JVM you are running? Perhaps this program
>would make a good benchmark...

The Windows version of the JVM that comes with Sun's latest stable
JDK:
$ java -version
java version "1.4.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.0_01-b03)
Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)

Regards,
Marco

Phil Carmody

unread,
Jul 30, 2002, 2:13:40 AM7/30/02
to
Marco Schmidt wrote:
>
> Mark Nelson wrote:
>
> >Interesting experiment, Marco! It sounds like the implementation of
> >BigInteger is probably pretty good.
>
> I don't know enough about that subject, so I can't tell. Maybe there
> are specialized algorithms for this task (long ASCII number => binary
> representation) that are much better.
>
> >I'm astonished that the gcc version and java port ran at nearly the same
> >speed! I would have thought that C/C++ had a huge advantage for something so
> >bound up in computation.
>
> Actually, numerical computing is one of the things that Java is pretty
> good at: <http://math.nist.gov/javanumerics/>,
> <http://www.javagrande.org/>.

Unless Java has changed how it handls numeric computing in the last 5
years, I simply can't agree with you on that one.
See:
How Java's Floating-Point Hurts Everyone Everywhere
By Professor Kahan.

Phil

The 3 Wise Men US

unread,
Aug 4, 2002, 8:24:38 PM8/4/02
to
"Mark Nelson" <ma...@ieee.org> wrote in message news:<UwJ%8.106814$_51.8...@rwcrnsc52.ops.asp.att.net>...

> Gordon Cormack already posted a pointer to this, but out of personal
> interest I duplicated his work. I wrote a short Java program (got to love
> those Java libraries!) that converts the RAND Corp million digit text file
> to a 400K+ binary file, using the quite spiffy BigInteger class. I've posted
> the binary file temporarily at:
>
> http://datacompression.info/Miscellaneous/AMillionRandomDigits.bin
>
> The source, (appended to this message as well) is at:
>
> http://datacompression.info/Miscellaneous/Convert.java
>
> The bin file stymied every compressor I threw at it in a limited test. I'd
> love to find a program that can at least squeeze a few bytes out of it.

Why? If you find a general purpose compressor that can compress this
file by even just one byte you would have shown that either your claim
that the data is random was false or that it is possible to compress
random data and Jules Gilbert was right all along.

The 3 Wise Men US

unread,
Aug 4, 2002, 8:33:25 PM8/4/02
to
"Sachin Garg" <sch...@yahoo.com> wrote in message news:<ahs2kn$i3g$1...@newsreader.mailgate.org>...

> "Mark Nelson" <ma...@ieee.org> wrote in message
> news:Ra209.667343$cQ3.105713@sccrnsc01...
> > I'm still guessing that there's a compressor out there somewhere that can
> > squeeze a byte or two out of it.
> >
>
> Hi,
>
> Yep, I have a slightly modified adaptive arithmetic coder that can.But I am
> having trouble implementing a part of it. (Will take a few days). But I can
> predict it will work.
>
> It can compress the $5000 challenge file too, but can not compress enough to
> cover up the bytes used by the decompressor. :-(
>
> And gotta mention, it compresses only a byte or two. ;-)
> (actually a few 10s)

Either:
1. Your modifications are specific to the data files you are compressing.
If this is the case your claim is not very impressive as anyone can write
a specialist compressor that can compress any file in a set of up to
256 files to a single byte.
2. Your modified compressor is still a general purpose compressor,
implying:
2a. The data files are not random after all.
2b. You have found a general method of compressing random data.
This makes your claims equivalent to those of Jules Gilbert.
After all, the result of compressing a random file can't be
any more random so you can just put the compressed file back
into the compressor and you have your own perpetual compression
scheme. If you can't do this then you have to fall back to either
1 or 2a above.

Mark Nelson

unread,
Aug 5, 2002, 12:00:32 AM8/5/02
to
"The 3 Wise Men US" <the3wi...@yahoo.com> wrote:

> > The bin file stymied every compressor I threw at it in a limited test.
I'd
> > love to find a program that can at least squeeze a few bytes out of it.
>
> Why? If you find a general purpose compressor that can compress this
> file by even just one byte you would have shown that either your claim
> that the data is random was false or that it is possible to compress
> random data and Jules Gilbert was right all along.

Oh, not at all.

If an off-the-shelf compressor managed to knock 10 bytes off the 400K bin
file, it certainly wouldn't disprove anything. It would just be an
interesting note.

For Jules to be right requires that size of the decompressor plus the size
of any data be less than the size of the input file. That's a bit different.

| The Ultimate Compression Resource - http://www.datacompression.info
| Sponsored by Xceedsoft - Serious About Components
|
"The 3 Wise Men US" <the3wi...@yahoo.com> wrote in message
news:dd9a1641.02080...@posting.google.com...

The 3 Wise Men US

unread,
Aug 5, 2002, 5:58:57 AM8/5/02
to
"Mark Nelson" <ma...@ieee.org> wrote in message news:<ALm39.282973$uw.1...@rwcrnsc51.ops.asp.att.net>...

> "The 3 Wise Men US" <the3wi...@yahoo.com> wrote:
>
> > > The bin file stymied every compressor I threw at it in a limited test.
> I'd
> > > love to find a program that can at least squeeze a few bytes out of it.
> >
> > Why? If you find a general purpose compressor that can compress this
> > file by even just one byte you would have shown that either your claim
> > that the data is random was false or that it is possible to compress
> > random data and Jules Gilbert was right all along.
>
> Oh, not at all.
>
> If an off-the-shelf compressor managed to knock 10 bytes off the 400K bin
> file, it certainly wouldn't disprove anything. It would just be an
> interesting note.
>
> For Jules to be right requires that size of the decompressor plus the size
> of any data be less than the size of the input file. That's a bit different.

Off-the-shelf compressors compress by removing redunduncy from data.
Random data doesn't have any redundancy. If you don't include the size
of the decompressor, the only way to "compress" a random binary file
is to hide information specific to the file in the decompressor.

The only way an off-the-shelf compressor could compress the file
is if it included some sort of random data compression "Easter Egg"
that just happened to work for this file, e.g. it searches for a
particular 4 byte sequence in the data and if this occurs in the
first 65536 bytes, it stores the offset to this sequence in the
second and third bytes of the compressed file. The first byte is
a header that tells the decompressor whether the compressed file
is a normal compressed file or an Easter Egg file. If the 4 byte
sequence occurred early enough in your data file (1 in 65536 chance)
then this compressor could "compress" it by one byte.

How else could an off-the-shelf compressor compress this data if
it is truly random?

SCOTT19U.ZIP_GUY

unread,
Aug 5, 2002, 9:29:09 AM8/5/02
to
the3wi...@yahoo.com (The 3 Wise Men US) wrote in
<dd9a1641.02080...@posting.google.com>:

>Off-the-shelf compressors compress by removing redunduncy from data.
>Random data doesn't have any redundancy. If you don't include the size
>of the decompressor, the only way to "compress" a random binary file
>is to hide information specific to the file in the decompressor.
>

Actaully since a random file can be anything. I think its highly
unlikely you get zero redunduncy in the file. So assuming the
file has some redundancy the question is. Is there encugh to
actaully exploit.

>The only way an off-the-shelf compressor could compress the file
>is if it included some sort of random data compression "Easter Egg"
>that just happened to work for this file, e.g. it searches for a
>particular 4 byte sequence in the data and if this occurs in the
>first 65536 bytes, it stores the offset to this sequence in the
>second and third bytes of the compressed file. The first byte is
>a header that tells the decompressor whether the compressed file
>is a normal compressed file or an Easter Egg file. If the 4 byte
>sequence occurred early enough in your data file (1 in 65536 chance)
>then this compressor could "compress" it by one byte.
>

If the compressor is a general compressor which uses an assumed
static frequency distribution for a given symbol set. It has a
fair chance of compressing the file a byte or two. If its an
adaptive compression then the table information has to be stored
in the file which tends to make the output file longer so the chances
of compression go down.


The only real question is for long files is there one that has
"ZERO REDUNDANCY" and if it does so what. Since there are really
unlimited number of general compressors. That could use very
strange character sets. Eample the set A = 0 B = 10 C = 11 could
be used. In which case the file would have to have the right
distributions not to be able to be compressed with it.
Take a long file that is a multiply of 256 and thats use the
normal each symbol is one of the possible byte values. Suppose
there is ZERO redunacry in this file at every order tested using
the 256 symbol set. Here is a general compressor that compresses
it. Leave the file unchanged except don't write the last byte.
On uncompression it counts the occurranace of each symbol. At
end of file if the occurances are off by just one in one of the
symbols you write out that byte. So it would compress every file
of that length that had the required zero rendunacy for that
symbol set. AS to the rest of files who cares they can all expand
it doesn't have to be bijective.

0 new messages