Re: compressed enwik8 of 15,932,968 bytes into a file of 15,898,981 bytes

260 views
Skip to first unread message

Matt Mahoney

unread,
Dec 28, 2010, 12:09:04 PM12/28/10
to BOCUT Adrian SEbastian, hutter...@googlegroups.com, mmah...@cs.fit.edu, marcus...@anu.edu.au, jabo...@gmail.com, cili...@gmail.com, macarie...@ulbsibiu.ro, nicole...@yahoo.com
Do let us know when your decompresser is working. Remember that the total size of the compressed file and the decompression executable file must be 3% smaller than the current record in order to be eligible for an award.

Also it would be helpful to post your source code as a text file instead of pasting it into your blog which messes up the formatting.

Also, it might be helpful to review http://mattmahoney.net/dc/dce.html#Section_11 because to me it looks like you are using this approach. Of course you are free to use any approach you wish.

 
-- Matt Mahoney, matma...@yahoo.com


From: BOCUT Adrian SEbastian <bocutadria...@yahoo.com>
To: hutter...@googlegroups.com; matma...@yahoo.com; mmah...@cs.fit.edu; marcus...@anu.edu.au; jabo...@gmail.com; cili...@gmail.com; macarie...@ulbsibiu.ro
Cc: nicole...@yahoo.com
Sent: Mon, December 27, 2010 5:52:52 PM
Subject: compressed enwik8 of 15,932,968 bytes into a file of 15,898,981 bytes

Hello,
here can be found a C language code for compressing - the compressed enwik8 file into a smaller one.
http://bocut-appl.110mb.com/index.php?p=1_9_enwik8

http://bocut-new-compress-algos.blogspot.com/2010/12/enwik8.html

The method uses the Calkin-Wilf tree.
Here is a very brief resume of the method:
If this method can be applied, than the fraction with a bit mark is written in the compressed file.
Otherwise the bits string is transformed with XOR rule - each 2 consecutive bits are taken in order to obtain form a string with many possibilities of 0101010101010... an more uniform string - something like the delta transform from the RLE method.
On this string is applied again the method: write the CW fraction and in case the remaining bits from the string, with some bits marks also.
http://bocut-new-compress-algos.blogspot.com/search/label/Calkin%20Wilf%20Compress

The beta version of this method can be found on the following links and can be applied to exe or binary files:
http://bocut-new-compress-algos.blogspot.com/search/label/Embedded%20Compress
http://bocut-appl.110mb.com/index.php?p=1_8_Embedded-Compress

In my opinion the CW tree is - still all these - a promising idea for a better compression.

Thank you very much a attention and patience for reading,

PS> As soon as possible I will send you also the decompressed file for this.
       
All the best,
regards,
Adrian,
  


 

Reply all
Reply to author
Forward
0 new messages