This is a complete rewrite of PAQ6. It differs primarily in that it
replaces the gradient descent model mixer with a neural network, which
can be accelerated using MMX assembler (thus the better speed). For
non x86-32 machines or if you don't have NASM you can compile with
-DNOASM (1/3 slower). I tested it under WIndows, Linux and Sparc
Solaris for archive compatibility.
I will let Werner test on the maximumcompression.com corpus but in my
own tests it takes first place on ohs.doc (due to a large embedded
jpeg, which Stuffit missed), and english.dic, and second place on a
couple other files.
I don't know how Stuffit models jpeg (I haven't seen their patent) but
what I did was partially decode the image back to the DCT coefficients
to provide context for the Huffman coded data.
I plan to add more models to PAQ8 but I wanted to get something
released this year.
-- Matt Mahoney
Some results on the Calgary corpus. There are 5 memory level settings.
They all run about the same speed. Tested on a 2.2 GHz Athlon-64
3500+ with 1 GB RAM (in 32 bit mode under WinXP). This is a solid
archive.
paq7 -1 (lowest memory setting, about 56 MB)
111261 BIB: -> 21592
768771 BOOK1: -> 199430
610856 BOOK2: -> 123431
102400 GEO: -> 44832
377109 NEWS: -> 89025
21504 OBJ1: -> 7799
246814 OBJ2: -> 49829
53161 PAPER1: -> 11331
82199 PAPER2: -> 17578
513216 PIC: -> 23303
39611 PROGC: -> 8788
71646 PROGL: -> 10236
49379 PROGP: -> 7294
93695 TRANS: -> 11263
3141622 -> 625924 (1.5939 bpc) in 172.70 sec (18.191 KB/sec)
Time 172.70 sec, memory 56440419 bytes
paq7 -5 (highest memory setting, about 500 MB)
111261 BIB: -> 21493
768771 BOOK1: -> 194933
610856 BOOK2: -> 120373
102400 GEO: -> 44561
377109 NEWS: -> 86395
21504 OBJ1: -> 7755
246814 OBJ2: -> 48216
53161 PAPER1: -> 10809
82199 PAPER2: -> 16812
513216 PIC: -> 23201
39611 PROGC: -> 8628
71646 PROGL: -> 10189
49379 PROGP: -> 7299
93695 TRANS: -> 10827
3141622 -> 611684 (1.5576 bpc) in 177.66 sec (17.684 KB/sec)
Time 177.67 sec, memory 525842019 bytes
paq7 -3 (default setting, 150 MB) on a 750 MHz Duron (192 MB memory)
under WinMe:
111261 BIB: -> 21500
768771 BOOK1: -> 195555
610856 BOOK2: -> 120863
102400 GEO: -> 44643
377109 NEWS: -> 86932
21504 OBJ1: -> 7744
246814 OBJ2: -> 48670
53161 PAPER1: -> 10905
82199 PAPER2: -> 16970
513216 PIC: -> 23229
39611 PROGC: -> 8652
71646 PROGL: -> 10172
49379 PROGP: -> 7272
93695 TRANS: -> 10909
3141622 -> 614209 (1.5641 bpc) in 710.08 sec (4.424 KB/sec)
Time 710.13 sec, memory 150320739 bytes
-- Matt Mahoney
The maximum compression site has just been updated. Not only PAQ7 was
added, but also the previous #1 and #2 listed programs (WinRK and
PAsQDa) are updated. Don't forget to have a look at the DOC test :)
On 'best overall compression program' WinRK is ranked 1st, PAQ7 2nd and
PAsQDa 3th. On the 'real life' multiple files test PAQ7 doesn't show
it's full potential yet, but this will change when the exe,wav,txt
models are in place...