MONSTER OF COMPRESSION - New Benchmark -

66 views
Skip to first unread message

Raymond_NGhM

unread,
May 3, 2008, 12:26:00 AM5/3/08
to encode_ru_f...@googlegroups.com


Hi, Nania

Try/add switch 0 for CCM(x)
then compare (De)Compression speed & ratio with previous
switch, (5)

It no more affect on encoded files, (JPG, PNG, MP3...)

joey

unread,
May 4, 2008, 7:21:00 AM5/4/08
to encode_ru_f...@googlegroups.com


quoting:
MONSTER OF COMPRESSION 2007 Closed
Winner:
CCMX by Christian Martelock (size)
SLUG 1.23 by Christian Martelock (efficiency)
LZRW1 by Ross Williams (Compression Time)
QUICKLZ 1.40 beta 5 by Lasse Mikkel Reinhold (Decompression Time)
MONSTER OF COMPRESSION 2008 Incoming...

1. very good work - thank you nania
2. it is wonderfull to have the old LZRW-algorithms back in contest,
but i wonder from where are the binary versions
can you give us a link please?

i have never seen the lzrw on
www.maximumcompression.com

it compress faster than LZO ?

maybe we can create a LZRW-version, which compress
a complete directory inclusiv subdirectories ?

Black_Fox

unread,
May 4, 2008, 7:28:00 AM5/4/08
to encode_ru_f...@googlegroups.com


Those can be found at Matt Mahoney's website

joey

unread,
May 4, 2008, 8:01:00 AM5/4/08
to encode_ru_f...@googlegroups.com


Thank you Black_Fox
---
public domain (open source) memory to memory compressors
by Ross Williams in 1991
The programs were implemented as file compressors
by Matt Mahoney on Feb. 14, 2008
---
i will test it!

Bulat Ziganshin

unread,
May 4, 2008, 8:05:00 AM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: joey
it compress faster than LZO ?

i think that there are almost no difference between lzrw1, lzo -1, quick -1, tornado -1 - they all use lzrw1 scheme: lzss coding and direct-hashing matchfinder

joey

unread,
May 4, 2008, 1:12:00 PM5/4/08
to encode_ru_f...@googlegroups.com


the Winner in "Compression Time" MOC 2007 Closed

is: LZRW1 by Ross Williams (wonderfull work by Matt Mahoney)

and not lzo or quicklz or tornado

because this my ask

but lzo and quicklz support recursive directory-structure
and lzrw1 does not for now (unfortunately)

Bulat Ziganshin

unread,
May 4, 2008, 1:49:00 PM5/4/08
to encode_ru_f...@googlegroups.com


i think that difference is very small and HDD times should be much more important

i personally prefer slug 1.1 because it seems to be fastest lzh compressor (i.e. it compress much better that lzss ones). thor, OTOH, has the best i/o system i've seen

i suggest you to make your own little tests (if you have any practical goals)

joey

unread,
May 4, 2008, 2:20:00 PM5/4/08
to encode_ru_f...@googlegroups.com


thank you very much - Bulat

my goal is to use it as a very fast compressor for backup-purposes
thats why i need support for a directory-structure
maybe
one directory with all its subdirectories

Bulat Ziganshin

unread,
May 4, 2008, 2:54:00 PM5/4/08
to encode_ru_f...@googlegroups.com


depending on your computer and avg size of files, the best compressor may be different. it seems that Nania counted the pure cpu time which is several times smaller than i/o time for such type of compressors

you may process many files using "for" command. are you need file-to-file compressor? may be fast archiver will be preferable?

Christian

unread,
May 4, 2008, 3:11:00 PM5/4/08
to encode_ru_f...@googlegroups.com


I did an interesting test using most of the high-speed compressors - I included 6pack, too. The test file is a big TAR-file (1.168.125.952 bytes). Its contents are:

-OpenOffice2 
-Seamonkey
-Enwik8
-Gimp2
-SFC
-Abiword
-many files from UCLC
-...


The test was repeated 4 times for each compressor. The results include process and wall times. The test is performed on one hard disk with 16M cache. My system has 2G memory and is clocked at 3.4GHz.

As expected Thor e4 is best ratio-wise. A close second place goes to Slug 1.26 which is roughly 2.5 times faster than Thor e4. Thor e3 comes in third just in front of Tor -3. The other results are not very interesting and crowning the fastest compressor is odd because I did not include Quick1, Tor -1 and the likes.

Now, looking at the results, one can make an interesting observation. Assuming that Quick2's and Slug 11b's IO code is ok, the IO/walltime barrier must be at around 10s. Only Quick2, Slug 11b, Slug 126b and Thor e2/e3 come close to this limit. All other compressors seem to be lacking somewhere.
Let's use Filemon to take a closer look. Slug 126b reads/writes 64K blocks and Thor reads/writes 32K blocks. So, all of the IO of both tools is done on sector boundaries. Quick2 reads 32K blocks and writes blocks up to 32K. Slug 11b reads 128K blocks and writes up to 128K. Quick2 and Slug 11b write some status bytes, too.
Now, what do the other tools with their IO? 6pack reads 128K and outputs smaller chunks. 6pack_opt seems to be reading 64K chunks. LZOP reads 256K chunks and writes blocks up to 256K. Tor reads 256K chunks most of the time and writes really big blocks.

It appears, that reading blocks bigger than 128K degrades wall time a lot - e.g. LZOP and Tor are both 2x slower while having a damn fast processing time. The write-cache on my system works fine - it seems that writing strategies do not influence the results much. 6pack's IO strategy seems to be alright, oddly it looses a lot of time somewhere else (I checked its results thrice). For reasons of simplicity I took threading and different file-APIs out of the examination. And finally, here are the results:

------------------------- 
6PACK -1
-------------------------
User Time = 7.531
Global Time = 31.406

User Time = 6.750
Global Time = 43.031

User Time = 7.093
Global Time = 42.391

User Time = 7.312
Global Time = 43.000

-> 574.618.151
-------------------------
6PACK_OPT -1
-------------------------
User Time = 6.453
Global Time = 21.781

User Time = 7.625
Global Time = 36.032

User Time = 7.140
Global Time = 41.688

User Time = 7.296
Global Time = 41.844

-> 577.201.364
-------------------------
LZOP -1
-------------------------
User Time = 9.281
Global Time = 22.000

User Time = 9.296
Global Time = 20.703

User Time = 9.203
Global Time = 20.812

User Time = 9.296
Global Time = 22.344

-> 534.098.618
-------------------------
QUICK2
-------------------------
User Time = 9.812
Global Time = 18.281

User Time = 8.437
Global Time = 11.063

User Time = 8.453
Global Time = 10.406

User Time = 8.312
Global Time = 10.297

-> 484.663.774
-------------------------
SLUG11b
-------------------------
User Time = 6.218
Global Time = 15.406

User Time = 6.140
Global Time = 15.890

User Time = 6.000
Global Time = 11.813

User Time = 5.750
Global Time = 10.641

-> 482.258.314
-------------------------
SLUG126b
-------------------------
User Time = 11.390
Global Time = 13.453

User Time = 11.453
Global Time = 13.031

User Time = 11.718
Global Time = 13.234

User Time = 11.468
Global Time = 13.219

-> 377.148.927
-------------------------
THOR E2
-------------------------
User Time = 10.062
Global Time = 19.500

User Time = 9.593
Global Time = 12.453

User Time = 9.140
Global Time = 10.953

User Time = 9.406
Global Time = 10.938

-> 460.255.840
-------------------------
THOR E3
-------------------------
User Time = 12.250
Global Time = 15.156

User Time = 12.140
Global Time = 14.031

User Time = 12.437
Global Time = 14.000

User Time = 12.265
Global Time = 13.782

-> 412.496.116
-------------------------
THOR E4
-------------------------
User Time = 32.171
Global Time = 34.594

User Time = 32.078
Global Time = 33.421

User Time = 32.125
Global Time = 33.360

User Time = 32.109
Global Time = 33.328

-> 359.995.412
-------------------------
TOR -3
-------------------------
User Time = 13.156
Global Time = 21.797

User Time = 13.562
Global Time = 23.297

User Time = 13.109
Global Time = 21.344

User Time = 13.187
Global Time = 21.172

-> 415.575.354


Bulat Ziganshin

unread,
May 4, 2008, 3:39:00 PM5/4/08
to encode_ru_f...@googlegroups.com


are you not overlap compression and i/o?

on my box, thor shows the most efficient i/o (i.e. global time minus user time). afair, tests on Sportsman site shows the same results

once i've tested various block sizes (for pure copying to nul program) and discovered that 256kb is ideal on my box. may be results are different on other boxes or read+compress+write have other picture

i write in large blocks because otherwise too many disk seeks occur - i.e. disk heads are going from read to write position and back. my HDD has only 2mb cache which may be reason of such bad behavior

fastest overall mode depends, in the first place, on the speed of your cpu. although c2d 6400 is rather common denominator these days

you may also try 4x4 and lzturbo which uses overlapped i/o. well, afaik, it's the only fast compressors that supports multi-threading, but i will be really interested to know which fast compressors supports advanced i/o (m/m files, background i/o threads, async i/o calls)

Bulat Ziganshin

unread,
May 4, 2008, 3:56:00 PM5/4/08
to encode_ru_f...@googlegroups.com


i've compiled tornado 0.5alpha with 128k and 256k input blocks. can you try them to compare?

http://www.haskell.org/bz/tor05a.zip

Bulat Ziganshin

unread,
May 4, 2008, 4:00:00 PM5/4/08
to encode_ru_f...@googlegroups.com


one more important question - are your i/o buffers aligned to some boundary? (c library tends to alloc blocks with $xxx20 address)

and - are you use only one output buffer? or there are 2+ output buffers used in the round-robin way

i ask you because, with some restriction, i/o may be performed without memcpy, just by memory mapping data already read into address space of input buffer, and output may be performed directly from outout buffer without copying data into OS buffers

Christian

unread,
May 4, 2008, 4:23:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Sorry for still using the old forum, but I didn't receive a activation mail yet.

Quoting: Bulat Ziganshin
on my box, thor shows the most efficient i/o (i.e. global time minus user time). afair, tests on Sportsman site shows the same results

Sadly, Metacompressor is down. But I'm confident that latest Slug has similar efficiency (global-user). Still, on Werner's test I 'fear' results won't change too much - I think Slug is too hard on his Athlon-XP's small 512K L2-cache.

Quoting: Bulat Ziganshin
but i will be really interested to know which fast compressors supports advanced i/o (m/m files, background i/o threads, async i/o calls)


Quoting: Bulat Ziganshin
fastest overall mode depends, in the first place, on the speed of your cpu. although c2d 6400 is rather common denominator these days

I think in my case the fastest compressors are screaming for more IO (C...@3.4GHz). Compression to nul is a lot faster, but useless.

Here is a nice comparison of the different modes (even using compression). For HDDs simple IO is almost as fast as async I/O. Memory mapped does not perform well. Well, if you check it out yourself then please post some results.

joey

unread,
May 4, 2008, 4:23:00 PM5/4/08
to encode_ru_f...@googlegroups.com


@christian

thank you

very interesting analysis of the buffer size

you wrote:

6PACK -1 , 6pack reads 128K -> 574.618.151
6PACK_OPT -1 , 6pack_opt seems to be reading 64K -> 577.201.364

6PACK_OPT is faster but worse in compression?

this means lower buffer
- speeds up but lower the compression ratio?

i have done too several tests (other program) with buffersize

and the surprisingly result for me was:
the speed/buffersize-relation seems to depend on hardware
this means
on system A buffersize A was resulting the shortest time
on system B buffersize B was resulting the shortest time
on system C buffersize C was resulting the shortest time

may be the difference between the systems was too big...

i am not the expert on compression like you, but

what about to make it testable for us ?

would it be possible to have here a commandline parameter ?

like -buffer=xxxxx
where
xxxxx = [16K,32K,64K,128K,256K,512K,1024K,2048K

then every user can do its own test runs
and we can compare the results

and would it be possible (in a further version)
to add a directory-support? - may be only a dream?

Christian

unread,
May 4, 2008, 4:25:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
i've compiled tornado 0.5alpha with 128k and 256k input blocks. can you try them to compare?

I'll check it out and post results later.

Bulat Ziganshin

unread,
May 4, 2008, 4:53:00 PM5/4/08
to encode_ru_f...@googlegroups.com


for my box, 256k still much faster:
>timer tor-128k.exe -2 dll700.dll 

Kernel Time = 6.289 = 00:00:06.289 = 9%
User Time = 32.767 = 00:00:32.767 = 50%
Process Time = 39.056 = 00:00:39.056 = 59%
Global Time = 65.204 = 00:01:05.204 = 100%


>timer tor-256k.exe -2 dll700.dll

Kernel Time = 5.848 = 00:00:05.848 = 9%
User Time = 32.526 = 00:00:32.526 = 52%
Process Time = 38.375 = 00:00:38.375 = 62%
Global Time = 61.559 = 00:01:01.559 = 100%



i think that this depends on HDD firmware

Quoting: Christian
I didn't receive a activation mail yet.

nor me

Quoting: Christian
Memory mapped does not perform well.

for me too, i've tested it also (actually, once i've wrote rather universal i/o library)

Quoting: Christian
For HDDs simple IO is almost as fast as async I/O

well, i've compared it to tornado, may be it's really just not optimized. i will experiment with slug if you claim that it uses simple i/o

Christian

unread,
May 4, 2008, 5:03:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Here are the results:

------------------ 
tor-128k -3
------------------
User Time = 12.343
Global Time = 21.938

User Time = 12.125
Global Time = 20.828

User Time = 12.750
Global Time = 24.250

User Time = 12.187
Global Time = 22.437

-> 425.802.874
------------------
tor-256k -3
------------------
User Time = 12.109
Global Time = 22.094

User Time = 12.140
Global Time = 21.844

User Time = 12.265
Global Time = 23.750

User Time = 12.218
Global Time = 22.031

-> 425.802.874



Hmmm, maybe Filemon doesn't work correctly, but it seems that tor-128k is reading 128k blocks the first 8M only. After that, it always reads 4k followed by 124k. It still outputs 8M blocks. It's the same with tor-256k - 256k on the first 8M, than 4k + 252k. Maybe this helps. Additionally, maybe the 8M are hurting, too.

Black_Fox

unread,
May 4, 2008, 5:12:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Christian
I didn't receive a activation mail yet
I got it instantly, it only landed into spam folder (gmail).

Christian

unread,
May 4, 2008, 5:14:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: joey
you wrote:

6PACK -1 , 6pack reads 128K -> 574.618.151
6PACK_OPT -1 , 6pack_opt seems to be reading 64K -> 577.201.364

6PACK_OPT is faster but worse in compression?


It's compression is slightly worse and speed is really close. But compression speed (process time) might be better on systems with smaller cache. Maybe LovePimple can shed some light on the improvements.

Quoting: joey
what about to make it testable for us ?

would it be possible to have here a commandline parameter ?

If you look at my tools you'll notice that I don't like to many commandline parameters. Maybe I'll add a switch, but please don't count on it. Additionally, I chose 64k because many very known tools use this size, too.

Quoting: joey
and would it be possible (in a further version)
to add a directory-support? - may be only a dream?

No plans for that, sorry. I'd really like to, but I don't have enough time. And the little spare time I have I spend with my girlfriend/friends/hobbies and sometimes developing compression algorithms.

Christian

unread,
May 4, 2008, 5:29:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
i will experiment with slug if you claim that it uses simple i/o

I even don't know if Slug's IO works good on other systems (except mine and my girlfriends laptop). But yes, it uses simple IO (e.g. fread and fwrite). The idea is to let the OS do all the async stuff. If we read/write only small blocks the OS will do the async work with its read-ahead and cache-behind systems. We just have to make it easy for the OS to guess - so, Slug always reads/writes exactly 64K. FYI, Thor always reads/writes exactly 32K. Maybe this is all bogus, but explorer, copy and some other tools work like this, too. Just use Filemon and look around a little bit.

Bulat Ziganshin

unread,
May 4, 2008, 5:59:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Christian
Hmmm, maybe Filemon doesn't work correctly, but it seems that tor-128k is reading 128k blocks the first 8M only. After that, it always reads 4k followed by 124k. It still outputs 8M blocks. It's the same with tor-256k - 256k on the first 8M, than 4k + 252k. Maybe this helps. Additionally, maybe the 8M are hurting, too.


thanks, i'll check it

Quoting: Christian
Maybe this is all bogus, but explorer, copy and some other tools work like this, too. Just use Filemon and look around a little bit.

well, i mean that i thought that thor at least used some b/g thread or async calls, even with the same 32k blocks - i think it's hard to check this with Filemon?

Christian

unread,
May 4, 2008, 6:06:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
well, i mean that i thought that thor at least used some b/g thread or async calls, even with the same 32k blocks - i think it's hard to check this with Filemon?


Yes, you can not check this with Filemon. But Process Explorer shows only one thread.

I'm off to bed. Goodnight everyone!

Bulat Ziganshin

unread,
May 4, 2008, 6:23:00 PM5/4/08
to encode_ru_f...@googlegroups.com


Quoting: Christian
But Process Explorer shows only one thread.

there is also async i/o :)

Christian

unread,
May 5, 2008, 3:28:00 AM5/5/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
there is also async i/o

Where do you find this in PE?

Btw., my own tests showed that overlapped IO is not worth it when reading/writing small blocks (e.g. 32k, 64k). The OS does the same thing anyway - at least when you're not using things like "FILE_FLAG_NO_BUFFERING". Well, you might loose a little bit of memory bandwidth from the in-memory-copying, but it does not make any difference here.

Bulat Ziganshin

unread,
May 5, 2008, 3:42:00 AM5/5/08
to encode_ru_f...@googlegroups.com


Quoting: Christian
Where do you find this in PE?

i mean that such apis exist and they may be used even in 1-threaded program. but if this is can be checked by analyzing executable, it's great

Quoting: Christian
Well, you might loose a little bit of memory bandwidth from the in-memory-copying, but it does not make any difference here.

this depends. memcpy is 230mb/sec and tor -1 is >50mb/sec on my box

Christian

unread,
May 5, 2008, 3:55:00 AM5/5/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
i mean that such apis exist and they may be used even in 1-threaded program. but if this is can be checked by analyzing executable, it's great

I don't know a way to check if a program uses overlapped IO. But I can tell that Thor does not use threads and reads/writes 32k all of the time. Further, I checked overlapped IO and it didn't make a difference while being more 'complex'.

Quoting: Bulat Ziganshin
this depends. memcpy is 230mb/sec and tor -1 is >50mb/sec on my box

Right, but L2-cache is coming into play here. And cached copying is much much faster.

Nonetheless, imo, the 'best' solution is to use threading. It's platform independent and you don't have to rely on the OS. But since simple IO works surprisinly good, I prefer it - well, because it's simple.

Christian

unread,
May 5, 2008, 4:04:00 AM5/5/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
but if this is can be checked by analyzing executable, it's great

Actually you can use a disassembler and check the fopen/CreateFile/whatever calls and look at the used flags. But I distaste such practice strongly.
Btw., I'm hoping that Metacompressor goes online again. I'd love to have some more detailed results for Slug. Testing on my system all the time is useless - because Slug was tailored to work good on it.

Reply all
Reply to author
Forward
0 new messages