balz v1.04 is here!

encode

unread,

Apr 25, 2008, 11:37:00 AM4/25/08

to encode_ru_f...@googlegroups.com

OK, a very special version of BALZ is here!

Briefly what's new:
+ Enlarged window size to 4 MB, block size to 32 MB
+ Improved match finder
+ Improved parsing. Default "e" mode uses greedy parsing. An optimized "ex" mode uses an advanced lazy matching with two byte lookahead. During parsing, encoder checks some additional conditions like is current offset in Rep() state, is current offset good enough, etc.

Enjoy!

;)

http://encode.ru/balz/index.htm

Nania Francesco Antonio

unread,

Apr 25, 2008, 12:26:00 PM4/25/08

to encode_ru_f...@googlegroups.com

SFC Test
option [ex]
13.077.423 B comp. 169,703 s. dec. 2,687 s.
option [e]
13.278.748 B comp. 68,060 s. dec. 2,705 s.

LovePimple

unread,

Apr 25, 2008, 6:13:00 PM4/25/08

to encode_ru_f...@googlegroups.com

Thanks Ilia! :)

Mirror: Download

Nania Francesco Antonio

unread,

Apr 26, 2008, 12:21:00 PM4/26/08

to encode_ru_f...@googlegroups.com

I/you/they have remained really interested Ilia by this BALZ - LZ77 compressor. Go down under the 100.000.000 Bs in Maximum compression for MFC test is not what from not too long! I am curious to know if you/he/she has inserted a pre-filter delta for BMP, TIFF, WAVE etc.?

encode

unread,

Apr 26, 2008, 12:28:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Nope, BALZ has exactly the same E8/E9 transformer as LZPM (an improved QUAD's one) and only. Just BALZ is much stronger on binary data, thanks to LZ77! I'm hoping that new BALZ v1.04 will have MUCH higher compression on ALL test sets, including MFC, Squeeze Chart, Black_Fox's and of course yours! ;)

LovePimple

unread,

Apr 26, 2008, 12:41:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Quick test...

BALZ [e]

A10.jpg > 843,382
AcroRd32.exe > 1,473,688
english.dic > 872,448
FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261
world95.txt > 233,472

Total = 12,650,532 bytes

ENWIK8 > 30,279,021 bytes

Elapsed Time: 00:45:14.517 (2714.517 Seconds)

BALZ [ex]

A10.jpg > 843,382
AcroRd32.exe > 1,449,276
english.dic > 962,560
FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981

Total = 12,946,345 bytes

ENWIK8 > 29,230,841 bytes

Elapsed Time: 02:04:27.840 (7467.840 Seconds)

encode

unread,

Apr 26, 2008, 12:43:00 PM4/26/08

to encode_ru_f...@googlegroups.com

LovePimple
Re-check the results, there is something wrong!

LovePimple

unread,

Apr 26, 2008, 12:50:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Results are correct for my machine. It seems that BALZ still fails to work correctly on my old P3 @750MHz machine.

encode

unread,

Apr 26, 2008, 1:34:00 PM4/26/08

to encode_ru_f...@googlegroups.com

LovePimple

unread,

Apr 26, 2008, 2:09:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Here are the results from the same test on my AMD Semprom 2400+ machine...

BALZ [e]

A10.jpg > 843,382
AcroRd32.exe > 1,473,688

english.dic > 1,095,449

FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261

world95.txt > 638,687

Total = 13,278,748 bytes

BALZ [ex]

A10.jpg > 843,382
AcroRd32.exe > 1,449,276

english.dic > 1,093,638

FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981

Total = 13,077,423 bytes

Compression of ENWIK8 is far too slow to keep retesting.

EDIT: Here are the results for the fastest [e] setting...

ENWIK8 > 30,279,021 bytes

Elapsed Time: 00:42:10.449 (2530.449 Seconds)

encode

unread,

Apr 26, 2008, 2:10:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Thank you!

LovePimple

unread,

Apr 26, 2008, 3:19:00 PM4/26/08

to encode_ru_f...@googlegroups.com

We had this problem before.

http://encode.ru/forums/index.php?action=vthread&forum=1&topic=649&page=0

encode

unread,

Apr 26, 2008, 3:29:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Quoting: LovePimple

We had this problem before.

Yep! It's a compiler-related problem... Anyway, BALZ is for modern PCs. P3 is for museums. For example, some time ago I get myself to the PC center to get a RAM for my sampler, the RAM type is equal to an old laptops type. Sellers said that such RAM type is from P3 era and should be placed at museum, finally I've found one chip and purchase it at very high price - because it's a museum-like, very rare RAM chip...
I just don't know what's wrong... ;)

As always, you may play with Visual Studio compile:
balz104cl.zip

encode

unread,

Apr 26, 2008, 3:43:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Some testing results:

textures.tar (Textures from the Doom 3 game, 604,218,368 bytes)

PKZIP 2.50, -exx: 233,240,852 bytes
TOR 0.4, -5: 216,888,608 bytes
TOR 0.4, -11: 210,794,900 bytes
CABARC 1.00, -m LZX:21: 193,234,553 bytes
LZPM 0.15, 1: 187,609,193 bytes
LZPM 0.15, 9: 185,038,816 bytes
BALZ 1.04, e: 184,031,123 bytes
BALZ 1.04, ex: 183,003,496 bytes

LovePimple

unread,

Apr 26, 2008, 3:50:00 PM4/26/08

to encode_ru_f...@googlegroups.com

Quoting: encode

P3 is for museums.

I don't agree. Just because something (or someone) is old, we should not dismiss them as "museum" pieces. ;)

Quoting: encode

As always, you may play with Visual Studio compile:

Thank You! :)

LovePimple

unread,

Apr 27, 2008, 12:15:00 PM4/27/08

to encode_ru_f...@googlegroups.com

BALZ 1.04 is now added to SFC and MFC tests. :)

http://www.maximumcompression.com/

encode

unread,

Apr 27, 2008, 3:10:00 PM4/27/08

to encode_ru_f...@googlegroups.com

However, new version has lower compression even with 4 MB dictionary. Note, BALZ v1.03 has MINMATCH=3, BALZ v1.04 has MINMATCH=4. Newer version looks like loose too many short matches. Maybe BALZ v1.05 will have 1 MB dictionary, MINMATCH=3, and improved LZ-output coding. :)

Bulat Ziganshin

unread,

Apr 27, 2008, 3:52:00 PM4/27/08

to encode_ru_f...@googlegroups.com

in order to make compression fast and good, you need to use separate hash tables for short strings. say, tornado uses 3 separate tables: for 2-byte, 3-byte and 4+-byte strings. first two tables are rather large and addressed directly, without chains. lzma 4.43 used the same scheme and current versions uses separate table for 4-byte strings and last table only for 5+-byte strings. the same is true for rar. note that size of table should be much larger than max. distance for this type of matches. say, lzma uses one million entries for searchoing 4-byte strings while distances are probably limited to something about 50-200kb

encode

unread,

Apr 27, 2008, 3:57:00 PM4/27/08

to encode_ru_f...@googlegroups.com

Yep, I will try to implement such multi-level hashing in BALZ. :)

encode

unread,

Apr 28, 2008, 9:34:00 AM4/28/08

to encode_ru_f...@googlegroups.com

Just carefully tested such thing with BALZ. Well, it's works! However, I'll not hurry to add it.

 
int pos=head[HSIZE+gethash3(i)]; 
if (pos) { 
  // search for short string 
} 
pos=head[gethash4(i)]; 
while (pos) { 
  // do a hash chained search 
  pos=prev[pos]; 
} 
// ... 
head[HSIZE+gethash3(i)]=i; 
int h=gethash4(i) 
prev[i]=head[h]; 
head[h]=i; 
// ...

Even with large HSIZE match finder finds not all short strings. At the same time such thing may do slightly deeper search - since we limit a hash chain length to 8192, and 4 byte hash is better than 3 byte. Will do more tests... :)

encode

unread,

Apr 28, 2008, 10:49:00 AM4/28/08

to encode_ru_f...@googlegroups.com

Another cool idea, which works, is to match MINMATCH from Rep() (recent offset) only. In this case we may encode MINMATCH WITHOUT offset, also MINMATCH freely can be even 2.

Bulat Ziganshin

unread,

Apr 28, 2008, 1:05:00 PM4/28/08

to encode_ru_f...@googlegroups.com

Quoting: encode

Even with large HSIZE match finder finds not all short strings.

HSIZE is size of 4-byte hash here :)

as i said before, with 3-byte strings whose offsets are limited to 4096, lzma used 64k entries

btw, Kadach wrote that it's better to check repdists first - before performing search in hashtables. look at lzma for implementation details

encode

unread,

Apr 28, 2008, 1:09:00 PM4/28/08

to encode_ru_f...@googlegroups.com

Quoting: Bulat Ziganshin

HSIZE is size of 4-byte hash here

...and of 3-byte hash as well... ;)

Quoting: Bulat Ziganshin

btw, Kadach wrote that it's better to check repdists first - before performing search in hashtables. look at lzma for implementation details

Will look again at Kadach.

encode

unread,

May 1, 2008, 1:14:00 PM5/1/08

to encode_ru_f...@googlegroups.com

New BALZ v1.05 comes out! What's new:
+ New match finder: HC5 - i.e. hash chains with 3-5-byte hashing!
+ Slightly improved parsing
All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!

It will be released within one or two weeks...

encode

unread,

May 1, 2008, 2:48:00 PM5/1/08

to encode_ru_f...@googlegroups.com

Although, S&S parsing rules! Many times I compared various parsing schemes with S&S, S&S is the best - even with smaller dictionary it achieves higher compression than, say 2-byte lookahead lazy matching. Maybe I should combine new match finder (HC5) with such parsing...

LovePimple

unread,

May 1, 2008, 3:05:00 PM5/1/08

to encode_ru_f...@googlegroups.com

Quoting: encode

All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!

encode

unread,

May 1, 2008, 3:43:00 PM5/1/08

to encode_ru_f...@googlegroups.com

Quoting: encode

Maybe I should combine new match finder (HC5) with such parsing...

Tested the idea...
Well, even such match finder is extremely slow with SS parsing. That means that with this kind of parsing we should use binary tree or similar stuff, maybe we may build a tree, like with some LZW implementations, and instead of a direct buffer search, just traverse thru this structure.
Anyway, by now, let assume that BALZ is a fast LZ77 encoder. New BALZ v1.05 with "e" option is fast enough indeed.

LovePimple

unread,

May 1, 2008, 3:54:00 PM5/1/08

to encode_ru_f...@googlegroups.com

As long as its still '*MUCH* faster' than previous versions!

encode

unread,

May 1, 2008, 4:41:00 PM5/1/08

to encode_ru_f...@googlegroups.com

The SS parsing's performance can't go out of my head. For example, BALZ with 1 MB window and SS parsing may beat BALZ with 4 MB window and lazy matching with 2-byte lookahead. Having said that with SS parsing encoder is dead slow (starting at 18X slower compared to my special lazy matching). Well, at least I can see how much "air" kept by current scheme. Note that in some cases and on some files the large dictionary make sense, even SS-based encoder with smaller dictionary may not compete with larger-dictionary brother with much simpler parsing strategy. Anyway, SS is still far from optimal, like I said in some cases like 'canterbury.tar' lazy matching provides significantly higher compression compared to SS. I tested LZMA with optimal and simple parsing schemes and I see how 'real' optimal parsing may help, with same settings (dict. size, match finder, and, the most important, simple parsing strategy) LZMA and BALZ are close together, of course, as they both utilize LZ77. Concluding, I will release what I currently have, and when we will see... something... Anyway, BALZ v1.05 is something special, believe me... ;)

Reply all

Reply to author

Forward