balz v1.04 is here!

3 views
Skip to first unread message

encode

unread,
Apr 25, 2008, 11:37:00 AM4/25/08
to encode_ru_f...@googlegroups.com


OK, a very special version of BALZ is here!

Briefly what's new:
+ Enlarged window size to 4 MB, block size to 32 MB
+ Improved match finder
+ Improved parsing. Default "e" mode uses greedy parsing. An optimized "ex" mode uses an advanced lazy matching with two byte lookahead. During parsing, encoder checks some additional conditions like is current offset in Rep() state, is current offset good enough, etc.

Enjoy!

;)

http://encode.ru/balz/index.htm



Nania Francesco Antonio

unread,
Apr 25, 2008, 12:26:00 PM4/25/08
to encode_ru_f...@googlegroups.com


SFC Test
option [ex]
13.077.423 B comp. 169,703 s. dec. 2,687 s.
option [e]
13.278.748 B comp. 68,060 s. dec. 2,705 s.

LovePimple

unread,
Apr 25, 2008, 6:13:00 PM4/25/08
to encode_ru_f...@googlegroups.com


Thanks Ilia! :)

Mirror: Download

Nania Francesco Antonio

unread,
Apr 26, 2008, 12:21:00 PM4/26/08
to encode_ru_f...@googlegroups.com


I/you/they have remained really interested Ilia by this BALZ - LZ77 compressor. Go down under the 100.000.000 Bs in Maximum compression for MFC test is not what from not too long! I am curious to know if you/he/she has inserted a pre-filter delta for BMP, TIFF, WAVE etc.?

encode

unread,
Apr 26, 2008, 12:28:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Nope, BALZ has exactly the same E8/E9 transformer as LZPM (an improved QUAD's one) and only. Just BALZ is much stronger on binary data, thanks to LZ77! I'm hoping that new BALZ v1.04 will have MUCH higher compression on ALL test sets, including MFC, Squeeze Chart, Black_Fox's and of course yours! ;)

LovePimple

unread,
Apr 26, 2008, 12:41:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Quick test...

BALZ [e]

A10.jpg > 843,382
AcroRd32.exe > 1,473,688
english.dic > 872,448
FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261
world95.txt > 233,472

Total = 12,650,532 bytes


ENWIK8 > 30,279,021 bytes

Elapsed Time: 00:45:14.517 (2714.517 Seconds)


BALZ [ex]

A10.jpg > 843,382
AcroRd32.exe > 1,449,276
english.dic > 962,560
FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981

Total = 12,946,345 bytes


ENWIK8 > 29,230,841 bytes

Elapsed Time: 02:04:27.840 (7467.840 Seconds)

encode

unread,
Apr 26, 2008, 12:43:00 PM4/26/08
to encode_ru_f...@googlegroups.com


LovePimple
Re-check the results, there is something wrong!

LovePimple

unread,
Apr 26, 2008, 12:50:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Results are correct for my machine. It seems that BALZ still fails to work correctly on my old P3 @750MHz machine.

encode

unread,
Apr 26, 2008, 1:34:00 PM4/26/08
to encode_ru_f...@googlegroups.com




LovePimple

unread,
Apr 26, 2008, 2:09:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Here are the results from the same test on my AMD Semprom 2400+ machine...


BALZ [e]

A10.jpg > 843,382
AcroRd32.exe > 1,473,688
english.dic > 1,095,449

FlashMX.pdf > 3,751,136
FP.LOG > 895,287
MSO97.DLL > 1,916,423
ohs.doc > 844,279
rafale.bmp > 1,089,156
vcfiu.hlp > 731,261
world95.txt > 638,687

Total = 13,278,748 bytes



BALZ [ex]

A10.jpg > 843,382
AcroRd32.exe > 1,449,276
english.dic > 1,093,638

FlashMX.pdf > 3,738,823
FP.LOG > 855,849
MSO97.DLL > 1,885,008
ohs.doc > 836,783
rafale.bmp > 1,071,154
vcfiu.hlp > 698,529
world95.txt > 604,981

Total = 13,077,423 bytes

Compression of ENWIK8 is far too slow to keep retesting.

EDIT: Here are the results for the fastest [e] setting...


ENWIK8 > 30,279,021 bytes

Elapsed Time: 00:42:10.449 (2530.449 Seconds)

encode

unread,
Apr 26, 2008, 2:10:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Thank you!

LovePimple

unread,
Apr 26, 2008, 3:19:00 PM4/26/08
to encode_ru_f...@googlegroups.com

encode

unread,
Apr 26, 2008, 3:29:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Quoting: LovePimple
We had this problem before.

Yep! It's a compiler-related problem... Anyway, BALZ is for modern PCs. P3 is for museums. For example, some time ago I get myself to the PC center to get a RAM for my sampler, the RAM type is equal to an old laptops type. Sellers said that such RAM type is from P3 era and should be placed at museum, finally I've found one chip and purchase it at very high price - because it's a museum-like, very rare RAM chip...
I just don't know what's wrong... ;)

As always, you may play with Visual Studio compile:
balz104cl.zip



encode

unread,
Apr 26, 2008, 3:43:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Some testing results:

textures.tar (Textures from the Doom 3 game, 604,218,368 bytes)

PKZIP 2.50, -exx: 233,240,852 bytes
TOR 0.4, -5: 216,888,608 bytes
TOR 0.4, -11: 210,794,900 bytes
CABARC 1.00, -m LZX:21: 193,234,553 bytes
LZPM 0.15, 1: 187,609,193 bytes
LZPM 0.15, 9: 185,038,816 bytes
BALZ 1.04, e: 184,031,123 bytes
BALZ 1.04, ex: 183,003,496 bytes



LovePimple

unread,
Apr 26, 2008, 3:50:00 PM4/26/08
to encode_ru_f...@googlegroups.com


Quoting: encode
P3 is for museums.

I don't agree. Just because something (or someone) is old, we should not dismiss them as "museum" pieces. ;)


Quoting: encode
As always, you may play with Visual Studio compile:

Thank You! :)

LovePimple

unread,
Apr 27, 2008, 12:15:00 PM4/27/08
to encode_ru_f...@googlegroups.com


BALZ 1.04 is now added to SFC and MFC tests. :)

http://www.maximumcompression.com/

encode

unread,
Apr 27, 2008, 3:10:00 PM4/27/08
to encode_ru_f...@googlegroups.com


However, new version has lower compression even with 4 MB dictionary. Note, BALZ v1.03 has MINMATCH=3, BALZ v1.04 has MINMATCH=4. Newer version looks like loose too many short matches. Maybe BALZ v1.05 will have 1 MB dictionary, MINMATCH=3, and improved LZ-output coding. :)

Bulat Ziganshin

unread,
Apr 27, 2008, 3:52:00 PM4/27/08
to encode_ru_f...@googlegroups.com


in order to make compression fast and good, you need to use separate hash tables for short strings. say, tornado uses 3 separate tables: for 2-byte, 3-byte and 4+-byte strings. first two tables are rather large and addressed directly, without chains. lzma 4.43 used the same scheme and current versions uses separate table for 4-byte strings and last table only for 5+-byte strings. the same is true for rar. note that size of table should be much larger than max. distance for this type of matches. say, lzma uses one million entries for searchoing 4-byte strings while distances are probably limited to something about 50-200kb

encode

unread,
Apr 27, 2008, 3:57:00 PM4/27/08
to encode_ru_f...@googlegroups.com


Yep, I will try to implement such multi-level hashing in BALZ. :)

encode

unread,
Apr 28, 2008, 9:34:00 AM4/28/08
to encode_ru_f...@googlegroups.com


Just carefully tested such thing with BALZ. Well, it's works! However, I'll not hurry to add it.
 
int pos=head[HSIZE+gethash3(i)];
if (pos) {
// search for short string
}
pos=head[gethash4(i)];
while (pos) {
// do a hash chained search
pos=prev[pos];
}
// ...
head[HSIZE+gethash3(i)]=i;
int h=gethash4(i)
prev[i]=head[h];
head[h]=i;
// ...


Even with large HSIZE match finder finds not all short strings. At the same time such thing may do slightly deeper search - since we limit a hash chain length to 8192, and 4 byte hash is better than 3 byte. Will do more tests... :)

encode

unread,
Apr 28, 2008, 10:49:00 AM4/28/08
to encode_ru_f...@googlegroups.com


Another cool idea, which works, is to match MINMATCH from Rep() (recent offset) only. In this case we may encode MINMATCH WITHOUT offset, also MINMATCH freely can be even 2.

Bulat Ziganshin

unread,
Apr 28, 2008, 1:05:00 PM4/28/08
to encode_ru_f...@googlegroups.com


Quoting: encode
Even with large HSIZE match finder finds not all short strings.

HSIZE is size of 4-byte hash here :)

as i said before, with 3-byte strings whose offsets are limited to 4096, lzma used 64k entries

btw, Kadach wrote that it's better to check repdists first - before performing search in hashtables. look at lzma for implementation details

encode

unread,
Apr 28, 2008, 1:09:00 PM4/28/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
HSIZE is size of 4-byte hash here

...and of 3-byte hash as well... ;)

Quoting: Bulat Ziganshin

btw, Kadach wrote that it's better to check repdists first - before performing search in hashtables. look at lzma for implementation details

Will look again at Kadach.

encode

unread,
May 1, 2008, 1:14:00 PM5/1/08
to encode_ru_f...@googlegroups.com


New BALZ v1.05 comes out! What's new:
+ New match finder: HC5 - i.e. hash chains with 3-5-byte hashing!
+ Slightly improved parsing
All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!

It will be released within one or two weeks...

encode

unread,
May 1, 2008, 2:48:00 PM5/1/08
to encode_ru_f...@googlegroups.com


Although, S&S parsing rules! Many times I compared various parsing schemes with S&S, S&S is the best - even with smaller dictionary it achieves higher compression than, say 2-byte lookahead lazy matching. Maybe I should combine new match finder (HC5) with such parsing...

LovePimple

unread,
May 1, 2008, 3:05:00 PM5/1/08
to encode_ru_f...@googlegroups.com


Quoting: encode
All in all, new version is *MUCH* faster and has higher compression, in some cases the compression improvement is really huge!



encode

unread,
May 1, 2008, 3:43:00 PM5/1/08
to encode_ru_f...@googlegroups.com


Quoting: encode
Maybe I should combine new match finder (HC5) with such parsing...

Tested the idea...
Well, even such match finder is extremely slow with SS parsing. That means that with this kind of parsing we should use binary tree or similar stuff, maybe we may build a tree, like with some LZW implementations, and instead of a direct buffer search, just traverse thru this structure.
Anyway, by now, let assume that BALZ is a fast LZ77 encoder. New BALZ v1.05 with "e" option is fast enough indeed.

LovePimple

unread,
May 1, 2008, 3:54:00 PM5/1/08
to encode_ru_f...@googlegroups.com


As long as its still '*MUCH* faster' than previous versions!

encode

unread,
May 1, 2008, 4:41:00 PM5/1/08
to encode_ru_f...@googlegroups.com


The SS parsing's performance can't go out of my head. For example, BALZ with 1 MB window and SS parsing may beat BALZ with 4 MB window and lazy matching with 2-byte lookahead. Having said that with SS parsing encoder is dead slow (starting at 18X slower compared to my special lazy matching). Well, at least I can see how much "air" kept by current scheme. Note that in some cases and on some files the large dictionary make sense, even SS-based encoder with smaller dictionary may not compete with larger-dictionary brother with much simpler parsing strategy. Anyway, SS is still far from optimal, like I said in some cases like 'canterbury.tar' lazy matching provides significantly higher compression compared to SS. I tested LZMA with optimal and simple parsing schemes and I see how 'real' optimal parsing may help, with same settings (dict. size, match finder, and, the most important, simple parsing strategy) LZMA and BALZ are close together, of course, as they both utilize LZ77. Concluding, I will release what I currently have, and when we will see... something... Anyway, BALZ v1.05 is something special, believe me... ;)



Reply all
Reply to author
Forward
0 new messages