balz v1.03 is here!

0 views
Skip to first unread message

encode

unread,
Apr 17, 2008, 3:22:00 PM4/17/08
to encode_ru_f...@googlegroups.com


OK, new version has been released. This version introduces new greedy encoder which is *FAST*. I hope you enjoy it. Also I hope that BALZ become a fast LZ77 coder. So, it's interesting to compare it with TORNADO, and others. ;)

http://encode.ru/balz/index.htm



LovePimple

unread,
Apr 17, 2008, 3:44:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Thanks Ilia! :)

Vacon

unread,
Apr 17, 2008, 3:51:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Hello everyone,

see my comment here:
URL

Best regards!

Bulat Ziganshin

unread,
Apr 17, 2008, 4:50:00 PM4/17/08
to encode_ru_f...@googlegroups.com


how about sending it to the http://www.metacompressor.com/submit.aspx ?

encode

unread,
Apr 17, 2008, 4:56:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
how about sending it to the http://www.metacompressor.com/submit.aspx ?

Throws a error message!

Bulat Ziganshin

unread,
Apr 17, 2008, 5:00:00 PM4/17/08
to encode_ru_f...@googlegroups.com


what you mean? i'm successfully test 4x4 there

encode

unread,
Apr 17, 2008, 5:02:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin

what you mean? i'm successfully test 4x4 there

It throws:
You have not uploaded a 'zip' file but application/zip!

I tried many times with differently packed ZIP archives - nothing works!

Bulat Ziganshin

unread,
Apr 17, 2008, 5:11:00 PM4/17/08
to encode_ru_f...@googlegroups.com


are you use IE? this form does't work with Opera at least

Bulat Ziganshin

unread,
Apr 17, 2008, 5:16:00 PM4/17/08
to encode_ru_f...@googlegroups.com


at least, 1.02 was tested there by someone

encode

unread,
Apr 17, 2008, 5:33:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
are you use IE?

Firefox! :)

Bulat Ziganshin

unread,
Apr 17, 2008, 5:37:00 PM4/17/08
to encode_ru_f...@googlegroups.com


>timer balz.exe e dll100.dll balz
User Time = 394.837 = 00:06:34.837 = 78%
>timer balz.exe d balz nul
User Time = 20.609 = 00:00:20.609 = 82%
36,720,996 bytes


>tor -5 dll100.dll -otor
User Time = 25.797 = 00:00:25.797 = 78%
>timer tor -d tor -o
User Time = 5.708 = 00:00:05.708 = 82%
32,634,030 bytes

although my computer is by no way modern - 64k+64k cach size

encode

unread,
Apr 17, 2008, 5:40:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
are you use IE?

With IE it throws:

Your code is wrong!

What is CODE??

Quoting: Bulat Ziganshin
dll100.dll

Try others... ;)

Bulat Ziganshin

unread,
Apr 17, 2008, 6:31:00 PM4/17/08
to encode_ru_f...@googlegroups.com


code is LongNightsDebugging

Black_Fox

unread,
Apr 17, 2008, 7:07:00 PM4/17/08
to encode_ru_f...@googlegroups.com


ex mode tested

for "e" mode resulting total was 16 859 360, comp speed 836 kBps and decompression speed little bit smaller than for ex mode

LovePimple

unread,
Apr 17, 2008, 8:28:00 PM4/17/08
to encode_ru_f...@googlegroups.com


Quick test...

Test file: ENWIK8

Test Machine: AMD Sempron 2400+, Windows XP SP2

mode: ex

Compression Time: 10814.922s

Compressed Size: 30,604,477 bytes

Decompression Time: 17.408s


mode: e

Compression Time: 1381.077s

Compressed Size: 32,406,406 bytes

Decompression Time: 18.962s

encode

unread,
Apr 18, 2008, 6:09:00 AM4/18/08
to encode_ru_f...@googlegroups.com


By the way, in future versions I may increase dictionary size to say 1..4 MB. Also, ex may represent a Lazy Parsing, not SS, which may be completely removed. In addition, I will reduce memory usage to ~70 MB, even if we deal with 4 MB and larger dictionaries.

:)

I think the future is in such *FAST* modes, since SS' compression speed is unacceptable. What do you think?

LovePimple

unread,
Apr 18, 2008, 8:12:00 AM4/18/08
to encode_ru_f...@googlegroups.com


Quoting: encode
I think the future is in such *FAST* modes, since SS' compression speed is unacceptable. What do you think?

I agree! :)

encode

unread,
Apr 18, 2008, 8:24:00 AM4/18/08
to encode_ru_f...@googlegroups.com


Furthermore, in most cases the difference in compression is really small, even if we compare to greedy (unoptimized) parsing, as currently BALZ make use. If we deal with Lazy Matching we even further close the gap, being just slightly slower.



encode

unread,
Apr 19, 2008, 4:36:00 PM4/19/08
to encode_ru_f...@googlegroups.com


What I've already done with BALZ v1.04:
+ Removed SS parsing
+ Reduced memory usage to ~80 MB
+ Changed dictionary size to 1 MB
+ Mode "e" uses greedy parsing
+ Mode "ex" uses lazy matching with 2-byte lookahead
Having said that with a larger dictionary and a new, simpler parsing, new BALZ often outperforms an old one (current version), being incomparable faster and with smaller memory footprint. Very cool!

Bulat Ziganshin

unread,
Apr 19, 2008, 4:51:00 PM4/19/08
to encode_ru_f...@googlegroups.com


Ilya, what's your email??

encode

unread,
Apr 19, 2008, 4:54:00 PM4/19/08
to encode_ru_f...@googlegroups.com


ilia_muraviev # yahoo . com :)

Hope that bots will not extract my box...

encode

unread,
Apr 20, 2008, 3:12:00 PM4/20/08
to encode_ru_f...@googlegroups.com


Continue improving BALZ:
+ Changed dictionary size to 2 MB. Looks like 2 MB is some kind of standard value for modern LZ77 coders (CABARC,QUANTUM,etc.)
+ Tested more deeply parsing with 2-byte lookahead. In some cases such thing may slightly hurt compression, comapred to simple 1-byte lookahead lazy matching. But overall it helps, especially on text files.
+ Just stuck in a middle with formula - len/offset limits. i.e. which offset should be the max for each length. With 2 MB dictionary I've found that I should restrict offsets for 3,4,5-byte matches.
3 - ~256
4 - ~4k
5 - ~512k
Continue digging...
P.S.
This new beast, due to a larger dictionary, has a higher compression, in some cases notable higher, even with such simpler parsing scheme. At the same time it's faster...

Bulat Ziganshin

unread,
Apr 20, 2008, 4:48:00 PM4/20/08
to encode_ru_f...@googlegroups.com


Quoting: encode
Looks like 2 MB is some kind of standard value for modern LZ77 coders (CABARC,QUANTUM,etc.)

yes, if you live in 90s :) rar/ace already had 4mb dicts

Quoting: encode
This new beast, due to a larger dictionary, has a higher compression, in some cases notable higher, even with such simpler parsing scheme. At the same time it's faster...

you have a lot of room for improvement

encode

unread,
Apr 20, 2008, 5:03:00 PM4/20/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
rar/ace already had 4mb dicts

Actually, I can set ANY dictionary size. Larger dictionary = slower compression. Well, I'll test the BALZ with 4 MB dictionary. Maybe I should keep 4 MB, at least for "pht.psd" file. ;)

encode

unread,
Apr 21, 2008, 1:15:00 PM4/21/08
to encode_ru_f...@googlegroups.com


OK, tested BALZ with various window sizes (1..4 MB). Well, maybe 4 MB is too heavy - Hash Chain based match finder is not so efficient on such large dictionaries. The main question is - what is BALZ - fast and efficient LZ77 or LZMA competitor. Well, even with 4 MB BALZ may not compete with LZMA - we need Optimal Parsing and Binary Tree based match finder. Therefore, I think I should keep something in middle between Deflate and LZMA - fast LZ77. Concluding, 1 MB dictionary is enough, although I will make additional tests with 2 MB one. In new BALZ I also improved parsing, new "ex" mode uses an advanced Lazy Matching with 2-byte lookahead, also during decision of dropping a match it looks for offset of a current match, is it closer/good match, also is current offset in a rep state, and so on. Another parameter I tested is a hash chain length – 4k, 8k, 16k; 16k is too large value and with a large dictionaries, say 4 MB, may heavily affect compression speed. 8k is very deep search, 4k is OK with 1 MB dictionary but not really enough on a larger ones. Anyway, you may post your own thought about what BALZ do you really want to see – i.e. fast or not, favor compression or speed, etc.



Christian

unread,
Apr 21, 2008, 1:37:00 PM4/21/08
to encode_ru_f...@googlegroups.com


Quoting: encode
Anyway, you may post your own thought about what BALZ do you really want to see – i.e. fast or not, favor compression or speed, etc.

I'd like to see stronger or much faster compression. In case you go for stronger compression an increase in dictonary size would be great. But frankly speaking, I'd love to see an improved version of TC combining your know-how from quad and lzpm. I first found this forum because I was looking for some info on TC.
;)

encode

unread,
Apr 21, 2008, 2:10:00 PM4/21/08
to encode_ru_f...@googlegroups.com


Yep, TC is one of the craziest things I've ever made. Another cool compressor is one closed source version of QUAD, which represents an order-2 fast CM+LZP layer. The performance of this compression is crazy! However, starting with QUAD idea I'm looking more carefully at asymmetric things like ROLZ and LZ77. Indeed, new BALZ in some cases has identical or greater compression than my old PIMPLE, being incomparable faster. In addition, my new LZ77 easily outperforms LZPM on binary files. But the coolest part of BALZ is its simplicity – I think it's one of the simplest compressors ever made – BALZ v1.04 has ~7 KB source code (encoder/decoder/interface, all stuff), at the same time, things like TC have high complexity – large sources, lots of classes, etc. - hard to work on such large projects. Anyway, things like fast CM and LZP is well known to me and to release a new compressor I may just Copy+Paste my own code. Just currently, I more interested in relatively new area to me – pure LZ77. OK, will look at MFC's results and will decide what stuff and tricks to insert to BALZ, maybe again I'll skip back to CM+LZP. ;)

encode

unread,
Apr 22, 2008, 6:43:00 AM4/22/08
to encode_ru_f...@googlegroups.com


Added some match finder trick which drastically increases compression speed with large dictionaries; the text compression speed is crazily improved. In addition, found that 2 MB in most cases is much better than 1 MB, for example:

driver.tar (driver cache, 320,136,192 bytes)

BALZ v1.04, ex, 512k: 82,271,902 bytes
BALZ v1.04, ex, 1m: 76,534,871 bytes
BALZ v1.04, ex, 2m: 66,462,989 bytes
BALZ v1.04, ex, 4m: 63,979,864 bytes

Anyway, with new match finder the compression speed is OK even with 4 MB window.

Continue testing!

Bulat Ziganshin

unread,
Apr 22, 2008, 8:06:00 AM4/22/08
to encode_ru_f...@googlegroups.com


try it with 512mb dict

encode

unread,
Apr 22, 2008, 8:44:00 AM4/22/08
to encode_ru_f...@googlegroups.com


TORNADO already has 512m dict ;)

Bulat Ziganshin

unread,
Apr 22, 2008, 10:33:00 AM4/22/08
to encode_ru_f...@googlegroups.com


and what are results?

Fallon

unread,
Apr 22, 2008, 12:31:00 PM4/22/08
to encode_ru_f...@googlegroups.com


Quoting: encode
Anyway, you may post your own thought about what BALZ do you really want to see – i.e. fast or not, favor compression or speed, etc.

High efficiency, maybe give up some in exchange for compression,
buth in truth, I will peek in and follow the ride to anywhere.
Quoting: encode
the coolest part of BALZ is its simplicity – I think it's one of the simplest compressors ever made – BALZ v1.04 has ~7 KB source code

Attractive!
I take it that your attention for assymmetry has to do with practical use. Typical users probably won't give a rat's ass (bulat: DGARA:)

encode

unread,
Apr 22, 2008, 4:08:00 PM4/22/08
to encode_ru_f...@googlegroups.com


Quoting: Bulat Ziganshin
and what are results?

TOR v0.4, -11: 56,116,831 bytes

Christian

unread,
Apr 22, 2008, 4:39:00 PM4/22/08
to encode_ru_f...@googlegroups.com


Quoting: Fallon
Typical users probably won't give a rat's ass (bulat: DGARA) about a 7 kb or 700 kb source

Still, a tiny source is always nice - is easy to maintain. Damn, even Slug has 8 kb source code. CCM and RZM both have around 20 kb.

Fallon

unread,
Apr 22, 2008, 6:23:00 PM4/22/08
to encode_ru_f...@googlegroups.com


Quoting: Christian
Still, a tiny source is always nice - is easy to maintain.

:)

Reply all
Reply to author
Forward
0 new messages