Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New BWT compressor

9 views
Skip to first unread message

Uwe Herklotz

unread,
Jul 14, 2003, 6:32:55 AM7/14/03
to
Hi all!

For anyone interested, I've released version 1.0 of my new BWT
compressor at ftp://ftp.elf.stuba.sk/pub/pc/pack/uhbc10.zip

UHBC uses recent research results by S.Deorowicz and J.Abel for
improved second stage processing after BWT (Burrows and Wheeler
Transform). Some extensions and sophisticated modeling provide
top compression ratios. Speed isn't as good as with Bzip2, but
still a lot faster than most PPM.

Some results (-m3):
average bpb
calgary corpus (14 files) 2.208 (calgary.tar 776 kb)
canterbury corpus 2.002 (canterbury.tar 473 kb)
canterbury corpus (large) 1.564

Note that all results are without any filters or preprocessing
before BWT. Additional filters may help a lot for some data.

Please read "uhbc.doc" for usage and legal info.
I'll appreciate any comments, hints, bug reports...

Regards,
Uwe

Falk Hueffner

unread,
Jul 14, 2003, 8:03:06 AM7/14/03
to
"Uwe Herklotz" <_no_s...@yahoo.com> writes:

> For anyone interested, I've released version 1.0 of my new BWT
> compressor at ftp://ftp.elf.stuba.sk/pub/pc/pack/uhbc10.zip

Sounds interesting. Are there any plans to make it free software, by
chance?

--
Falk

Uwe Herklotz

unread,
Jul 14, 2003, 9:32:53 AM7/14/03
to
Falk Hueffner <falk.h...@student.uni-tuebingen.de> wrote:

> Sounds interesting. Are there any plans to make it free software, by
> chance?

UHBC is free for non-commercial use. The current version is still in an
experimental state. Therefor it is intended for testing and evaluation
only. There are no plans to make it open source.

Uwe

berto

unread,
Jul 14, 2003, 12:48:26 PM7/14/03
to
"Uwe Herklotz" <_no_s...@yahoo.com> wrote in message news:<beu12c$8obm2$1...@ID-76993.news.uni-berlin.de>...

> Hi all!
> For anyone interested, I've released version 1.0 of my new BWT
> compressor at ftp://ftp.elf.stuba.sk/pub/pc/pack/uhbc10.zip
> Some results (-m3):
average bpb
> calgary corpus (14 files) 2.208 (calgary.tar 776 kb)
> canterbury corpus 2.002 (canterbury.tar 473 kb)
> canterbury corpus (large) 1.564
> Regards,
> Uwe

Hi

I've tested UHBC 1.0 and I can confirm that on some files (BMP2 on

my sheet) the results are GREAT

For example:

TYPE BMP2
UNCOMPRESS 2,038,490
ERI32 5.1 fre (2002) 1,028,906
UHBC v 1.0 (30-06-2003) 1,099,576
GCA V0.9k (2001) 1,105,946
BWIC (2003) 1,108,519
CTW (Context Tree Weighting) 0.1 1,111,677
DGCA Beta4 (2003) 1,113,888
Epm r8-pre2 (not public) (2003) 1,128,454
QLFC V6.6W (07-02-2001) 1,128,817
YBS 0.03f (09-2000) 1,129,291
Compressia v1.0b 1,135,245
Epm r7 1,138,165
Compressia v0.98beta (2002) 1,140,782
ABC 2.3 (2002) 1,141,002
YBS 0.03e (09-2000) 1,141,047
SLIM13 (2002) 1,150,835
GRZip v 0.7.3 (26-06-2003) 1,151,335
GRZip v 0.6.1 (28-05-2003) 1,151,335
PPMII Monstrous Variant I rev. 1 (30-04-2002) 1,153,963
Durilca v 0.2 (30-6-2003) 1,153,973
Durilca v 0.0 (16-5-2003) 1,153,973
Durilca v 0.1 (30-5-2003) 1,153,975
SBC v0.950 beta (27-03-2002) 1,154,502
SZIP V1.12 (03-2000) 1,155,611
PAQ1SSE (2003) (PAQ2) 1,155,869
RKUC 1.04 (24-06-1999) 1,156,076
BA v1.01 beta5 (2000) 1,163,564
Rkc' (OVERALL) (2003) 1,164,198
Rkc' (only PPMD) (2003) 1,164,198
Bee 0.7.4i (21-12-2002) 1,166,474
ZZIP v0.36C (04-07-2001) 1,169,900
BEE 0.6.3 (18-11-2001) 1,170,306
BEE 0.7.6.d (04-2003) 1,171,519
ENC (Enhanced Compressor) v0.15 (17-12-2002) 1,175,135
PPMII Monstrous Variant H (21-04-2001) 1,178,498
SBC v0.969 beta rev 1 (28-08-2002) 1,179,557
M99 (1999) (After RadixBWT) 1,180,019
PAQ1 (2002) 1,186,403
PPMII Variant I (28-04-2002) 1,187,314
Durilca light v 0.0 (16-5-2003) 1,187,323
Durilca light v 0.2 (30-6-2003) 1,187,323
Durilca light v 0.1 (30-5-2003) 1,187,325
PPMII Variant H (21-04-2001) 1,187,956
Winrar 3.00 beta4 (2002) 1,188,206
Stuffix 8.0.0.148 (2003) (sit x) 1,192,668
CTXf 0.69 archiver (03-2003) 1,195,375
7-ZIP 2.30 beta19 (7Z - PPMD) (11-04-2002) 1,196,543
7-ZIP 2.30 beta19 (OVERALL) (11-04-2002) 1,196,543
Bioarc v1.9 (2001) 1,197,032
777 v0.04 beta 1 (01-03-1998) 1,197,980
UFA 0.04 Beta 1 (01-03-1998) 1,197,986
WINIMP 1.1 (2000) 1,198,839
7-ZIP 2.24 (BZIP2) (21-03-2001) 1,199,381
Experimental Archiver (EXP) 1 (25-02-1998) 1,199,454
PAR v2.00 build 61 (28-01-2001) 1,199,547
RKIVE v1.92 beta1 (1997) 1,205,465
PPMZ2 v0.8 (07-1999) 1,212,679
LZAP v0.20.0 beta (06-1998) 1,215,131
Bwtzip 1,215,546
MAR (Melting-pot archiver) (1999) 1,217,805
Ship in a bottle 1.0 b15 1,218,490
BIJECTIVE COMPRESSOR V1.01 (2000) 1,220,331
Rkc' (only ROLZ) 1,223,217
PPMY v2.02(3c+SSE) (2003) 1,224,051
PPMZ V9.1 (1997) 1,226,096
PPMY v2.0 (2001) 1,229,366
ACB 2.00c (25-4-1997) 1,234,872
UHARC 0.5np (15-10-2002) 1,234,930
UHARC 0.4 (28-12-2001) 1,238,193
RK v1.04.1a (2000) 1,247,412
X1 v0.95a (X) (03-05-1997) 1,252,503
P12 (NNTC) 1,253,118
AI 1,274,656
RDMC v0.06b (09-04-2001) 1,280,857
WinAce 2.11 (12-2001) 1,297,669
PPMN v1.00b1+ km build4 (10-08-2002) 1,301,662
ARHANGEL v1.40 (18-01-2000) 1,304,403
P6 (NNTC) 1,310,346
DLC (Digilinear Compression) 0.6.1. (1999) 1,331,696
BA v1.00 beta (2000) 1,332,798
HA 0.999b (01-1995) 1,360,387
LGHA v1.1g (20-04-1999) 1,360,387
P5 (NNTC) 1,387,333
7-ZIP 2.30 beta19 (7Z - LZMA) (11-04-2002) 1,388,774
RK v 1.02 build 4 alpha (1999) 1,390,140
DISINTEGRATOR 0.9b (DST) (05-1998) 1,396,901
UHARC 0.2 (21-12-1997) 1,402,811
BOA Constrictor v0.58b (02-1998) 1,414,616
DC Archiver 0.98b (2000) 1,428,766
BSA 2.0 (1994) 1,440,295
CAB archive 1.3 1,440,712
Bix 1.00 Beta7 (23-10-1999) 1,445,771
SEMONE 0.6 b0 (1999) 1,476,434
GSA 0.01b (15-08-2000) 1,494,016
ESP v1.92 (11-1997) 1,514,576
QUANTUM 0.97 (1995) 1,548,040
UC2 v.2.37 (1995) 1,555,164
7-ZIP 2.30 beta19 (ZIP) (11-04-2002) 1,567,513
ARJZ 0.15 alpha (07-04-1995) 1,569,235
KZIP (08-06-2003) 1,574,928
HIT 2.10 (1994) 1,580,956
SQZ 1.8.3 (24-01-1993) 1,582,014
Quark 1.0beta (01-05-1993) 1,583,601
Pkzip v 2.50 (03-01-1999) 1,584,880
Yamazaki Zippe v 1.06.1 (03-12-2000) 1,592,911
AMG 2.3 (08-1995) 1,593,545
Sky 1.15 (09-07-1997) 1,593,545
OOP 2.3 (15-11-1997) 1,593,545
AIN V2.30 (1994) 1,596,479
YAC 1.02 (1995) 1,599,437
SQWEZ V2.3 (1995) 1,601,921
LZDS 2.1 (1999) 1,602,682
XTREME v1.06 1,604,364
ZET v 0.1 beta (09-03-1994) 1,612,973
HAP TM 3.00 (1992) 1,618,118
THAP 1.02c (1995) 1,618,118
ARJ 2.55 c (08-05-1997) 1,619,183
BSArc v 1.9.3 (03-09-1992) 1,637,855
Winzip 9 beta (2003) 1,640,426
NRV v0.10 (27-04-1998) 1,653,181
LIMIT 1.2 (02-1994) 1,657,754
LHARK 0.4d (12-1996) 1,658,195
WINZIP 2.00 (Winzip 7.0) 1,660,440
COMPRESS 1.5 (22-10-2001) 1,660,488
CABARC 1.00 (1996) 1,661,194
Wingzip V1.0.0 build 21 (2002) 1,662,686
Dzip v2.9 (2002) 1,662,718
Vuzip 1.7 build2.167 (29-01-2003) 1,662,771
MSXIE v1.40 Pro (30 -12-1997) 1,664,642
LHA 2.13 (20-07-1991) 1,664,672
ZOO 2.1 (09-07-1991) 1,664,806
Crossepac v 1.35 (10-10-1994) 1,664,976
GZIP 1.2.4 (18-08-1993) 1,669,800
Hpack v 0.79a0 (1-05-1993) 1,671,213
AKT V70 Beta7 (08-01-2000) 1,676,423
RAX 1.0.2 (25-02-1998) 1,682,389
PAK 2.51 (08-10-1990) 1,685,161
JAM (08-1996) 1,688,534
ARQ Crusher! V3.2 (02-03-1997) 1,700,666
KTY Archive 1.3 (03-11-2002) 1,707,727
COMP16 1,709,387
ICE 1.02c (1995) 1,709,510
ARX 1.0 (17-07-1994) 1,709,585
ELI 5750 1,709,646
Hyper 2.6 (5-1992) 1,724,163
XPA32 1.0.2 (25-5-1999) 1,735,637
SQUISH v1.0 (1992) 1,743,936
M99 (1999) 1,755,635
ARIDEMO v 1.03 (2001) 1,778,564
Turbo Compressor v0.1 (31-12-1990) 1,779,037
ChArc v 1.2 (1990) 1,780,005
SCRNCH 1.02 (1988) 1,781,970
BVI 1.70 (02-1998) 1,790,122
LZOP 1.01 (27-04-2003) 1,804,930
ARG 1.00.001 BETA (05-06-1994) 1,842,937
Ditpack v 1.0 (1991) 1,843,330
GAS 2.0 (1993) 1,843,626
SQPC File Squezeer v1.28 (10-01-1985) 1,848,490
SPLINT 2.1 (12-04-1989) 1,878,616
Reduq v2.1 (2001) 1,878,877
InstallSHIELD Compressor 2.00.053 (1993) 1,881,385
NASHRINK v5.0 (1997) 1,881,893
DWC v5.10 (07-03-1990) 1,908,339
LZSS v1.0 1,932,961
Causeway Compressor v 3.01 (1996) 1,933,262
TERSE 2.1d 16bit (1994) 1,936,384
MPC 3.00 (Power compressor 3) (1996) 1,939,477
RESOF V2.0B (15-02-1993) 1,939,554
NULIB v3.24 (01-1993) 1,952,909
KABOOM v 1.1 (1992) 1,956,348
TOP 4 v1.03 (02-08-1996) 1,968,215
ASD v 0.1.4 (01-1997) 1,981,133
LZK v0.01Beta 1 (2002) 1,981,421
Squash v 1.21 (05-1997) 1,981,465
MS COMPRESS 2.0 (1992) 1,981,804
Blink v 2.55 (03-1999) 1,983,470
FOXSQZ v 1.9c (11-11-1997) 1,986,257
ARS 2.1 1,993,785
MDCD v1.0 (23-10-1988) 1,997,910
BriefLZ pack v1.0 (2003) 2,007,976
Run Length Encoding 8 bit (1991) 2,038,490
AR7 v1.1 (1991) 2,038,490
ARC 6.02 (01-1989) 2,038,490
ARCA 1.29 (09-12-1987) 2,038,490
Lzcomp (1986) 2,038,490
OPAQue (1998) 2,038,490
BCOMP v0.1 (28-01-1999) 2,038,490

Bob Mariotti

unread,
Jul 15, 2003, 10:16:35 AM7/15/03
to
On 14 Jul 2003 09:48:26 -0700, b.des...@virgilio.it (berto) wrote:

>I've tested UHBC 1.0 and I can confirm that on some files (BMP2 on
>
>my sheet) the results are GREAT
>

Hey berto;

Seeing how you have an enormous library of compressors - would you be
willing to try and uncompress a file that I have to see if any of your
tools will do so? Then I will know what program I will need to
further process the files???

Thanks

Bob

berto

unread,
Jul 16, 2003, 12:25:59 PM7/16/03
to
> Hey berto;
>
> Seeing how you have an enormous library of compressors - would you be
> willing to try and uncompress a file that I have to see if any of your
> tools will do so? Then I will know what program I will need to
> further process the files???
>
> Thanks
>
> Bob

Hi Bob
There are no problems to do that.
Send me the file (Not too big!)

If you want you can download only the best

to to the job avoiding loss of time

Bye
Berto

Bob Mariotti

unread,
Jul 16, 2003, 10:20:57 PM7/16/03
to


Thank, Berto;

I can send you a small sample attached to email. Either post your
address or email to be at r.mar...@financialdatacorp.com

Bob

Binh Vo

unread,
Jul 23, 2003, 12:56:12 PM7/23/03
to
Hi, I downloaded a copy of your compressor, and was quite impressed with the
results it was getting, especially using options like -m0, which showed that it
could do very well using only BWT->RLE->entropy, with no MTF step. Could you
let me know which algorithms you are using for each step, in particular the RLE?

Thanks,
-Binh

"Uwe Herklotz" <_no_s...@yahoo.com> wrote in message news:<beu12c$8obm2$1...@ID-76993.news.uni-berlin.de>...

Uwe Herklotz

unread,
Jul 24, 2003, 7:06:15 AM7/24/03
to
Binh Vo <b...@mit.edu> wrote in message:
fbc02b54.0307...@posting.google.com...

> Hi, I downloaded a copy of your compressor, and was quite impressed with
the
> results it was getting, especially using options like -m0, which showed
that it
> could do very well using only BWT->RLE->entropy, with no MTF step. Could
you
> let me know which algorithms you are using for each step, in particular
the RLE?

BWT - Combination of radix sort, ternary quicksort and shellsort.
RLE - Similar to RLE-BIT algorithm as described in article
"Improvements to the Burrows-Wheeler Compression Algorithm:
After BWT Stages" by J.Abel. The run lenghts are coded with
binary prefix code. Run length prediction is improved by
context information: run symbol as well as previous MTF rank,
i.e. MTF is also performed in -m0 mode but here it's used for
RLE prediction only. It's possible to remove this but I added
-m0 mode just to show the results of direct entropy coding
used for the adaptive switching. It was not intended as full
replacement for MTF/WFC.

The entropy coding of remaining symbols after RLE as well as of
run lengths is done via bitwise arithmetic encoding. Probability
modeling is similar to the method presented in paper "Improvements
to Burrows-Wheeler Compression Algorithm" by S.Deorowicz.

Regards,
Uwe

apm

unread,
Jul 29, 2003, 5:24:06 PM7/29/03
to
"Uwe Herklotz" <_no_s...@yahoo.com> wrote in message news:<beubh8$8no04$1...@ID-76993.news.uni-berlin.de>...

By 'Free Software' I mean GPL'd. Free as in free speech, not free
beer. So UHBC is released currently free of charge, binary release
only. Source closed, peer review not possible. Pity.

-apm

Mark van Leeuwen

unread,
Jul 30, 2003, 4:26:59 AM7/30/03
to
apm wrote:

> By 'Free Software' I mean GPL'd.

Well, I do not want to continue with the troll you just started, but
the GPL is not reaaly the only one and "absolute" free sofware
licence. But I guess Uwe does not want to put UHBC in an even more
liberal licence or ... no I don't think he will! ;-)

--
Mark

0 new messages