Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Need Shareware Compression Library

0 views

Skip to first unread message

dave robbins

unread,

Mar 19, 1996, 3:00:00 AM3/19/96

I'm in need of a shareware compression library that will be used with
Borland C/C++ 3.1, anyone have any suggestions.

---
Dave Robbins drob...@servtech.com
Utica NY

Joseph H Allen

unread,

Mar 19, 1996, 3:00:00 AM3/19/96

In article <4il3od$o...@murphy2.servtech.com>,

dave robbins <drob...@cyber2.servtech.com> wrote:
>I'm in need of a shareware compression library that will be used with
>Borland C/C++ 3.1, anyone have any suggestions.

Here's a freeware compression library. It's a variable order adaptive
huffman coder. Zero order huffman coders base their compression on the
frequencies of occurance of each letter- the more frequent the letter, the
fewer number of bits that are used to encode it. Higher order coders
compile statistics on each letter, but also take into account the characters
which preceed the letter. For example, the frequency of occurance of 'e'
might be 20%, but if it's preceeded with 'th' is goes up to 95%, and thus
needs fewer bits to encode. The number of characters preceeding the letter
which are taken into account is the order of the coder. Higher order is
generally better, but there is a penalty for overhead. The best order for
text seems to be 3.

It compresses about the same as gnu zip for text, but somewhat worse for
binary. It is slower- by about 5 times, but that may not matter for your
application. On the other hand it doesn't violate any patents, it's in the
public domain and the code's tiny enough (about 10K) so that I can just post
it here. If you have any ideas for improving the speed, let me know :-)

You could improve compression ratios (but not speed) by using arithmetic
coding instead, but there would then be a patent problem.

You can compress blocks:

int comp(char *dst,char *src,int len);
/*
* Compress the block at src/len into buffer at dst. Returns compressed
* size (no. characters written to dst).
*/

int dcomp(char *dst,char *src,int len);
/*
* Uncompress a block at src into buffer at dst. 'len' should be the
* original uncompressed size of the block. dcomp() returns the compressed
* size of the block (no. characters read from src).
*/

Also there's an example main() at the end which can be used to compress from
stdin to stdout (so you can just compile everything which follows to try
it). Note that there's an option for setting the order of the statistical
model used. Give a -help option for the list of options when you run the
program.

I haven't tried this in MS-DOS, but it should work if you lower MAXBUFSIZ
and HTSIZE, change the stdin and stdout file modes to binary and compile
with large model.

/* Variable order adaptive huffman coder
* by: Joe Allen
* Released into public domain 19, 1996
*/

#include <stdio.h>
#include <string.h>

#define MAXBUFSIZ 65535 /* Buffer size for example main() */
#define HTSIZE 16384 /* Hash table size (must be power of 2) */

/* The order of the statistical model */

int order=3;

/* Bit assembler/dis-assembler */

static int accu;
static int nbits;

/* I/O buffer */

static unsigned char *orgbuf;
static unsigned char *buf;
static int buflen;

/* Initialize I/O buffer */

void iziobuf(srcdst)
char *srcdst;
{
accu=0;
nbits=0;
buflen=0;
buf=srcdst;
orgbuf=srcdst;
}

/* Output n bits of c */

void emit(c,n)
{
accu|=c<<nbits;
nbits+=n;
while(nbits>=8) buf[buflen++]=accu, nbits-=8, accu>>=8;
}

/* Output any remaining bits */

int flsh()
{
if(nbits) buf[buflen++]=accu, nbits=0;
return buflen;
}

/* Get n bits */

int get(n)
{
int c;
while(nbits<n) accu|=*buf++<<nbits, nbits+=8;
c=accu&((1<<n)-1);
accu>>=n; nbits-=n;
return c;
}

/* Return number of bytes we've gotten with 'get' so far */

int got()
{
return buf-orgbuf;
}

/* Huffman node pool */

struct huff_node
{
struct huff_node *zero; /* These point towards the leaves */
struct huff_node *one; /* (they equal 0 if this is a leaf) */

struct huff_node *parent; /* This points towards the root (0 for root) */

struct huff_node *sort; /* This points to node with next higher freq */

int count; /* Frequency of occurance of this leaf. Or
sum of freqs of children */

struct entry *entry; /* Points to hash table entry for this char */
};

static struct huff_node *free_huff_node=0;
static struct huff_node *alloced_huff_node=0;
static int nfree_huff_nodes=0;

static struct huff_node *allocate_huff_node()
{
if(!nfree_huff_nodes)
{
free_huff_node=(struct huff_node *)malloc(sizeof(struct huff_node)*
(nfree_huff_nodes=1024));
free_huff_node->zero=alloced_huff_node;
alloced_huff_node=free_huff_node++;
--nfree_huff_nodes;
}
++free_huff_node;
--nfree_huff_nodes;
return free_huff_node-1;
}

static void free_huff_nodes()
{
struct huff_node *h;
while(h=alloced_huff_node)
{
alloced_huff_node=h->zero;
free(h);
}
free_huff_node=0;
nfree_huff_nodes=0;
}

/* Hash table entry pool */

struct entry
{
unsigned char str[8]; /* String */
int len; /* Length */
struct entry *next; /* Next node with same hash value */

/* If this entry refers to a table */
struct huff_node *root; /* Root node for the table */
struct huff_node *last; /* Last node added to table */
struct huff_node *esc; /* Node for ESC symbol */

/* If this entry refers to a character in a table */
struct huff_node *node;
};

static struct entry *free_entry=0;
static int nfree_entries=0;
static struct entry *alloced_entry=0;

static struct entry *allocate_entry()
{
if(!nfree_entries)
{
free_entry=(struct entry *)malloc(sizeof(struct entry)*(nfree_entries=1024));
free_entry->next=alloced_entry;
alloced_entry=free_entry++;
--nfree_entries;
}
++free_entry;
--nfree_entries;
return free_entry-1;
}

static void free_entries()
{
struct entry *e;
while(e=alloced_entry)
{
alloced_entry=e->next;
free(e);
}
free_entry=0;
nfree_entries=0;
}

/* Hash table */

static struct entry *htab[HTSIZE];

/* Add a new node to a huffman table */

static void add(base,ent)
struct entry *base, *ent;
{
struct huff_node *new=allocate_huff_node();
struct huff_node *coppied=allocate_huff_node();
struct huff_node *last=base->last;

new->entry=ent;
new->count=0;
new->sort=coppied;
new->parent=last;
new->zero=0;
new->one=0;
ent->node=new;

coppied->entry=last->entry;
coppied->count=last->count;
coppied->sort=last;
coppied->zero=last->zero;
coppied->one=last->one;
coppied->parent=last;
if(coppied->entry==base) base->esc=coppied;
else if(coppied->entry) coppied->entry->node=coppied;

last->entry=0;
last->zero=coppied;
last->one=new;

base->last=new;
}

/* Increment count on a huffman node and update huffman tree */

static void update(base,i)
struct entry *base;
struct huff_node *i;
{
while(i)
{
++i->count;
if(i->sort && i->count>i->sort->count)
{
struct huff_node *e;
struct huff_node tmp;
for(e=i->sort;e->sort && e->count==e->sort->count;e=e->sort);
tmp= *e;

e->count=i->count;
e->entry=i->entry;
e->zero=i->zero;
e->one=i->one;

i->count=tmp.count;
i->entry=tmp.entry;
i->zero=tmp.zero;
i->one=tmp.one;

if(i->zero)
{
i->zero->parent=i;
i->one->parent=i;
}
else
{
if(i->entry==base) i->entry->esc=i;
else i->entry->node=i;
}

if(e->zero)
{
e->zero->parent=e;
e->one->parent=e;
}
else
{
if(e->entry==base) e->entry->esc=e;
else e->entry->node=e;
}

i=e;
}
i=i->parent;
}
}

/* Create a new huffman table */

static void mklevel(base,ent)
struct entry *base, *ent;
{
struct huff_node *root=allocate_huff_node();
struct huff_node *esc=allocate_huff_node();
struct huff_node *first=allocate_huff_node();

root->parent=0;
root->count=2;
root->entry=0;
root->zero=esc;
root->one=first;
root->sort=0;

esc->parent=root;
esc->count=1;
esc->entry=base;
esc->zero=0;
esc->one=0;
esc->sort=root;

first->parent=root;
first->count=1;
first->entry=ent;
first->zero=0;
first->one=0;
first->sort=esc;

base->last=first;
base->esc=esc;
base->root=root;
ent->node=first;
}

/* Output a huffman code */

static void up(i)
struct huff_node *i;
{
if(i->parent) up(i->parent), emit(i->parent->one==i,1);
}

/* Decode a huffman code */

static struct huff_node *down(i)
struct huff_node *i;
{
while(i->zero)
if(get(1)) i=i->one;
else i=i->zero;
return i;
}

/* Lookup a string in the hash table */

static struct entry *find(str,len)
unsigned char *str;
{
struct entry *p;
unsigned long hval=0;
int x;
for(x=0;x!=len;++x) hval=(hval>>28)+(hval<<4)+str[x];
hval&=(HTSIZE-1);
for(p=htab[hval];p;p=p->next)
if(len==p->len)
{
for(x=0;x!=len;++x) if(str[x]!=p->str[x]) break;
if(x==len) return p;
}
p=allocate_entry();
p->len=len;
if(len) memcpy(p->str,str,len);
p->next=htab[hval];
htab[hval]=p;
p->node=0;
p->esc=0;
p->root=0;
p->last=0;
return p;
}

/* Encode last byte of string. All but the last byte of the string is used
* to select the huffman table to use */

void encode(str,len)
unsigned char *str;
{
struct entry *base, *ent;
base=find(str,len-1);
ent=find(str,len);
if(ent->node)
{ /* Table and character exist */
up(ent->node);
update(base,ent->node);
}
else if(base->last)
{ /* Only table exists */
up(base->esc);
update(base,base->esc);
if(len-1) encode(str+1,len-1);
else emit(str[0],8);
add(base,ent);
update(base,ent->node);
}
else
{ /* Neither table nor character exist */
if(len-1) encode(str+1,len-1);
else emit(str[0],8);
mklevel(base,ent);
}
}

/* Decode one character. Adds the character to right after end of string.
* Uses characters already in the string to select huffman table */

unsigned char decode(str,len)
unsigned char *str;
{
struct entry *base=find(str,len);
struct entry *ent;
struct huff_node *t;
if(base->root)
{
t=down(base->root);
if(t->entry==base)
{ /* Received escape character */
update(base,t);
if(len)
{
decode(str+1,len-1);
}
else
{
str[len]=get(8);
}
add(base,ent=find(str,len+1));
update(base,ent->node);
}
else
{ /* Received a real character */
unsigned char c=t->entry->str[t->entry->len-1];
update(base,t);
str[len]=c;
}
}
else
{ /* A new table */
if(len)
{
decode(str+1,len-1);
}
else
{
str[len]=get(8);
}
mklevel(base,find(str,len+1));
}
}

/* Free allocated data structures */

void cleanup()
{
int x;
free_huff_nodes();
free_entries();
for(x=0;x!=HTSIZE;++x) htab[x]=0;
}

/* Encode a block (src/len) into an output buffer (dst). Returns encoded size
*/

int comp(dst,src,len)
unsigned char *dst, *src;
{
int x;

iziobuf(dst);

for(x=0;x!=order && x!=len;++x) encode(src,x+1);
while(x!=len) encode(src+x++-order,order+1);
cleanup();

return flsh();
}

/* Decode a block (dst/len) from a buffer (src). 'len' should be the
* original decoded size. dcomp() returns the encoded size (no. bytes
* scanned in src).
*/

int dcomp(dst,src,len)
unsigned char *dst, *src;
{
int x;

iziobuf(src);

for(x=0;x!=order && x!=len;++x) decode(dst,x);
while(x!=len) decode(dst+x++-order,order);
cleanup();

return got();
}

/* Example main() */

int main(argc,argv)
char *argv[];
{
unsigned char ibuf[MAXBUFSIZ];
unsigned char obuf[MAXBUFSIZ*2];
int bksize=MAXBUFSIZ;
int olen;
int ilen;
int dcmp=0;
int x;

/* Uncompress if name starts with 'u' */
for(x=strlen(argv[0])-1;x>0;--x) if(argv[0][x-1]=='/') break;
if(argv[0][x]=='u') dcmp=1;

/* Parse options */
for(x=1;x!=argc;++x)
if(!strcmp(argv[x],"-d")) dcmp=1;
else if(!strcmp(argv[x],"-c")) dcmp=0;
else if(!strcmp(argv[x],"-o"))
{
sscanf(argv[x+1],"%d",&order);
++x;
}
else if(!strcmp(argv[x],"-b"))
{
sscanf(argv[x+1],"%d",&bksize);
++x;
}
else
{
fprintf(stderr,"Compress/Decompress from stdin to stdout\n");
fprintf(stderr,"-d Decompress\n");
fprintf(stderr,"-c Compress\n");
fprintf(stderr,
"-o order Set ORDER of statistical model (0-7, default=3)\n");
fprintf(stderr,"-b blksize Set block size (max=65535, default=65535)\n");
exit(1);
}

if(dcmp)
{ /* Uncompress */
bksize=(unsigned)getchar(); /* Get block size */
bksize+=(unsigned)getchar()<<8;
while((ilen=getchar())!= -1) /* Get compressed block size */
{
ilen+=getchar()<<8;
if(!ilen) /* If last block, get final block size */
{
bksize=(unsigned)getchar();
bksize+=(unsigned)getchar()<<8;
fread(ibuf,1,MAXBUFSIZ,stdin);
}
else fread(ibuf,1,ilen,stdin);
dcomp(obuf,ibuf,bksize); /* Uncompress a block */
fwrite(obuf,1,bksize,stdout); /* Output the block */
if(!ilen) break;
}
}
else
{ /* Compress */
putchar(bksize); /* Output block size */
putchar(bksize>>8);
do
{
ilen=fread(ibuf,1,bksize,stdin); /* Input a block */
if(ilen>0)
{
olen=comp(obuf,ibuf,ilen); /* Compress a block */
if(ilen!=bksize)
putchar(0), /* Indicate end of file */
putchar(0),
putchar(ilen&255), /* Output final block size */
putchar(ilen>>8);
else
putchar(olen&255), /* Output compressed block size */
putchar(olen/256);
fwrite(obuf,1,olen,stdout);
}
} while(ilen==bksize);
}
}
--
/* jha...@world.std.com (192.74.137.5) */ /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Tom Lane

unread,

Mar 19, 1996, 3:00:00 AM3/19/96

jha...@world.std.com (Joseph H Allen) writes:
> You could improve compression ratios (but not speed) by using arithmetic
> coding instead, but there would then be a patent problem.

I believe that not *all* forms of arithmetic coding are patented.
If you just implement the basic arith coder discussed by, eg, Witten,
Neal, and Cleary ("Arithmetic Coding for Data Compression", Comm. ACM
June 1987), you should be pretty safe. Slow, but safe. The patented
variants are things like IBM's "Q-coder", whose main claim to fame is
avoiding any multiplications or divisions.

The comp.compression FAQ lists several free implementations of
various flavors of arithmetic coding in its question 13.

regards, tom lane
organizer, Independent JPEG Group

Mark Nelson

unread,

Mar 19, 1996, 3:00:00 AM3/19/96

Joseph H Allen wrote:
> On the other hand it doesn't violate any patents, it's in the
> public domain and the code's tiny enough (about 10K) so that I can
> just post it here. If you have any ideas for improving the speed,
> let me know :-)
>
> You could improve compression ratios (but not speed) by using
> arithmetic coding instead, but there would then be a patent problem.
>

I noticed that DCC 96 will be having a presentation on data compression
patents. It would be really great if an attendee would volunteer to
summarize the material and post it to this newsgroup. That way those
of us not lucky enough to can get the latest info on this all-important
topic. Any volunteers?

Mark Nelson
http://web2.airmail.net/markn

Joseph H Allen

unread,

Mar 19, 1996, 3:00:00 AM3/19/96

In article <TGL.96Ma...@netcom16.netcom.com>,

Tom Lane <t...@netcom.com> wrote:
>jha...@world.std.com (Joseph H Allen) writes:

>> You could improve compression ratios (but not speed) by using arithmetic
>> coding instead, but there would then be a patent problem.

>I believe that not *all* forms of arithmetic coding are patented.

>If you just implement the basic arith coder discussed by, eg, Witten,
>Neal, and Cleary ("Arithmetic Coding for Data Compression", Comm. ACM
>June 1987), you should be pretty safe. Slow, but safe. The patented
>variants are things like IBM's "Q-coder", whose main claim to fame is
>avoiding any multiplications or divisions.

>The comp.compression FAQ lists several free implementations of
>various flavors of arithmetic coding in its question 13.

Does 'ha' use the Q-coder? 'ha' shows how impressive arithmetic coding with
higher order statistical modelling actually is.

Does LZ77 with a brute force string search violate any patents? If not,
then another possible free algorithm would be LZ77 using a skip-list
database for the dictionary.

Skip-lists are fast ordered databases which were only recently invented (I
don't have a paper reference handy- see the comp.programming faq I think).
They use a random number generator to choose the size of each new node which
is inserted into the list. Larger nodes have more forward 'skip' pointers
and hence have more reach. Larger nodes occur less frequently than smaller
ones due to a weighting on the random number genertor. The list is always
nearly balanced because random numbers are used when inserting.

Anyway, they are not trees so they may not be covered under any of the LZ77
patents. You could use them to store all three or four character strings
along with pointers to where the string originated in the input. I.E.,
use them to replace the hash table as used in zip.

0 new messages