Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Compression in Perl

0 views
Skip to first unread message

Igor Vulfson

unread,
Jul 1, 1997, 3:00:00 AM7/1/97
to

Does anyone have a code that would compress a string of upper case
characters into a shorter string? - will need to decompress later!

Thanks,
iv
--
Igor Vulfson | Email: mailto:ivul...@fedex.com
Senior Scientific Programmer | URL: http://www.magibox.net/~unabest/
Operations Research, FedEx | Work: (901)395-7358 Home:
(901)624-0776

Quentin Fennessy

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to

In article <33B990...@styx.or.fedex.com>,

Igor Vulfson <ig...@styx.or.fedex.com> wrote:
>Does anyone have a code that would compress a string of upper case
>characters into a shorter string? - will need to decompress later!

Check out CPAN - there is a Compress::Zlib module that interfaces
with Infozip.


--
Quentin Fennessy AMD, Austin Texas

Tom Phoenix

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to Igor Vulfson

On Tue, 1 Jul 1997, Igor Vulfson wrote:

> Does anyone have a code that would compress a string of upper case
> characters into a shorter string? - will need to decompress later!

There are many compression algorithms, depending upon your needs. But any
publicly-available modules should be on CPAN. Tell us more if you need
something more. Hope this helps!

--
Tom Phoenix http://www.teleport.com/~rootbeer/
root...@teleport.com PGP Skribu al mi per Esperanto!
Randal Schwartz Case: http://www.rahul.net/jeffrey/ovs/


Igor Vulfson

unread,
Jul 2, 1997, 3:00:00 AM7/2/97
to Tom Phoenix

Tom Phoenix wrote:
> There are many compression algorithms, depending upon your needs. But any
> publicly-available modules should be on CPAN. Tell us more if you need
> something more. Hope this helps!

I have no option of using Compress:Zlib because it requires zlib,
which in turn, needs to be compiled on the machine, which is out of
the question because I don't have C compiler on Irix 5.3 (out web
server), which sucks.

What I need is quick and dirty way to shrink a string in size
either using pack/unpack or some other means that I am not aware of.

iv ;)

Lloyd Zusman

unread,
Jul 3, 1997, 3:00:00 AM7/3/97
to

On Wed, 2 Jul 1997 07:53:26 -0700, Tom Phoenix <root...@teleport.com> wrote:
> On Tue, 1 Jul 1997, Igor Vulfson wrote:
>
> > Does anyone have a code that would compress a string of upper case
> > characters into a shorter string? - will need to decompress later!
>
> There are many compression algorithms, depending upon your needs. But any
> publicly-available modules should be on CPAN. Tell us more if you need
> something more. Hope this helps!

I have invented a compression algorithm that will compress any data
down to one bit.

I'm still working on the decompression algorithm, however.


--
Lloyd Zusman
l...@asfast.com

Randy J. Ray

unread,
Jul 3, 1997, 3:00:00 AM7/3/97
to

Igor Vulfson <ig...@styx.or.fedex.com> writes:
> Does anyone have a code that would compress a string of upper case
> characters into a shorter string? - will need to decompress later!

There is a module to interface to the libz compression code, called
Compress::Zlib, and available at your nearest CPAN site.

If you are compressing smaller strings, with a more tightly-defined alphabet
(such as only A-Z), you can devise a much more effiecient scheme using Huffman
codes or something similar.

Randy
--
===============================================================================
Randy J. Ray -- U S WEST Technologies IAD/CSS/DPDS Phone: (303)595-2869
Denver, CO rj...@uswest.com
"It's not denial. I'm just very selective about the reality I accept." --Calvin

Andrew M. Langmead

unread,
Jul 3, 1997, 3:00:00 AM7/3/97
to

Igor Vulfson <ig...@styx.or.fedex.com> writes:

>Does anyone have a code that would compress a string of upper case
>characters into a shorter string? - will need to decompress later!

Since in another post you mentioned that compilers are unobtainable,
here are some other ideas.

1. Since there are only uppercase letters, you only have 26
combinations, the entire set fits into 6 bits. This would cause about
a 25% savings.

#!/usr/bin/perl -w

use strict;

my $string = 'FATSTRING';
my (%compress, %uncompress);
my ($compressed_data, $compressed_chunk,$uncompressed_data);

# set up the mapping between characters and bitpatterns.
my $val = 0;
my $letter;
for $letter ( 'A' .. 'Z') {
$compress{$letter} = substr(unpack("B*", pack "C", $val++), 2); #each letter encoded in 6 bits
}

%uncompress = reverse %compress; # each 6 bit pattern corresponds to a letter.

# compressing
my $bitstring = '';
for $letter (split //, $string) { # for each character in the string.
$bitstring .= $compress{$letter}; # find the corresponding bit pattern.
}
$bitstring .= "0" x (8 - length($bitstring) % 8); # pad to byte boundry
$compressed_data = pack 'B*', $bitstring; # pack into bytes.

# uncompressing
$uncompressed_data = '';
for $compressed_chunk (unpack # for each six bit chunk.
'A6' x (length($compressed_data) * 8/ 6),
unpack 'B*', $compressed_data) {
$uncompressed_data .= $uncompress{$compressed_chunk}; #find its character.
}


print "$uncompressed_data\n";

2. You might be able to compress further by making the bitpatterns
variable length. More frequent characters get a smaller bitpattern.

3. For anything smaller, check with the people hanging around
comp.compression.

--
Andrew Langmead

0 new messages