I need to compress an Oracle database as it is being backed-up to tape.
gzip gives me 10x ~ 12x compression (80 GB to ~7 GB) but takes 2h30 to
do so.
I would need a maybe not so good compressor that can do better than 4x
but faster than gzip (1h?).
Unfortunately, using the hardware compressor in the tape device is not
an option.
Thanks for comments and ideas,
Toni
Try gzip -1.
mark
Thanks Mark,
I have tried. This gives me a 1.5 ~ 1.8x increase in speed but i'd need
some more.
Right now I'm also experimenting with compress and bzip2. Other ideas I
want to try are zip and rar (if rar exists for unix).
Toni
Maybe this could help you :
http://www.maximumcompression.com/data/summary_mf3.php#data
If gzip -1 doesn't do it for you, then try lzop.
mark
Yep, LZOP is the anwser here!.
Toni
I went through your dilemma last year and came up with three options:
LZRW1+RLE (which you've already worked with)
LZO
LZP (Charles Bloom)
Each of them has advantages and disadvantages, so it is best to
implement all three and see which one works best for your environment.
It just dawned on me that you might not be aware of the
advantages/disadvantages. Here's my opinions, based on some informal
research of last year (my focus was primarily DEcompression speed on
embedded 8088 platforms with low speed/low memory):
LZRW1+RLE
Pros: Decompression of runs is the fastest of all three if your CPU has
a repeating store opcode like x86's REP STOS
Cons: Compression is worst of all three
LZO
Pros: Pre-written for you as the LZO C library; decade-old library is
thoroughly tested and optimized; speed is excellent for a "generic"
multi-platform library
Cons: No official documentation for exactly *what* LZO algorithms do so
if you don't use C the library you have a lot of porting work ahead of
you (however, a simple easy-to-understand Java port exists if you
understand Java)
LZP
Pros: Simple, clever concept; easy to implement; compression speed
excellent.
Cons: Requires same amount of memory for decompression as compression
(due to offsets being stored in the hash table)
BTW, all of the above assumes you are using byte-aligned or
nybble-aligned codes (ie. by "LZP" I am *not* talking about the
PPM-like order-2 variable-bit code stuff). Also, if you plan to
re-implement any of the above for a specific language or platform, you
have to try to understand how they work so you can pick the best one.
In my case, my 8088 environment only has three memory pointers
available, 64KB total, slow memory speed, and a 4-byte prefetch queue
so I had to take these factors into account when choosing what I was
going to use.
Starting with file of 248,046,595 bytes:
lzop -1 87790913 (0.354) 4.1
lzop -2 87565567 (0.353) 3.9
lzop -7 68455488 (0.276) 56.2
lzop -9 67886068 (0.274) 160.7
gzip -1 70038655 (0.282) 16.8
gzip -6 60523978 (0.244) 32.0
gzip -9 60025123 (0.242) 112.6
bzip2 48528470 (0.196) 214.8
Decompression times in seconds, on the default lzop -2, gzip -6, and
bzip2 output:
lzop 1.7
gun 2.2
gzip 3.6
bzip2 37.7
"gun" is a gzip decompressor that uses the latest version of zlib and
as a result is faster than gzip. One of these days gzip will be
updated accordingly.
To first order, lzop got 3:1, gzip 4:1, and bzip2 5:1 compression.
gzip took eight times as long as lzop, and bzip2 took seven times as
long as gzip.
By the way, lzop -3 through lzop -6 produced the same compressed size
as lzop -2. I don't know why lzop -2 got better compression in less
time than lzop -1, but perhaps that's why lzop -2 is the default.
What's clear is that you should just stick with the default for lzop,
which is lzop -2. For higher compression levels, you should use gzip
instead. Similarly, you needn't bother with gzip -9. For higher
compression than the default gzip -6, you should probably use bzip2.
mark
Finally I've stayed with gzip -1. The client did not ever want to hear
on any compressioon tool wich was not universally known (...by them).
I like compression algorithms and have experimented a bit with them,
mostly of the RLE kind when I workd in real-time embedded systems in a
past life (collecting servo data and similar). Now with clients who
only use computers as a resource and dont' know nothing about them it
is much more difficult to do anything at all.
Thanks again,
Toni
Please give the absolute version number of zlib that was tested.
"The latest version of zlib" is a relative designation whose meaning
may vary, especially from the point of view of the developer of zlib
decompression [Mark Adler] in contrast to an ordinary user.
According to http://www.zlib.net , as of August 7th, 2005 the latest
version of zlib is 1.2.3, dated July 18, 2005. Is that the one?
> One of these days gzip will be updated accordingly.
A similar proposal for re-implmenting gzip using zlib has been
around for several years, but I can find little evidence of action.
--
Yes, per the web site that is the latest version of zlib. gun.c is in
the examples directory.
> A similar proposal for re-implmenting gzip using zlib has been
> around for several years, but I can find little evidence of action.
That would be because there has been no action.
mark
This however runs both ways.
If the programmer gets too fancy with his code he could well end up
developing a system that is hard/expensive to maintain if the original
programmer is no longer available. Simpler systems are better in that sense.
However, the client is not the programmer or the availablle market of
programmers out there. What would you say if the client insists on using a
Bubble Sort because that is what they understand? And no matter how much you
warn them, they insist on that code being used?
I had the same happen to me when I had written a inventory program that used
fixed indexes for speed. The customer insisted on making the indexes as
small as possible to get max. speed (yes this was a few years ago), I kept
asking to double the size of the indexes to be safe for future expansion.
Guess which company does not exist today? No, it was not because of the
program, but it really was because they did not plan for radical changes in
the boat building industry, they got left in the dust in the last industry
dip - all the other companies around them still exist today.
Earl Colby Pottinger
--
I make public email sent to me! Hydrogen Peroxide Rockets, OpenBeos,
SerialTransfer 3.0, RAMDISK, BoatBuilding, DIY TabletPC. What happened to
the time? http://webhome.idirect.com/~earlcp