Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ZLIB, GZIP, COMPRESS

2,523 views
Skip to first unread message

Stephen Howe

unread,
Oct 27, 2009, 6:33:38 PM10/27/09
to
I have a task where I need to write in Win32 some code that can uncompress GZIP files.
Ideally it leaves the original compressed file alone and writes the uncompressed file(s) to a specified directory.

1) But I am unsure if the files have been created by
UNIX compress
UNIX pack
UNIX GZIP
ZLIB?
Are there any utilities that can display what the compressed file type is?
If I use a Win32 version of GZIP, it reveals that the file is "deflated" but not the compressed file type.
GZIP is unfriendly in that the original compressed files is deleted on uncompression (which it succeeds at)

2) I have tried using ZLIB and got nowhere.
It maybe that the file has been created by COMPRESS (dont know) and the FAQ indicates that it is not supported.

Adapting inf(), regardless as to whether I call inflateInit() or inflateInit2() (and I have no idea what windowBits should be
set to), it dies in inflate() with Z_DATA_ERROR returned.

3) If i try running minigzip_d.exe, what is distributed with ZLIB1.DLL, it fails to uncompress what GZIP manages.
That might be because UNIX compress has been used.

4) WINZIP succeeds in uncompressing the same file.

So right now I am not sure where to go.
I have no way of diagnosing the ZIP file type.
And if UNIX compress has been used, I have not seen any equivalent libraries in the Win32 world that support uncompression.

Any advice or help welcome

Thanks

Stephen Howe

Stephen Howe

unread,
Oct 28, 2009, 7:21:49 AM10/28/09
to
Hi

I dont really understand why ZLIB cannot uncompress files produced by COMPRESS.
Both use and understand the deflate method.
I am wondering if I use inflateSync() to skip over COMPRESS's header, whether inflate() will then work

Cheers

Stephen Howe

Mark Nelson

unread,
Oct 28, 2009, 8:18:46 AM10/28/09
to

Stephen,

I'm confused about a couple of things.

First, when you posted this question to my blog, you reported that you
had a file that you could decompress with gzip, and yet minigzip would
not work.

But in the post com comp.compression you say that it decompresses
using WinZip.

I'm going to assume the answer is WinZip, which leads to the following
comments:

1) WinZip is a general purpose archiver that works with all sorts of
containers. Gzip is a special purpose container that only works with
just a couple of formats. Gzip is very limited in scope and will only
work with that specific (and fairly simple) file type.

2) Gunzip will decompress files created by compress, but minigzip
doesn't.

3) Files produced by compress are compressed using an implementation
of LZW. It has literally nothing in common with the deflate algorithm.

4) If you need to decompress files that are created by compress, pack,
and gzip (there is no such thing as a file created by zlib), your best
bet is simply to shell out to the appropriate decompressor once you
identifiy the file type. I believe that each of these file types can
be easily identified by a magic number.

5) Don't complain so much about what zlib doesn't do. It does what it
is supposed to do, and does it very well. However, it will not give
you a magic pony that flies you the moon. To suppose that it should is
presumptuous.

- Mark

Stephen Howe

unread,
Oct 28, 2009, 11:34:05 AM10/28/09
to
On Wed, 28 Oct 2009 05:18:46 -0700 (PDT), Mark Nelson <snork...@gmail.com> wrote:

>On Oct 28, 6:21�am, Stephen Howe <sjhoweATdialDOTpipexDOTcom> wrote:
>> Hi
>>
>> I dont really understand why ZLIB cannot uncompress files produced by COMPRESS.
>> Both use and understand the deflate method.
>> I am wondering if I use inflateSync() to skip over COMPRESS's header, whether inflate() will then work
>>
>> Cheers
>>
>> Stephen Howe
>
>Stephen,
>
>I'm confused about a couple of things.
>
>First, when you posted this question to my blog, you reported that you
>had a file that you could decompress with gzip, and yet minigzip would
>not work.

Yes that is correct.
I have to say I have been throught a learning curve.
For a while I thought it was a file that had been generated by UNIX COMPRESS (which would make it Lempel Ziv Welch (LZW) under
the hood). But now I think different. It really is a GZIP generated file.
(The team that generates these files did some checks).

>But in the post com comp.compression you say that it decompresses
>using WinZip.

That's right. That is because WinZip will load foreign compressed file formats and unzip them.
WinZip will uncompress GZIP files. And it understands GZIP headers.
It will display the modified date and packed size but not the original size or compression ratio.

The slightly irritating thing is their command line tool WZZIP will not uncompress GZIP files.
So you have to use their interactive WINZIP to uncompress GZIP files.
But it definitely works.
However I am looking for a programmatic solution.

>I'm going to assume the answer is WinZip, which leads to the following
>comments:
>
>1) WinZip is a general purpose archiver that works with all sorts of
>containers. Gzip is a special purpose container that only works with
>just a couple of formats. Gzip is very limited in scope and will only
>work with that specific (and fairly simple) file type.
>
>2) Gunzip will decompress files created by compress, but minigzip
>doesn't.

That is interesting.
That means Gunzip not only understands Deflate (from PKZIP originally) but also LZW.

>3) Files produced by compress are compressed using an implementation
>of LZW. It has literally nothing in common with the deflate algorithm.

Yup.

>4) If you need to decompress files that are created by compress, pack,

>and gzip (there is no such thing as a file created by zlib)...

I thought there was.
The documentation in ZLIB mentions that it has its own lightweight header, even lighter than GZIP.
The basic difference between ZLIB files and GZIP files is just the header.
After that, it is the file with the deflate method.

I have just discovered PACK (and COMPACT, and BZIP, BZIP2).
Unix seems to have more compress utilities than you can shake a stick at :-)

>, your best
>bet is simply to shell out to the appropriate decompressor once you
>identifiy the file type. I believe that each of these file types can
>be easily identified by a magic number.

Might be. ZLIB says it can uncompress GZIP files.
And I am now assured I have a GZIP file.
So all I have to do is persuade ZLIB to work.

>5) Don't complain so much about what zlib doesn't do.

I think it is fantastic at what it does do.
It is just that at the moment, I cant persuade it to work on the GZIP file that I have
(and am now 100% convinced it is a GZIP file).
But I am not high-and-dry.

I shall look at

(i) inflateSync() which skips headers and resyncs. inflate() might work after that.
(ii) all the g* functions which automate working on GZIP files. I will try these myself.

Thanks for the help Mark

Stephen Howe

Stephen Howe

unread,
Oct 28, 2009, 11:51:31 AM10/28/09
to
On Wed, 28 Oct 2009 15:34:05 +0000, Stephen Howe <sjhoweATdialDOTpipexDOTcom> wrote:

>That's right. That is because WinZip will load foreign compressed file formats and unzip them.
>WinZip will uncompress GZIP files. And it understands GZIP headers.
>It will display the modified date and packed size but not the original size or compression ratio.

And WinZIP also displays the filename contained within.
All this is consistent with what the GZIP header provides.

Cheers

Stephen Howe

Noob

unread,
Oct 28, 2009, 12:23:55 PM10/28/09
to
Stephen Howe wrote:

> I have just discovered PACK (and COMPACT, and BZIP, BZIP2).
> Unix seems to have more compress utilities than you can shake a stick at :-)

And then some.

http://www.maximumcompression.com/index.html
http://www.maximumcompression.com/data/summary_mf2.php

Mark Adler

unread,
Oct 31, 2009, 12:12:30 PM10/31/09
to
On 2009-10-27 15:33:38 -0700, Stephen Howe <sjhoweATdialDOTpipexDOTcom> said:
> 1) But I am unsure if the files have been created by
> UNIX compress
> UNIX pack
> UNIX GZIP
> ZLIB?
> Are there any utilities that can display what the compressed file type is?

Unix has a utility called "file" that looks at the first few bytes to
tell what type it is:

% file test.tar.Z test.tar.gz test.tar.zip test.tar.bz2 test.tar.zz
test.tar.Z: compress'd data 16 bits
test.tar.gz: gzip compressed data, from Unix
test.tar.zip: Zip archive data, at least v2.0 to extract
test.tar.bz2: bzip2 compressed data, block size = 900k
test.tar.zz: data

That last one is zlib compressed data, which the file command doesn't
recognize.

You can also look at the first two bytes yourself:

1f 9d: compress
1f 8b: gzip
50 4b ("PK"): zip
42 5a ("BZ"): bzip2

The probability of seeing these very old compressed formats is
vanishingly small:

1f 1e: pack
1f 50: LZH

You will not normally see a zlib stream sitting all by itself in a
file, but you can use the first two bytes to search for candidates in a
file with embedded zlib streams. (You would follow this with an
attempted decompression to see if it really is a zlib stream.) Most
commonly, a zlib stream will start with one of these pairs:

78 01
78 5e
78 9c
78 da

Much less commonly, a zlib stream can start with one of these pairs
(with the above, this is the exhaustive list of all 32 current
possibilities):

08 3c, 08 7a, 08 b8, 08 f6, 18 38, 18 76, 18 b4,
18 f2, 28 34, 28 72, 28 b0, 28 ee, 38 30, 38 6e,
38 ac, 38 ea, 48 2c, 48 6a, 48 a8, 48 e6, 58 28,
58 66, 58 a4, 58 e2, 68 24, 68 62, 68 bf, 68 fd

> I dont really understand why ZLIB cannot uncompress files produced by COMPRESS.
> Both use and understand the deflate method.

No, compress does not use deflate, it uses an entirely different LZW
method. gzip and pigz have separate code to decompress legacy compress
files. zlib only provides code to deal with the deflate format.

> And I am now assured I have a GZIP file.
> So all I have to do is persuade ZLIB to work.

Read the documentation in zlib.h, especially about the options for the
inflateInit2() function.

On 2009-10-28 05:18:46 -0700, Mark Nelson <snork...@gmail.com> said:
> However, it will not give you a magic pony that flies you the moon.

I am working on the magic-pony-to-the-moon thing for the next release.

Mark

Mark Adler

unread,
Oct 31, 2009, 12:47:46 PM10/31/09
to
On 2009-10-31 09:12:30 -0700, Mark Adler <mad...@alumni.caltech.edu> said:
> Much less commonly, a zlib stream can start with one of these pairs
> (with the above, this is the exhaustive list of all 32 current
> possibilities):

Oops, I forgot another option for zlib headers. Here is the full set of 64:

Common:

78 01, 78 5e, 78 9c, 78 da

Rare:

08 1d, 08 5b, 08 99, 08 d7, 18 19, 18 57, 18 95, 18 d3,
28 15, 28 53, 28 91, 28 cf, 38 11, 38 4f, 38 8d, 38 cb,
48 0d, 48 4b, 48 89, 48 c7, 58 09, 58 47, 58 85, 58 c3,
68 05, 68 43, 68 81, 68 de

Very rare:

08 3c, 08 7a, 08 b8, 08 f6, 18 38, 18 76, 18 b4, 18 f2,
28 34, 28 72, 28 b0, 28 ee, 38 30, 38 6e, 38 ac, 38 ea,
48 2c, 48 6a, 48 a8, 48 e6, 58 28, 58 66, 58 a4, 58 e2,

68 24, 68 62, 68 bf, 68 fd, 78 3f, 78 7d, 78 bb, 78 f9

Mark

0 new messages