inflate returns Z_STREAM_END before end of file

GH

unread,

Mar 7, 2012, 10:04:10 AM3/7/12

to

I am using inflate() from zlib to uncompress a compressed file which
is prepared by a third party by some unknown program but very likely
gzip (as inflate partially succeeds when I call inflateInit2 with 0x10
bit of second arg set) . I am now puzzled by the following:
1. gzip -d uncompresses the file successfully.
2. gzip -d followed by gzip -1, or gzip -9, or gzip produces a
different file than the original compressed file.
3. my code using inflate uncompresses the files produced in step 2
without a problem.
4. same code only uncompresses the beginning part of the file, with
inflate() returning Z_STREAM_END while ds.avail_in and ds.avail_out
are both still positive. The size of the successfully uncompressed
content is about half GB.
5. I have a number of other files from the same third party for which
I observed 1 to 4.
Now I ran out of ideas about what is going on and how to modify my
code to deal with the original uncompressed files. Can someone shed
some light?

GH

unread,

Mar 7, 2012, 10:41:56 AM3/7/12

to

I searched the post and an ancient post mentioned "reset ZLib" which I
think may help in my case. Can someone elaborate on what "reset ZLib"
means and how to do that?

John Reiser

unread,

Mar 7, 2012, 4:44:14 PM3/7/12

to

On 03/07/2012 07:04 AM, GH wrote:
> I am using inflate() from zlib to uncompress a compressed file which
> is prepared by a third party by some unknown program but very likely
> gzip (as inflate partially succeeds when I call inflateInit2 with 0x10
> bit of second arg set) . I am now puzzled by the following:
> 1. gzip -d uncompresses the file successfully.

Good.

> 2. gzip -d followed by gzip -1, or gzip -9, or gzip produces a
> different file than the original compressed file.

Usually there are zillions of possible encodings, and many possible
resulting lengths. Choosing which matches and which encodings is an art,
often depending on the predicted time required to perform the choices.
The distance backwards may be limited, the length of a match may be limited,
the amount of searching at any position may be limited, some matches may be
easier than others to find or remember, the Huffman encoding need not be
canonical [canonical is *not* always best!], etc.

> 3. my code using inflate uncompresses the files produced in step 2
> without a problem.

Good.

> 4. same code only uncompresses the beginning part of the file, with
> inflate() returning Z_STREAM_END while ds.avail_in and ds.avail_out
> are both still positive. The size of the successfully uncompressed
> content is about half GB.

See ADVANCED USAGE in the manual page documentation for gzip. In some
ways gzip commutes with concatenation. A single call to inflate()
will stop at the end of a single member of a concatenation.

> 5. I have a number of other files from the same third party for which
> I observed 1 to 4.

Systematic exploitation of known properties can be good.

--

Mark Adler

unread,

Mar 9, 2012, 9:31:28 PM3/9/12

to

On 2012-03-07 07:04:10 -0800, GH said:
> 2. gzip -d followed by gzip -1, or gzip -9, or gzip produces a
> different file than the original compressed file.

Perfectly normal. There is no guarantee that even different versions
of the same program will produce the same output, only that the result
of decompression is exactly what was provided to the compressor. Small
improvements in compression algorithms can result in different
compressed data for the same uncompressed input.

> 4. same code only uncompresses the beginning part of the file, with
> inflate() returning Z_STREAM_END while ds.avail_in and ds.avail_out
> are both still positive. The size of the successfully uncompressed
> content is about half GB.

Just run inflate again like you did the first time and keep going from
where you left off. You have a file that is a concatenation of a
series of gzip streams. This is permitted by the standard, and handled
automatically by gzip and the gz* functions in zlib.

To restart inflate, use inflateReset() instead of inflateEnd() and
inflateInit2(), so as to avoid unnecessary memory release and
reallocation.

Mark

GH

unread,

Mar 13, 2012, 10:06:18 AM3/13/12

to

Many thanks! Everything turned out exactly as you described.

lander...@gmail.com

unread,

Jul 24, 2017, 4:45:48 AM7/24/17

to

在 2012年3月8日星期四 UTC+8上午5:44:14，John Reiser写道：

ways gzip commutes with concatenation." that is exactly my case!! thanks!