On 03/07/2012 07:04 AM, GH wrote:
> I am using inflate() from zlib to uncompress a compressed file which
> is prepared by a third party by some unknown program but very likely
> gzip (as inflate partially succeeds when I call inflateInit2 with 0x10
> bit of second arg set) . I am now puzzled by the following:
> 1. gzip -d uncompresses the file successfully.
Good.
> 2. gzip -d followed by gzip -1, or gzip -9, or gzip produces a
> different file than the original compressed file.
Usually there are zillions of possible encodings, and many possible
resulting lengths. Choosing which matches and which encodings is an art,
often depending on the predicted time required to perform the choices.
The distance backwards may be limited, the length of a match may be limited,
the amount of searching at any position may be limited, some matches may be
easier than others to find or remember, the Huffman encoding need not be
canonical [canonical is *not* always best!], etc.
> 3. my code using inflate uncompresses the files produced in step 2
> without a problem.
Good.
> 4. same code only uncompresses the beginning part of the file, with
> inflate() returning Z_STREAM_END while ds.avail_in and ds.avail_out
> are both still positive. The size of the successfully uncompressed
> content is about half GB.
See ADVANCED USAGE in the manual page documentation for gzip. In some
ways gzip commutes with concatenation. A single call to inflate()
will stop at the end of a single member of a concatenation.
> 5. I have a number of other files from the same third party for which
> I observed 1 to 4.
Systematic exploitation of known properties can be good.
--