Zlib: correct checksum but error decompressing

Andre

unread,

Aug 26, 2009, 10:19:42 AM8/26/09

to

I have been trying to solve this issue for a while now. I receive data
from a TCP connection which is compressed. I know the correct checksum
for the data and both the client and server generate the same
checksum. However, in Python when it comes to decompressing the data I
get the exception: "Error -5 while decompressing data"! I would assume
that if the string in python is equivalent to the correct checksum
than the decompress function should also work on the same string, but
that's clearly not the case.

# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())

Does anyone know what's going on?

InvisibleRoads Patrol

unread,

Aug 26, 2009, 11:01:37 AM8/26/09

to Andre, pytho...@python.org

On Wed, 26 Aug 2009 07:19:42 -0700 (PDT), Andre <andre...@gmail.com>
wrote:

Hi Andre,

Hmm. Can you decompress the string on the server before it was sent?
Maybe the zipfile or gzip module will work.
Reference:
http://bytes.com/topic/python/answers/42131-zlib-decompress-cannot-gunzip-can
from cStringIO import StringIO
from gzip import GzipFile
body = GzipFile('', 'r', 0, StringIO(raw_data)).read()

You might want to try experimenting with the wbits parameter of
zlib.decompress()
Reference:
http://mail.python.org/pipermail/python-list/2008-December/691694.html
zlib.decompress(data, -15)

The zlib module seems to work fine with both strings and byte arrays.
import array, zlib
dataAsString = zlib.compress('example string')
dataAsArray = array.array('b', dataAsString)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray.tostring())

--
http://invisibleroads.com

We train ordinary people into Python software developers and connect them
with jobs and projects for local businesses.

Paul Rubin

unread,

Aug 26, 2009, 5:57:25 PM8/26/09

to

Andre <andre...@gmail.com> writes:
> I have been trying to solve this issue for a while now. I receive data
> from a TCP connection which is compressed.

Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.

John Machin

unread,

Aug 26, 2009, 7:53:28 PM8/26/09

to pytho...@python.org

Paul Rubin <http> writes:

Short answer:

Try this:
zlib.decompress(incoming_data, -15)
If that doesn't work:
print repr(incoming_data[:30])
# post the results here

Longer answer:

A zlib stream consists of a deflate stream preceded by
a 2-byte header and followed by a 4-byte Adler32
checksum of the original data.

The problem occurs not out of a desire to save 6 bytes
but through compounding of 2 mistakes:

Mistake (1) is in the HTTP protocol.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
The "deflate" content coding should have been called "zlib".
Read this and weep:
"""deflate The "zlib" format defined in RFC 1950 [31] in
combination with the "deflate" compression mechanism
described in RFC 1951 [29]."""

Mistake (2) happens when software implementers read only
the first word of the above quote and provide only a
deflate stream.

A reader can handle both possibilities by checking for a
(usual, default) zlib header:

data[0] == '\x78' and (ord(data[1]) + 0x7800) % 31 == 0

HTH,
John