# convert data to a byte array
data = array('b', raw_data)
# print checksum for visual inspection
print zlib.crc32(data.tostring())
# try to decompress, but fails!
str = zlib.decompress(data.tostring())
Does anyone know what's going on?
Hi Andre,
Hmm. Can you decompress the string on the server before it was sent?
Maybe the zipfile or gzip module will work.
Reference:
http://bytes.com/topic/python/answers/42131-zlib-decompress-cannot-gunzip-can
from cStringIO import StringIO
from gzip import GzipFile
body = GzipFile('', 'r', 0, StringIO(raw_data)).read()
You might want to try experimenting with the wbits parameter of
zlib.decompress()
Reference:
http://mail.python.org/pipermail/python-list/2008-December/691694.html
zlib.decompress(data, -15)
The zlib module seems to work fine with both strings and byte arrays.
import array, zlib
dataAsString = zlib.compress('example string')
dataAsArray = array.array('b', dataAsString)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray)
zlib.decompress(dataAsString) == zlib.decompress(dataAsArray.tostring())
We train ordinary people into Python software developers and connect them
with jobs and projects for local businesses.
Are you sure it is compressed with zlib? If yes, does it include the
standard zlib header? Some applications save a few bytes by stripping
the header. See the zlib doc page for how to deal with that, there is
a flag that causes the header check to be skipped on decompression if
you pass a negative number. That's the first thing I would try.
Short answer:
Try this:
zlib.decompress(incoming_data, -15)
If that doesn't work:
print repr(incoming_data[:30])
# post the results here
Longer answer:
A zlib stream consists of a deflate stream preceded by
a 2-byte header and followed by a 4-byte Adler32
checksum of the original data.
The problem occurs not out of a desire to save 6 bytes
but through compounding of 2 mistakes:
Mistake (1) is in the HTTP protocol.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html
The "deflate" content coding should have been called "zlib".
Read this and weep:
"""deflate The "zlib" format defined in RFC 1950 [31] in
combination with the "deflate" compression mechanism
described in RFC 1951 [29]."""
Mistake (2) happens when software implementers read only
the first word of the above quote and provide only a
deflate stream.
A reader can handle both possibilities by checking for a
(usual, default) zlib header:
data[0] == '\x78' and (ord(data[1]) + 0x7800) % 31 == 0
HTH,
John