Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

zipped socket

0 views
Skip to first unread message

John

unread,
Aug 8, 2005, 12:14:01 AM8/8/05
to

Is there anyway open a socket so that every send/listen/recv
goes thru a zipping/unzipping process automatically?

Thanks,
--j

jep...@unpythonic.net

unread,
Aug 8, 2005, 7:51:50 AM8/8/05
to John, pytho...@python.org
As far as I know, there is not a prefabbed solution for this problem. One
issue that you must solve is the issue of buffering (when must some data you've
written to the compressor really go out to the other side) and the issue of
what to do when a read() or recv() reads gzipped bytes but these don't produce any
additional unzipped bytes---this is a problem because normally a read() that
returns '' indicates end-of-file.

If you only work with whole files at a time, then one easy thing to do is use
the 'zlib' encoding:
>>> "abc".encode("zlib")
"x\x9cKLJ\x06\x00\x02M\x01'"
>>> _.decode("zlib")
'abc'
... but because zlib isn't self-delimiting, this won't work if you want to
write() multiple times, or if you want to read() less than the full file

Jeff

Peter Hansen

unread,
Aug 8, 2005, 8:19:22 AM8/8/05
to
John wrote:
>
> Is there anyway open a socket so that every send/listen/recv
> goes thru a zipping/unzipping process automatically?

You ought to be able to do this easily by wrapping a bz2 compressor
around the socket (maybe using socket.makefile() to return a file object
first) and probably using a generator as well:

http://effbot.org/librarybook/bz2.htm includes relevant examples (not
specifically with sockets though).

Googling for "python incremental compression" ought to turn up any other
alternatives.

-Peter

Bryan Olson

unread,
Aug 9, 2005, 9:30:56 PM8/9/05
to

That's basically a solved problem; zlib does have a kind of
self-delimiting. The key is the 'flush' method of the
compression object:

some_send_function( compressor.flush(Z_SYNC_FLUSH) )

The Python module doc is unclear/wrong on this, but zlib.h
explains:

If the parameter flush is set to Z_SYNC_FLUSH, all pending
output is flushed to the output buffer and the output is
aligned on a byte boundary, so that the decompressor can get
all input data available so far.


There's also Z_FULL_FLUSH, which also re-sets the compression
dictionary. For a stream socket, we'd usually want to keep the
dictionary, since that's what gives us the compression. The
Python doc states:

Z_SYNC_FLUSH and Z_FULL_FLUSH allow compressing further
strings of data and are used to allow partial error recovery
on decompression

That's not correct. Z_FULL_FLUSH allows recovery after errors,
but Z_SYNC_FLUSH is just to allow pushing all the compressor's
input to the decompressor's output.


--
--Bryan

0 new messages