Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

zlib and SSH2 compression

204 views
Skip to first unread message

yawnmoth

unread,
Apr 28, 2009, 12:34:10 PM4/28/09
to
I'm trying to use zlib to inflate / deflate SSH messages and am having
some difficulty.

zlib produces the following for the initial SSH_MSG_SERVICE_REQUEST:

63:65:60:60:e0:29:2e:ce:d0:2d:2d:4e:2d:4a:2c:2d:c9:00:00

...which corresponds to the following:

05:00:00:00:0c:73:73:68:2d:75:73:65:72:61:75:74:68

...or ".....ssh-userauth", where the non-ASCII printable characters
have been replaced with .'s.

If, however, I don't prepend 78:9c: to that I don't get any response
from the server. So I add it and get this in response:

78:9c:62:63:60:60:e0:29:2e:ce:d0:2d:2d:4e:2d:4a:2c:2d:c9:00:08

zlib doesn't like that, however (gets me a "data error"), unless I
remove the first three characters. So I remove them and get this:

00:00:00:0c:73:73:68:2d:75:73:65:72:61:75:74:68

...which corresponds to "....ssh-userauth". Problem with this is that
it's not a valid response. There should be five bytes preceeding the
'ssh-userauth' - one for the message type - presumably
SSH_MSG_SERVICE_ACCEPT - and four for the length of the string. The
message type byte is missing.

My question is two fold.

Why is prepending 78:9c: to the compressed (deflated)
SSH_MSG_SERVICE_REQUEST necessary and why is the uncompressed
(inflated) SSH_MSG_SERVICE_ACCEPT missing it's first byte?

(I got the idea of prepending 78:9c: from PuTTY's SSHZLIB.C - the
"Provide missing zlib header if -d was specified." part of it)

Simon Tatham

unread,
Apr 28, 2009, 1:54:00 PM4/28/09
to
yawnmoth <terr...@yahoo.com> wrote:
> Why is prepending 78:9c: to the compressed (deflated)
> SSH_MSG_SERVICE_REQUEST necessary

There are three very similar data formats based on the same kind of
compression, two of which are wrappers on the third.

Deflate (RFC 1951) is a bare compressed-data format with no header
or checksum information. The GZIP file format (RFC 1952) and the
ZLIB data stream format (RFC 1950) are different wrappers on
Deflate. GZIP includes a longish header containing various fields
such as file name, and the compressed data is followed by a CRC.
ZLIB replaces GZIP's long header with a short header fixed at two
bytes (typically the bytes 78 9C, though not necessarily in all
cases) and replaces GZIP's CRC with an Adler32 checksum. Both
wrapper formats in principle contain header fields which could
select a core compression format other than Deflate, but in practice
neither one has defined any value for that field other than the one
that selects Deflate.

The format of the zlib compressed data in SSH is defined to be the
ZLIB format, i.e. RFC 1950. Thus all SSH zlib-compressed data
streams should begin with the 2-byte ZLIB header, although the
Adler32 checksum never appears in SSH (since SSH compressed data
streams never formally terminate - at rekey they are simply chopped
off unceremoniously).

So the 78 9C on the front is correct; your question should not be
'why does the ssh server require it?' but rather 'why is zlib not
putting it there?'. I'm afraid I can't comment on why zlib is not
putting it there, since you haven't given details of how you're
calling zlib (and in any case my personal experience is with writing
and using my own equivalent code, and I have relatively little
knowledge of the API of zlib proper).

Applying my personal Deflate/ZLIB/GZIP analysis tool to your example
streams tells me a couple more important points:

Firstly, the data stream you sent to the server _terminates_. The
first and only compressed block within it has the 'final' flag set
in its 3-bit header, meaning that the decompressor expects to see no
blocks after that. This is invalid in an SSH connection: the
compressed data stream should be nonterminating, so no block should
ever have its 'final' flag set at all. So there's something else
wrong with your use of zlib, though again you haven't provided
enough information to tell what. My guess is that the server won't
immediately complain at this treatment, but you'd get an unfriendly
response when you sent on your _next_ packet.

Secondly, decoding the data you quote from the server says that it
does decompress to the bytes 06 00 00 00 0C "ssh-userauth"; the
compressed data you give here _does_ include the leading 06
[SSH_MSG_SERVICE_ACCEPT] byte as it should have. Again, the problem
is therefore not with the data you've received but with your
decoding of it, and I'm presuming that it's because you're using
zlib wrong rather than that zlib itself is buggy.
--
Simon Tatham "Selfless? I'm so selfless I
<ana...@pobox.com> don't even know who I am."

yawnmoth

unread,
Apr 28, 2009, 5:24:02 PM4/28/09
to
On Apr 28, 12:54 pm, Simon Tatham <ana...@pobox.com> wrote:

> yawnmoth  <terra1...@yahoo.com> wrote:
> Secondly, decoding the data you quote from the server says that it
> does decompress to the bytes 06 00 00 00 0C "ssh-userauth"; the
> compressed data you give here _does_ include the leading 06
> [SSH_MSG_SERVICE_ACCEPT] byte as it should have. Again, the problem
> is therefore not with the data you've received but with your
> decoding of it, and I'm presuming that it's because you're using
> zlib wrong rather than that zlib itself is buggy.

I'm using PHP's bindings to zlib. There are two functions that
provide zlib compression to PHP - gzcompress() and gzdeflate().
gzdeflate() omits the two byte header and the four byte checksum
whereas gzcompress() includes them.

I could use the output of gzcompress(), and remove the last four
bytes, however, there'd still be the issue of the 'final' flag in the
last blocks 3-bit header. I could use a bitmask to adjust the value
of that individual byte, however, there's still the question of where
the last block is.

According to RFC1951, "256 indicates end-of-block". ie. 0x100. Only
problem is that can be split up across multiple bytes. You could have
0x00 as one byte with the least significant bit of the previous byte
equal to 1 or you could have 0x10 with the most significant bit of the
next byte being 0. Or maybe the end-of-block is always supposed to
end with the null byte 0x00?

I guess one thing that's unclear to me... say 0x100 were split up
over two bytes as 1000 0000 0xxx xxxx. Could those last 7 bytes be
anything?

Anyway, thanks for your input - I appreciate it!

Simon Tatham

unread,
Apr 28, 2009, 7:18:12 PM4/28/09
to
yawnmoth <terr...@yahoo.com> wrote:
> I'm using PHP's bindings to zlib. There are two functions that
> provide zlib compression to PHP - gzcompress() and gzdeflate().
> gzdeflate() omits the two byte header and the four byte checksum
> whereas gzcompress() includes them.

<hasty google> You mean the bindings documented at
http://php.net/zlib ?

It doesn't look to me as if those bindings have enough
expressiveness to get zlib to generate a stream appropriate for SSH,
unfortunately.

A compressed SSH data stream consists of a collection of compressed
packets which _when concatenated end to end_ produce a single
ongoing Deflate stream that decompresses to the right plaintext. So
you have to be able to generate that Deflate stream incrementally,
by handing zlib a few extra bytes and having it output the next
piece of the compressed stream; but then you also have to _flush_
the output, i.e. make sure that the compressed bytes output so far
contain enough data to decode all the plaintext. This is typically
done by transmitting the end-of-block code (symbol 256) and then
starting a fresh block, which generates enough bits that the last
bit before that must have been output in a full byte.

In the underlying C zlib, this is done by calling the deflate()
function with Z_PARTIAL_FLUSH or Z_SYNC_FLUSH as its second
argument, but it doesn't look as if the PHP bindings expose that
interface at all. There's also a gzflush() which might work (I
haven't looked closely enough to be sure), but apparently that isn't
exposed in the PHP interface either even though most of the other
gz*() functions are.

> According to RFC1951, "256 indicates end-of-block". ie. 0x100.

You're missing a layer of encoding. Remember that everything in the
Deflate stream is Huffman-encoded, so you won't see the literal
nine-bit value 100000000 stored anywhere in the stream. For a block
with static Huffman trees (which will be most of them, in an SSH
stream with many small packets) it'll be seven zero bits (starting
at the right point - not just any seven consecutive zero bits in the
stream); with dynamic trees, all bets are off and there's no fixed
appearance for the end-of-block code.

> I guess one thing that's unclear to me... say 0x100 were split up
> over two bytes as 1000 0000 0xxx xxxx. Could those last 7 bytes be
> anything?

Ignoring the above for the moment and supposing the block-end code
_was_ encoded 100000000:

The zlib compressor will have a layer that outputs arbitrary
collections of bits, and then those bits are passed on to a final
layer that collects those bits into bytes and outputs a byte each
time it fills one up.

So zlib, when passed some cleartext and told to Z_PARTIAL_FLUSH,
would output (after everything else) those nine bits 100000000 into
its byte-collecting layer. The byte-collecting layer might then
output the first eight bits as a byte, and would hold on to the
final zero bit and wait for seven more bits to go with it. So you'd
transmit an SSH packet whose compressed data ended in that "1000
0000" byte, and then the first bit of the compressed data in the
_next_ packet would contain that last zero bit followed by the next
three-bit block header and more data. And the point is that by doing
this, you ensure that the zlib byte-collecting layer isn't holding
back any bits that are vital to understanding the data in the actual
SSH packet, because anything it's cached is part of the block-end
and block-start codes that don't encode any actual uncompressed
text.

Does any of that help make it clear? I'm afraid the news doesn't
seem to be good about the PHP zlib bindings, but I hope I've at
least managed to impart some understanding of what's needed.
--
Simon Tatham "Every person has a thinking part that wonders what
<ana...@pobox.com> the part that isn't thinking isn't thinking about."

yawnmoth

unread,
Apr 28, 2009, 8:38:21 PM4/28/09
to
On Apr 28, 6:18 pm, Simon Tatham <ana...@pobox.com> wrote:

> yawnmoth <terra1...@yahoo.com> wrote:
> > I'm using PHP's bindings to zlib.  There are two functions that
> > provide zlib compression to PHP - gzcompress() and gzdeflate().
> > gzdeflate() omits the two byte header and the four byte checksum
> > whereas gzcompress() includes them.
>
> <hasty google> You mean the bindings documented athttp://php.net/zlib?
>
> It doesn't look to me as if those bindings have enough
> expressiveness to get zlib to generate a stream appropriate for SSH,
> unfortunately.
>
> A compressed SSH data stream consists of a collection of compressed
> packets which _when concatenated end to end_ produce a single
> ongoing Deflate stream that decompresses to the right plaintext.

Sounds just like how SSH handles encryption - treating everything as
one continuous block as opposed to a bunch of discontinuous ones.
PHP's mcrypt bindings don't support this, persay, but with CBC
encryption, you can always work around it by setting the IV for the
next block to the last block of the ciphertext.

Of course, IV's don't really serve much point in compression, heh.

Anyway, your explanations have indeed improved my understanding and
I'm very appreciative of them - thanks! :)

Simon Tatham

unread,
Apr 29, 2009, 2:37:53 AM4/29/09
to
yawnmoth <terr...@yahoo.com> wrote:
> Sounds just like how SSH handles encryption - treating everything as
> one continuous block as opposed to a bunch of discontinuous ones.
> PHP's mcrypt bindings don't support this, persay, but with CBC
> encryption, you can always work around it by setting the IV for the
> next block to the last block of the ciphertext.

Yes. Unfortunately, with compression there's a lot more state you'd
have to restart (the last 32K of data for spotting backward matches
in, any hash table set up over that data to make it efficient, and
then other little details like any collection of bits left over from
the previous block), so there isn't really any practical alternative
_but_ to have a compression library that supports that mode of use.
--
Simon Tatham "infinite loop _see_ loop, infinite"
<ana...@pobox.com> - Index, Borland Pascal Language Guide

0 new messages