inflate/deflate

576 views
Skip to first unread message

Lehi Toskin

unread,
Jan 11, 2017, 10:23:47 PM1/11/17
to Racket Users
I'm looking at some data that was zlib compressed and I thought I'd inflate it with file/gunzip's deflate function, but I get "inflate: error in compressed data". I thought to try from the opposite direction, grabbing some deflated data from Racket and then asking zlib-flate to inflate it again, but then I get "flate: inflate: data: incorrect header check".

Are the inflate/deflate functions incompatible with zlib-compressed data? That doesn't seem to be right, since the documentation says it's pkzip and everything I've read says that pkzip is what zip uses.

Alex Harsanyi

unread,
Jan 11, 2017, 10:54:25 PM1/11/17
to Racket Users
Perhaps you should use `gunzip-through-ports` which also parses the required header?

Alex.

Lehi Toskin

unread,
Jan 11, 2017, 11:29:41 PM1/11/17
to Racket Users
If I run `gunzip-through-ports`, it errors out with "gnu-unzip: bad header"

Lehi Toskin

unread,
Jan 12, 2017, 1:01:04 AM1/12/17
to Racket Users
Interesting... If I prepend `(bytes #x78 #x9c)` to the compressed data created by deflate, zlib-flate will uncompress it. Same thing happens in reverse where I skip the first two bytes of the zlib-flate'd data and process it with inflate.

Ethan Estrada

unread,
Jan 12, 2017, 1:59:13 PM1/12/17
to Racket Users
On Wednesday, January 11, 2017 at 11:01:04 PM UTC-7, Lehi Toskin wrote:
> Interesting... If I prepend `(bytes #x78 #x9c)` to the compressed data created by deflate, zlib-flate will uncompress it. Same thing happens in reverse where I skip the first two bytes of the zlib-flate'd data and process it with inflate.

Then, is this a bug or something that should be an optional argument on the function like `#:ignore-initial-bytes #t`? I am not deeply familiar with the DEFLATE file format, but this seems like bug since it doesn't inter-operate with other implementations of zlib/zip/DEFLATE.

Lehi Toskin

unread,
Jan 12, 2017, 6:17:03 PM1/12/17
to Racket Users
On Thursday, January 12, 2017 at 10:59:13 AM UTC-8, Ethan Estrada wrote:
> Then, is this a bug or something that should be an optional argument on the function like `#:ignore-initial-bytes #t`? I am not deeply familiar with the DEFLATE file format, but this seems like bug since it doesn't inter-operate with other implementations of zlib/zip/DEFLATE.

If it isn't quite compatible with other DEFLATE implementations, I'd say it is a bug...

For reference, what I'm trying to do is take zTXt and iTXt data from a PNG file and uncompress/compress it. Whenever I do compression from Racket, pngfix complains that the data isn't quite right:

iTXt SKP default 15 Z_BUF_ERROR 853 [truncated]
zTXt SKP default 15 Z_BUF_ERROR 402
IDAT OK maximum 15 15 25411 1000938

This is why I started looking into the details of the compression in the first place... Now that I think about it, it's probably naive of me to simple add those two bytes and expect everything to actually be working as expected - while the data is decompressed as I expect, there might be some special cases inside zlib-flate that allow it to be decompressed anyway. I'm assuming the Z_BUF_ERROR is being reported because it's not as flexible and the structure of the data is missing some information like length (or something). Wild speculation, though. I actually have no idea what's wrong and I'm trying to make sense of something I barely understand.

Tony Garnock-Jones

unread,
Jan 12, 2017, 8:09:42 PM1/12/17
to Lehi Toskin, Racket Users
Hi Lehi,

On 01/12/2017 06:17 PM, Lehi Toskin wrote:
> Now that I think about it, it's probably naive of
> me to simple add those two bytes and expect everything to actually be
> working as expected [...] I'm assuming the Z_BUF_ERROR is being
> reported because it's not as flexible and the structure of the data
> is missing some information like length (or something).

See FAQs 18 and 19 here: http://www.gzip.org/zlib/zlib_faq.html#faq18

and see also RFC 1950, that explains why the bytes you need are 0x789c,
and explains the kind of trailer you should probably also have:
https://www.ietf.org/rfc/rfc1950.txt

I wonder if the Z_BUF_ERROR is related to a missing checksum trailer.

(I remember running into this a couple of years ago. I've forgotten the
details, but the gist of it is that the DEFLATE format is the underlying
compression method, and that it can be placed in either a zlib envelope
(RFC 1950) or a gzip envelope (separately specified, probably not what
you want here).)

Tony

Lehi Toskin

unread,
Jan 12, 2017, 11:32:59 PM1/12/17
to Racket Users, lehi....@gmail.com
Thanks to Tonyg's links, I figured out that the part I was missing was the ADLER-32 check of the uncompressed data added to the end of the byte string.

That makes the total byte string composition look like this:
(bytes #x78 #x9c) compressed-data-from-deflate (number->bytes (adler32 uncompressed-text))

`number->bytes`, is a function I made; its definition is
(define (number->bytes num)
(define hex (number->string num 16))
; from file/sha1
(hex-string->bytes (if (even? (string-length hex)) hex (string-append "0" hex))))

P.S. I didn't see an implementation of ADLER32 anywhere, so I had to write my own, which took a little longer than expected, but oh well.

Tony Garnock-Jones

unread,
Jan 13, 2017, 8:32:51 AM1/13/17
to Lehi Toskin, Racket Users
On 01/12/2017 11:32 PM, Lehi Toskin wrote:
> P.S. I didn't see an implementation of ADLER32 anywhere, so I had to write my own, which took a little longer than expected, but oh well.

Oh, cool. That'd probably be a useful thing for Racket's
net/git-checkout module, which has a piece of code in `zlib-inflate`
that reads:

...
(inflate i o)
;; Verify checksum?
(read-bytes-exactly 'adler-checksum 4 i)
...

Perhaps you could contribute your adler32 implementation, and it could
be used there. (And perhaps that `zlib-inflate` function deserves being
pulled out into a separate module. Hmm.)

Cheers,
Tony

Tony Garnock-Jones

unread,
Jan 13, 2017, 8:47:40 AM1/13/17
to Lehi Toskin, Racket Users
On 01/12/2017 11:32 PM, Lehi Toskin wrote:
> `number->bytes`, is a function I made; its definition is
> (define (number->bytes num)
> (define hex (number->string num 16))
> ; from file/sha1
> (hex-string->bytes (if (even? (string-length hex)) hex (string-append "0" hex))))

You might be able to use Racket's built-in `integer->integer-bytes`
here:
http://docs.racket-lang.org/reference/generic-numbers.html?q=integer%20bytes#%28def._%28%28quote._~23~25kernel%29._integer-~3einteger-bytes%29%29

Lehi Toskin

unread,
Jan 13, 2017, 3:01:19 PM1/13/17
to Racket Users, lehi....@gmail.com
On Friday, January 13, 2017 at 5:32:51 AM UTC-8, Tony Garnock-Jones wrote:
>
> Oh, cool. That'd probably be a useful thing for Racket's
> net/git-checkout module, which has a piece of code in `zlib-inflate`
> that reads:
>
> ...
> (inflate i o)
> ;; Verify checksum?
> (read-bytes-exactly 'adler-checksum 4 i)
> ...
>
> Perhaps you could contribute your adler32 implementation, and it could
> be used there. (And perhaps that `zlib-inflate` function deserves being
> pulled out into a separate module. Hmm.)
>
> Cheers,
> Tony

Sounds good to me! I just made a PR.

reilithion

unread,
Dec 10, 2021, 5:29:17 PM12/10/21
to Racket Users
Yes, please do this. I was in need of a zlib-inflate while working on a script to inspect Minecraft save data.

Stephen De Gabrielle

unread,
Dec 11, 2021, 8:36:29 AM12/11/21
to reilithion, Racket Users
Please share!

S.



--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/853ecf97-23c9-4602-a0e1-038ddda3a0cdn%40googlegroups.com.
--
----
Reply all
Reply to author
Forward
0 new messages