How to debug "Not a valid zip file" from zip.OpenReader

1,982 views
Skip to first unread message

David Karr

unread,
Feb 23, 2022, 4:44:35 PM2/23/22
to golang-nuts
A while ago, I wrote a small Go app that reads things from zip files.  I tested it with various zip files (usually Java jar files), and it has always worked perfectly fine.

Today I'm looking at another jar file.  The Java "jar" command likes it perfectly fine. I could list the contents and extract all the files. Java had no particular trouble running with the jar file.

When I try to call "zip.OpenReader()" on this jar file, it just gives me ""zip: not a valid zip file"".  Well, great.

I then ran a command to run the same command on every jar file in my home directory tree, where there are a couple of thousand from various tool distributions.  I counted how many failed with that same error. I found just the one I already have.

This jar file is a little over 12mb.  Several of the jar files that don't fail are quite a bit larger than this. Most of the ones that don't fail are smaller than this. The permissions on this jar are the same as the others.

I then tried passing this jar file to "unzip". It was able to unzip it, but it did print this:

    warning [<jar file name>]:  3624 extra bytes at beginning or within zipfile
      (attempting to process anyway)

I tried this on some of the working jar files, and it didn't print this warning.

So, it appears that if anything, this jar file is "nonstandard", but both unzip and jar have no problem with it, and jar didn't even say there was an issue with it.  The zip package, however, bombs on this with no information.

I'm going to go to the people who gave me that jar file and ask them how they produced it, but I think it's clear it's not fatally broken.

I haven't submitted a bug report for this package yet, as I wanted some feedback on this first.

Ian Lance Taylor

unread,
Feb 23, 2022, 7:22:34 PM2/23/22
to David Karr, golang-nuts
The error is zip.ErrFormat. From a quick look at the sources it can
be returned if

- a file header does not start with the required signature (0x04034b50)
- a directory header does not start with the required signature (0x02014b50)
- a directory entry is too short to contain the additional information
it claims to hold
- a directory entry fails to contain the additional information it
claims to hold
- the directory end signature (0x06054b50) is not found
- the offset in the directory is invalid (negative or larger than the file size)

There is no especially easy way to find out exactly which one is
happening. You could add some print statements to
archive/zip/reader.go to narrow it down.

I'm not sure what to make of the "3624 extra bytes".

Not sure how much this helps. If you can share the zip file, you
could open a bug report.

Ian

David Karr

unread,
Feb 23, 2022, 7:44:05 PM2/23/22
to Ian Lance Taylor, golang-nuts

Pablo Caballero

unread,
Feb 23, 2022, 9:17:34 PM2/23/22
to David Karr, Ian Lance Taylor, golang-nuts
The file you are trying to unzip contains "garbage" at the beginning. Other implementations seem to look for the start signature and discard the noise (and complaint (unzip) /remain silent (jar) about it). I don't know whether there is a "standard" saying how an unzip implementation should behave. Golang's implementation seems to treat it as an error.

Look what your file looks like if you open it with a hex editor (noise from the download process I guess...)

image.png

Best!

Pablo

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAA5t8VpRkTniYnMKhQ%2BDfL8v57wM8Gj-4vHFq5i887Tw2Rwaeg%40mail.gmail.com.

Pablo Caballero

unread,
Feb 23, 2022, 9:22:53 PM2/23/22
to David Karr, Ian Lance Taylor, golang-nuts
Side note, be very very careful while uploading files... the HTTP conversation might have had sensible information.

Kurtis Rader

unread,
Feb 23, 2022, 9:37:25 PM2/23/22
to Pablo Caballero, David Karr, Ian Lance Taylor, golang-nuts
On Wed, Feb 23, 2022 at 6:17 PM Pablo Caballero <pdc...@gmail.com> wrote:
The file you are trying to unzip contains "garbage" at the beginning.

That garbage looks like the sort of HTTP transaction information you'll get from `curl -v` or something similar. In other words, someone inadvertently inserted "garbage" either when uploading the zip file that David downloaded or by someone, or some tool, on David's end when they downloaded the zip file.  Regardless, I don't think the Go zip package should silently ignore the unexpected bytes and would argue it's wrong for the Java implementation to do so. Whether the Go zip package should search for the start of the zip signature by skipping the unexpected prefix bytes and returning some indication it had done so is debatable. My vote is no. That sort of behavior is far too easy to result in an exploitable security vulnerability.

--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Mine GO BOOM

unread,
Feb 23, 2022, 11:02:08 PM2/23/22
to golang-nuts
The core library should be safe and strict. It wouldn't be hard to clone the library and make a more lenient version to be used by people who like to append zip files to the end of images.

Kurtis Rader

unread,
Feb 23, 2022, 11:40:34 PM2/23/22
to Mine GO BOOM, golang-nuts
On Wed, Feb 23, 2022 at 8:04 PM Mine GO BOOM <mineg...@gmail.com> wrote:
The core library should be safe and strict. It wouldn't be hard to clone the library and make a more lenient version to be used by people who like to append zip files to the end of images.

That should only be supported by whatever code opens the image file -- not a Go package (whether part of the Go stdlib or a third-party package) that reads "zip" files. The code reading that image file should only pass the "zip" content to the Zip format decoder if it has some confidence the content is in the "zip" format. And the Zip format decoder should still report an error if the content is not valid according to the specification.
 
On Wednesday, February 23, 2022 at 6:37:25 PM UTC-8 Kurtis Rader wrote:
On Wed, Feb 23, 2022 at 6:17 PM Pablo Caballero <pdc...@gmail.com> wrote:
The file you are trying to unzip contains "garbage" at the beginning.

That garbage looks like the sort of HTTP transaction information you'll get from `curl -v` or something similar. In other words, someone inadvertently inserted "garbage" either when uploading the zip file that David downloaded or by someone, or some tool, on David's end when they downloaded the zip file.  Regardless, I don't think the Go zip package should silently ignore the unexpected bytes and would argue it's wrong for the Java implementation to do so. Whether the Go zip package should search for the start of the zip signature by skipping the unexpected prefix bytes and returning some indication it had done so is debatable. My vote is no. That sort of behavior is far too easy to result in an exploitable security vulnerability.
Reply all
Reply to author
Forward
0 new messages