wxZipStream - 4GB limit?

175 views
Skip to first unread message

Volker Wichmann

unread,
Jan 20, 2018, 10:42:55 AM1/20/18
to wx-...@googlegroups.com
Hi,

we (www.saga-gis.org) are using wxZipInputStream and wxZipOutputStream
to read and write compressed datasets. Now a user reported a problem
with large zip files. Writing those files does not result in an error,
but once the files are read back in, the following error is reported:
"Can't read inflate stream: unexpected EOF in underlying stream."

This all seems to happen once the zip files get larger than 4GB. After
the error, only a part of the dataset is loaded, most likely the part
above the 4GB. This reminds me of some 32bit limit. Does such a limit
exist? Or do you have any pointer where to have a look in the sources to
find this out?

This happens with wxWidgets 3.1.1 latest git master and MSW 64bit (Win7).

Thanks and best regards,
Volker

Vadim Zeitlin

unread,
Jan 20, 2018, 10:52:03 AM1/20/18
to wx-...@googlegroups.com
On Sat, 20 Jan 2018 16:42:46 +0100 Volker Wichmann wrote:

VW> we (www.saga-gis.org) are using wxZipInputStream and wxZipOutputStream
VW> to read and write compressed datasets. Now a user reported a problem
VW> with large zip files. Writing those files does not result in an error,

Just to confirm, can they be read successfully by the other programs?

VW> but once the files are read back in, the following error is reported:
VW> "Can't read inflate stream: unexpected EOF in underlying stream."
VW>
VW> This all seems to happen once the zip files get larger than 4GB. After
VW> the error, only a part of the dataset is loaded, most likely the part
VW> above the 4GB. This reminds me of some 32bit limit. Does such a limit
VW> exist?

Well, it's not supposed to, since Tobias Taschner added support for ZIP64
a couple of years ago, but apparently it does...

VW> Or do you have any pointer where to have a look in the sources to
VW> find this out?

All the relevant code is in src/common/zipstrm.cpp, but the puzzling part
is that I don't see the error message you give in it, but only in
src/common/zstream.cpp which contains wxZlibInputStream class, which is
quite different from wxZipInputStream. So now I wonder which one do you
actually use?

Regards,
VZ

Volker Wichmann

unread,
Jan 20, 2018, 11:28:50 AM1/20/18
to wx-...@googlegroups.com
On 01/20/2018 04:52 PM, Vadim Zeitlin wrote:
> Just to confirm, can they be read successfully by the other programs?

No, seems like they can not. I just tried with unzip (under linux) and got:

1266952293 extra bytes at beginning or within zipfile
(attempting to process anyway)
error [file.sg-pts-z]: start of central directory not found;
zipfile corrupt.

> All the relevant code is in src/common/zipstrm.cpp, but the puzzling part
> is that I don't see the error message you give in it, but only in
> src/common/zstream.cpp which contains wxZlibInputStream class, which is
> quite different from wxZipInputStream. So now I wonder which one do you
> actually use?

This is strange, we include <wx/zipstrm.h> and use
wxZipInput/OutputStream. Do we miss a build option or something like
that? BTW, we are still using MSVC 2010.


Regards,
Volker

Vadim Zeitlin

unread,
Jan 20, 2018, 11:32:53 AM1/20/18
to wx-...@googlegroups.com
On Sat, 20 Jan 2018 17:28:47 +0100 Volker Wichmann wrote:

VW> On 01/20/2018 04:52 PM, Vadim Zeitlin wrote:
VW> > Just to confirm, can they be read successfully by the other programs?
VW>
VW> No, seems like they can not. I just tried with unzip (under linux) and got:
VW>
VW> 1266952293 extra bytes at beginning or within zipfile
VW> (attempting to process anyway)
VW> error [file.sg-pts-z]: start of central directory not found;
VW> zipfile corrupt.

I really wonder if you're creating ZIP files at all. Does it look like
one? E.g. does it start with "PK"?

VW> > All the relevant code is in src/common/zipstrm.cpp, but the puzzling part
VW> > is that I don't see the error message you give in it, but only in
VW> > src/common/zstream.cpp which contains wxZlibInputStream class, which is
VW> > quite different from wxZipInputStream. So now I wonder which one do you
VW> > actually use?
VW>
VW> This is strange, we include <wx/zipstrm.h> and use
VW> wxZipInput/OutputStream. Do we miss a build option or something like
VW> that? BTW, we are still using MSVC 2010.

No, there are no build options that could replace wxZipInputStream with
wxZlibInputStream (although there are options that could disable one or
both of them). And, at least with the current master, the error you're
seeing can only come from wxZlibInputStream::OnSysRead(), so something
seems very wrong here.

Regards,
VZ

Volker Wichmann

unread,
Jan 20, 2018, 12:08:38 PM1/20/18
to wx-...@googlegroups.com
On 01/20/2018 05:32 PM, Vadim Zeitlin wrote:
> I really wonder if you're creating ZIP files at all. Does it look like
> one? E.g. does it start with "PK"?

Yes, it starts with "PK". And everything works fine with smaller files.

> No, there are no build options that could replace wxZipInputStream with
> wxZlibInputStream (although there are options that could disable one or
> both of them). And, at least with the current master, the error you're
> seeing can only come from wxZlibInputStream::OnSysRead(), so something
> seems very wrong here.

It may be possible that the a SAGA version that showed the error was
build with wxWidgets 3.1.0 and not latest master, I will check that
again. But we verified that it is still not working when using 3.1.1,
maybe without the error message. Using my linux build to load the file I
do not get it and loading the file fails silently.

Vadim Zeitlin

unread,
Jan 20, 2018, 12:14:58 PM1/20/18
to wx-...@googlegroups.com
On Sat, 20 Jan 2018 18:08:35 +0100 Volker Wichmann wrote:

VW> On 01/20/2018 05:32 PM, Vadim Zeitlin wrote:
VW> > I really wonder if you're creating ZIP files at all. Does it look like
VW> > one? E.g. does it start with "PK"?
VW>
VW> Yes, it starts with "PK". And everything works fine with smaller files.

OK, this rules out accidentally using zlib, thanks.

VW> > No, there are no build options that could replace wxZipInputStream with
VW> > wxZlibInputStream (although there are options that could disable one or
VW> > both of them). And, at least with the current master, the error you're
VW> > seeing can only come from wxZlibInputStream::OnSysRead(), so something
VW> > seems very wrong here.
VW>
VW> It may be possible that the a SAGA version that showed the error was
VW> build with wxWidgets 3.1.0 and not latest master, I will check that
VW> again.

After a quick look at this file history, I don't see any traces of this
error message having ever been in this file. But I could be missing
something.

VW> But we verified that it is still not working when using 3.1.1, maybe
VW> without the error message. Using my linux build to load the file I
VW> do not get it and loading the file fails silently.

Well, this is bad enough :-( If you could please debug it, it would be
definitely very welcome.

TIA,
VZ

Tobias T

unread,
Jan 20, 2018, 5:48:50 PM1/20/18
to wx-dev
If I remember correctly (it has been a while) ZIP64 output can only work if
a size has been specified when using PutNextEntry(). Which means the size has to
be known in advance.

Volker Wichmann

unread,
Jan 21, 2018, 10:15:37 AM1/21/18
to wx-...@googlegroups.com
Hi,

On 01/20/2018 11:48 PM, Tobias T wrote:
> If I remember correctly (it has been a while) ZIP64 output can only work if
> a size has been specified when using PutNextEntry(). Which means the
> size has to
> be known in advance.

thanks for the hint, we definitely do not set a size when we create a
new entry with PutNextEntry(). I will try to test this soon and report
back if this was the problem.

Best regards,
Volker

Volker Wichmann

unread,
Feb 5, 2018, 12:23:10 PM2/5/18
to wx-...@googlegroups.com
On 01/20/2018 11:48 PM, Tobias T wrote:
> If I remember correctly (it has been a while) ZIP64 output can only work if
> a size has been specified when using PutNextEntry(). Which means the
> size has to
> be known in advance.

Hi,

I'm having a hard time debugging this because of the large file sizes
and absolutely no experience with the ZIP file format. So I can just
report some of the observations I made in the hope they will give
someone some hints on what is going on.

- I did specify a size when using PutNextEntry(), and this triggers
ZIP64 when the size exceeds 0xffffffff. Nevertheless, I still get errors
when reading back the created files.

Side note: debugging makes me think that it is not necessary to specify
the exact file size to trigger ZIP64 as long as the size is big enough
because both "size" and "compressed_size" seem to be recalculated in
CloseEntry() called from ~wxZipOutputStream(). If that is true, it would
facilitate the usage of ZIP64 (and could possibly be made a boolean choice)


However, there are two different scenarios, both resulting in errors,
but interestingly in different ones:

- the data to zip is > 4GB, but the zipped archive is < 4GB:
in this case wxZipInputStream reports "reading zip stream (entry %s):
bad length"; extracting the file with 7z reports "Headers Error" and
"CRC failed"

Interestingly the extracted file is complete and correct despite the
errors reported.


- both the data to zip and the zipped archive are > 4GB:
in this case, wx reports "Can't read inflate stream: unexpected EOF in
underlying stream"; 7z reports "Headers Errors; Unconfirmed start of
archive; Warnings: There are some data after the end of the payload
data" and "CRC failed"

The extracted file only contains a portion of the original file and is
incorrect.


So it seems that all data gets written (also indicated by the file sizes
of the zip archives), but that there are some problems with the header
entries / offsets. Checks for both cases are in many places and I cannot
tell where it is failing, unfortunately.

Regards,
Volker

Tobias T

unread,
Feb 5, 2018, 4:09:44 PM2/5/18
to wx-dev
Thanks for the update, I'll try looking into this.

Tobias T

unread,
Feb 7, 2018, 3:40:39 PM2/7/18
to wx-dev
I've looked into this and there are definitely some issues in the current implementation if single files in the archive are larger than 4GB.
I'll try to come up with a solution.


Am Montag, 5. Februar 2018 18:23:10 UTC+1 schrieb Volker Wichmann:

Tobias T

unread,
Feb 14, 2018, 2:56:40 PM2/14/18
to wx-dev
Hi Volker,
have a look at my PR here:
https://github.com/wxWidgets/wxWidgets/pull/730

This should fix creation of ZIP files with files larger 4GB


Am Montag, 5. Februar 2018 18:23:10 UTC+1 schrieb Volker Wichmann:

Volker Wichmann

unread,
Feb 15, 2018, 11:05:03 AM2/15/18
to wx-...@googlegroups.com
Hi Tobias,

On 02/14/2018 08:56 PM, Tobias T wrote:
> Hi Volker,
> have a look at my PR here:
> https://github.com/wxWidgets/wxWidgets/pull/730
>
> This should fix creation of ZIP files with files larger 4GB

thanks a lot for your work on this! I just finished some tests and most
things seem to work, I found only one case in which I encounter problems.

Note: all tests are done by setting the format to wxZIP_FORMAT_ZIP64
(thanks much for that option, facilitates usage a lot!) and using plain
PutNextEntry(pEntry), i.e. without specifying output file size.

- file(s) to zip < 4GB and zip archive < 4GB: saving and loading OK

- file(s) to zip > 4GB and zip archive > 4GB: saving and loading OK

- file(s) to zip > 4GB and zip archive < 4GB: saving OK, on loading I
get an error

Details on the last case: the archive contains a large data file (>4 GB)
and a small XML metadata file. On loading the data file is correct, but
I get an error with the XML file:

Can't read from inflate stream: invalid stored block lengths
XML parsing error: 'no element found' at line 1

Interestingly, unzipping the archive with 7z shows no problems and the
XML file is correct. So maybe there is missing something in the
wxZipInputStream implementation for this particular case?


minor: I came across two typos

interface/wx/zipstrm.h, line 602:
limits are exceeded (instead of exceed)

src/common/zipstrm.cpp, line 2602:
file too large (instead of to)


Thanks again for your help,
Volker

Tobias T

unread,
Feb 15, 2018, 2:12:17 PM2/15/18
to wx-dev
There was an issue when reading 64-bit values from zip headers. Which I fixed now.
It might explain the issue with the 3rd case you where having.
Please retest with the latest version of the PR.
If it still does not work maybe copy the file sizes and order of the files in the failing ZIP, but the fact
that it worked with 7-zip makes it plausible that it was the issue with 64-bit values.


Am Samstag, 20. Januar 2018 16:42:55 UTC+1 schrieb Volker Wichmann:

Volker Wichmann

unread,
Feb 16, 2018, 5:26:17 AM2/16/18
to wx-...@googlegroups.com
On 02/15/2018 08:12 PM, Tobias T wrote:
> There was an issue when reading 64-bit values from zip headers. Which I
> fixed now.
> It might explain the issue with the 3rd case you where having.
> Please retest with the latest version of the PR.
> If it still does not work maybe copy the file sizes and order of the
> files in the failing ZIP, but the fact
> that it worked with 7-zip makes it plausible that it was the issue with
> 64-bit values.
I can confirm that this commit fixes the problem and makes all three use
cases working. Great!

Thanks a lot for your work!

Looking forward to get the pull request merged,
best regards,
Volker


Reply all
Reply to author
Forward
0 new messages