Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

java.util.zip Limitations

3 views
Skip to first unread message

Gerry Wheeler

unread,
Apr 21, 2004, 9:11:00 PM4/21/04
to
I have some ZIP files that apparently can't be opened by
java.util.zip.ZipFile, although they work fine with various Linux zip
utilities. ZipFile throws a ZipException in <init>, but the message in the
exception doesn't say exactly why. (And, of course, I've closed the window
where it used to be. I can get it again if needed.)

A) Does anyone have experience with various flavors of zip files, and any
hints?

B) What utilities might I use to get more information about my zip files
to see if they are, in fact, incompatible with java.util.zip? (I'm playing
on Linux, but I have Windows available if needed.)
--
Gerry Wheeler
Naples, FL

Andrew Thompson

unread,
Apr 22, 2004, 9:41:28 AM4/22/04
to
On Thu, 22 Apr 2004 01:11:00 GMT, Gerry Wheeler wrote:

> I have some ZIP files that apparently can't be opened by
> java.util.zip.ZipFile, although they work fine with various Linux zip
> utilities. ZipFile throws a ZipException in <init>, but the message in the
> exception doesn't say exactly why. (And, of course, I've closed the window
> where it used to be. I can get it again if needed.)

Well ..D'uh!
<http://www.physci.org/codes/javafaq.jsp#exact>

> A) Does anyone have experience with various flavors of zip files, and any
> hints?

Yes. Ask smart qns. quoting the exact error..

Oh, what? You mean tips w/ zips?

1 or 2

> B) What utilities might I use to get more information about my zip files
> to see if they are, in fact, incompatible with java.util.zip? (I'm playing
> on Linux, but I have Windows available if needed.)

I am unfamiliar w/ Linux (I'm sure one
of the many Linux folks will wander by
in a moment) but would recommend you
have a look at the zip file in WinZip
on the Windows PC to check it is made
with standard compression.

Also check the compression level, I
know Java cannot write in anything but
default compression level, but am
not sure whether it can read higher levels.

If the file is either, make a new one
with standard compression type/level
and try that.

HTH

--
Andrew Thompson
http://www.PhySci.org/ Open-source software suite
http://www.PhySci.org/codes/ Web & IT Help
http://www.1point1C.org/ Science & Technology

Roedy Green

unread,
Apr 22, 2004, 1:21:41 PM4/22/04
to
On Thu, 22 Apr 2004 01:11:00 GMT, "Gerry Wheeler"
<gwhe...@qolpublishing.com> wrote or quoted :

>
>A) Does anyone have experience with various flavors of zip files, and any
>hints?

Java can only read Zip files it created itself. It does not support
the wide variety of packing algorithms, some proprietary that PkZip
and WinZip use. For that you must exec.

see http://mindprod.com/jgloss/zip.html

--
Canadian Mind Products, Roedy Green.
Coaching, problem solving, economical contract programming.
See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.

Andrew Thompson

unread,
Apr 22, 2004, 2:38:08 PM4/22/04
to
On Thu, 22 Apr 2004 17:21:41 GMT, Roedy Green wrote:

> Java can only read Zip files it created itself.

Not so. I have several applets that read
data from zip files created with WinZip.

Gerry Wheeler

unread,
Apr 23, 2004, 6:00:47 PM4/23/04
to
On Thu, 22 Apr 2004 13:41:28 +0000, Andrew Thompson wrote:

> If the file is either, make a new one with standard compression type/level
> and try that.

I can't change the zip file. It is one of a series of files distributed by
the National Weather Service via a data stream, and I'm hoping to be able
to unzip it on the fly as I receive the data stream. I have to handle
whatever they send.

Andrew Thompson

unread,
Apr 23, 2004, 9:05:45 PM4/23/04
to

O...K..... but what about the information..

(A.T. previously)
[ ....would recommend you

have a look at the zip file in WinZip
on the Windows PC to check it is made
with standard compression.

Also check the compression level, I
know Java cannot write in anything but
default compression level, but am

not sure whether it can read higher levels. ]

So.. what are the compression level/types
of the supplied Zips?

If the Zip files supplied are standard,
the problem is your code.

How big are the Zips? If you can find one
under 50Kb I'll have a look at it myself..

Gerry Wheeler

unread,
Apr 23, 2004, 10:46:40 PM4/23/04
to
On Sat, 24 Apr 2004 01:05:45 +0000, Andrew Thompson wrote:
> O...K..... but what about the information..

Two things: a) I recompiled my program with the zip code in place so I
could recreate the exception, and b) I found the Linux zipinfo program
which gives oodles of information about the zip file.

So, the exception is:

java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method) at
java.util.zip.ZipFile.<init>(ZipFile.java:112) at
java.util.zip.ZipFile.<init>(ZipFile.java:128) at
kg4nbb.emwin.EmwinPacketProcessor.decompressProduct(EmwinPacketProcessor.java:258)
at
kg4nbb.emwin.EmwinPacketProcessor.processPacket(EmwinPacketProcessor.java:120)
at kg4nbb.emwin.EmwinIngestor.ingest(EmwinIngestor.java:188) at
kg4nbb.emwin.EmwinIngestor.main(EmwinIngestor.java:74)

[That might wrap a little, but I think it's readable.] As I mentioned
previously, it's not particularly informative. I thought at first it might
have to do with permissions, or looking in the wrong directory or
something. But the ZipFile object is instantiated with a File object, and
tests on that indicate the file really does exist.

Here's the information about one of the offending zip files, produced by
zipinfo:

Archive: AFMMHXNC.ZIS 3072 bytes 1 file

End-of-central-directory record:
-------------------------------

Actual offset of end-of-central-dir record: 2477 (000009ADh)
Expected offset of end-of-central-dir record: 2477 (000009ADh)
(based on the length of the central directory and its expected offset)

This zipfile constitutes the sole disk of a single-part archive; its
central directory contains 1 entry. The central directory is 58
(0000003Ah) bytes long, and its (expected) offset in bytes from the
beginning of the zipfile is 2419 (00000973h).

There is no zipfile comment.

Central directory entry #1:
---------------------------

AFMMHXNC.TXT

offset of local header from start of archive: 0 (00000000h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 2.0
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: no
file last modified on (DOS date/time): 2004 Apr 22 00:38:20
32-bit CRC value (hex): 75e0f21d
compressed size: 2377 bytes
uncompressed size: 15643 bytes
length of filename: 12 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: text
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc

There is no file comment.

(Yes, for whatever reason, the National Weather Service uses the extension
ZIS on their zip files; I dunno why.) From what I understand about zip
files, it all looks pretty standard. I'm not sure, though, about the need
for version 2.0 software. This is on Linux, so I must presume they're
referring to the version of zip, unzip, etc. Do the Java classes meet that
requirement?

Gerry Wheeler

unread,
Apr 23, 2004, 10:48:42 PM4/23/04
to
On Sat, 24 Apr 2004 02:46:40 +0000, Gerry Wheeler wrote:
> Here's the information about one of the offending zip files, produced by
> zipinfo:

Hmmm, the rewrapping on that (in my newsreader) makes it rather hard to
read. I can repost or send the output directly if needed.

Andrew Thompson

unread,
Apr 23, 2004, 11:19:24 PM4/23/04
to
On Sat, 24 Apr 2004 02:46:40 GMT, Gerry Wheeler wrote:

> On Sat, 24 Apr 2004 01:05:45 +0000, Andrew Thompson wrote:
>> O...K..... but what about the information..

> So, the exception is:

NOW we are cooking with gas!

> java.util.zip.ZipException: error in opening zip file

....
> kg4nbb.emwin.EmwinPacketProcessor.decompressProduct(EmwinPacketProcessor.java:258)
> at

OK. Did you write 'EmwinPacketProcessor'?

What's at line 258?

> [That might wrap a little, but I think it's readable.]

It wraps, but it is fine.

>..As I mentioned


> previously, it's not particularly informative.

On the contrary, you just have to figure how
to read them and work out what to do with the
information gained.

>..I thought at first it might


> have to do with permissions,

Almost certainly not. Those generally
mention SecurityAccessExceptions,
SecurityManagers and such, rather than
'..ZipException: error in opening zip file'

>..or looking in the wrong directory or


> something. But the ZipFile object is instantiated with a File object, and
> tests on that indicate the file really does exist.

Tests inside your code?

Are they before/after line 258?

> Here's the information about one of the offending zip files, produced by
> zipinfo:

...


> Archive: AFMMHXNC.ZIS 3072 bytes 1 file

.....


> compression method: deflated
> compression sub-type (deflation): normal
> file security status: not encrypted

That is all looking pretty standard..

....


> (Yes, for whatever reason, the National Weather Service uses the extension
> ZIS on their zip files; I dunno why.)

"We are individuals"? ;-)

>...From what I understand about zip


> files, it all looks pretty standard.

Agreed.

>..I'm not sure, though, about the need
> for version 2.0 software.

I do not think that is relevant. But am not sure.

Next step.

Try this with a Java program that
_can_ read Zip files.
<http://www.physci.org/applet/ziplet.jsp>

Drop the 3/4Kb zip to 'physci' at the main domain,
and I'll give it a go with Ziplet (I'll put it
up at my site if you like)

Roedy Green

unread,
Apr 24, 2004, 1:32:37 AM4/24/04
to
On Fri, 23 Apr 2004 22:00:47 GMT, "Gerry Wheeler"
<gwhe...@qolpublishing.com> wrote or quoted :

>I have to handle
>whatever they send.

If they are using compression algorithms Java does not support, look
into the PkZip api. It has support for dozens of algorithms including
PkZip's proprietary super compressors.

see http://mindprod.com/jgloss/zip.html
http://mindprod.com/jgloss/pkzip.html

Roedy Green

unread,
Apr 24, 2004, 1:36:34 AM4/24/04
to
On Thu, 22 Apr 2004 18:38:08 GMT, Andrew Thompson
<SeeMy...@www.invalid> wrote or quoted :

>> Java can only read Zip files it created itself.
>
>Not so. I have several applets that read
>data from zip files created with WinZip.

It depends. If you are lucky, Winzip and Pkzip-created files will use
only the compression algorithms that Java supports. IN GENERAL that
will not be so. However, you will often luck out, especially if you
have control of creating the zip and did not let it use anything
fancy.

I went through all this with The Replicator and finally decided my
best bet was to create and access zips only with Java to avoid WinZip
or PkZip getting tricky on me every once in a while.

See http://mindprod.com/products.html#REPLICATOR

Andrew Thompson

unread,
Apr 24, 2004, 1:56:04 AM4/24/04
to
On Sat, 24 Apr 2004 05:36:34 GMT, Roedy Green wrote:

I got completely sidetracked by this,
but just thought I would add..

It does seem like the OP's Zip (or
rather .sip) files are standard algorithm/
default compression. AFAIU Java should
be able to read them.

_Now_ to the sidetrack...

(problems with zips)


> I went through all this with The Replicator

[ ;-) ]
Yes. I have played with Zip's a bit
and fallen prey to most of the traps
you mention.

Fortunately I usually make the Zip files
I need, so I can ensure they are JAva
compatible whether I roll a class to do
it in Java, use the jar tool, ..or get
lazy and drag'n'drop the files in my
(standard compression/algorithm default)
version of WinZip.

> ...and finally decided my


> best bet was to create and access zips only with Java to avoid WinZip
> or PkZip getting tricky on me every once in a while.

I still like to get stung occasionally
to warn me of problems I might face,
..for example, if I offered a signed
applet that advertised "Drag'n'Drop
Zip/UnZip".

But I can appreciate that if you have
a wealth of knowledge of a topic, that
would be of more hindrance than help.

Chris Uppal

unread,
Apr 24, 2004, 6:28:00 AM4/24/04
to
Gerry Wheeler wrote:

This all looks normal to me (on the assumption that zipinfo converts the "disk
number" field to 1-based indexing for human consumption, I'd expect the value
in the file to be 0).

As Roedy has said, there are several other compression schemes (4 by my count,
although they come is several sub-flavours) that "real" Zip utilities from
PKWare can use (actually I've never seen any of them in the wild), but this
file advertises itself to use "deflate" compression (type 8) which *everyone*
can read.

The version field is also apparently OK. I have routinely used Java's Zip
stuff to read Zip files with versions of 2.0 and 2.3.

All of which makes:

> java.util.zip.ZipException: error in opening zip file
> at java.util.zip.ZipFile.open(Native Method) at
> java.util.zip.ZipFile.<init>(ZipFile.java:112) at
> java.util.zip.ZipFile.<init>(ZipFile.java:128) at

look extremely odd. Can you use Java to unzip *anything* on that machine ?
Maybe there's a problem with the installation of the native library. Also
worth testing to see if you can unzip the same file using Java on a Windows box
(most people have one they can reach, even if they are not willing to admit it
in public ;-)

If you like, and if you are allowed to, you can email the offending ZIS file to
me (note the spam blocker). I'm currently working on Zip file encoding and
decoding (in a different language) so I'm pretty well placed to say if there is
*really* anything wrong with the file. (And another test case never hurts ;-)


> (Yes, for whatever reason, the National Weather Service uses the extension
> ZIS on their zip files; I dunno why.)

Possibly to avoid under-developed email filters that refuse *all* attachments
with a .zip extension.

-- chris


Gerry Wheeler

unread,
Apr 24, 2004, 8:34:32 AM4/24/04
to
On Sat, 24 Apr 2004 03:19:24 +0000, Andrew Thompson wrote:

> On Sat, 24 Apr 2004 02:46:40 GMT, Gerry Wheeler wrote:
>> java.util.zip.ZipException: error in opening zip file
> ....
>> kg4nbb.emwin.EmwinPacketProcessor.decompressProduct(EmwinPacketProcessor.java:258)
>> at
>
> OK. Did you write 'EmwinPacketProcessor'? What's at line 258?

Yes, that's my code. Line 258 is where I instantiate the ZipFile, like
this:

File f = <get a File for the data received over the stream>; if
(f.exists()) {
ZipFile z = new ZipFile(f); // line 258 // other stuff to
deal with the ZipEntries


}
}
>>..As I mentioned
>> previously, it's not particularly informative.
>
> On the contrary, you just have to figure how to read them and work out
> what to do with the information gained.

Yeah, but the message is "error in opening zip file". Well, yeah, thanks a
lot. :-) A little more detail would have been useful.



> Drop the 3/4Kb zip to 'physci' at the main domain, and I'll give it a go
> with Ziplet (I'll put it up at my site if you like)

OK, I'll send one over. Thanks.

Gerry Wheeler

unread,
Apr 24, 2004, 8:51:00 AM4/24/04
to
On Sat, 24 Apr 2004 12:34:32 +0000, Gerry Wheeler wrote:
> OK, I'll send one over. Thanks.

As I was looking for a good sample to send to Andrew, I remembered
something. I don't think this will (or should) affect this, but you never
know...

The NWS data stream is sent as packets of information. But the people who
wrote it were not good at packet stuff, so all the packets are exactly
1024 bytes, padded with 0's if necessary. And there's no indicator in the
packet header of the length of the actual data. So, the last packet of the
file (including zip files) has a bunch of extra 0's on the end. My code
removes the 0's if it recognizes (from the file name) that the data is a
text file. But I didn't want to start messing with the various binary file
types, so I leave the 0's in place for all of them.

If I understand zip files correctly, this should be OK because there's an
offset given to the zip directory, so the program should be able to find
all the parts and ignore the extra junk. But, if the Java code is being
extra picky, perhaps it's rejecting the file because of this. I'll whip up
a test version of the program to remove all the trailing 0's from zip
files to see what happens. I'll keep it dumb and hope I don't remove part
of the file's data.

Gerry Wheeler

unread,
Apr 24, 2004, 9:42:56 AM4/24/04
to
On Sat, 24 Apr 2004 12:51:00 +0000, Gerry Wheeler wrote:

> I'll whip up
> a test version of the program to remove all the trailing 0's from zip
> files to see what happens. I'll keep it dumb and hope I don't remove part
> of the file's data.

Result: it doesn't work. Not only doesn't it fix my original problem, it
renders the zip file unreadable by other utilities as well. Apparently a
zip file has some information at the end that may be zero, so my quicky
test removed that from the file. Oh well.

Andrew Thompson

unread,
Apr 24, 2004, 10:00:27 AM4/24/04
to
On Sat, 24 Apr 2004 12:34:32 GMT, Gerry Wheeler wrote:
> On Sat, 24 Apr 2004 03:19:24 +0000, Andrew Thompson wrote:
>> On Sat, 24 Apr 2004 02:46:40 GMT, Gerry Wheeler wrote:
>>> java.util.zip.ZipException: error in opening zip file

>> Drop the 3/4Kb zip to (me), and I'll give it a go


>> with Ziplet (I'll put it up at my site if you like)
>
> OK, I'll send one over. Thanks.

http://www.physci.org/test/scrap/121TestZip/

Java reads it just fine, I made no
changes to the file whatsoever.

>> ....
>>> kg4nbb.emwin.EmwinPacketProcessor.decompressProduct(EmwinPacketProcessor.java:258)
>>> at
>>
>> OK. Did you write 'EmwinPacketProcessor'? What's at line 258?
>
> Yes, that's my code. Line 258 is where I instantiate the ZipFile, like
> this:

We are narrowing it down,

Strip out the _all_ stuff you can
before it stops failing*, then post that.
<http://www.physci.org/codes/sscce.jsp>

* You sohuld be able to do it
in under 40 lines.

Andrew Thompson

unread,
Apr 24, 2004, 10:12:09 AM4/24/04
to
On Sat, 24 Apr 2004 11:28:00 +0100, Chris Uppal wrote:

> I'm currently working on Zip file encoding and
> decoding (in a different language) so I'm pretty well placed to say if there is
> *really* anything wrong with the file. (And another test case never hurts ;-)

<http://www.physci.org/test/scrap/121TestZip/ZFPPPGHI.ZIS>
(..just in case Gerry missed your post)

Gerry Wheeler

unread,
Apr 24, 2004, 10:47:24 AM4/24/04
to
On Sat, 24 Apr 2004 14:00:27 +0000, Andrew Thompson wrote:
> Java reads it just fine, I made no
> changes to the file whatsoever.

OK, that's good to know. (And the sample I sent you had the extra 0's on
the end, so I know I don't need to worry about them.) So now I need to
look at my code. I'll mess with it a bit and let you know what I get.

Thanks!

Gerry Wheeler

unread,
Apr 24, 2004, 11:08:02 AM4/24/04
to
On Sat, 24 Apr 2004 14:00:27 +0000, Andrew Thompson wrote:
> Strip out the _all_ stuff you can
> before it stops failing*, then post that.
> <http://www.physci.org/codes/sscce.jsp>
>
> * You sohuld be able to do it
> in under 40 lines.

Well, it's more than 40 lines, but I did put in some comments and white
space. :-)

On my system, this test program throws the same exception as my real
program. Andrew -- you already have the sample zip file. Others who may be
following along can try this on any random zip file, or I can send you one
of my samples for testing. I'm going to take this with me to work today to
try it out on a Windows computer to see if there's a difference.

----------- sample code here --------------

package kg4nbb.emwin;

import java.io.File;
import java.util.Enumeration;
import java.util.zip.*;


/**
* This is a test program to attempt to unzip a file from the
* National Weather Service's EMWIN data stream.
*
* Under normal use, the file would arrive in packets within
* the EMWIN stream. This simplified test assumes the file has
* already been saved and is ready to be unzipped.
*
* @author gwheeler
*/
public class TestZipFile {

/** Creates a new instance of TestZipFile */
public TestZipFile() {
}

/**
* The main code, where everything starts.
*
* @param args[0] the name of the zip file
*/
public static void main(String[] args) {
if (args.length == 1) {
new TestZipFile().test(args[0]);
}
else {
System.err.println("usage: TestZipFile <filename>");
}
}


/**
* Attempts to unzip the specified file.
*/
private void test(String filename) {
try {
File f = new File(filename);
if (f.exists()) {
// Here's where the problem usually occurs.

System.out.println("attempting to open " + f);


ZipFile z = new ZipFile(f);

System.out.println("zip file " + f + " opened");

// Just show what's in it.

Enumeration entries = z.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = (ZipEntry)entries.nextElement();
System.out.println(" found entry " + entry.getName());
}

}
else {
System.err.println("can't find " + f);
}
}
catch (Exception e) {
// For testing purposes, just catch and report all exceptions.

e.printStackTrace();
}
}
}

----------- sample code ends --------------

Andrew Thompson

unread,
Apr 24, 2004, 11:36:26 AM4/24/04
to
On Sat, 24 Apr 2004 15:08:02 GMT, Gerry Wheeler wrote:

> On Sat, 24 Apr 2004 14:00:27 +0000, Andrew Thompson wrote:

...
>> <http://www.physci.org/codes/sscce.jsp>
...


> Well, it's more than 40 lines,

100 lines for the entire post is pretty good.

> ----------- sample code here --------------
>
> package kg4nbb.emwin;

Hey, ..what's that in it for?

Did you read the SSCCE link _carefully_? ;-)

(Some of those lines are pretty long too)

I'll try to have a look at your code
tomorrow, but am secretly hoping sombody
else will do it for you before then..

Joseph Dionne

unread,
Apr 24, 2004, 1:37:55 PM4/24/04
to
Andrew Thompson wrote:
> On Sat, 24 Apr 2004 15:08:02 GMT, Gerry Wheeler wrote:
>
>
>>On Sat, 24 Apr 2004 14:00:27 +0000, Andrew Thompson wrote:
>
> ...
>
>>><http://www.physci.org/codes/sscce.jsp>
>
> ...
>
>>Well, it's more than 40 lines,
>
>
> 100 lines for the entire post is pretty good.
>
>
>>----------- sample code here --------------
>>
>>package kg4nbb.emwin;
>
>
> Hey, ..what's that in it for?
>
> Did you read the SSCCE link _carefully_? ;-)
>
> (Some of those lines are pretty long too)
>
> I'll try to have a look at your code
> tomorrow, but am secretly hoping sombody
> else will do it for you before then..
>

I'll just throw this into the mix. jar tvf file.zis works fine.
Perhaps there is a bug in the java.util.zip package.

Chris Uppal

unread,
Apr 24, 2004, 3:04:31 PM4/24/04
to
Andrew Thompson wrote:

> <http://www.physci.org/test/scrap/121TestZip/ZFPPPGHI.ZIS>
> (..just in case Gerry missed your post)

Thanks Andrew. It looks as if I can't reach your machine just now, I'll
try again later.

-- chris

Chris Uppal

unread,
Apr 25, 2004, 5:04:13 AM4/25/04
to
Gerry Wheeler wrote:

> The NWS data stream is sent as packets of information. But the people who
> wrote it were not good at packet stuff, so all the packets are exactly
> 1024 bytes, padded with 0's if necessary. And there's no indicator in the
> packet header of the length of the actual data. So, the last packet of the
> file (including zip files) has a bunch of extra 0's on the end. My code
> removes the 0's if it recognizes (from the file name) that the data is a
> text file. But I didn't want to start messing with the various binary file
> types, so I leave the 0's in place for all of them.

That's where your problem lies. The Zip structure ends with a record that
itself finishes with a comment encoded as a 2-byte size (in Intel byte order)
followed by that many bytes of comment. So if the Zip file has no comment (the
typical case) the last two bytes will both be 0.

There are two ways of reading Zip files. One is to start at the end, where
there is a "table of contents" which allows random access to the elements of
the Zip. That's what a java.util.zip.ZipFile does, and it reads the table of
contents somewhere as part of its constructor. The 0 padding added by the
weather people is buggering that up, and your attempts to remove the padding
aren't quite right either. You could fix that by analysing the end of the
data yourself, to find how many 0s it *should* have on the end, but that's
messy. And fortunately you don't need to do it.

The other way of reading a Zip file is to start at the beginning and iterate
over each element, ignoring the table of contents. To do that in Java you use
a java.util.zip.ZipInputStream. ZipInputStream is a rather weird class
(because of the weird nature of the Zip file format). Here's an example of how
to use it (very hacky but I hope it's clear enough) that I've just tested with
the file that Andrew posted a link to (which has 124 bytes of padding on the
end), and it seems to work fine.

-- chris

========== Dezip.java =============
import java.util.zip.*;
import java.io.*;

public class Dezip
{


public static void
main(String[] args)

throws Exception
{
ZipInputStream zis
= new ZipInputStream(
new BufferedInputStream(
new FileInputStream(
args[0])));

ZipEntry entry;
byte[] buffer = new byte[1024];

while ((entry = zis.getNextEntry()) != null)
{
int got;
System.out.println(entry);
while ((got = zis.read(buffer)) >= 0)
System.out.write(buffer, 0, got);
System.out.println("========");
zis.closeEntry();
}
}
}
=======================


Andrew Thompson

unread,
Apr 25, 2004, 7:20:27 AM4/25/04
to
On Sun, 25 Apr 2004 10:04:13 +0100, Chris Uppal wrote:

> Here's an example of how
> to use it (very hacky but I hope it's clear enough) that I've just tested with
> the file that Andrew posted a link to (which has 124 bytes of padding on the
> end), and it seems to work fine.

Ahhh. Good. After you reported problems
with that link I went back to double check
it. At first it was not working, (even more
oddly, the applet loaded and also failed
to get the Zip).

Next minute it was fine.

The only thing I can put it down to, is my
flakey server being a bit sleepy (shrugs).

If I remember I might remove those files
tonight, no need of them any longer, and
I am sure the world does not need a web-page
in which one can read the weather in ..Samoa
was it? on some particular date!

> ========== Dezip.java =============
> import java.util.zip.*;

Nice example.

I was about to point the OP towards the code
for Ziplet, then realised that not only is
it spread across mutiple classes, and contains
a lot of floss he does not need, but ultimately
it _entirely_ relies on the JEditorPane to
actually get the content..

No help at all! ;-)

Gerry Wheeler

unread,
Apr 25, 2004, 8:40:05 AM4/25/04
to
On Sun, 25 Apr 2004 10:04:13 +0100, Chris Uppal wrote:

> The other way of reading a Zip file is to start at the beginning and
> iterate over each element, ignoring the table of contents. To do that in
> Java you use a java.util.zip.ZipInputStream.

Bingo! That works great!

OK, now I can get back to my original program and get these things
unzipped.

Many, many thanks to everyone who has contributed!

Joseph Dionne

unread,
Apr 25, 2004, 10:04:06 AM4/25/04
to

I'm am very glad to see your application is now working, but I believe
it to be only a work around. Modifying Mr. Wheeler's sample code to use
ZipInputStream instead of ZipFile, the native ZipFile.open() exception
avoided, however ZipInputStream seems to be able to deal with the
missing comment correctly. (Modified version follows)

My concern is that there exists a bug in the ZipFile.open(), believing
it should be able to handle the same data the ZipInputStream does.

However, I might be being too picky.


import java.io.File;
import java.io.FileInputStream;
import java.util.Enumeration;
import java.util.Date;
import java.util.zip.*;
import java.util.jar.*;

/*
// Wont work (sometimes)!


ZipFile z = new ZipFile(f);

*/
ZipInputStream z = new ZipInputStream(
new FileInputStream(f));

System.out.println("zip file " + f + " opened");

// Just show what's in it.

/*


Enumeration entries = z.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = (ZipEntry)entries.nextElement();
System.out.println(" found entry " +
entry.getName());
}

*/
// Works (always)!
ZipEntry ze ;

while (null != (ze = z.getNextEntry())) {
Date dt = new Date(ze.getTime());
System.out.println(
ze.getSize() + " "
+ dt.toString() + " "
+ ze.getName());

Chris Uppal

unread,
Apr 25, 2004, 11:06:06 AM4/25/04
to
Joseph Dionne wrote:

> I'm am very glad to see your application is now working, but I believe
> it to be only a work around. Modifying Mr. Wheeler's sample code to use
> ZipInputStream instead of ZipFile, the native ZipFile.open() exception
> avoided, however ZipInputStream seems to be able to deal with the
> missing comment correctly. (Modified version follows)
>
> My concern is that there exists a bug in the ZipFile.open(), believing
> it should be able to handle the same data the ZipInputStream does.

I don't think so. As I tried to explain in my earlier post, ZipFile uses the
table of contents at the end of the zip file, and hence is unhappy to find that
the table is corrupt. The ZipInputStream, OTOH, makes no use of the table (in
fact it never even reads that far in the input), and so is unfazed.

The point is that there are two *completely* different ways of accessing the
Zip file structure -- it's designed to work for either sequential access *or*
random access, and has two different internal data-structures to support the
two different patterns. The padding (and stripping off the padding) damaged
one of these structures, but left the other untouched. By a happy chance, the
undamaged structure was the one that was best suited to Gerry's application.

java.util.ZipFile and java.util.ZipFileStream are the APIs corresponding to the
two different access methods. It would be much better if the documentation
*explained* that and also explained what the algorithms were (in this case it
is *not* an "implementation detail") and the tradeoffs between them, but...

-- chris


Joseph Dionne

unread,
Apr 25, 2004, 11:32:10 AM4/25/04
to
Chris Uppal wrote:
> Joseph Dionne wrote:
>

[snip]

>>My concern is that there exists a bug in the ZipFile.open(), believing
>>it should be able to handle the same data the ZipInputStream does.
>

>
> I don't think so. As I tried to explain in my earlier post, ZipFile uses the
> table of contents at the end of the zip file, and hence is unhappy to find that
> the table is corrupt. The ZipInputStream, OTOH, makes no use of the table (in
> fact it never even reads that far in the input), and so is unfazed.
>

And because of this "fact," one can work around the differences in
behavior between ZipFile and ZipInputStream. This is a good thing.

> The point is that there are two *completely* different ways of accessing the
> Zip file structure -- it's designed to work for either sequential access *or*
> random access, and has two different internal data-structures to support the
> two different patterns. The padding (and stripping off the padding) damaged
> one of these structures, but left the other untouched. By a happy chance, the
> undamaged structure was the one that was best suited to Gerry's application.
>

Agreed, while the manor by which ZipFile and ZipInputStream approach the
compressed data is different, is it not true that a bad zip file is
always bad file? If unzip, zipinfo, jar, and other zip file commands
deal with the NWS zip file/stream, would it not be consistent for
ZipFile and ZipInputStream to both do the same, accept or reject it?

> java.util.ZipFile and java.util.ZipFileStream are the APIs corresponding to the
> two different access methods. It would be much better if the documentation
> *explained* that and also explained what the algorithms were (in this case it
> is *not* an "implementation detail") and the tradeoffs between them, but...
>

The fact that the behavior of ZipFile and ZipInputStream differ causes
the dilemma of knowing when to use one over the other. Since I know
this behavior exists, I for one will never again use ZipFile, creating
instead a MyZipFile class that incorporates the logic I know to work
(see the working example in previous post). If developers regularly
using zip files from outside sources do the same, has not the usefulness
of ZipFile be dismissed to the point of irrelevancy, and even worse,
continued confusion? IMHO.

I guess the real question is "when is a bug a bug?" M$ has answered
this question many times by saying "when we say so."

Andrew Thompson

unread,
Apr 25, 2004, 11:56:16 AM4/25/04
to
On Sun, 25 Apr 2004 15:32:10 GMT, Joseph Dionne wrote:

>> The point is that there are two *completely* different ways of accessing the
>> Zip file structure -- it's designed to work for either sequential access *or*
>> random access, and has two different internal data-structures to support the
>> two different patterns. The padding (and stripping off the padding) damaged
>> one of these structures, but left the other untouched. By a happy chance, the
>> undamaged structure was the one that was best suited to Gerry's application.

So, to clarify (for my own understanding)

ZipFile, uses TOC at end, allows random access - fast

ZipInputStream, goes through entries sequentially,
not use TOC - slower

Is that right?
[ OK an oversimplification,
but on the right track? ]

....


> Agreed, while the manor by which ZipFile and ZipInputStream approach the
> compressed data is different, is it not true that a bad zip file is
> always bad file? If unzip, zipinfo, jar, and other zip file commands
> deal with the NWS zip file/stream, would it not be consistent for
> ZipFile and ZipInputStream to both do the same, accept or reject it?

No. It may be that those tools use
the sequential method, or that they
try the TOC first, and fall back to
doing a sequential read if need be.

>> java.util.ZipFile and java.util.ZipFileStream are the APIs corresponding to the
>> two different access methods. It would be much better if the documentation
>> *explained* that and also explained what the algorithms were (in this case it
>> is *not* an "implementation detail") and the tradeoffs between them, but...

Yep, I agree with that.

Sometimes I think the high level classes
offered by Sun obscure too much of what
is actually happening.

Not that I'd want to have to code all
that stuff myself.. ;-)

Joseph Dionne

unread,
Apr 25, 2004, 12:35:54 PM4/25/04
to

And here is where ZipFile should be "fixed," if one agrees the inability
of opening a, be it debatable, "good" zipfile a bug. If the TOC is
invalid, or corrupt, the random access method should throw an exception,
i.e. "ZipException: TOC not available" or better yet, the TOC should be
created from the file, since it is an existing file and not a stream.
By ZipFile failing to merely open a zipfile, such as provided by NWS,
the implication is the entire file is bad, i.e. not a zip file, not just
a lesser part of the zipfile.

[snip]

Andrew Thompson

unread,
Apr 25, 2004, 1:14:21 PM4/25/04
to
On Sun, 25 Apr 2004 16:35:54 GMT, Joseph Dionne wrote:

> Andrew Thompson wrote:
...
>>>Agreed, while the manor by which ZipFile and ZipInputStream approach the
>>>compressed data is different, is it not true that a bad zip file is
>>>always bad file? If unzip, zipinfo, jar, and other zip file commands
>>>deal with the NWS zip file/stream, would it not be consistent for
>>>ZipFile and ZipInputStream to both do the same, accept or reject it?
>>
>> No. It may be that those tools use
>> the sequential method, or that they
>> try the TOC first, and fall back to
>> doing a sequential read if need be.
>
> And here is where ZipFile should be "fixed," if one agrees the inability
> of opening a, be it debatable, "good" zipfile a bug. If the TOC is
> invalid, or corrupt, the random access method should throw an exception,
> i.e. "ZipException: TOC not available"

A better description in the exception
would be a good idea, combined with
better documentation of the methods

>...or better yet, the TOC should be

> created from the file, since it is an existing file and not a stream.

I disagree.

I have an 11 Meg file on the net
at the moment. Let's say I want to
pull any _particular_ single file out
for inspection (as I do right here
<http://www.physci.org/source.jsp>).

IIUC, the random access method (ZipFile)
would be far better for that. In the event
that the TOC becomes corrupted, I would prefer
to have the program throw exceptions, which
can then allow me to catch them and proceed in
the best manner.

For a file on the local filesystem, I might
be tempted to simply revert to the ZIS method,
it would not matter if it required the entire
11 Meg of the file be read.

The server is a different matter, and I would
probably remove the page that accesses the zip
until I could get a fresh zip with a valid
TOC uploaded.

Now.. you might reason..

OK, so they should allow fall-back to sequential
_unless_ the programmer specifically requests
FAIL_ON_BAD_TOC.

Yes they _could_ do that, but then that would
be just as, if not more, reliant on effective
documentation, which brings us straight back to
the concern expressed by chris.

The existing docs are not good enough, if
they were, this problem would simply be
'options'. If you need to add futher
back-up methods etc., that would seem to
complicate matters, as _those_ methods
would then require effective documentation.

> By ZipFile failing to merely open a zipfile, such as provided by NWS,
> the implication is the entire file is bad, i.e. not a zip file, not just
> a lesser part of the zipfile.

I am not sure what you mean by 'implication'
here, but feel that any vagueness could be
removed with better documentation and error
information.

Joseph Dionne

unread,
Apr 25, 2004, 2:37:57 PM4/25/04
to

Forgive my ignorance, but I believe my http client app cannot open a zip
file via the Internet with ZipFile. thus would need to access it via a
ZipInputStream. So, this reason does not refute my premise, that on a
local filesystem, ZipFile, and ZipInputStream should similar behaviors.

Upon further review, ZipInputStream cannot even identify a zip file
stream, it simply returns no entries. Now, I as an application
developer cannot even determine if the valid zip file is simply empty.

> The server is a different matter, and I would
> probably remove the page that accesses the zip
> until I could get a fresh zip with a valid
> TOC uploaded.
>

As the webmaster, that is something you would do for any failing link,
and rightly so. Yes, the TOC is mandatory, and I assume the zip files
on you website were either created by you, or at least validated by you
prior being placed on you servers. So, again, this argument does not
address the case at hand, i.e. downloading a zip file created by another
entity, and using pure Java to read what the app expected as a valid zip.

> Now.. you might reason..
>
> OK, so they should allow fall-back to sequential
> _unless_ the programmer specifically requests
> FAIL_ON_BAD_TOC.
>
> Yes they _could_ do that, but then that would
> be just as, if not more, reliant on effective
> documentation, which brings us straight back to
> the concern expressed by chris.
>
> The existing docs are not good enough, if
> they were, this problem would simply be
> 'options'. If you need to add futher
> back-up methods etc., that would seem to
> complicate matters, as _those_ methods
> would then require effective documentation.
>

Perhaps is it just my feeling, but I find most Java docs provide only
the basic of information. Thank goodness for newsgroups and the
community sharing our learned experiences. Sometimes, using Java is
very much like using M$ VC API -- keep trying things until one of them
gets the desired affect.

>
>>By ZipFile failing to merely open a zipfile, such as provided by NWS,
>>the implication is the entire file is bad, i.e. not a zip file, not just
>>a lesser part of the zipfile.
>
>
> I am not sure what you mean by 'implication'
> here, but feel that any vagueness could be
> removed with better documentation and error
> information.
>

Obviously a zip file/stream has certain "milestones" that must meet
certain reasonable data checks. Using the modified version of the
TestZipFile sample in a previous post, I get the same behavior reading
an empty .zip file as I would get reading non .zip file, i.e. a .java
file lets say. ZipFile should do the milestone checks, best guess
checks, that the file is indeed a zip file, returning an open error if
the file is obviously not a .zip file, or returning a ZipFile object if
it is valid. If the TOC is not available, throw an exception from the
random access method, allowing me, the application developer, the choice
of whether to continue or not.

As it stands now I have no way to "know" if the .zip file is valid or
not. As I said, an invalid .zip file and a non .zip file behave the same
using ZipInputStream.

Sir, I dont wish to belabour this point, but I believe the "hard work"
should be on the language developers, not those using their language. I
apply this same philosophy to my applications -- I get paid the big
bucks to ease the burdens of my users, not to add to their miseries.

Roedy Green

unread,
Apr 25, 2004, 5:20:41 PM4/25/04
to
On Sun, 25 Apr 2004 10:04:13 +0100, "Chris Uppal"
<chris...@metagnostic.REMOVE-THIS.org> wrote or quoted :

>The other way of reading a Zip file is to start at the beginning and iterate
>over each element, ignoring the table of contents. To do that in Java you use
>a java.util.zip.ZipInputStream. ZipInputStream is a rather weird class
>(because of the weird nature of the Zip file format).

One thing you have to watch out for is Java created Zip files are
incomplete. Not all the fields are filled in. This is because it does
no back patching to the embedded index in its strictly sequential
creation pass.

see http://mindprod.com/jgloss/zip.html for details.

Chris Uppal

unread,
Apr 26, 2004, 5:54:58 AM4/26/04
to
Andrew Thompson wrote:

> So, to clarify (for my own understanding)
>
> ZipFile, uses TOC at end, allows random access - fast
>
> ZipInputStream, goes through entries sequentially,
> not use TOC - slower
>
> Is that right?
> [ OK an oversimplification,
> but on the right track? ]

That's right.

It's worth adding that the thing you are trading-off for speed is the
ability to read or write the Zip format without arbitrarily seeking in
the file. As a case in point, the OPs application was reading the data
off the network, so it's natural to consume it as a single forward-only
scan.

-- chris

Chris Uppal

unread,
Apr 26, 2004, 7:57:44 AM4/26/04
to
Joseph Dionne wrote:

> > I don't think so. As I tried to explain in my earlier post, ZipFile
> > uses the table of contents at the end of the zip file, and hence is
> > unhappy to find that the table is corrupt. The ZipInputStream, OTOH,
> > makes no use of the table (in fact it never even reads that far in the
> > input), and so is unfazed.
> >
>
> And because of this "fact," one can work around the differences in
> behavior between ZipFile and ZipInputStream. This is a good thing.

The way you've put "fact" in quotes suggests to me that you don't understand
the way the ZIP file format is designed. It is not an accident of
implementation, its a feature of the format's design that is very properly
reflected in the Java classes that implement that design.


> > The point is that there are two *completely* different ways of
> > accessing the Zip file structure -- it's designed to work for either
> > sequential access *or* random access, and has two different internal
> > data-structures to support the two different patterns. The padding
> > (and stripping off the padding) damaged one of these structures, but
> > left the other untouched. By a happy chance, the undamaged structure
> > was the one that was best suited to Gerry's application.
> >
>
> Agreed, while the manor by which ZipFile and ZipInputStream approach the
> compressed data is different, is it not true that a bad zip file is
> always bad file?

No.

The ZIP format is a *data format* not a file format. It is designed to support
a "stream like" approach, where the only data that an application has is what
it has read *so far* (off the network, or from a tape, or whatever), and so it
*has* to be able to interpret what it has seen without waiting for the end of
the input. Hence "errors" in the file that occur after any given entry are
*irrelevant* to that entry, and hence irrelevant to any application that is
doing a forward-only pass over the data.

Other applications don't restrict themselves to forwards-only, but allow
themselves to "know" that the data is held in a seekable format (such a normal
disk file). Such applications, and such applications *only*, will use the
table-of-contents and will be sensitive to errors in it.


> If unzip, zipinfo, jar, and other zip file commands
> deal with the NWS zip file/stream, would it not be consistent for
> ZipFile and ZipInputStream to both do the same, accept or reject it?

No.

Those programs are only some of the applications of the ZIP format. Any
individual one may use only a fraction of the power in the format, or they may
try to expose all the power. I don't know which do and which don't. One way
to see is to ask which of them can read/write ZIP-encoded data to/from a tape.
Any that can't must be relying on the random-access features of hard-disk
files. Equally you can check to see how efficient they are at retreiving data
from an entry near the end of a big ZIP file. Any that are slow are presumably
doing a forward-only scan and failing to make use the random-access
features.

The Java classes expose both feature-sets.

If you want life to be simple, and for there just to be an easy-to-use and
easy-to-understand class library, then either you will have to drop some of the
features (a bad thing) or you will have to design something that is better than
the Sun-supplied stuff.

Take it from me, that's hard. I've recently being going through the exercise
of designing a class library for manipulating ZIP formatted data (in a
different language) and it is *not easy* to find a workable compromise, let
alone something that is both simple and comprehensive.

I don't particularly admire the design Sun have come up with (but then I don't
think much of the design of the ZIP format either). But it is genuinely
difficult to do better (my own attempt uses more layering).

I think that the real problems are that:

a) not everyone realises what the ZIP format is *for* (they've only every used
it for files).

b) the documentation is *terrible*.


> The fact that the behavior of ZipFile and ZipInputStream differ causes
> the dilemma of knowing when to use one over the other. Since I know
> this behavior exists, I for one will never again use ZipFile, creating
> instead a MyZipFile class that incorporates the logic I know to work
> (see the working example in previous post).

No, no, no, no, no. NO ! You have it all wrong.

The two ZIP classes in the Java library are intended for different purposes.
In some cases there's an overlap and either could be used (albeit with
different performance tradeoffs). In most cases one is at least clearly
preferable to the other; and in some cases only one or the other can possibly
be used. It's *your* job, as a programmer, to understand the issues and make
an intelligent choice based on your understanding. To the extent that you
don't understand the issues then you are not doing your job properly (and to
the extent that that is the fault of the Sun documentation -- 100% I suspect --
the Sun programmers weren't doing their job properly either).


> I guess the real question is "when is a bug a bug?" M$ has answered
> this question many times by saying "when we say so."

<grin>

Either of the two classes may have bugs or deficiencies, of course, in any
given implementation. But bugs are not the issue here. For the data under
discussion, both classes (in the JDK 1.4.2 implementation) are working
perfectly.

-- chris


Chris Uppal

unread,
Apr 26, 2004, 7:50:28 AM4/26/04
to
Roedy Green wrote:

> One thing you have to watch out for is Java created Zip files are
> incomplete. Not all the fields are filled in. This is because it does
> no back patching to the embedded index in its strictly sequential
> creation pass.

Agreed about the facts.

I'm not sure that I agree with the phrasing though. The files it creates are
not "incomplete"; all the data is there, and it is in full conformance with the
spec.

However the *way* that it's written can confuse a badly-written decoder since
the information on a file's size, compressed size, and CRC is placed *after*
the file. A sufficiently badly-written (or deliberately incomplete) decoder
might have difficulty coping with that. I.e. the spec is *designed* to allow
Zip format data to be written without back-patching, but some broken readers
may not be able to cope.

AFAIK, the only reader software that *does* suffer from that inability is Sun's
own implementation. At least, I've seen comments to that effect that on this
NG -- I haven't actually tested it myself.

-- chris

Chris Uppal

unread,
Apr 26, 2004, 7:55:32 AM4/26/04
to
Joseph Dionne wrote:

> As it stands now I have no way to "know" if the .zip file is valid or
> not. As I said, an invalid .zip file and a non .zip file behave the same
> using ZipInputStream.

If haven't tested this myself, but I'll take your word for it. That is indeed
a deficiency (or perhaps a bug -- fairly major either way) in the
implementation of ZipInputStream.

It should read until it either finds a correctly formatted entry representing
the start of the ToC (and hence the end of the real data), and then return
null. Or it should throw an exception indicating that the data is corrupt
beyond some point (from the very start of the data if it isn't in ZIP format).
It should *not* fail silently.

-- chris

Chris Uppal

unread,
Apr 26, 2004, 7:54:48 AM4/26/04
to
Joseph Dionne wrote:

> And here is where ZipFile should be "fixed," if one agrees the inability
> of opening a, be it debatable, "good" zipfile a bug. If the TOC is
> invalid, or corrupt, the random access method should throw an exception,
> i.e. "ZipException: TOC not available"

That's exactly what does happen (though the exception could be clearer). If
you try to use the ToC you get an exception. If you don't try to use the ToC
(ie, if you use ZipInputStream) then you don't. What could be simpler ?

> or better yet, the TOC should be
> created from the file, since it is an existing file and not a stream.

You can't. The ToC contains data that is not included in the body of the file
(which is a fairly major design error IMO -- presumably something to do with
maintaining compatibility as the format has evolved).

It might help if you though of the class java.util.ZipFile as being named
java.util.ZipFileIndex -- in many ways that would be a better name, since it is
really the index that is represented, not the whole file.

Of course, there could be an option to "fix" (heuristically) broken ZIP files.
Some of the ZIP applications do have options to do such things. There's no
obvious reason why the Java library should provide such options, though,
anymore than it provides an option to try to recover compressed data that has
been corrupted by a CR <-> CR/LF conversion.

-- chris

Gerry Wheeler

unread,
Apr 26, 2004, 7:59:48 PM4/26/04
to
On Mon, 26 Apr 2004 10:54:58 +0100, Chris Uppal wrote:
> As a case in point, the OPs application was reading the data off the
> network, so it's natural to consume it as a single forward-only scan.

It's more complicated than that, but you're basically correct. In my
particular application, the data arrives in formatted packets, possibly
intermingled with packets of other NWS products. So I have to save all the
packets to a file before starting the decompression.

Nevertheless, your point is well-made. An application could unzip a file
as it arrived in a stream if necessary, and ZipInputStream facilitates
that.

Andrew Thompson

unread,
Apr 27, 2004, 2:33:02 AM4/27/04
to
On Mon, 26 Apr 2004 10:54:58 +0100, Chris Uppal wrote:

>> ZipFile, uses TOC at end, allows random access - fast
>>
>> ZipInputStream, goes through entries sequentially,
>> not use TOC - slower

...
>> ..on the right track? ]
..


> It's worth adding that the thing you are trading-off for speed is the
> ability to read or write the Zip format without arbitrarily seeking in
> the file. As a case in point, the OPs application was reading the data
> off the network, so it's natural to consume it as a single forward-only
> scan.

A very good point Chris. When I was doing
one computer course long ago the person running
it asked whether HD or _magnetic tape_ was the
quickest to 'read a file'.

Of course, we all incorrectly guessed HD,
the instructor then proceeded to demonstrate
that the huge file, sequential read from the
tape drive, was actually a tad quicker than
reading off HD.

The moral of that story: data storage medium
appropriate to the circumstance and intended use.

Joseph Dionne

unread,
Apr 27, 2004, 3:50:47 AM4/27/04
to
Chris Uppal wrote:
> Joseph Dionne wrote:
>
>
>>And here is where ZipFile should be "fixed," if one agrees the inability
>>of opening a, be it debatable, "good" zipfile a bug. If the TOC is
>>invalid, or corrupt, the random access method should throw an exception,
>>i.e. "ZipException: TOC not available"
>
>
> That's exactly what does happen (though the exception could be clearer). If
> you try to use the ToC you get an exception. If you don't try to use the ToC
> (ie, if you use ZipInputStream) then you don't. What could be simpler ?
>

I will avoid the temptation to write "no" repeatedly <grin>. Actually
what happens when a zip file of concern to the OP, one without a valid
TOC, is "opened" by ZipFile, it fails in the native open method open of
ZipFile, preventing one from getting the exception you mentioned.
Perhaps a corruption in the zip file TOC, a few bad bytes, might
generate the exception mentioned. I will attempt to verify this
possibility.

>>or better yet, the TOC should be
>>created from the file, since it is an existing file and not a stream.
>
>
> You can't. The ToC contains data that is not included in the body of the file
> (which is a fairly major design error IMO -- presumably something to do with
> maintaining compatibility as the format has evolved).
>

Perhaps "you can't," however I can. Admittedly, some data will need to
be supplied, and as Mr. Thompson has previously noted, this should not
be a automatic feature applied to all files, like his large zip
repository driving his website, however for some, like the original OP,
this would be a useful feature.

> It might help if you though of the class java.util.ZipFile as being named
> java.util.ZipFileIndex -- in many ways that would be a better name, since it is
> really the index that is represented, not the whole file.
>
> Of course, there could be an option to "fix" (heuristically) broken ZIP files.
> Some of the ZIP applications do have options to do such things. There's no
> obvious reason why the Java library should provide such options, though,
> anymore than it provides an option to try to recover compressed data that has
> been corrupted by a CR <-> CR/LF conversion.
>
> -- chris
>

Zip file technology has been evolving for some twenty years. I have had
the (dis)pleasure of using every evolution over that same period, at
times a very painful experience. My intent of this discussion is to
head off a repeat of history. Jar is the only zip utility that cannot
read a stream, i.e. "jar tvf - < file.ZIS" is invalid syntax.

Again, reading a stream, one would use ZipInputStream and not ZipFile,
and not take advantage of the TOC. However, identifing a non zip file
is not based on the presense or lack of a valid TOC. Run unzip a one of
you .java files, and it will report that the file "might not" be a zip
file. I am sure there is no TOC in your .java file. Yet run unzip on
the .ZIS file of the original OP, and you will find it is accepted as a
valid zip file. This means that unzip uses other criteria before giving
up. I am only suggesting that the technology learned over the last
twenty years regarding zip files should be part of Java's zip classes,
at a minimum.

As a final note, my thanks to all who have posted. It is only my hope
that in continuing this conversation that we as developers ask, nay
demand, better from the developers of the languages we use, just like
our customers ask/demand more from us. I for one want to spend time
creating new applications, not finding the flaws of the languages I
selected to create those applications.

Chris Uppal

unread,
Apr 27, 2004, 3:27:12 AM4/27/04
to
Gerry Wheeler wrote:

> In my
> particular application, the data arrives in formatted packets, possibly
> intermingled with packets of other NWS products. So I have to save all the
> packets to a file before starting the decompression.

Well, you don't /have/ to. It sounds easy enough to create a custom
InputStream that pulls the "packets" off the network (passing intermingled data
off to other handling) and provides the zipped data to the ZipInputStream on
demand. Not that it's necessarily /worth/ doing that -- my guess is that it
wouldn't be -- it depends on the rest of your app (and on your "style" as a
programmer).

-- chris


Chris Uppal

unread,
Apr 27, 2004, 8:22:36 AM4/27/04
to
Joseph Dionne wrote:

> I will avoid the temptation to write "no" repeatedly <grin>.

I'm glad you took all those "no"s as I had intended. I realised afterwards
that I should at least have scattered a few smilies around -- it could be been
read as hostile when I only meant it jokingly.


> Actually
> what happens when a zip file of concern to the OP, one without a valid
> TOC, is "opened" by ZipFile, it fails in the native open method open of
> ZipFile, preventing one from getting the exception you mentioned.
> Perhaps a corruption in the zip file TOC, a few bad bytes, might
> generate the exception mentioned. I will attempt to verify this
> possibility.

I think it's quite likely that it doesn't reallise that there /is/ a ToC at
all. I know that if I use my own code to try to open a Zip file with extra
stuff added on at the end, or truncated, then that's what happens. The reason
is sheds an interesting side-light on the quality of design in the ZIP format
;-) The problem is that the "starting point" for finding the ToC is to to
find the "central directory [=ToC] end record". That entry is at the end of
the file, but it is of variable size. And you can only work out how big it is
by finding its start. Stupid, eh ? So, what you have to do is scan backwards
through the file looking for a special marker-sequence of four bytes. If you
find one (in the last 64K or so), then you have to verify that it isn't just an
accidental occurence of that pattern. There are various tests you can make to
see if the apparent "end record" is plausible, and one of the easiest and most
obvious is that it does indeed extend to exactly the end of the file. If the
file has been added-to or truncated, then the "record" will be deemed invalid,
and -- this is the important point -- the search will continue looking further
backwards in the file.

Since the search never finds an acceptable "end record", the software can only
conclude that it's not reading a Zip file at all, rather than thinking it's got
a corrupt one. (I'm considering modifiying my own implementation so that it
can optionally be told to use the "best available candidate" if none is
perfect -- though I'm not yet convinced that's a feature I'll need).


> Perhaps "you can't," however I can. Admittedly, some data will need to
> be supplied, and as Mr. Thompson has previously noted, this should not
> be a automatic feature applied to all files, like his large zip
> repository driving his website, however for some, like the original OP,
> this would be a useful feature.

As above, I'm not really convinced that the ability to repair damaged files
should be part of the standard API for manipulating those files. It seems to
me that that is better handled by a separate tool. Or at least a separate API.
For one reason, it's less complicated for the developer/user to learn. For
another the "forgiving" parser probably won't share a lot of code with the
"correct" parser.


> Zip file technology has been evolving for some twenty years. I have had
> the (dis)pleasure of using every evolution over that same period, at
> times a very painful experience.

Displeasure is right. I've not been using ZIP stuff for as long as you, but I
remember when I first started programming on DOS / early Windows and being
gob-struck at how utterly brain-dead the ZIP utilities were after I'd been used
to *NIX tar and compress.


> I am only suggesting that the technology learned over the last
> twenty years regarding zip files should be part of Java's zip classes,
> at a minimum.

I think I understand your point. But I don't really agree with it.

If an API is defined with abilities roughly like the Zip utilities (i.e. making
best efforts to retrieve data, even if it means falling back to another access
method; pretending that it is possible to replace one file in a Zip file as a
single operation, and so on) then I think that should be a higher-level API,
and separate from the relatively low-level APIs that exist. As far as I can
see, the current low-level APIs are just about sufficient to allow Sun to
create a higher level one on top of them[*]. Perhaps you are right and some
such API should be added to the existing stuff (I can certainly think of uses
for it), but it should be separate from, and in addition to, the current stuff.
I would hate to see it all rolled-up into one big ball where I didn't have
control of (and perhaps didn't even know) what the code is doing.

-- chris

[*] One big nuisance is that the natural name for such a high level class would
be java.util.ZipFile, which is already in use :-(


Joseph Dionne

unread,
May 22, 2004, 10:02:49 AM5/22/04
to
Chris Uppal wrote:

{snip}

Sorry my response is so delayed, I've been heads down in project without
time to play. At times the theoretical must make room for the practical.

All your points are well taking, and in keeping with OOP and Java
principles. Perhaps this is where my issue derives -- I have
fundamental disagreements with OOP and the direction Java is taking. As
a c language programmer from K&R forward, I love the simplicity of that
language, believing less is more.

With Java, the "API" is so large, and growing, it is approaching the
point where effective use will be diminished because of its eminence
size. So with that said, I shall address your following points.

>
> If an API is defined with abilities roughly like the Zip utilities (i.e. making
> best efforts to retrieve data, even if it means falling back to another access
> method; pretending that it is possible to replace one file in a Zip file as a
> single operation, and so on) then I think that should be a higher-level API,
> and separate from the relatively low-level APIs that exist.

While I agree on principle with this, utilities build up, from the
lower, more simpler components to higher, more advanced functionality.
Since opening a file is, IMHO, the simplest of operations, a higher
level utility to "repair" a zip file would still need to open that same
zip file. (Remember that the zip file in question is still valid, just
without one valid structure, the TOC.) However. ZipFile cannot be used
to open this file, so a complete replacement of ZipFile is needed, not
an extension of ZipFile.

> As far as I can
> see, the current low-level APIs are just about sufficient to allow Sun to
> create a higher level one on top of them[*].

The functionality I added in myZipFile class, using InputStream and
ZipStream, have the ability to 1) differentiate between an invalid zip
file, for example myZipFile.java, and one such as the OPs zip file
missing the TOC, and 2) simulate the functionality of ZipFile, less
direct access of course (which I intend to add in the future). The
replacement is functional, and demonstrates my point that the native
method ZipFile.open() could do the same, thus making ZipFile more
useful, not less. (Look mom, no utilities!)

> Perhaps you are right and some
> such API should be added to the existing stuff (I can certainly think of uses
> for it), but it should be separate from, and in addition to, the current stuff.
> I would hate to see it all rolled-up into one big ball where I didn't have
> control of (and perhaps didn't even know) what the code is doing.
>

I see no loss of control because ZipFile.open() can detect a truly bad
zip file, or open the OPs limited zip file. And, by extending ZipFile,
the OOP/Java way, to create your higher level utility, control is once
again restored -- you could fail to open a zip file that does not
contain a valid TOC for instance, by throwing an exception.

It remains my opinion that as a "user" of the Java language, I have the
right to "demand" reasonable behavior from the language developers, just
like my customers/users demand reasonable behavior from my work.
Perhaps I am just getting too picky in my old age, but when I began in
software development, the concept of open source, community developed
OSes and languages was just that -- a concept many of us desired. Now
that these concepts are being realized, I wish to push the envelope
further, by raising the awareness of language developers that their
"customers" are we that use the fruits of their labor.

joseph

Chris Uppal

unread,
May 25, 2004, 5:53:18 AM5/25/04
to
Joseph Dionne wrote:

> Sorry my response is so delayed, I've been heads down in project without
> time to play. At times the theoretical must make room for the practical.

It's Good to eat...


> All your points are well taking, and in keeping with OOP and Java
> principles. Perhaps this is where my issue derives -- I have
> fundamental disagreements with OOP and the direction Java is taking. As
> a c language programmer from K&R forward, I love the simplicity of that
> language, believing less is more.

I agree that Java is over-complex, but I suspect that we are thinking of
different things...


> With Java, the "API" is so large, and growing, it is approaching the
> point where effective use will be diminished because of its immense
> size.

Yes, we are. The class library is only an application of Java -- albeit one
that is standardised. I prefer to keep the concepts of the language and the
library separate.

The API is huge, and growing. It has some nasty kludges due to early design
flaws. But I don't see how to avoid that and still have a lot of pre-packaged
functionality. It's nothing to do with OO really, just the shear number of
application domains that Sun have added to the mix.

Perhaps it'd be better if there were a clearer separation between a small
"core" library that could be thought of as part of the Java language (akin to
the C standard library), and a much larger set of code that is only part of the
Java "platform".


> > As far as I can
> > see, the current low-level APIs are just about sufficient to allow Sun
> > to create a higher level one on top of them[*].
>

> The functionality I added in myZipFile class, using InputStream and
> ZipStream, have the ability to 1) differentiate between an invalid zip
> file, for example myZipFile.java, and one such as the OPs zip file
> missing the TOC, and 2) simulate the functionality of ZipFile, less
> direct access of course (which I intend to add in the future). The
> replacement is functional, and demonstrates my point that the native
> method ZipFile.open() could do the same, thus making ZipFile more
> useful, not less. (Look mom, no utilities!)

Well, you can look at this in another way, and say that the fact that you have
done this shows that it's not necessary to package this functionality as part
of the /standard/ class library. (Which, after all, we have just agreed is
dauntingly big).


> It remains my opinion that as a "user" of the Java language, I have the
> right to "demand" reasonable behavior from the language developers, just
> like my customers/users demand reasonable behavior from my work.

I very much agree with this in general. There is a specific point though: in
my experience, users tend to want solutions that are "too" simple -- that is
the solution they ask for (first) is actually simpler than the task itself,
which means that they wouldn't always be able to do their work with it. Part
of our job as designers/programmers is to ensure that we know /all/ the
/necessary/ complexities of their task, and ensure that we don't forget that
the users will have to deal with "odd" cases, even though the users themselves
tend to do so.

I'm of the opinion that there's a similar "pressure" from consumers of a class
library for X, for it to be simplified to reflect the limited concept of X that
is most common. It's the job of the class library designer to ensure that the
process of dropping/hiding "inconvenient" aspects of X doesn't go too far.

Of course, it's a matter of opinion whether Sun's Zip library has gone too far
in the opposite direction. I think you've made a fair case that it has. (But,
I don't agree so strongly that I've changed the API of my own Zip library to
include the features you are seeking ;-)

-- chris

Michael Borgwardt

unread,
May 25, 2004, 6:08:03 AM5/25/04
to
Joseph Dionne wrote:

> With Java, the "API" is so large, and growing, it is approaching the
> point where effective use will be diminished because of its immense
> size.

Nobody forces you to use it. You can always roll your own. I just
don't see how that is more effective.

0 new messages