IO.unzip errors on Mac/Windows when it finds case-insensitive duplicates

88 views
Skip to first unread message

eugene yokota

unread,
Aug 27, 2013, 1:08:17 PM8/27/13
to simple-b...@googlegroups.com
Hi,

I was researching [sbt/sbt-assembly#90] and found that 

1. HFS+ on Mac OS X is by default case insensitive.
2. Some jars like avro-tools-1.7.3.jar contain case-insensitive duplicates.

In this case the duplicates were `META-INF/LICENSE` file and `META-INF/license/` directory.
The current implementation of IO.unzip blows up when this happens,
which is seemingly logical, but practically inconvenient.
Commandline unzip for example on Mac seems to skip the latter.
I think the default should skip on duplicates, and maybe provide an param to configure if blowing up is preferred.

-eugene


Paolo Giarrusso

unread,
Aug 27, 2013, 3:38:04 PM8/27/13
to simple-b...@googlegroups.com
Maybe you're right about IO.unzip, but I don't think you should use that for sbt-assembly.

Shouldn't assembly preserve the original zipfile content by assembling the output zip file directly (through java.util.zip.ZipOutputStream), without going through the filesystem, even though it's probably more code to write?
I'd guess a JAR which breaks when expanded on case-insensitive filesystems is still valid (unless the JAR specification disagrees), and it's easy to create one by naming resource files appropriately.

If the JAR built by sbt-assembly (with this fix) were unpacked, the same problem would appear, but being unpacked seems not the point of JAR files.

On Tuesday, August 27, 2013 7:08:17 PM UTC+2, eugene yokota wrote:
Hi,

I was researching [sbt/sbt-assembly#90] and found that 

1. HFS+ on Mac OS X is by default case insensitive.
Windows FSs count there as well. 

eugene yokota

unread,
Aug 27, 2013, 4:35:48 PM8/27/13
to simple-b...@googlegroups.com
Hi Paolo,

sbt-assembly is the battleground of version conflicts and file duplicates that we normally
take for granted during the runtime. X depends on servlet, jetty depends on another version of servlet, etc.

If we look at license files for instance, the default merge strategy is to rename them since they are
almost always going conflict with license files from other jars.
In principle, I agree with your sentiment on inspecting the index, using ZipOutputStream, etc,
but it sounds like more work for little payout and performance penalty. (we cache unzipped result)
I'd be happy to review pull reqs, if someone else wants to take it further.

-eugene



--
You received this message because you are subscribed to the Google Groups "simple-build-tool" group.
To unsubscribe from this group and stop receiving emails from it, send an email to simple-build-t...@googlegroups.com.
To post to this group, send email to simple-b...@googlegroups.com.
Visit this group at http://groups.google.com/group/simple-build-tool.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages