tar/zip archives larger than 8GB

1,633 views
Skip to first unread message

Cowles, Esme

unread,
Jun 14, 2011, 2:15:57 PM6/14/11
to digital-...@googlegroups.com
I've been trying to generate BagIt packages of our digital objects, wrapped in either tar or zip archives to make it easier to move them around over HTTP. Although the zip and tar formats both support files larger than 8GB, and command-line tools support that, I haven't been able to generate archives that large with the Java libraries I've found:

JDK 1.6 java.util.zip: 4GB max file size

Apache Commons Compress (http://commons.apache.org/compress/): 7.99GB for tar, inherits java.util.zip 4GB limit for zip.

ICE JTar (http://www.trustice.com/java/tar/): accepts files larger than 8GB, but produces archives incompatible with GNU tar.

Google Code jtar (http://code.google.com/p/jtar/): haven't actually tried this, but there are no docs and the code looks like it's using the old 11-digit octal (7.99GB) limit.


Does anybody have experience working with tar/zip archives larger than 8GB in Java? Should I just fall back on command line tools with better large file support?

-Esme
--
Esme Cowles <esco...@ucsd.edu>

"Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement."
-- J.R.R. Tolkien, The Fellowship of the Ring

Mark A. Matienzo

unread,
Jun 14, 2011, 2:27:49 PM6/14/11
to digital-...@googlegroups.com
On Tue, Jun 14, 2011 at 2:15 PM, Cowles, Esme <esco...@ucsd.edu> wrote:
> I've been trying to generate BagIt packages of our digital objects, wrapped in either tar or zip archives to make it easier to move them around over HTTP.  Although the zip and tar formats both support files larger than 8GB, and command-line tools support that, I haven't been able to generate archives that large with the Java libraries I've found:
>
> [snip]

>
> Does anybody have experience working with tar/zip archives larger than 8GB in Java?  Should I just fall back on command line tools with better large file support?

My understanding is that ZIP64 support has been added to OpenJDK, and
it will be included in Java 7.

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library

Chris Prom

unread,
Jun 14, 2011, 2:34:53 PM6/14/11
to digital-...@googlegroups.com
Esme,

You may want to try the tool that Hartwig Thomas has developed to zip large SQL files for the Swiss National Archives SIARD project:

http://sourceforge.net/projects/zip64file/

I have not used it, but I believe the intent was to allow unlimited zip sizes.

Thanks,

Chris
____

Chris Prom
Assistant University Archivist
University of Illinois at Urbana-Champaign
chris...@gmail.com

--
You received this message because you are subscribed to the Google Groups "Digital Curation" group.
To post to this group, send email to digital-...@googlegroups.com.
To unsubscribe from this group, send email to digital-curati...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/digital-curation?hl=en.


Ben O'Steen

unread,
Jun 14, 2011, 3:05:31 PM6/14/11
to digital-...@googlegroups.com
This may sound like a naive question, but does the archive have to be a
single file? I mean, there is only poor* support for very large zip
archives.

* well, narrow support in a limited range of tools.

If it is important that the archive behaves like a single 'zip' then you
could just split the zip file into multiple segments - I know spanning
an archive across multiple zip files is pretty old school, but it should
work.

That said, I would recommend manually splitting the archive into
multiple archives if possible.

Tar is also a format that works well in terms of handling sequential
data - should be ameniable to being created, chopped into segments and
then individually compressed.

There are a number of benefits to not wielding >8Gb files around after
all!

Ben

Cowles, Esme

unread,
Jun 16, 2011, 8:06:02 AM6/16/11
to digital-...@googlegroups.com
Thanks for the pointers, everyone.

In my testing so far, command-line tar has been most reliable: GNU and BSD tar both handle 10GB files, and can read each other's archives, and works on all platforms. zip64file is the best all-Java option I've found. The only drawback is that the builtin zip command on MacOSX (infozip) doesn't support zip64 archives on MacOSX. When JDK 1.7 is released, I'll test the java.util.zip support for zip64 too.

The 8GB+ archives come mostly from large video files (I think the largest file is 16GB). So segmented archives would still need to support files that large. It's frustrating that the format specs have supported larger files for some time, but the tool support is still catching up years and years later.

-Esme
--
Esme Cowles <esco...@ucsd.edu>

"There is always an easy solution to every human problem -- neat, plausible,
and wrong." -- H. L. Mencken

Ben

--

Andy Boyko

unread,
Jun 16, 2011, 12:02:29 PM6/16/11
to digital-...@googlegroups.com
Esme,

The zip included in Mac OS X has supported zip64 since at least Mac OS X 10.6 (Snow Leopard); have you seen a problem?

-Andy

Cowles, Esme

unread,
Jun 16, 2011, 3:17:19 PM6/16/11
to digital-...@googlegroups.com
Andy-

My main test was using zip64file (http://zip64file.sourceforge.net/) to generate zip64 zip files. My standard test was creating an archive containing a 10GB file and a handful of smaller files, and I was able to read them with zip64file and with infozip on linux. Though I've had problems with a larger archive (50GB) created by zip64file (which I haven't been able to get working with any zip implementation). The 50GB archive works fine with GNU tar on the same system.

On MacOSX 10.6, I could create and read 10GB archives with zip64file, but they weren't readable with the builtin infozip or Archive Utility. The zip64file docs specifically say they were targeting compatibility with PKZip:

http://sourceforge.net/projects/zip64file/files/zip64%20Documentation.pdf/download (p3).

So perhaps the files created by zip64file are incompatible with infozip on MacOSX?

infozip will create 10 GB archives, but not unzip them. I tried with both the builtin unzip (5.52, dated 28 February 2005) and the latest from fink (6.00, dated 20 April 2009). Archive Utility will create and read it's own 10GB archives without problems, but can't unpack the infozip archive either.

-Esme
--
Esme Cowles <esco...@ucsd.edu>

"In Lydia's imagination, a visit to Brighton comprised every possibility of
earthly happiness." -- Jane Austen, Pride and Prejudice

Reply all
Reply to author
Forward
0 new messages