Nested zip files

541 views
Skip to first unread message

mg51...@gmail.com

unread,
Jun 23, 2016, 12:01:40 PM6/23/16
to scala-user
Hi there, new to scala not new to coding. So far I've been able to unzip 1 level of files with no problems using scala. 'm having trouble with unzipping nested zip files, has anyone here dealt with this problem and could give some example or pointers? Thanks!


Original.zip contains
file1.txt
file2.txt
file3.zip contains
              file3.txt
file4.zip contains
              file4.txt

I want to be able to extract all the text files automatically/recursively

Thanks!

Rex Kerr

unread,
Jun 23, 2016, 12:36:28 PM6/23/16
to mg51...@gmail.com, scala-user
`java.util.zip.ZipFile` gives you a list of entries in the original (as a `ZipEntry`).

The `getInputStream` method gives you a `java.io.InputStream` for each of those entries.  If it's a text file, you can read that in as lines in any number of ways (including `scala.io.Source.fromInputStream`).  If it's a zip file, wrap it in `java.util.zip.ZipInputStream`.  Now you have to use the `getNextEntry` and `read` methods (after checking how much to read by querying the entry) until the zip file is empty. If you want to do it recursively, you probably need to buffer that `read` and make an `InputStream` out of it (I've called this `ProxyInputStream` in the past; it just takes an input stream and a maximum size and forwards all calls to the underlying input stream except that it fixes up the positions and sizes).  Then you can wrap recursively in `ZipInputStream` if you need to.

  --Rex

--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Compall

unread,
Sep 10, 2016, 4:37:50 AM9/10/16
to Rex Kerr, mg51...@gmail.com, scala-user
On 6/23/16 11:36 PM, Rex Kerr wrote:
`java.util.zip.ZipFile` gives you a list of entries in the original (as a `ZipEntry`).

The `getInputStream` method gives you a `java.io.InputStream` for each of those entries.  If it's a text file, you can read that in as lines in any number of ways (including `scala.io.Source.fromInputStream`).  If it's a zip file, wrap it in `java.util.zip.ZipInputStream`.  Now you have to use the `getNextEntry` and `read` methods (after checking how much to read by querying the entry) until the zip file is empty. If you want to do it recursively, you probably need to buffer that `read` and make an `InputStream` out of it (I've called this `ProxyInputStream` in the past; it just takes an input stream and a maximum size and forwards all calls to the underlying input stream except that it fixes up the positions and sizes).  Then you can wrap recursively in `ZipInputStream` if you need to.

There's no need to do the offset math; ZipInputStream itself resets EOF and stream position on getNextEntry, and treats the end of the current entry as EOF.  (cf Returns for ZipInputStream#available.)  So you just have to make sure to strictly read before calling getNextEntry again.  For example, in this code, is will be the same InputStream for multiple calls, but copyIO strictly reads to EOF before we return and get the next step.  Just don't call close too early and it'll be fine.

Greetings,

Stephen.

Rex Kerr

unread,
Sep 10, 2016, 3:58:56 PM9/10/16
to Stephen Compall, mg51...@gmail.com, scala-user
I think I tried something like that but it ended up being too fragile.  Maybe I just didn't try hard enough.  It looks like it might be simpler.

  --Rex
Reply all
Reply to author
Forward
0 new messages