extracting the corpus zip files

394 views
Skip to first unread message

Suzan Verberne

unread,
Apr 7, 2014, 8:39:42 AM4/7/14
to clef-ehea...@googlegroups.com
Dear all,

Maybe I missed another topic on this, but I have problems extracting the corpus zip files (on Linux).
I first tried 'unzip part1.zip'
But got:
End-of-central-directory signature not found.  Either this file is not
  a zipfile, or it constitutes one disk of a multi-part archive.  In the
  latter case the central directory and zipfile comment will be found on
  the last disk(s) of this archive.

After googling I found this solution: 'jar xvf part1.zip' and that seemed to work at first but then I get:

created: part1/
 inflated: part1/attra0843_12.dat
 inflated: part1/bestb0834_12.dat
 inflated: part1/cadth0844_12.dat
 inflated: part1/cks.n0835_12.dat
java.io.EOFException: Unexpected end of ZLIB input stream
        at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
        at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
        at java.util.zip.ZipInputStream.read(ZipInputStream.java:163)
        at java.util.zip.ZipInputStream.closeEntry(ZipInputStream.java:109)
        at sun.tools.jar.Main.extractFile(Main.java:954)
        at sun.tools.jar.Main.extract(Main.java:870)
        at sun.tools.jar.Main.run(Main.java:260)
        at sun.tools.jar.Main.main(Main.java:1167)


The same for the other parts, although some get further than others.

What do I do wrong?

Thanks!
Suzan Verberne

-- 
Suzan Verberne, postdoctoral researcher
Information Foraging Lab, Institute for Computing and Information Sciences
Radboud University Nijmegen
Tel: 0031 24 36 53431/11668
Email: s.ver...@cs.ru.nl
http://sverberne.ruhosting.nl
--

Johannes Leveling

unread,
Apr 7, 2014, 9:26:06 AM4/7/14
to Suzan Verberne, clef-ehea...@googlegroups.com
You might want to check if you have downloaded the complete file
(i.e. compare the file size of your physical file with the actual file size).
For incomplete zip archives, uncompressing the archive will fail with
the error message you got.

Best
Johannes
--
Johannes Leveling, Research Fellow, CNGL, School of Computing, DCU, Ireland

Suzan Verberne

unread,
Apr 7, 2014, 10:38:08 AM4/7/14
to Johannes Leveling, clef-ehea...@googlegroups.com
Thanks! That seems to be the probleem indeed!
Centre for Language and Speech Technology / Information Foraging Lab
Radboud University Nijmegen
Tel: 0031 24 36 53431/11134

Suzan Verberne

unread,
Apr 8, 2014, 8:38:06 AM4/8/14
to Johannes Leveling, clef-ehea...@googlegroups.com
I now succeeded in extracting all files, except for part2.zip
'jar xvf part2.zip' gives:

created: part2/
java.io.IOException: Push back buffer is full
at java.io.PushbackInputStream.unread(PushbackInputStream.java:232)
at java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:374)
at java.util.zip.ZipInputStream.read(ZipInputStream.java:165)
at java.util.zip.ZipInputStream.closeEntry(ZipInputStream.java:109)
at sun.tools.jar.Main.extractFile(Main.java:954)
at sun.tools.jar.Main.extract(Main.java:870)
at sun.tools.jar.Main.run(Main.java:260)
at sun.tools.jar.Main.main(Main.java:1167)

And the only inflated file is clini0836_12.dat

Anyone recognizes this problem?

Suzan

Lorraine Goeuriot

unread,
Apr 9, 2014, 7:35:19 AM4/9/14
to clef-ehea...@googlegroups.com
Hi Suzan,

Sorry for the delay replying. We just tried again to unzip it with the command "unzip" on ubuntu and it worked fine (2 of us tried this). 
I got an error with Open suse on another computer, similar to the one you got in your first message, and managed to unzip it with the graphical interface (I mean not through command line).

Let us know if that helps. 

Best,
Lorraine
Reply all
Reply to author
Forward
0 new messages