Lucene index of the ICWSM 2009 collection

28 views
Skip to first unread message

Ashish

unread,
Feb 2, 2010, 11:01:14 PM2/2/10
to icwsm-data
I downloaded Lucene index of the ICWSM 2009 collection from
http://www.icwsm.org/2010/data.shtml.

I am unable to untar the index-all.tar file because the file is
corrupt. Please let me know if anyone else also encountered the same
problem and from where can I get the correct files.

Thanks.

Manirupa Das

unread,
Feb 3, 2010, 4:37:57 PM2/3/10
to icwsm...@googlegroups.com
Yes, I've faced a similar problem where the index does not unarchive completely. (I am on a Mac on Snow Leopard. ) The downloaded copy occupies 11GB, but since I was downloading overnight I did not notice breaks if any..




--
You received this message because you are subscribed to the Google Groups "icwsm-data" group.
To post to this group, send email to icwsm...@googlegroups.com.
To unsubscribe from this group, send email to icwsm-data+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/icwsm-data?hl=en.


Ian Soboroff

unread,
Feb 5, 2010, 9:15:14 AM2/5/10
to icwsm...@googlegroups.com
When I get back into town I will double check against my master copy,
and update the download site if necessary. Thanks for the heads up.
Ian

Denzil

unread,
Mar 11, 2010, 4:55:04 AM3/11/10
to icwsm-data
Hi Ian,

Have you updated the Lucene Index? I still receive an error while
extracting the file. The 10.3 GB "index-all.tar.gz" file extracts to a
3.23 GB "index" folder (after ignoring the error). The folder consists
of three files: "_0.cfx" of 3.23 GB, "_30q.cfs" of 0kb and
"segments_2" of 0kb respectively.
I tried opening the index using Luke -- Lucene Index Toolbox :
http://www.getopt.org/luke/ however I was unable to do so.

Regards,
--Denzil

On Feb 5, 7:15 pm, Ian Soboroff <isobor...@gmail.com> wrote:
> When I get back into town I will double check against my master copy,
> and update the download site if necessary.  Thanks for the heads up.
> Ian
>
>
>
> On Wed, Feb 3, 2010 at 4:37 PM, Manirupa Das <manir...@gmail.com> wrote:
> > Yes, I've faced a similar problem where the index does not unarchive
> > completely. (I am on a Mac on Snow Leopard. ) The downloaded copy occupies
> > 11GB, but since I was downloading overnight I did not notice breaks if any..
>

Ian Soboroff

unread,
Mar 11, 2010, 10:01:20 AM3/11/10
to Denzil Correa, Ashish Sureka, icwsm...@googlegroups.com
I just downloaded the index, and was able to unpack both the
newly-downloaded copy and my master copy. Furthermore, I checked the
MD5 checksums and the files match (338b300beb7470a041c9235079a422ca).
All I can conclude is that you have a corrupted download.

When I unpack the tarball, I get:

$ tar tzvf index-all.tar.gz
drwxr-xr-x dknights/staff 0 2009-02-20 11:29:10 index/
-rw-r--r-- dknights/staff 64 2009-02-20 11:29:10 index/segments_2
-rw-r--r-- dknights/staff 3475190168 2009-02-20 11:29:10 index/_0.cfx
-rw-r--r-- dknights/staff 13601844835 2009-02-20 11:27:24 index/_30q.cfs
-rw-r--r-- dknights/staff 20 2009-02-20 11:29:10 index/segments.gen

Ian

On Thu, Mar 11, 2010 at 6:46 AM, Denzil Correa <mce...@gmail.com> wrote:
> Hi Ian,
>
> The Lucene Index still doesn't seem to work. It shows an error while
> extracting. Nonetheless, the 10.3GB sized "index-all.tar.gz" extracts
> to a 3.23 GB "index" folder containing three files: _0.cfx, _30q.cfs
> and segments_2 of size 3.23 GB, 0kB and 0kB respectively.
>
> I try to open the index using Luke-Lucene Index Toolbox :
> http://www.getopt.org/luke/
>
> Could you please look into it? Thanks !


>
> Regards,
> --Denzil
>
> On Feb 5, 7:15 pm, Ian Soboroff <isobor...@gmail.com> wrote:

>> When I get back into town I will double check against my master copy,
>> and update the download site if necessary.  Thanks for the heads up.
>> Ian
>>
>>
>>
>> On Wed, Feb 3, 2010 at 4:37 PM, Manirupa Das <manir...@gmail.com> wrote:
>> > Yes, I've faced a similar problem where the index does not unarchive
>> > completely. (I am on a Mac on Snow Leopard. ) The downloaded copy occupies
>> > 11GB, but since I was downloading overnight I did not notice breaks if any..
>>

Correa Denzil

unread,
Mar 12, 2010, 9:38:17 AM3/12/10
to Ian Soboroff, Ashish Sureka, icwsm...@googlegroups.com
Hi Ian,

The problem was with the WinRar archiver. I untarred using the 'tar' command in Linux and was able to extract all the files.

Thanks !

--Regards,
Denzil
Reply all
Reply to author
Forward
0 new messages