Account Options

  1. Sign in
The old Google Groups will be going away soon.
Switch to the new Google Groups.
Google Groups Home
« Groups Home
Message from discussion The C-Prize
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jim_bow...@hotmail.com  
View profile  
 More options Jun 17 2005, 6:30 pm
Newsgroups: comp.compression
From: jim_bow...@hotmail.com
Date: 17 Jun 2005 15:30:41 -0700
Local: Fri, Jun 17 2005 6:30 pm
Subject: Re: The C-Prize
Well obviously I was thinking too big for an early contest.

Perhaps Wikipedia would suffice:

http://download.wikimedia.org/#wikipedia

The current english language edition, gzipped, is about 1GB, but as you
see, there are many Wikipedias in different languages.  I doubt they
total over 4GB gzipped.

Compressing multiple languages would have a very good side effect:
Machine translation technology could be dramatically enhanced and
properly constructed compressors would discover things about the
various cultures that might not otherwise be obvious.

Part of my preference for picking a big corpus to begin with is so that
if the contest goes big time -- which it would if proper funding levels
were applied to it -- some sort of rules might be constructed to allow
some divisions of the contest for large computer systems to crunch away
doing more brute-force statistics such as Turney used, while other
divisions could do a fairly sampled subset of the total corpus.  But
this is something we can deal with later when the competition (and
prize) gets much bigger.

PS:  Another reason I'd like to see huge corpora ultimately is that I'd
really like to see this system beat humans by a substantial amount:
We're so stupid we desperately need all the help we can get.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.