Newsgroups: comp.compression
From: "Matt Mahoney" <matmaho...@yahoo.com>
Date: 17 Jun 2005 12:13:08 -0700
Local: Fri, Jun 17 2005 3:13 pm
Subject: Re: The C-Prize
jim_bow...@hotmail.com wrote: It would be hard to conduct a test with a 1 TB corpus. Few people have > As for the text corpus I would choose: > That was one of the main questions I wanted to discuss because it is so > In this regard, you may be interested in my recently published article: > "AI Breakthrough or the Mismeasure of Machine?" > http://www.kuro5hin.org/story/2005/5/26/192639/466 > I'm thinking of writing another article on the C-Prize. enough computing power to do this, and I am sure you don't want to host the corpus on your website. I would suggest something on the order of 1 GB. Turing estimated in Even this may be too high. The average human vocabulary is 30K common Semantic models using LSA have used about 250 MB, like the WSJ corpus. A ~ A (reflexive, used by most data compressors) LSA uses the transitive property to predict that C follows A even if You might be able to construct a high quality (diverse) corpus by -- Matt Mahoney You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||