Newsgroups: comp.compression
From: "Matt Mahoney" <matmaho...@yahoo.com>
Date: 19 Jun 2005 11:38:04 -0700
Local: Sun, Jun 19 2005 2:38 pm
Subject: Re: The C-Prize
jim_bow...@hotmail.com wrote: I think as long as there is enough text in the data that a compressor > Matt Mahoney writes: > > Nice choice. Lots of contributors, lots of topics, high quality. > Here are the drawbacks I see to using the "cur" download of Wikipedia: > 1) The purported "neutral point of view" is subject to systemic bias. > 2) If a snapshot of the "cur" downloads is delayed, it will be subject > 3) The entire history of edits is about a factor of 10 larger than the > For a discussion of the downloads see: > http://en.wikipedia.org/wiki/Wikipedia:Database_download#Weekly_datab... has to be able to learn semantics, syntax, coherence, etc. to compress it effectively, then it should be a good test of AI. As long as everyone works with the same data, it should be fair. Whatever version is used will become outdated, but this should not matter to a compressor that starts with no knowledge. Wikipedia is not completely representative of all possible human communication, but I think it is close enough. I don't think the edits are that useful because you end up with lots of nearly identical texts that are easy to compress. -- Matt Mahoney You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||