Law Firms As Potential Underwriters

James A. Bowery

unread,

Mar 14, 2011, 1:38:57 PM3/14/11

to Hutter Prize

The recent New York Times story "Armies of Expensive Lawyers, Replaced
by Cheaper Software" (http://www.nytimes.com/2011/03/05/science/
05legal.html) indicates that there may be visible profit to the owners
of law firms in automating verbal intelligence. I'm not well
connected with highly placed law firms but it may be worthwhile
finding out how close their owners are to seeing the value, to their
bottom lines, of competitions like the Hutter Prize. It has got to be
more visible to them than the much more popularly visible Loebner
Prize, for example.

Matt Mahoney

unread,

Mar 14, 2011, 2:27:56 PM3/14/11

to hutter...@googlegroups.com

Isn't this assuming that lawyers understand the relationship between data
compression and AI?

Maybe the Enron corpus (400 MB) would be a more appropriate data set?

But it seems to me that the real problem is that AI requires a lot more
computation than the Hutter prize allows. If you look at my large text
benchmark, the top programs are those that use the most time and most memory and
that the algorithm doesn't matter much.

1 GB is about the amount of language processed by an adult since birth. 100 MB
is the amount by a 3 year old child. I wonder what kind of language model this
provides an incentive for. In any case, the top performers on both benchmarks
are still low level models. I think that the Hutter prize is near the limit of
what software alone can achieve without more computing power.

-- Matt Mahoney, matma...@yahoo.com

James Bowery

unread,

Mar 14, 2011, 2:57:10 PM3/14/11

to hutter...@googlegroups.com

When there is this much potential profit on the line there is certainly the incentive to reach greater understanding.

It seems reasonable to have a prize including not only the Enron corpus, but the body of law then-applicable plus the entirety of Wikipedia to try to maintain consilience. Computational resources required for decompression would have to be paid for by the entrants under some reasonable cloud configuration.

On Mon, Mar 14, 2011 at 1:27 PM, Matt Mahoney <matma...@yahoo.com> wrote:

Isn't this assuming that lawyers understand the relationship between data
compression and AI?

...

Maybe the Enron corpus (400 MB) would be a more appropriate data set?

...

I think that the Hutter prize is near the limit of
what software alone can achieve without more computing power.

-- Matt Mahoney, matma...@yahoo.com

----- Original Message ----
> From: James A. Bowery <jabo...@gmail.com>
> To: Hutter Prize <hutter...@googlegroups.com>
> Sent: Mon, March 14, 2011 1:38:57 PM
> Subject: [Hutter Prize] Law Firms As Potential Underwriters
>
> The recent New York Times story "Armies of Expensive Lawyers, Replaced
> by Cheaper Software" (http://www.nytimes.com/2011/03/05/science/
> 05legal.html) indicates that there may be visible profit to the owners
> of law firms in automating verbal intelligence. I'm not well
> connected with highly placed law firms but it may be worthwhile
> finding out how close their owners are to seeing the value, to their
> bottom lines, of competitions like the Hutter Prize. It has got to be
> more visible to them than the much more popularly visible Loebner
> Prize, for example.

--
You received this message because you are subscribed to the Google Groups "Hutter Prize" group.
To post to this group, send email to hutter...@googlegroups.com.
To unsubscribe from this group, send email to hutter-prize...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/hutter-prize?hl=en.

Matt Mahoney

unread,

Mar 14, 2011, 4:53:57 PM3/14/11

to hutter...@googlegroups.com

How could we judge a contest where the entrants supply their own supercomputers?

I know there are other programming contests that accomplish this. You have to supply a result that is hard to compute but easy to check. How could this be done with text prediction or equivalent AI?

-- Matt Mahoney, matma...@yahoo.com

James Bowery

unread,

Mar 14, 2011, 5:05:25 PM3/14/11

to hutter...@googlegroups.com, Matt Mahoney

What I meant by "under some reasonable cloud configuration" is not that the entrants would own the cloud upon which the decompression runs, but that the computation by their algorithm would be within a cloud infrastructure they pay for. but do not control.

It is getting to be pretty standard practice nowadays to set up programs to use clouds within which they initialize new systems for additional resource as needed. The executable archives just have to write their program to that API specification. This would add an insignificant amount to the binary size given an appropriate API standard.

Matt Mahoney

unread,

Mar 14, 2011, 8:59:47 PM3/14/11

to hutter...@googlegroups.com

What about just relaxing the time and memory requirements? In a sense, the contest already has the properties of a contest where the problem is hard to solve but easy to verify. It is hard to find a short program that outputs a string, but easy to verify that it does.

I'd like to see something along the lines of http://www.mailcom.com/challenge/ where source code is allowed and all files must be packed into one of several common archive formats.

Or perhaps there is a fast proof of a decompresser that doesn't require full decompression. I can't think of such a test, however. I did try some experiments where it is possible to test a compressor using a public data set without including the decompression program. http://mattmahoney.net/dc/uiq/

Reply all

Reply to author

Forward