25 views

Skip to first unread message

Jan 15, 2007, 5:15:21 AM1/15/07

to Hutter Prize

Hi Ian,

This cross-posted from Creat...@googlegroups.com to comp.compression,

and Hutter Pr...@googlegroups.com. It seems relevant.

Ian Parker wrote:

>

> Either natural language is compressible or it is not. You now seem to

> be saying it is compressible but not by very much - am I right?

You can think of it as an attempt to explain why text can't be

compressed as much as we expect if you like. In particular why we can't

compress it by as much as human estimates for the entropy of text

suggest we should (e.g. Matt Mahoney on Shannon.)

Something like... that the sum of human knowledge needed to compress a

given text that much, will always be bigger than the text itself.

We can understand this if we accept that knowledge is coded in the

different ways of ordering text (or the different ways of ordering any

sufficiently complex/random set.)

It will always take more information to code all the different ways of

ordering a random set than it takes to code the set itself. (It is only

a corollary that you can't use information about the different possible

orderings of a text to compress text, because your knowledge of all the

possible orders of a text will always be bigger than the text itself.)

It is important to note that while language is statistically random

(permitting the same tokens to specify more orders), language is not

indeterminate (c.f. Adrian Barnett's rather amusing "bitmap" Theory Of

Everything (http://www.abarnett.demon.co.uk/theories.html#BITMAP.) With

language the different orders are precisely determined by the set

itself. It is just many are equally likely.

Which IMHO is the insight we have been looking for to model AI (NLP

etc.)

-Rob

Jan 15, 2007, 1:54:29 PM1/15/07

to Hutter Prize

Rob Freeman wrote:

> Hi Ian,

>

> This cross-posted from Creat...@googlegroups.com to comp.compression,

> and Hutter Pr...@googlegroups.com. It seems relevant.

>

> Ian Parker wrote:

> >

> > Either natural language is compressible or it is not. You now seem to

> > be saying it is compressible but not by very much - am I right?

>

> You can think of it as an attempt to explain why text can't be

> compressed as much as we expect if you like. In particular why we can't

> compress it by as much as human estimates for the entropy of text

> suggest we should (e.g. Matt Mahoney on Shannon.)

>

> Something like... that the sum of human knowledge needed to compress a

> given text that much, will always be bigger than the text itself.

No, the Kolmogorov complexity of a string x cannot be significantly

more than the size of x. You can always write a program to store and

print x verbatim.

> We can understand this if we accept that knowledge is coded in the

> different ways of ordering text (or the different ways of ordering any

> sufficiently complex/random set.)

>

> It will always take more information to code all the different ways of

> ordering a random set than it takes to code the set itself. (It is only

> a corollary that you can't use information about the different possible

> orderings of a text to compress text, because your knowledge of all the

> possible orders of a text will always be bigger than the text itself.)

No, if you have a set of k items out of a possible n, then there are (n

choose k) = n!/(k!(n-k)!) possible sets, but there are k! ways to order

the elements. It is not true that k! > (n choose k) when n is much

larger than k.

>

> It is important to note that while language is statistically random

> (permitting the same tokens to specify more orders), language is not

> indeterminate (c.f. Adrian Barnett's rather amusing "bitmap" Theory Of

> Everything (http://www.abarnett.demon.co.uk/theories.html#BITMAP.) With

> language the different orders are precisely determined by the set

> itself. It is just many are equally likely.

I like the idea of launching satellites by building a giant ramp into

space, driving them up by truck and pushing them over the edge. :-)

> Which IMHO is the insight we have been looking for to model AI (NLP

> etc.)

>

> -Rob

I didn't see any great insights there. Ian's compression algorithm at

http://ipcomp.blogspot.com/ looks like an attempt at random data

compression.

-- Matt Mahoney

Jan 15, 2007, 11:12:14 PM1/15/07

to Hutter Prize

Matt Mahoney wrote:

> Rob Freeman wrote:

> >

> > ...the sum of human knowledge needed to compress a

> > given text that much, will always be bigger than the text itself.

>

> No, the Kolmogorov complexity of a string x cannot be significantly

> more than the size of x. You can always write a program to store and

> print x verbatim.

> Rob Freeman wrote:

> >

> > ...the sum of human knowledge needed to compress a

> > given text that much, will always be bigger than the text itself.

>

> No, the Kolmogorov complexity of a string x cannot be significantly

> more than the size of x. You can always write a program to store and

> print x verbatim.

You misunderstand me Matt. I am talking about attempts to compress text

using "human knowledge".

I define "human knowledge" to be different ways of ordering a system

(i.e. distributions of relations.) The information needed to specify

different ways of ordering a (random) system will always be bigger than

the information needed to specify the system itself.

If you choose to define "human knowledge" another way you will get

another result. But on the basis of past discussions I think you agree

that distributions of relations are the best model.

> > We can understand this if we accept that knowledge is coded in the

> > different ways of ordering text (or the different ways of ordering any

> > sufficiently complex/random set.)

> >

> > It will always take more information to code all the different ways of

> > ordering a random set than it takes to code the set itself. (It is only

> > a corollary that you can't use information about the different possible

> > orderings of a text to compress text, because your knowledge of all the

> > possible orders of a text will always be bigger than the text itself.)

>

> No, if you have a set of k items out of a possible n, then there are (n

> choose k) = n!/(k!(n-k)!) possible sets, but there are k! ways to order

> the elements. It is not true that k! > (n choose k) when n is much

> larger than k.

You misunderstand me again. My point is that "n choose k" is always

much larger than n (or k! is always >> k.)

> > It is important to note that while language is statistically random

> > (permitting the same tokens to specify more orders), language is not

> > indeterminate (c.f. Adrian Barnett's rather amusing "bitmap" Theory Of

> > Everything (http://www.abarnett.demon.co.uk/theories.html#BITMAP.) With

> > language the different orders are precisely determined by the set

> > itself. It is just many are equally likely.

>

> I like the idea of launching satellites by building a giant ramp into

> space, driving them up by truck and pushing them over the edge. :-)

The guy is inspired :-)

> > Which IMHO is the insight we have been looking for to model AI (NLP

> > etc.)

>

> I didn't see any great insights there. Ian's compression algorithm at

> http://ipcomp.blogspot.com/ looks like an attempt at random data

> compression.

I thought you analyzed Ian's algorithm well. It seems to come down to a

re-invention of compression based on first order probability

distributions. All credit to Ian for rediscovering that from first

principles.

But Ian's algorithm was not the insight I was talking about.

I think the insight we have been looking for to model AI/NLP is that

the information needed to code different ways of ordering a system

(knowledge) is always greater than the information to code the system

itself (for a random system.)

In the context of AI/NLP it is important to note that random need not

mean indeterminate. I hope I demonstrated this in our earlier thread on

comp.compression. In the case of language relational distributions can

precisely determine a syntax which nevertheless has random statistics.

To be clear I am talking once again about our toy example:

"AX...DX...DB...AZ...YZ...YC...A_". Here different relational

distributions (A,D), (A,Y) (corresponding to "knowledge": in some ways

"A is close to D but not Y" but in other ways "A is close to Y but not

D"?) deterministically specify syntax, but still produce a random

language model A_ -> AB/AC (probably because the distributions are

contradictory.)

These distributions specify the system, but it is more expensive to

enumerate all the different possible relational distributions than it

is to enumerate the system itself. (So they can't be used to compress

it.)

The way to model NLP/AI is to specify the underlying random (but

determinate) system (e.g. text), and generate different possible

distributions over that system to model knowledge, or limit syntax (the

same thing), as required.

-Rob

Jan 16, 2007, 12:01:21 AM1/16/07

to Hutter...@googlegroups.com

On 1/15/07, **Rob Freeman** <gro...@chaoticlanguage.com> wrote:

I define "human knowledge" to be different ways of ordering a system

(i.e. distributions of relations.) The information needed to specify

different ways of ordering a (random) system will always be bigger than

the information needed to specify the system itself.

When you talk about "different ways of ordering" you are really talking about the equivalence class of strings under a grammar.

Take a very simple grammar where two strings are equivalent:

X == "not not "X

We can see from this that you can index this infinite equivalence class with an integer i that places 2i "not " strings in front of X. But not all members of this equivalence class are equally likely. Therefore arithmetic coding can be used for the index.

What you're claiming is that the expression of knowledge being compressed has an index pretty close to 0 -- in other words, there isn't much you can do to simplify the expression.

I think most of us who believe the Hutter Prize will result in breakthroughs (AI, epistemological, scientific, philosophical, social, political, economic, etc.) believe that there is a _lot_ more to be done to the knowledge expressed in Wikipedia for it to have an index anywhere near 0.

Jan 16, 2007, 7:17:50 AM1/16/07

to Hutter Prize

I have no objections to cross posting under the new moderation scheme.

Hutter is in fact completely unmoderated.

It may well be that natural language is less compressible than Matt and

Co imagine. However it certainly is compressible. Obviously without a

definte benchmark you cannot tell how compressible.

Hutter is in fact completely unmoderated.

It may well be that natural language is less compressible than Matt and

Co imagine. However it certainly is compressible. Obviously without a

definte benchmark you cannot tell how compressible.

I would like to make a number of remarks.

1) Rrimavera, Ressorte, Mamanthal. We know that English can be

translated into Spanish using a human translator. It would be hard for

me to concede therefore that techniques do not exist to eliminate at

least 2/3 of words using context. Primavera etc. implies log(2)3 bits

per word (at least).

2) This is a much more mathematical point. We know that the present

prize-winner has compressed to 17MB. If we allocate a number to a word

there must be a technique (we will ignore Kolmagorov for the moment)

which will compress our sequence to at most 17MB. We can at this stage

view punctuation marks as words. Algorithms like 7-zip take raw text

and compress using fragments as does the current prize-winner. Now if a

sequence is random to the extent that the probability of a word

occurring is simply the mean frequency of that word there cannot be any

better method than factorials, the Gosper approximation giving the

minimum compressed size. Now if V is our vector of words VP is going to

represent a fragment representation. We can get to fragments by a

simple matrix multiplication. VP = 17MB compressed. A matrix X can (and

has been) found which will give us VPX. The probability decomposition

of VPX MUST be 17MB. Of course multiplying one matrix by another simply

produces another matrix. Why not take the bull by the horns, apply LSA

and produce the 17MB minus matrix? The answer, of course, is

Kolmogorov. If I were not concerned with Hutter, but were simply

concerned with GALE and translating languages I would not care a damn

about Kolmogorov but would simply demand a larger training set.

This raises yet another point. We can get an initial semantic

assessment by simply associating fragments to words. This tends to be

what is done in assessing relationships between different languages,

cerradura does indeed have a very similar fragment structure to serrure

(French) arana, arraigne - Papers please! If we were to take the

prize-winning program and construct a matrix P linking it to a word

list. This could be a springboard to further compression.

- Ian Parker

Jan 16, 2007, 5:42:03 AM1/16/07

to Hutter Prize

James Bowery wrote:

> On 1/15/07, Rob Freeman <gro...@chaoticlanguage.com> wrote:

> >

> > I define "human knowledge" to be different ways of ordering a system

> > (i.e. distributions of relations.) The information needed to specify

> > different ways of ordering a (random) system will always be bigger than

> > the information needed to specify the system itself.

>

> When you talk about "different ways of ordering" you are really talking

> about the equivalence class of strings under a grammar.

>

> ...> On 1/15/07, Rob Freeman <gro...@chaoticlanguage.com> wrote:

> >

> > I define "human knowledge" to be different ways of ordering a system

> > (i.e. distributions of relations.) The information needed to specify

> > different ways of ordering a (random) system will always be bigger than

> > the information needed to specify the system itself.

>

> When you talk about "different ways of ordering" you are really talking

> about the equivalence class of strings under a grammar.

>

>

> X == "not not "X

If I understand you correctly James, I do not mean "the equivalence

class of strings under a grammar" when I say "different ways of

ordering." I don't think the assumption of a grammar is justified at

all.

Why do you think the assumption of a grammar is justified?

I'm considering equivalence classes yes, but not classes under a

grammar. You could say I let the classes govern the grammar. In some

cases the two come to the same thing, but they are not the same.

I don't think there is any reason to assume that natural language

limits itself to paradigmatic relationships of the kind X == "not not

"X. I would go as far as to say it is demonstrably not the case.

A case in point. My example. For a language

"AX...DX...DB...AZ...YZ...YC...A_" you have two paradigmatic classes

(A,D) and (A,Y). You cannot reduce those to a grammar of the kind X ==

"not not "X. Is there any mapping (A,D) == "not not "(A,Y)?

For any real language there are many paradigmatic classes of this sort.

Just look at how many you can get.

Calculate them out for yourself. Expand my example. If A_1, A_2,... A_n

are the contexts of A in some text, and X_1, X_2,...X_n are contexts of

other tokens, then the number of ways A can have common contexts with

other tokens in the text, and thus uniquely specify some new

paradigmatic class, are just Matt's "(n choose k) = n!/(k!(n-k)!)

possible sets", where k is the number of common contexts between A and

some other token.

The syntagmatic distribution of sequences AX_? specified by these

classes in the text can be random, because many different paradigmatic

distributions (A_i,...A_i+k) can be equally likely (and must be,

because many of the "n choose k" possible classes will overlap, and

thus form complementary distributions with each other??) But the

relationship between any given syntax and its corresponding

paradigmatic distribution is not random. And the different paradigmatic

distributions (knowledge?) governing that random syntax are not random

either, just very numerous. Much more numerous than the sequence of 2n

or so tokens needed to specify them.

I don't know if any of that makes sense. Perhaps I have misunderstood

your objection.

-Rob

Jan 17, 2007, 9:03:45 AM1/17/07

to Hutter Prize

Rob Freeman wrote:

> If I understand you correctly James, I do not mean "the equivalence

> class of strings under a grammar" when I say "different ways of

> ordering." I don't think the assumption of a grammar is justified at

> all.

>

> Why do you think the assumption of a grammar is justified?

> If I understand you correctly James, I do not mean "the equivalence

> class of strings under a grammar" when I say "different ways of

> ordering." I don't think the assumption of a grammar is justified at

> all.

>

> Why do you think the assumption of a grammar is justified?

Because you can think of a compressed file as a grammar specification

which has a single well formed string:

The original file.

This is most obvious in the case of simple hierarchical grammars like

those generated by Nevill-Manning's Sequitur compressor.

Jan 17, 2007, 10:43:01 AM1/17/07

to Hutter Prize

jabo...@gmail.com wrote:

> Rob Freeman wrote:

> > If I understand you correctly James, I do not mean "the equivalence

> > class of strings under a grammar" when I say "different ways of

> > ordering." I don't think the assumption of a grammar is justified at

> > all.

> >

> > Why do you think the assumption of a grammar is justified?

>

> Because you can think of a compressed file as a grammar specification

> which has a single well formed string

> Rob Freeman wrote:

> > If I understand you correctly James, I do not mean "the equivalence

> > class of strings under a grammar" when I say "different ways of

> > ordering." I don't think the assumption of a grammar is justified at

> > all.

> >

> > Why do you think the assumption of a grammar is justified?

>

> Because you can think of a compressed file as a grammar specification

> which has a single well formed string

So you feel justified in assuming grammar to make compression possible,

on the grounds that if compression is possible you will have a grammar?

Surely your argument is circular.

As a case in point, what are the equivalence classes which compress my

example string: "AX...DX...DB...AZ...YZ...YC"?

In particular how do you capture the information that "in some ways A

is close to D but not Y, but in other ways A is close to Y but not D"?

-R

Jan 17, 2007, 11:38:32 AM1/17/07

to Hutter Prize

Rob Freeman wrote:

> So you feel justified in assuming grammar to make compression possible,

> on the grounds that if compression is possible you will have a grammar?

>

> Surely your argument is circular.

> So you feel justified in assuming grammar to make compression possible,

> on the grounds that if compression is possible you will have a grammar?

>

> Surely your argument is circular.

No I was merely using grammar as an example of a knowledge

representation language and then you said it isn't powerful enough to

compress natural language and then I said that there is an existence

proof that it is: Sequitur with its hierarchical grammar output.

> As a case in point, what are the equivalence classes which compress my

> example string: "AX...DX...DB...AZ...YZ...YC"?

>

> In particular how do you capture the information that "in some ways A

> is close to D but not Y, but in other ways A is close to Y but not D"?

Sorry, but you're not making sense to me. Find some better way of

defining "information" aka: algorithmic information theory, and we can

continue this discussion here. Otherwise, keep it in comp.compression.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu