Join the interesting discussion there concerning the generalization of relational similarity to subsume attribute similarity and its impact on artificial intelligence.
He's challenged the validity of using compression as an AI benchmark.
> Dr. Turney's work is in extracting verbal analogies from text in > order to achieve AI.
Not exactly. I believe that analogical reasoning is a central part of human intelligence and I believe that research on algorithms that can handle verbal analogies will contribute to the field of AI. But I do not believe that merely extracting verbal analogies from text will achieve AI.
> He's challenged the validity of using compression as an AI > benchmark.
Not exactly. I was trying to explain why compression as an AI benchmark does not appeal to me personally. I was expressing a personal opinion, not issuing a challenge.
He has a new blog entry providing details of the data structures used in finding relational similarity as well as thoughts on unifying those with the data structures used in finding attribute similarity:
... ' LSA and LRA have much in common and it is natural to wonder whether they can be unified. I've been trying to think of an elegant scheme that combines them, allowing us to calculate both attributional and relational similarity in the same framework. Patterns can subsume words, chunks, and pairs, so one possibility is a pattern-pattern matrix, but there are so many possible patterns that this matrix could easily exceed the capability of today's computers, unless the patterns are constrained in some way. I haven't yet figured out a clean way to constrain the patterns.
More exotic ideas for unification involve some kind of multi-resolution data structure, with chunks arranged in layers, corresponding to their sizes. A related idea comes from Gentner's paper, Why We're So Smart. Gentner points out that many words that at first seem to refer to objects, on closer examination actually refer to relations. For example, the word weapon seems to refer to an object, but whether an object is a weapon depends on the intention of an agent towards the object. A stone can be simply a stone, or it can be a weapon if an agent intends to use it as such. A gun can be a weapon, or it may be sport equipment in the hands of a sport shooter. By mapping relations between pairs of words (e.g., instrument and aggressor) to single words (e.g., weapon), it may be possible to unify attributional similarity (similarity between single words) and relational similarity (similarity between word pairs).
jabow...@gmail.com wrote: > He has a new blog entry providing details of the data structures used > in finding relational similarity as well as thoughts on unifying those > with the data structures used in finding attribute similarity:
> ... > ' > LSA and LRA have much in common and it is natural to wonder whether > they can be unified. I've been trying to think of an elegant scheme > that combines them, allowing us to calculate both attributional and > relational similarity in the same framework. Patterns can subsume > words, chunks, and pairs, so one possibility is a pattern-pattern > matrix, but there are so many possible patterns that this matrix could > easily exceed the capability of today's computers, unless the > patterns are constrained in some way. I haven't yet figured out a > clean way to constrain the patterns.
> More exotic ideas for unification involve some kind of multi-resolution > data structure, with chunks arranged in layers, corresponding to their > sizes. A related idea comes from Gentner's paper, Why We're So > Smart. Gentner points out that many words that at first seem to refer > to objects, on closer examination actually refer to relations. For > example, the word weapon seems to refer to an object, but whether an > object is a weapon depends on the intention of an agent towards the > object. A stone can be simply a stone, or it can be a weapon if an > agent intends to use it as such. A gun can be a weapon, or it may be > sport equipment in the hands of a sport shooter. By mapping relations > between pairs of words (e.g., instrument and aggressor) to single words > (e.g., weapon), it may be possible to unify attributional similarity > (similarity between single words) and relational similarity (similarity > between word pairs).
My understanding of LRA from the verbal analogies test was in terms of LSA. Thus, to solve "carpenter is to mason as wood is to ___?" is to compute (wood + mason - carpenter) on the vector representations of these words in a word-document matrix (or an approximated matrix computed by SVD or a neural network as described by Gorrell).
We can replace documents with immediate context, and count how often one word appears before or after another. A pair-pattern matrix to relate terms like "carpenter-wood" with "builds with" seems to me to be the product of bigram matrices, one that relates "carpenter" with "builds with" and another that relates "builds with" with "wood".
Matt Mahoney wrote: > jabow...@gmail.com wrote: > > He has a new blog entry providing details of the data structures used > > in finding relational similarity as well as thoughts on unifying those > > with the data structures used in finding attribute similarity:
> > ...
> > LSA and LRA have much in common and it is natural to wonder whether > > they can be unified. I've been trying to think of an elegant scheme > > that combines them, allowing us to calculate both attributional and > > relational similarity in the same framework.
> My understanding of LRA from the verbal analogies test was in terms of > LSA. Thus, to solve "carpenter is to mason as wood is to ___?" is to > compute (wood + mason - carpenter) on the vector representations of > these words in a word-document matrix (or an approximated matrix > computed by SVD or a neural network as described by Gorrell).
> We can replace documents with immediate context, and count how often > one word appears before or after another. A pair-pattern matrix to > relate terms like "carpenter-wood" with "builds with" seems to me to be > the product of bigram matrices, one that relates "carpenter" with > "builds with" and another that relates "builds with" with "wood".
I think you are exactly right with this Matt.
You could find the "pair-pattern matrix" for "carpenter-wood" by filtering the prior contexts of "wood" with the post contexts of carpenter.
That is how Peter should do it.
You could project distributions representing all kinds of attributes or relationships, all kinds of meaning in a word, from text in the same way.
Note you can also turn these distributions around and use them to govern syntax. Given a representation for a relationship as a pair-pattern matrix you have a very precise specification for possible syntax appropriate to this meaning.
The question of relevance to this group is whether you can reduce the dimensions of these representations so we can store them all in less space than the texts they come from. Can you cluster the pair-pattern matrix for "carpenter-wood" with that of "build-mason" and all other similar pairs, to find a general relation representing "instrument" or "tool" in all its subtlety?
I think the evidence of examples like my toy AX...DX...DB...AZ...YZ...YC illustrate that we can't. Because there are so many clusters of contexts possible, and many of them contradict each other.
But I think this is a good thing. It means we can use the same words to project out many more subtle distinctions of meanings (different distributions.)
That is why the meaning of language is so subtle and syntax so idiosyncratic.
The goal of compressing text using meaning turns out to be wrong, but the link between compression and meaning is not wrong. Meaning still corresponds to a distribution on the text (similar to a compression.) It is just this relationship turns out to expand the power of text, not compress it.