A relevant entry in Peter Turney's weblog is here:
http://apperceptual.wordpress.com/2007/01/05/hello-world/
Join the interesting discussion there concerning the generalization of
relational similarity to subsume attribute similarity and its impact on
artificial intelligence.
He's challenged the validity of using compression as an AI benchmark.
Not exactly. I believe that analogical reasoning is a central part of
human intelligence and I believe that research on algorithms that can
handle verbal analogies will contribute to the field of AI. But I do
not believe that merely extracting verbal analogies from text will
achieve AI.
> He's challenged the validity of using compression as an AI
> benchmark.
Not exactly. I was trying to explain why compression as an AI benchmark
does not appeal to me personally. I was expressing a personal opinion,
not issuing a challenge.
...
'
LSA and LRA have much in common and it is natural to wonder whether
they can be unified. I've been trying to think of an elegant scheme
that combines them, allowing us to calculate both attributional and
relational similarity in the same framework. Patterns can subsume
words, chunks, and pairs, so one possibility is a pattern-pattern
matrix, but there are so many possible patterns that this matrix could
easily exceed the capability of today's computers, unless the
patterns are constrained in some way. I haven't yet figured out a
clean way to constrain the patterns.
More exotic ideas for unification involve some kind of multi-resolution
data structure, with chunks arranged in layers, corresponding to their
sizes. A related idea comes from Gentner's paper, Why We're So
Smart. Gentner points out that many words that at first seem to refer
to objects, on closer examination actually refer to relations. For
example, the word weapon seems to refer to an object, but whether an
object is a weapon depends on the intention of an agent towards the
object. A stone can be simply a stone, or it can be a weapon if an
agent intends to use it as such. A gun can be a weapon, or it may be
sport equipment in the hands of a sport shooter. By mapping relations
between pairs of words (e.g., instrument and aggressor) to single words
(e.g., weapon), it may be possible to unify attributional similarity
(similarity between single words) and relational similarity (similarity
between word pairs).
http://apperceptual.wordpress.com/2007/01/13/unified-latent-analysis/
My understanding of LRA from the verbal analogies test was in terms of
LSA. Thus, to solve "carpenter is to mason as wood is to ___?" is to
compute (wood + mason - carpenter) on the vector representations of
these words in a word-document matrix (or an approximated matrix
computed by SVD or a neural network as described by Gorrell).
We can replace documents with immediate context, and count how often
one word appears before or after another. A pair-pattern matrix to
relate terms like "carpenter-wood" with "builds with" seems to me to be
the product of bigram matrices, one that relates "carpenter" with
"builds with" and another that relates "builds with" with "wood".
-- Matt Mahoney
I think you are exactly right with this Matt.
You could find the "pair-pattern matrix" for "carpenter-wood" by
filtering the prior contexts of "wood" with the post contexts of
carpenter.
That is how Peter should do it.
You could project distributions representing all kinds of attributes or
relationships, all kinds of meaning in a word, from text in the same
way.
Note you can also turn these distributions around and use them to
govern syntax. Given a representation for a relationship as a
pair-pattern matrix you have a very precise specification for possible
syntax appropriate to this meaning.
The question of relevance to this group is whether you can reduce the
dimensions of these representations so we can store them all in less
space than the texts they come from. Can you cluster the pair-pattern
matrix for "carpenter-wood" with that of "build-mason" and all other
similar pairs, to find a general relation representing "instrument" or
"tool" in all its subtlety?
I think the evidence of examples like my toy
AX...DX...DB...AZ...YZ...YC illustrate that we can't. Because there are
so many clusters of contexts possible, and many of them contradict each
other.
But I think this is a good thing. It means we can use the same words to
project out many more subtle distinctions of meanings (different
distributions.)
That is why the meaning of language is so subtle and syntax so
idiosyncratic.
The goal of compressing text using meaning turns out to be wrong, but
the link between compression and meaning is not wrong. Meaning still
corresponds to a distribution on the text (similar to a compression.)
It is just this relationship turns out to expand the power of text, not
compress it.
-Rob