Solution Using Analogy Space

akshay bhat

unread,

Nov 6, 2010, 3:27:48 AM11/6/10

to MetaOptimize Challenge [discuss]

A good way of defining semantic similarity is by using concept of
analogy,
I would suggest following paper to understand definition of similar
concepts:
Robert Speer, Catherine Havasi, and Henry Lieberman. AnalogySpace:
Reducing the Dimensionality of Common Sense Knowledge, AAAI 2008 [1]

Using a network of common sense concepts linked to each other via
simple assertions such as IsA or HasA etc and Sparse Singular Value
decomposition,
an Analogy Space can be constructed, finding similar concepts is as
simple as finding dot products between vectors in analogy space.
You can construct a simple network using co-occurrence of terms from
your data, may be you can add data from Conceptnet (common sense) +
Wordnet (lexicon) + Open Cyc (common sense) + DBpedia (Real world
entities) . + Freebase (Real world entities) .

The divisi package used to compute such similarities is open source
and can be found from http://csc.media.mit.edu/analogyspace

Though after around a million nodes one runs into problems, a single
SVD model cannot approximate correctly. I proposed a solution in which
the network is divided in to multiple communities (overlapping / non
overlapping) using community detection algorithms from field of
network analysis and complex systems.
One can then create models for each community and get significantly
better results. Note that this idea can be applied to all sorts of
networks such as Social Networks.
Another advantage is that the if a concept exists in multiple
communities, then one can get different similar terms, which represent
different meaning of that concept.
I presented this Idea at International Semantic Web Conference last
year as a poster and I am currently working on it. [2]

References:
[1] Robert Speer, Catherine Havasi, and Henry Lieberman. AnalogySpace:
Reducing the Dimensionality of Common Sense Knowledge, AAAI 2008
http://analogyspace.media.mit.edu/media/speerhavasi.pdf

[2] Akshay Bhat. Analogy Engines for The Semantic Web, ISWC 2009
http://www.akshaybhat.com/Poster.pdf

---
Akshay Bhat
Student, Cornell University
www.akshaybhat.com

Dragon Silicon

unread,

Nov 6, 2010, 4:04:20 AM11/6/10

to metaoptimize-ch...@googlegroups.com

Hi Ashkay,

Interesting approach, but if you take a single look at the corpus, you'll see that the linguistic-specific optimization hack isn't applicable to this particular dataset (as in: the actually meaningful sentences are massively outnumbered by semi-random tokens).

Of course, the real data might turn out to be much more closer to reality :)

Looking forward what you can actualize based on this

-SDr

akshay bhat

unread,

Nov 6, 2010, 4:53:41 AM11/6/10

to MetaOptimize Challenge [discuss]

There is no linguistic specific optimization hack as such, and the
method can be applied to other domains as well.

Most of the approaches will rely on creating a large co-occurrence (or
some variant) matrix, which becomes hard to model using SVD after few
million tokens.
There is code in Mahout project which can perform parallel SVD on
sparse matrices, but still it is limited by the RAM of the driving
machine.

Another way to look at it is as a reccomendation problem, and
constructing a slope one reccomendation system.

> > and can be found fromhttp://csc.media.mit.edu/analogyspace

Reply all

Reply to author

Forward