After CompareTerms, can you retrieve related documents?

18 views
Skip to first unread message

Heidi McClure

unread,
Dec 8, 2011, 11:57:59 AM12/8/11
to semanti...@googlegroups.com
(Hopefully this will go through to the list, now - tried to send
yesterday but was sending from a non-registered email addr - trying
again from gmail...)

Thanks for this great set of code!

I have written some code that allows me to compare phrases using
CompareTerms - I am able to get scores that help me to understand if
two phrases are related based on the termvector.  Now, once I have a
pair of phrases, I would like to gather all the documents that are
related to the two phrases.  Is there an easy way to do this?  That
is, is there an API like CompareTerms for retrieving the related
documents when you have a phrase?  When I say phrase, I'm talking
about a multi-term phrase - 2-4 words, usually.

Thinking about this a bit more, I think perhaps a better solution is
to identify the phrases I'm interested in BEFORE doing the Lucene and
SV indexing.  I think this was suggested in a recent thread.  Is this
a better approach when trying to correlate multi-word phrases?

thanks!

-heidi

heidi

unread,
Dec 8, 2011, 5:51:11 PM12/8/11
to Semantic Vectors
In trying to answer my own questions, I modified my corpus of data so
that the phrases I was interested in had an underscore instead of
spaces. Then I indexed with Lucene and built my vectors with SV.
Now, when I get two phrases that are related, I can superpose the two
vectors so I get one vector that identifies all possibly related
entries in the vector.

Now - my next question - given a term vector, is there a way to know
which documents are significant? Do the columns mean anything? Do
the columns refer to a document? Or does SV obscure the column
meanings?

Any help or guidance appreciated!
thanks,
-heidi


Dominic

unread,
Dec 10, 2011, 11:54:31 AM12/10/11
to Semantic Vectors
Hi Heidi,

Have you seen the page on document search at
http://code.google.com/p/semanticvectors/wiki/DocumentSearch? There
are some options here that might get you started, though much of this
space remains relatively unexplored as far as I know.

Apologies for being slow to reply at the moment, my family is in the
middle of moving house and there is way too much to organize :-(

Best wishes,
Dominic

Heidi McClure

unread,
Dec 11, 2011, 9:12:56 AM12/11/11
to Semantic Vectors
thanks!  I'll look there. Am I correct that with SV the columns in the
vector files may refer to some random document or documents?  And even
using LSA to build vectors, the columns don't refer to specific
documents?
Good luck with your move!-heidi

> --
> You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
> To post to this group, send email to semanti...@googlegroups.com.
> To unsubscribe from this group, send email to semanticvecto...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/semanticvectors?hl=en.
>

Reply all
Reply to author
Forward
0 new messages