Re: BuildIndex command line

41 views
Skip to first unread message
Message has been deleted

Dominic Widdows

unread,
Jul 26, 2016, 2:36:33 PM7/26/16
to semanti...@googlegroups.com
On Tue, Jul 26, 2016 at 10:28 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:

I have three questions about SV that is somehow confuse  me:


1. I tried to run the command "pitt.search,semanticvectors.BuildIndex" from the terminal but got the error " Could not find or load main class.pitt.search,semanticvectors.BuildIndex".. I run it from the path of where the jar file is, I also run it from the source path of my project ( that has the class path and doesn't return the error of not finding pitt.search.semanticvectors).. Do I need to do something specific to make this command works?

Please set your classpath appropriately, using either -cp at the command line or setting a $CLASSPATH environment variable. 
 
2. Those lines:
 VectorSearcher searcher = new VectorSearcher.VectorSearcherCosine(
              searchVectorStore, searchVectorStore, luceneUtils, defaultFlagConfig, new String[] {queryTerm});
          LinkedList<SearchResult> results = searcher.getNearestNeighbors(10);


are using Cosine similarity? will all the file of terms vectors be searched? or just like Lucene behavior, only the terms that that are in common with the query will be searched, thus the fast performance of Lucene search procedure.

Cosine similarity.
 

3. Can the behavior changed when we query a statement instead of just a word? The example right now consider only the first word of the query. Can we use the class that Lucene use to search, and apply it to the term vector file instead of the typical index files?

Turns out that Java's scanner by default reads a word, but Scanner.readLine reads a line. Fixed here (but you will need to build from source to get this change).

 

Thanks

--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvecto...@googlegroups.com.
To post to this group, send email to semanti...@googlegroups.com.
Visit this group at https://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Dominic Widdows

unread,
Jul 26, 2016, 4:56:41 PM7/26/16
to semanti...@googlegroups.com
The tokenization (or rather, basic string splitting) is done in this line: https://github.com/semanticvectors/semanticvectors/commit/951a81d237d15ad2a05cf72dafdbee9623a03939#diff-a9fbd7f09877cbbbf4954262ea08d9dbR44

The diff in github isn't that clear because for some reason all the line-ending changed. So it's not as obvious as it should have been.

-Dominic

On Tue, Jul 26, 2016 at 11:44 AM, Hadeel Maryoosh <hadeel....@gmail.com> wrote:
Thanks for your answer.
I fixed the scanner to be read by a line. But The problem is that the query is not tokenized, it's searching as a one whole sentence.  Usually, Lucene tokenize the query to terms to return any documents have any one of the words in the query. Can we do that with SV here?
Reply all
Reply to author
Forward
0 new messages