--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvecto...@googlegroups.com.
To post to this group, send email to semanti...@googlegroups.com.
Visit this group at http://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/groups/opt_out.
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticvectors/a2eca312-4266-4ec0-aafb-2825a7858703n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticvectors/CACr_39u11e0eOo_ui9G3pq_f77iZo0C8Gxp9z5xi0xq_3%2BTr9Q%40mail.gmail.com.
I've used pitt.search.lucene.IndexFlatFilePositions to process enwiki-corpus.txt, a file with 2 million sentences, so it's like having 2 million documents.I then ran pitt.search.semanticvectors.BuildIndex -minfrequency 2 -luceneindexpath positional_index -elementalmethod orthographic.Questions:Does using the option -elementalmethod orthographic help when searching sentences?
Do I need to use the searchtype option, set to proximity?
In your example, you put the query in quotes, does that cause it to represent a 'sentence'?What is the SentenceVectors.java class used for? Is it relevant for computing sentence similarity?
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticvectors/CAN2L1Vqx5mN6%3DOt%2BYZcgW9zCjuUHfZHs%3DabJc8Hfii2RFT5LBw%40mail.gmail.com.
On Fri, Oct 23, 2020 at 4:56 PM Ron King <ronc...@gmail.com> wrote:I've used pitt.search.lucene.IndexFlatFilePositions to process enwiki-corpus.txt, a file with 2 million sentences, so it's like having 2 million documents.I then ran pitt.search.semanticvectors.BuildIndex -minfrequency 2 -luceneindexpath positional_index -elementalmethod orthographic.Questions:Does using the option -elementalmethod orthographic help when searching sentences?No, this affects word vectors only - word vectors at the start of training will be similar if they represent words that are orthographically similar to one another.Do I need to use the searchtype option, set to proximity?It's been quite a long time since I looked at this part of the codebase (I'd forgotten it existed), but if memory serves the idea with the proximity search was to find documents where two terms occur close to one another. The SentenceVectors class encodes the relative position of words into the document vector representation, and the proximity search tries to leverage this encoding by inferring the distance between a pair of words.In your example, you put the query in quotes, does that cause it to represent a 'sentence'?
To view this discussion on the web visit https://groups.google.com/d/msgid/semanticvectors/CAFkrV%3DVV1x1v4hHfCBY4SOU%2BUy%2B8HNnS16uDwpeqYhUNj85RBQ%40mail.gmail.com.