Random Indexing

Teodor Dimov

unread,

Oct 5, 2016, 12:29:13 PM10/5/16

to Semantic Vectors

Hi passing 1 as how many passes for the Reflective Random Indexing means its just Random Indexing right?

Dominic Widdows

unread,

Oct 5, 2016, 12:33:57 PM10/5/16

to semanti...@googlegroups.com

It varies a lot depending on what dataset you're using and what problem you're trying to solve. In this article we found that 2 cycles was good, starting with terms, so 2 or 3 if you start with documents. http://www.sciencedirect.com/science/article/pii/S1532046409001208.

Best wishes,

Dominic

On Wed, Oct 5, 2016 at 7:52 AM, Teodor Dimov <teod...@gmail.com> wrote:

Hi passing 1 as how many passes for the Reflective Random Indexing means its just Random Indexing right?

--
You received this message because you are subscribed to the Google Groups "Semantic Vectors" group.
To unsubscribe from this group and stop receiving emails from it, send an email to semanticvectors+unsubscribe@googlegroups.com.
To post to this group, send email to semanticvectors@googlegroups.com.
Visit this group at https://groups.google.com/group/semanticvectors.
For more options, visit https://groups.google.com/d/optout.

Teodor Dimov

unread,

Oct 5, 2016, 2:55:52 PM10/5/16

to Semantic Vectors

Ok thanks Dominic. I am looking at semantic vectors to create pairwise phrases for autocomplete recommendation and then mix them with actual logs of searches to enrich suggestion dataset. i will try with two but my concerns are that the data set is small to begin with (only few hundred to 1-2k words per document) and using RRI will make them all similar (i think increasing passes will do that). Any thoughts?

Dominic Widdows

unread,

Oct 5, 2016, 2:58:17 PM10/5/16

to semanti...@googlegroups.com

My main thoughts for such a small corpus is to try various options and see what results you get. This includes trying out positional indexes.

-Dominic

On Wed, Oct 5, 2016 at 9:54 AM, Teodor Dimov <teod...@gmail.com> wrote:

Ok thanks Dominic. I am looking at semantic vectors to create pairwise phrases for autocomplete recommendation and then mix them with actual logs of searches to enrich suggestion dataset. i will try with two but my concerns are that the data set is small to begin with (only few hundred to 1-2k words per document) and using RRI will make them all similar (i think increasing passes will do that). Any thoughts?

--

Reply all

Reply to author

Forward