problem with Random Indexing

5 views
Skip to first unread message

Luca Piras

unread,
Mar 10, 2016, 11:06:32 AM3/10/16
to s-space-re...@googlegroups.com
Good evening.

I'm a developer, doing a resarch work on Random Indexing.
I'm trying to use the RandomIndexing class in order to process a set of documents and get a vector for each word that appears in them.

After calling the processDocument method for each document in my dataset, I call the processSpace method on the RandomIndexing object, using a java.util.Properties object as parameter.

The fuction I wrote is as follows:

public RandomIndexing trainRandomIndexing(Collection<String> texts) throws Exception
{
RandomIndexing ri = new RandomIndexing();
for(String t : texts)
{
StringReader reader = new StringReader(t);
ri.processDocument(new BufferedReader(reader));
}
    Properties properties = new Properties();
properties.setProperty("WINDOW_SIZE_PROPERTY", "2");
properties.setProperty("USE_PERMUTATIONS_PROPERTY", "false");
properties.setProperty("VECTOR_LENGTH_PROPERTY", "500");
ri.processSpace(properties);
return ri;
}
However, when I get the vector corresponding to a word, like this:
Vector v = ri.getVector("some_word");
the result is a vector of 0s, 1s and -1s (precisely, the idex vector for that word, I believe).


Moreover, the documentation for the processSpace method in the RandomIndexing class reads "Does nothing."

I would like to know what I should do in order to get the context vectors with the properly processed values.

Thank you very much in advance.
Have a nice day,
Luca


David Jurgens

unread,
Mar 10, 2016, 12:27:27 PM3/10/16
to s-space-re...@googlegroups.com

Hi Luca,

  Random Indexing is an on-line algorithm, which doesn't perform any kind of matrix operation like LSA or any of the deep learning algorithm, so the processSpace() method should "do nothing" and you don't have to call it.  The getVector() method you're referring to is correctly returning the vector for the word (i.e., the summed values from it being used in context), not the index vector, which you would need to use ri.getWordToIndexVector().get("some_word") to obtain.

  The properties you're setting need to be specified in the constructor to take effect during the processDocument method, so the code should be

public RandomIndexing trainRandomIndexing(Collection<String> texts) throws Exception
{
    Properties properties = new Properties();

properties.setProperty("WINDOW_SIZE_PROPERTY", "2");
properties.setProperty("USE_PERMUTATIONS_PROPERTY", "false");
properties.setProperty("VECTOR_LENGTH_PROPERTY", "500");
    RandomIndexing ri = new RandomIndexing(properties);

for(String t : texts)
{
StringReader reader = new StringReader(t);
ri.processDocument(new BufferedReader(reader));
}
    return ri;
} 
If you're only seeing a few dimensions that are non-zero in the getVector() return value, it could just be that the word does not occur many times in the corpus. You might check with a very common word to see whether it is returning what is expected.

Let me know if you continue to see issues with this however.

  Thanks,
  David

--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.
To post to this group, send email to s-space-re...@googlegroups.com.
Visit this group at https://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/d/optout.

Lu P.

unread,
Mar 15, 2016, 12:54:08 PM3/15/16
to s-space-re...@googlegroups.com
Good evening,

I'm forwarding to you an email I sent you last Thursday about your project on Random Indexing.

I would really appreciate your help.
Thank you for your attention.

Have a nice day,
Luca

Reply all
Reply to author
Forward
0 new messages