Train against word frequency

22 views
Skip to first unread message

siddharth iyer

unread,
May 24, 2017, 9:59:34 AM5/24/17
to gensim
In Skip Gram algorithm it doesn't matter how many times a word occurs in the context, if it occurs more than 0 times we train against it as 1. I want to train against the frequency, is that possible?

Gordon Mohr

unread,
May 24, 2017, 11:17:54 AM5/24/17
to gensim
That's not quite true. A word is trained for each of its occurrences. For example if you have the context like the following, with 5 words on either side of the target word [meow]...

    the black cat gave a [meow] louder than the white cat

...then the skip-gram training example "with input word 'cat', try to predict 'meow'" will be presented to the network twice. And conversely, when the 1st 'cat' is the target word, 'meow' will be one of the trained inputs, then when the 2nd 'cat' is the target word, 'meow' will again be a trained input for target 'cat'.

(Similarly, if using negative-sampling, more frequent-words will more often be used to create the synthetic negative examples. So if 'cat' appears more often than 'meow', when some other word like 'toaster' is encountered, a synthetic negative-example pairing like "with input word 'toaster', try NOT to predict 'cat'" will be used more often than a similar negative-example with the target word 'meow'.)

To do other sorts of weighting isn't supported by the existing code, but as with anything else, could be patched in with enough effort.

- Gordon

siddharth iyer

unread,
May 24, 2017, 12:21:02 PM5/24/17
to gensim
I think I wasn't able to express myself before. Take this sentence "I love cats and cats are cute." If I want to predict the context words given the word "and". I want the expected output of all words  to be 1/7 except cats which should be 2/7 since it appears twice. I think what actually happens is, every word in the context is given a 1 and all others are 0. Were you referring to the same problem in the previous post? If so, I didn't understand it, sorry.

Gordon Mohr

unread,
May 24, 2017, 12:49:46 PM5/24/17
to gensim
Even though skip-gram word2vec goes-through-the-motions of predicting words, as a way to train up useful vectors, that's not it's primary application. Many word2vec implementations don't offer any interface for making predictions. (Gensim only got a method for this recently, it only works from some modes, and it's very slow.)

If what you really want to do is predict words, you may just want to build giant frequency tables, and those could record that 'cats' appears twice as often as the other words in the neighborhood of 'and'. (In the scikit-learn world, the class CountVectorizer for turning texts into co-occurrence vectors has a parameter 'binary' which controls whether you want full counts, or just 1 for at-least-1 and 0 for none.)

But it is also the case that in word2vec skip-gram training, the two occurrences of 'cats' mean the 'and'->'cats' pair is trained twice, and thus the (usually ignored except in training) neural-network prediction for 'and'->'cats', after training, is likely to be stronger than the predictions for the other words. So multiple occurrences in the same context are relevant. 

But you still probably don't want to use word2vec as a mere word-predictor.

- Gordon

siddharth iyer

unread,
May 24, 2017, 1:13:14 PM5/24/17
to gensim
I am not using gensim for prediction. I want to use it for embedding some information onto a vector space. For this project I am keeping the context window as maximum. I have lot of repetitions of certain objects in the sentence, and since the context window is the size of the sentence itself, Skip gram doesn't take into account the number of times an object has occurred. It only cares about it's occurrence. It puts a 1 if it occurs and a 0 if it doesn't. 

That's why I want to train against the _frequency_ of an object occurring in a context rather than 1 if it occurs and 0 if it doesn't. Is that possible in gensim?

Gordon Mohr

unread,
May 24, 2017, 1:53:22 PM5/24/17
to gensim
Your assumption about how skip-gram works is not quite correct. When a word appears more than once in a context, it's trained as more than one example in the skip-gram model, which means multiple occurrences already have more influence than a single occurrence. Higher frequencies affect the results in that way, just not through explicit counting. 

Why do you think you need a variation from the usual definition of word2vec skip-gram?

There's no option in gensim to parameterize ski-gram's behavior without patching the code. (But, another way to overweight certain words could be to preprocess your text to insert extra word occurrences. In either approach, though, I'm unsure what benefit that might achieve over the 'natural' weighting of more-frequent neighbors that already occurs.)

- Gordon

siddharth iyer

unread,
May 24, 2017, 1:57:17 PM5/24/17
to gensim
You are probably right, thanks for your help. 
Reply all
Reply to author
Forward
0 new messages