Central word and context

Skip to first unread message


Jul 15, 2024, 7:25:01 PM7/15/24
to Gensim
Excuse me,if I want the window to see the whole document and set it large enough,will the gensim choose every word to be the central word and other words to be context?

Gordon Mohr

Jul 16, 2024, 2:16:43 PM7/16/24
to Gensim
See the documentation for the parameters `window` & `shrink_window` for ways to arrange for a much-larger (including full-length-of-text) window where all words have equal weight.

In particular, if `window` is larger than your largest text (and the `Word2Vec`/`Doc2Vec`/`FastText` models only support 10,000-token texts), every word could be considered for every context.

And, if `shrink_windows` is turned off (set to `0` or `False`), then the default technique of always using some random smaller-than-`window` window each context, as an efficient way to weight nearer words more, will be turned off – meaning every word within `window` token-positions will be considered an equal part of the context. 

Note that such large & never-shrunk windows will result in relatively-longer runtimes. 

- Gordon


Jul 29, 2024, 2:05:29 PM7/29/24
to Gensim
Thank you very much,by the way what optimizer does gensim.models.Word2Vec use, is it SGD?

Gordon Mohr

Aug 3, 2024, 9:33:03 PM8/3/24
to Gensim
Yes, the training loop of Gensim's Word2Vec (closely following the original word2vec.c released by the Google researchers with the word2vec paper) is essentially SGD.

- Gordon

Reply all
Reply to author
0 new messages