Central word and context

刘士渤

unread,

Jul 15, 2024, 7:25:01 PM7/15/24

to Gensim

Excuse me，if I want the window to see the whole document and set it large enough，will the gensim choose every word to be the central word and other words to be context？

Gordon Mohr

unread,

Jul 16, 2024, 2:16:43 PM7/16/24

to Gensim

See the documentation for the parameters `window` & `shrink_window` for ways to arrange for a much-larger (including full-length-of-text) window where all words have equal weight.

In particular, if `window` is larger than your largest text (and the `Word2Vec`/`Doc2Vec`/`FastText` models only support 10,000-token texts), every word could be considered for every context.

And, if `shrink_windows` is turned off (set to `0` or `False`), then the default technique of always using some random smaller-than-`window` window each context, as an efficient way to weight nearer words more, will be turned off – meaning every word within `window` token-positions will be considered an equal part of the context.

Note that such large & never-shrunk windows will result in relatively-longer runtimes.

- Gordon

刘士渤

unread,

Jul 29, 2024, 2:05:29 PM7/29/24

to Gensim

Thank you very much，by the way what optimizer does gensim.models.Word2Vec use, is it SGD?

Gordon Mohr

unread,

Aug 3, 2024, 9:33:03 PM8/3/24

to Gensim

Yes, the training loop of Gensim's Word2Vec (closely following the original word2vec.c released by the Google researchers with the word2vec paper) is essentially SGD.

- Gordon

Reply all

Reply to author

Forward