Exact meaning of the parameter "window"

26 views
Skip to first unread message

이준호

unread,
Mar 6, 2022, 5:06:02 PM3/6/22
to Gensim
Hi. 
I am a junior NLP researcher, and want to compare *2vec models.
I want to fix the hyper parameters for *2vec models, but there is not a clue about the "window" size.

The guide said that
  • window (intoptional) – Maximum distance between the current and predicted word within a sentence.

As far as I understand, the meaning of the "window" value is equal to the value of "k" in the window size = 2k+1 equation.
This interpretation seems valid, given that the value of "window" does not have to be odd. In addition, hints for this could be found in doc2vec's code.

if dm and dm_concat:


self.layer1_size = (dm_tag_count + (2 * window)) * vector_size


But I can't find this part in the code of word2vec and fasttext. So, This is my questions.

1) In these two models, is it correct that the values of the "window" hyperparameters in doc2vec have the same meaning? 
2) If so, where can the evidence be found?

Please borrow your wisdom for junior researchers. :)

Gordon Mohr

unread,
Mar 7, 2022, 6:10:16 PM3/7/22
to Gensim
`window` is always the one-side count of  neighboring words that could become part of a (context)->(center-word) prediction. So a default `window=5` involves up to 10 words, 5 on each side, in predictions of a center target word. 

In skip-gram, each word is individually used as input to an attempted prediction: 10 separate 1-word-to-1-word predictions. In CBOW all words within the effective-window are combined and used as input: 10 words combined to make one 10-word-to-1-word prediction.

 Note that in both cases, the default behavior as progressing through training is to pick some effective-window – named `reduced_windows` in the source code – from 1 to the configured value, each time a center word is considered. So even with the default `window=5`, 20% of the time, the actual window used is just 1 neighbor on either side. This manages to effect a higher-weighting of nearer-neighbors by doing fewer total calculations (rather than by applying some sort of more-expensive positional scaling). 

If you haven't found the relevant word2vec & fasttext code, you may not be looking in the relevant cython code (`.pyx`) extensions. For example, here's a key `window`-sensitive loop inside the skip-gram code of `word2vec_inner.pyx`: 


The `window` (& calculated `reduced_window`) influence on other modes/models can be found by looking for similar references to `window` & `reduced_window` in `word2vec_inner.pyx`, 'foasttext_inner.pyx`, & `doc2vec_inner.pyx`. 

- Gordon

이준호

unread,
Mar 8, 2022, 5:38:16 PM3/8/22
to Gensim
Thanks for your very, very kind answer.
Thanks to your answer, I was fully understood. I don't even think about looking cython codes.

Hope you have a nice day, and also hope you don't get COVID-19 
(because I'm currently getting COVID-19 even after getting the vaccine three times. XD It's feel very bad. ).


2022년 3월 8일 화요일 오전 8시 10분 16초 UTC+9에 Gordon Mohr님이 작성:
Reply all
Reply to author
Forward
0 new messages