When speaking of pre-trained word-vectors, such as those in the `model.wv` vectors of Word2Vec or Doc2Vec, or found in some export file like the `GoogleNews` vectors, they're usually equivalent to what's in the `syn0` array. It wouldn't make sense to load those into `syn1` (for HS mode) or `syn1neg` (for negative-sampling mode).
Whether and when it might make any sense to load pre-trained word-vectors before Doc2Vec training at all remains, to me, a murky question. Doc2Vec does not require word-vectors as an input, or a 1st stage - and the modes that use word-vectors at all will make them simultaneously with doc-vectors from the training corpus.
In pure-DBOW mode (`dm=0, dbow_words=0`), loading word-vectors into `syn0` can't have any effect either way – the `syn0` values aren't used at all.
If you added skip-gram training to DBOW (`dm=0, dbow_words=1`), having prior word-vectors in `syn0`, either locked against changes or free to change with training, would have an indirect effect on the doc-vectors, because of all the skip-gram training affecting the hidden->output weights, with respect to which the doc-vectors would also then have their own adjustments calculated.
In DM mode, any pre-loaded word-vectors would be averaged with doc-vectors for every training-example, and so have the most-direct influence.
In cases where such word-vectors might have influence, my hunch would be they might speed or improve results marginally, if the new training data is thin and the prior word-vectors come from a compatible domain. But for larger datasets or datasets from a different language domain than the pre-trained vectors, my hunch would be the impact could be negligible or negative.
Separately, it might make sense to try to retain the `syn1` (in HS mode) or `syn1neg` (in negative-sampling mode) weights from a prior training session, either Word2Vec or Doc2Vec, to see if it speeds/improves a later session. But such weights are not typically saved as 'word-vectors'. (They are interpretable as one-vector-per-word in negative-sampling mode, but not cleanly in HS mode.) And they'd only be meaningful in a followup session with careful attention to ensuring vocabulary-correspondence. (In HS mode, perhaps retaining an identical vocabulary & encoding-tree. In negative-sampling mode, you could possibly synthesize a new `syn1neg` with a mix of imported vectors, for shared words, and new vectors, for novel words.) Whether you'd want this layer frozen (as in Doc2Vec inference, and possible by calling `train_document_MODE(..., learn_hidden=false, ...)` or updated with new training examples, would be an open question.
There's no existing facility for saving/loading/correlating-vocabularies with respect to the `syn1` or `syn1neg` weights – you'd have to code that up, and ensure the model remains in a self-consistent, usable state.
My hunches for that kind of re-use would be similar to those for re-use of traditional 'input' (syn0) vectors: possibly helpful for small datasets in well-matched domains, possibly wasteful or harmful if the dataset is larger or from a contrasting domain.
- Gordon