The reasons that these algorithms don't give the exact same results in subsequent training or inference runs on the same data are described in the project FAQ answers:
"Q11: I've trained my Word2Vec / Doc2Vec / etc model repeatedly using the exact same text corpus, but the vectors are different each time. Is there a bug or have I made a mistake? (*2vec training non-determinism)"
Generally, if your data & parameters are sufficient, each training run will result in a model that's about as capable on downstream tasks – even though the coordinate spaces are different, so words/texts are in different positions.
With regard to inference:
"Q12: I've used Doc2Vec infer_vector() on a single text, but the resulting vector is different each time. Is there a bug or have I made a mistake? (doc2vec inference non-determinism)"
Withing a single model, each inference of sufficient text/parameters/epochs should result in *similar* vectors, not identical vectors – and so substantive evaluations of the usefulness of these inferred-vectors should be stable, even as the exact coordinates jitter a bit.
If the coordinates from subsequent inferences of the same text are very different from each other, that can be a useful hint that something else is wrong. An extremely overfit or undertrained model – too little training date or very inappropriate parameters – might not give stable re-inferences. Doing too few epoch passes on the inference might cause problems that more epochs could improve. A text with very few (or no) words known to the model may get fairly arbitrary/meaningless vectors.
So: evaluate the vectors by whether they prove useful in downstream tests, and are self-similar to each other (and equally useful for downstream tasks) between runs, rather than checking for identical results.
- Gordon