This is expected behavior, per the Q11 in the FAQ. Every model run creates a new "space". Vectors from one run/space are not meaingfully comparable to vectors from other runs/spaces, even vectors for the exact same word.
Think of it this way:
There's no inherently-correct or best coordinates for a single word, like say 'apple', in generic 300-dimensions. There's only a *useful* place, with regard to the distances/angles to other related words, as can be learned from a corpus showing examples of all relevant words' contextual usages. And because of inherent randomness in how the algorithm run, including its perturbability by tiny changes in training-ordering or slight vocabulary/text changes, even stabs at stability like always initializing the word pre-training to the same random starting vector don't reliably force the whole run to land it in a similar place. Rather, the word 'apple', and all words related to it, will land in some 300-dimensional constellation that, from run-to-run, has similarly useful neighborhoods/directions - but potentially at very different coordinates in the giant high-hyperdimensional volume.
If you need word-vectors to be reliably comparable, they should come from the same training run, so they went through a common iterative tug-of-war with all other words. This might mean instead of churning your training texts by some fraction each run, you do a larger composite training run with all texts of the last N epochs, covering all words that need to have compatible vectors.
- Gordon