For word vectors, a common evaluation (following the original word2vec paper) is to measure how ell the resulting vectors solve analogy problems. The `accuracy()` method of the gnesim Word2Vec class will check a model against a list of questions in the same format as the original researchers used (and you can grab their questions files from the original word2vec.c distribution). But note: top scores on those questions (or analogies in general) might not correlate with the best word-vectors for other purposes – so it's best to devise your own project/goal-specific evaluation methods.
Sometimes, for word- or doc- similarity tasks, this means somehow creating your own sets of three items A-B-C, where your assumption/goal is that the similarity between A and B should be larger than the similarity between A and C. For example, A & B might be known to be related via some prior method, and C is randomly chosen (and thus overwhelmingly likely to be less-related).
In the original 'Paragraph Vectors' paper (
https://arxiv.org/abs/1405.4053) section 3.3, they bootstrap such an evaluation set from an existing search engine, and want their vectors on search-result-snippets to be closer-to-each-other for search-results that co-appear from the existing system (versus to other random documents). In the "Document Embedding with Paragraph Vectors" paper (
http://arxiv.org/abs/1507.07998), the existing 'category' system of Wikipedia or 'subject' labeling of Arxiv articles are used to hint which pairs-of-documents should be 'closer' (versus other randomly-selected documents).
- Gordon