I'm currently use gensim to reproduce the result of example of Google provide. here The problem is the accuracy test of gensim doesn't match with Google's result. For example, the accuracy of capital-common-countries of Google is 82.02%, the best result of gensim of different parameter sets is 64.4%. There is a big gap here. Here is the code snippet of train word2vec and accuracy by using gensim Code snippet of Google's demo without changes any parameters
Accuracy comparison detail Does anyone could help on this? (I'm also post this question on stackoverflow) |
sentences = Text8Corpus(".\text8")
model = Word2Vec(sentences, size=200, sg=0, window=8, alpha=0.05, min_count=5, workers=12, iter=15, cbow_mean=1, hs=0, negative=25)model.accuracy(".\questions-words.txt")