For this github tutorial: gensim/docs/notebooks/doc2vec-lee.ipynb
I have copied the code verabatim and I have been unable to reproduce anything near the 96% accuracy rate.
I am using gensim 0.13.4 on jupyter 4.3.1 notebook. I am using Anaconda Navigator.
In the tutorial for the assessment of the model :
In [12 ]collections.Counter(ranks)
Out[12] Counter({0: 292, 1: 8}) <-- Tutorial got
I am getting Counter({0: 31,
1: 24,
2: 16,
3: 19,
4: 16,
5: 8,
6: 8,
7: 10,
8: 7,
9: 10,
10: 12,
11: 12,
12: 5,
13: 9,
...
What could possibly be the problem. I am just copy pasting?
Thanks
Wall time: 12.5 s
Counter({0: 292, 1: 8})
Wall time: 12 s
Counter({0: 291, 1: 9})
Wall time: 16.4 s
Counter({0: 290, 1: 10})
Wall time: 20.6 s
Counter({0: 295, 1: 5})
Wall time: 21.3 s
Counter({0: 292, 1: 8})
Wall time: 20.6 s
Counter({0: 292, 1: 8})
Wall time: 16.7 s
Counter({0: 296, 1: 4})
Wall time: 15.4 s
Counter({0: 292, 1: 8})
Wall time: 15.3 s
Counter({0: 295, 1: 5})
Wall time: 14.8 s
Counter({0: 292, 1: 8})
1. Do you have any other suggestions for additional parameters and how to tune these additional parameters that could yield better performance?
2. Given the current state of doc2vec, is a perfect score too much to hope for i.e. that the most similar document is itself 100 percent of the time (with this lee dataset)?