For this github tutorial: gensim/docs/notebooks/doc2vec-lee.ipynb
I have copied the code verabatim and I have been unable to reproduce anything near the 96% accuracy rate.
I am using gensim 0.13.4 on jupyter 4.3.1 notebook. I am using Anaconda Navigator.
In the tutorial for the assessment of the model :
In [12 ]collections.Counter(ranks)
Out[12] Counter({0: 292, 1: 8}) <-- Tutorial got
I am getting Counter({0: 31,
1: 24, 2: 16, 3: 19, 4: 16, 5: 8, 6: 8, 7: 10, 8: 7, 9: 10, 10: 12, 11: 12, 12: 5, 13: 9, ...
What could possibly be the problem. I am just copy pasting?
Thanks
Wall time: 12.5 s Counter({0: 292, 1: 8}) Wall time: 12 s Counter({0: 291, 1: 9}) Wall time: 16.4 s Counter({0: 290, 1: 10}) Wall time: 20.6 s Counter({0: 295, 1: 5}) Wall time: 21.3 s Counter({0: 292, 1: 8}) Wall time: 20.6 s Counter({0: 292, 1: 8}) Wall time: 16.7 s Counter({0: 296, 1: 4}) Wall time: 15.4 s Counter({0: 292, 1: 8}) Wall time: 15.3 s Counter({0: 295, 1: 5}) Wall time: 14.8 s Counter({0: 292, 1: 8})
1. Do you have any other suggestions for additional parameters and how to tune these additional parameters that could yield better performance?
2. Given the current state of doc2vec, is a perfect score too much to hope for i.e. that the most similar document is itself 100 percent of the time (with this lee dataset)?