Hi Gordon,
I used doc2vec PV-DBOW in two ways:
1) Doc2Vec(dm=0, dbow_words=1, size=200, window=8, min_count=20, iter=5, workers=cores),
2) Doc2Vec(dm=0, dbow_words=1, size=200, window=5, min_count=12, iter=8, workers=cores),
to train my wikicorpus(14GB).
The model trained successfully. It gave relevant results for many phrases but when i tried artificial intelligence.
In first model, i got the following suggestions:
1) [('Existential risk from artificial general intelligence', 0.7284922003746033),
('Ethics of artificial intelligence', 0.7267584800720215),
("Turing's Wager", 0.7224212884902954),
('Oracle (AI)', 0.7094788551330566),
('AI aftermath scenarios', 0.703824520111084),
('AI control problem', 0.6999846696853638),
('Superintelligence: Paths, Dangers, Strategies', 0.691785454750061),
('Murray Shanahan', 0.6860222220420837),
('Artificial empathy', 0.6842677593231201),
('Explainable Artificial Intelligence', 0.682081937789917),
('Iyad Rahwan', 0.681956946849823),
('Moral Machine', 0.6816681027412415),
('Timeline of artificial intelligence', 0.676627516746521),
('Susan Schneider (philosopher)', 0.6764435768127441),
('From Bacteria to Bach and Back', 0.6752616167068481),
('AI-complete', 0.6739200353622437),
('David A. McAllester', 0.673627495765686),
('Knowledge acquisition', 0.6730433702468872),
('OpenAI', 0.6718262434005737),
('Open Letter on Artificial Intelligence', 0.6698791980743408)]
In second model-
2) [('Existential risk from artificial general intelligence', 0.7561817765235901), ('History of artificial intelligence', 0.734763503074646),
('Ethics of artificial intelligence', 0.7274946570396423),
('Oracle (AI)', 0.7165532112121582),
("Turing's Wager", 0.7119142413139343),
('Artificial general intelligence', 0.7059307098388672),
('Deep learning', 0.7024167776107788),
('AI takeover', 0.701856791973114),
('AI aftermath scenarios', 0.6950700879096985),
('Cognitive science', 0.6925462484359741),
('Symbolic artificial intelligence', 0.6894776821136475),
('AI-complete', 0.6873871088027954),
("Hubert Dreyfus's views on artificial intelligence", 0.6849253177642822),
('Moral Machine', 0.6835113167762756),
('Artificial neural network', 0.6826612949371338),
('Mind uploading', 0.6812909841537476),
('Cognitive bias mitigation', 0.6788017749786377),
('Explainable Artificial Intelligence', 0.6765998601913452),
('Bayesian cognitive science', 0.6736477017402649),
('Intelligence explosion', 0.671064019203186)]
Model two seems to be giving better results, but i want to eliminate suggestions like Existential risk from artificial general intelligence,History of artificial intelligence.
Is there a way where I can tune the parameters to get better results. Also should i try PV-DM w/average so that i can get better phrases and if so what window size and min_count should I use.
Any help will be appreciated. Thank you in advance