Pretrained model for doc2vec

208 views
Skip to first unread message

pradeep t

unread,
Jun 29, 2023, 3:55:46 AM6/29/23
to Gensim
Is there any pretrained model available for doc2vec algorithm (similar GoogleNews-vectors-negative300.bin for word2vec)?

Gordon Mohr

unread,
Jun 29, 2023, 5:45:39 PM6/29/23
to Gensim
I don't know of any I'd recommend, & that work with recent Gensim versions. (When I've seen such before, they've been based on outdated or custom Gensim versions, and made iffy choices of parameters, and had final sizes making their completeness/utility suspect.)

Note also that a generic model trained on public sources – like say Wikipedia articles – might not provide the best modeling capabilities for other kinds of documents that use different words & word-senses & topics. 

Still, it's not too hard to train your own such model – though it may take tens-of-hours of runtime, and a machine with 16GB+ RAM – using Wikipedia or document-sets closer to your own problem domain. 

There's a notebook demonstrating the process with a Wikipedia dump in the Gensim source code at:


- Gordon
Reply all
Reply to author
Forward
0 new messages