Hello to whomsoever it may concern,
So, I recently trained a gensim doc2vec model using approximately 7k documents with text. and mainly I am using this to find similar documents in my corpus and flagging them. Doc2vec works perfectly fine with this and all the results come out good.
Now, In my project, anyone can upload a new document, hence which means I have a new document now to index to my model, so that means my count should get increased to 7,001 documents. because in the future, I might be getting more and more documents and the doc2vec algorithm should check the similarity with these new docs as well.
So, I wanted to figure if there is a way to retrain the model, by not actually retraining the whole of 7,001 documents, but rather reuse the 7k doc2vec model and train with the only the new document. If such a solution is possible?
If not, could you suggest me some alternative approach which I could take?
I have already tried using build_vocab(update= True) and other solutions and also searched a lot, but nowhere it mentions as a solution. Can it be done?
Cheers,
Nandan