How many epochs to train for Doc2Vec

793 views

Skip to first unread message

Francisco Vargas

unread,

Aug 19, 2016, 11:32:15 AM8/19/16

to gensim

Usually in NNs we have schedulers that allow us to regularize by early stopping for example. To do this an overall error metric is needed and it is assessed for every epoch.

As I asked previously there seems to be no way easy way of obtaining a classification accuracy over some validation set of words ? or documents ? what would be the approach to to use these in training.

Gordon Mohr

unread,

Aug 19, 2016, 6:19:13 PM8/19/16

to gensim

Following the model of the original word2vec.c release, the gensim gradient descent is very simple: a fixed number of passes, a linear decay of the learning rate, no early stopping or learning-rate variation.

More advanced methods are a wishlist item, Adagrad for example:

https://github.com/RaRe-Technologies/gensim/wiki/Word2Vec-&-Doc2Vec-Wishlist#add-adagrad-gradient-descent-option

Currently, also, the internal neural-network model's training loss on its word-predictions isn't accumulated & reported. It could be, and that'd also be a nice-to-have feature, enabling other gradient-descent options or meta-optimizations. The `score()` code, motivated by another purpose, does some similar calculations when testing model error on new texts – but of course tracking the running errors during original training would be necessary for training-optimizations.