Following the model of the original word2vec.c release, the gensim gradient descent is very simple: a fixed number of passes, a linear decay of the learning rate, no early stopping or learning-rate variation.
More advanced methods are a wishlist item, Adagrad for example:
Currently, also, the internal neural-network model's training loss on its word-predictions isn't accumulated & reported. It could be, and that'd also be a nice-to-have feature, enabling other gradient-descent options or meta-optimizations. The `score()` code, motivated by another purpose, does some similar calculations when testing model error on new texts – but of course tracking the running errors during original training would be necessary for training-optimizations.
- Gordon