14k docs (of only ~47 words each on average, and less after rare words discarded) is on the small side for training a Doc2Vec model, but I'd still expect you to see some useful modeling from training. (Increasing `epochs` & decreasing `vector_size` can help a bit with thinner corpora, but getting more compatible data would be best.)
Your main error is trying to manage the training iterations and `alpha` learning rate with multiple calls to `train()` inside your own loop – and further severely mismanaging `alpha` such that it's going to a nonsensical negative value long before training ends.
Doing this yourself is almost always a mistake – unnecessary and error-prone. You should have even seen a `WARNING`-level message in your logs about the atypical situation: "Effective 'alpha' higher than previous training cycles"
If you can let whatever tutorial/example/etc site you copied this practice from know that they're steering users astray, please do! (And if by chance they offered exactly these wrong-values for `alpha`, the looping range, and the `alpha`-decrement, note that they didn't know what they were doing or check their results effectively before offering this example, so take any other their other guidance with a big grain of salt in the future.)
Separate comments:
* enabling logging to the INFO level will give you far more insight into the process, & progress of the model
* when you do call `.train()` with the same corpus as was just passed to `build_vocab()`, it's sufficient to use the `.corpus_count` that's cached inside the model (as per the code in the SO answer) – there's no need for the extra complexity & error-risk of supplying your own calculation.
* using a non-default `sample` parameter is typically most beneficial with much-larger corpora, where you might want to make it more aggressive (even smaller than default `10e-3`) - but starting out, with a tiny corpus, I'd not specify anything at all. (And, contrary to the comment in your code, I've never noticed a good reason to make it different between PV-DM and PV-DBOW modes.)
* setting `workers` to a number equal, or just under, the count of CPU cores is only a good policy up to about 8-core processors; if by chance you're on a machine with even more cores, a workers values somewhere in the 6-12 ranges is likely to achieve highest training throughput (though with a tiny corpus, no choice here will make that big of a running-time difference).
- Gordon