Don't fear trying more `epochs` – if that ever hurts, it may be a sign of overfitting, but it's not the true cause, nor is reducing the epochs an appropriate way to prevent overfitting. Instead, fight overfitting with a smaller vector `size`, or smaller vocabulary – so the model has fewer 'free parameters' to memorize idiosyncracies of the training data.
Wondering what your classes are – sentiment, or something else? And, what's the quality/balance of your training data? Is the 55% accuracy on a randomly held-back test set?
Do you hold back the test set from both `Doc2Vec` training & the classifier, or just the supervised classifier? (It's somewhat defensible to use all available data, without labels, for the unsupervised Doc2Vec training, even if held back from supervised classsifier training. If you can collect other unlabeled data from the same domain, adding it to Doc2Vec training improve the Doc2Vec model's general vocabulary can also make sense.)
I'd usually expect a higher accuracy, even from very simple techniques, but perhaps the problem is really hard and/or the training data very thin. Still, I'd double check all steps for process errors like mismatches of item-ids across any shuffling/sampling, or mistakenly creating imbalanced training/testing sets, etc.
The silver lining of a small (& quick) dataset is you can run a broader search across metaparameters. I'd especially try smaller vectors, adding `dbow_words` to `dm=0` training or trying `dm=1`, varied `negative` and `window` sizes, alternate `ns_exponent` values, & a larger `min_count`.
Though you may hate to throw anything out with such a small corpus, setting `min_count=1` (or any very-low value) can backfire, as all those rare words can't acquire powerful generalizable meanings from single (or few) usage examples, and thus serve as noise interfering with the training of other words (and reservoirs of excess model state to overfit training data).
Other tokenization strategies could be worth trying. With big datasets, stemming/lemmatization can be superfluous for Word2Vec/Doc2Vec/etc algorithms - there are enough examples of every word variant that they all wind up near each other. But in tiny datasets, the extra hint providing by canonicalizing variants into one token may help.
I would definitely be comparing against other sparser representations – bag of words, and also trying word n-grams or character n-grams – because there might be some individual word/multiword/subword features that are highly indicative for your specific classification, and get 'smoothed out' by the dense embedding featurization.
You could also try FastText's supervised classification mode, where word-vectors are trained specifically to be good at feeding a prediction of known-labels – though that mode is not supported by `gensim`.
- Gordon