The demo notebook included in gensim that Ivan mentions in another thread (
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb) is a good example to study - it applies Doc2Vec to a binary "positive"-"negative" (sentiment) classification.
Note that if your real goal is the best possible spam-vs-ham determination, rather than just trying/learning Doc2Vec on an available and interesting problem, other classification techniques based on other more categorical text features, even just "bag of words", might perform better. Doc2Vec typically benefits from somewhat longer documents - for example the IMDB example has texts of hundreds-of-words, rather than SMS-sized dozens-or-hundreds-of-characters. (It's worth trying as one among many feature-engineering techniques, I'd just avoid expectations, high or low, as to where it'd compare with other techniques.)
- Gordon