The vocabulary-building scan can be time-consuming. And, the decoding of the XML/Wikitext by WikiCorpus will be more expensive than handling plain-text. But, your main problem is likely the swapping. With Python datastructures/objects, you essentially never want to see any swapping, or else otherwise quick operations will take forever. (When you see it, you'll usually want to adjust the code to use less memory, or get more RAM, rather than wait it out.)
So you'd want to do anything possible to reduce memory use to avoid swapping, perhaps including shrinking your model (smaller `size`, larger `min_count`).
The logging output you show actually comes from `train()`, not `build_vocab()`. That suggests that at least the `build_vocab()` scan finished, and execution continued to an `train()` invocation (not shown) which would iterate over the corpus another 10 times.
Looking more at `WikiCorpus.get_texts()`, I'm a bit suspicious of its use of multiprocessing & thus multiple forked processes. It might be OK in a Python process that was just doing one pass over the WikiCorpus, but blow up addressable memory during multiple passes in a Python process that was already using a lot of memory (the large Doc2Vec model in training).
The optimization I'd mentioned in the other thread might help avoid issues. That is: use WikiCorpus only once, to extract the titles/text and write those to a plainer-text format locally. Then, read and feed that text to Doc2Vec, without using WikiCorpus.