Inference approximates what that same text would have received, as a vector, if it had been in bulk training. There's no specific way to weight the inference (other than leaving words out), nor side-reporting of what words most-affected the inference.
But some thoughts:
If you do have external signals that some words are less relevant, and your experiments suggest leaving those out improves your results, that's a reasonable approach (and it'd be interesting to hear what such signals help).
Especially in the case of PV-DBOW mode (`dm=0`), word neighbors don't matter, so you could potentially also *repeat* tokens that you know are important, to effectively give them more weight (in either training or inference). (In PV-DM modes, since neighboring words do matter, inserted extra terms might also be worth trying, but exactly where you insert them would then be more likely to have other mixed effects.)
In the canonical/original definition of the Doc2Vec algorithm ("Paragraph Vector"), each text just has a single unique ID, and those IDs each receive a trained vector. However, it's possible to give texts multiple tags, some of which repeat between text examples, and thus learn doc-vectors for those other tags as well. If there are certain distinguished keywords/categories in your data, like error-codes or product-names, it *might* make sense to coerce those to a controlled-vocabulary and add them as tags in the training data. Each such tag would then get its own vector, and the closeness of inferred-text vectors to those vectors might be useful. (In a way, doc-vectors are like "super-words", that range over the entire text example – so promoting known-salient terms from a small subset to be doctags rather than words may make sense.)
It's possible that other meta-parameter choices could tend to make the model better at your domain, or at inference. For example, sometimes fewer dimensions achieve better generalization (in addition to training/inferring faster), especially with limited data. (That might be one way to get a model that's better at ignoring 'noise' words, even without your own external word-inclusion tweaking.) So if you have (or can develop) some quantitative, repeatable way to score one model as better than another, a broad-search of potential training parameters could help.
In word2Vec, some have observed that the magnitude of raw word-vectors (before the unit-norming applied for similarity-comparisons) can be an indication of the strength/unity of a word's meaning. That is, words that mean something very specific have longer vectors, while more generic words (or words with many senses) have shorter. That *might* be a useful signal, alone or in combination with overall corpus/document frequencies, for treating some words specially – but this is speculation for potential experimentation, I don't know any rules-of-thumb. The same might be the case, in negative-sampling models, of the output weights that exist per-predicted word in `syn1neg` – though I wouldn't even have a guess as to whether larger or smaller magnitudes are more meaningful.
(Note that in pure PV-DBOW – `dm=0` – traditional input word-vectors are not trained at all, so will appear random from a model. Only adding word-training to DBOW with `dbow_words=1`, or switching to a PV-DM mode with `dm=1`, will a Doc2Vec model's word-vectors be meaningful.)
I know that's a big dump of half-baked ideas, but hope it helps.
- Gordon