I need to compare the similarity of a bunch of job offers.
When I do a Mallet word embedding similarity between pairs of job offers, they always come up with a large similarity of > 0.95.
I suspect the reason for this is that the language used in job offers is very standard and therefore all documents seem very similar on the surface.
I was wondering if it would be possible for me to feed the similarity evaluator a list of stopwords which includes not only generic english stopwords, but also words that are commonly used in most job offers.
Thx.
Alain Désilets