I am using an older model generated with gensim 0.13.3, in python 2.7.13 gensim 2.1.0.
>>> model.most_similar(['editorial' ,'rule'])
[(u'rules', 0.6381771564483643), (u'criteria', 0.5107885003089905), (u'criterion', 0.492725133895874), (u'trigger', 0.4885510504245758), (u'policy', 0.484529972076416), (u'DRM/DMCA', 0.47995659708976746), (u'condition', 0.47984564304351807), (u'evaluation', 0.46047770977020264), (u'review', 0.4602917432785034), (u'policies', 0.4565350115299225)]
>>> [ model.n_similarity(['editorial','rule'],[w[0]]) for w in model.most_similar(['editorial','rule']) ]
[0.6019811969674348, 0.48154911757292618, 0.46389647223886166, 0.46570814004737227, 0.46567470188832039, 0.4726064578375645, 0.44584272549756965, 0.44441483919354241, 0.46469044441119817, 0.44388189457279986]
The sort orders of these two results are different - which makes me worry about using one vs the other.
My understanding of how these are calculated is that both do a dot product of two vectors, each vector = <normalized sum_of_word_vectors>.
Give or take an extra average done in one of these - which should wash out in the normalization.