Hello,
I have been using doc2vec for quiet sometime now and am very pleased with the results that I get.
Recently I wanted to see some similarity scores from the vectors of the model that I get if I do something like :
scores1 = spatial.distance.cdist(vector1, vector2, 'cosine')
where
vector1 and
vector2 are obtained as : vector1 =
model.infer_vector(document1,steps=15, alpha=.09) vs how doc2vec's most_similar method is implemented.
I read the method that implements the most_similar method says this :
************************
def most_similar(self, positive=[], negative=[], topn=10, clip_start=0, clip_end=None, indexer=None): |
"""
Find the top-N most similar docvecs known from training. Positive docs contribute
positively towards the similarity, negative docs negatively.
This method computes cosine similarity between a simple mean of the projection
weight vectors of the given docs.*************************
What I am trying to understand is say I have projected vectors from the model
v1 = [.4,.5.6,7.8,9.]
and
v2 = [.2.3.4.5.6,.7]
what does :
simple mean of the projection weight vectors of the given docs mean?
And say if I have to implement the
most_similar method, how does the cosine similarity gets calculated in doc2vec?
Any insight will be much appreciated.
Regards,
Rathish