I'm wondering if someone can provide some insights into the underlying theory and justification that gensim uses to compute the similarity for 2 sets of vectors.
For example, n_similarity(['restaurant', 'japanese'], ['sushi', 'shop']) => 0.6154
Currently, gensim takes the mean vector of each set of vector, and then computes the cosine similarity of the resultant vector means. But there are other approaches.
One could create a similarity matrix of M x N dimensions, where each element is the cosine similarity of each pair of vectors, and then normalize the matrix to a score between 0 and 1 using L2 or something. Another approach would simply be to treat each similarity matrix as an MN vector and then compare those vector simliarities, or normalize them somehow.
I'd just like some insight into the tradeoffs of each approach. For example, taking the mean of a similarity matrix provides a very different score than taking the similarity of 2 vector means. I'm wondering if there is a theory behind which to use when, or if it's a trial and error sort of thing for each application.
Thanks.