77).similarity_matrix = []
index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0))
for sims in index:
similarity_matrix.append(sims)
similarity_array = np.array(similarity_matrix)
The dimensionality of the similarity_array is 300 X 300. However as I understand the dimensionality should be 77 x 77 (as my vocabulary size is 77).
i.e.,
word1, word2, ......, word77
word1 0.2, 0.8, ..., 0.9
word2 0.1, 0.2, ...., 1.0
... ...., ....., ....., ....
word77 0.9, 0.8, ..., 0.1
Please let me know what is wrong in my code.
Moreover, I want to know what is the order of the vocabulary (word1, word2, ..., word77) used to calculate this similarity matrix?
Can I obtain this order from model.wv.index2word? Please help me!
from gensim.models import Word2Vecimport numpy as npfrom scipy.spatial.distance import cdist
sentences = [['cute', 'cat', 'say', 'meow'], ['cute', 'dog', 'say', 'woof']]model = Word2Vec(sentences=sentences, size=10, window=1, iter=2000, min_count=1)
_ = {word: idx for (idx, word) in enumerate(model.wv.index2word)} # for comfortassert len(_) == 6
similarity = 1 - cdist(model.wv.syn0, model.wv.syn0, metric='cosine')assert similarity.shape == (6, 6)
similarity[_["cat"], _["dog"]] # similarity between 'cat' and 'dog'I am using the following python code to generate similarity matrix of word vectors (My vocabulary size is77).similarity_matrix = [] index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0)) for sims in index: similarity_matrix.append(sims) similarity_array = np.array(similarity_matrix)The dimensionality of the
similarity_arrayis300 X 300. However as I understand the dimensionality should be77 x 77(as my vocabulary size is 77).
i.e., word1, word2, ......, word77 word1 0.2, 0.8, ..., 0.9 word2 0.1, 0.2, ...., 1.0 ... ...., ....., ....., .... word77 0.9, 0.8, ..., 0.1Please let me know what is wrong in my code.
Moreover, I want to know what is the order of the vocabulary
(word1, word2, ..., word77)used to calculate this similarity matrix?Can I obtain this
orderfrommodel.wv.index2word? Please help me!
how to construct a vector for each sentence? Maybe by summing the word vector and then normalizing it?
Sorry for bothering you, but how to construct a vector for each sentence? Maybe by summing the word vector and then normalizing it? Do I have to nomalize it