SoftCosineSimilarity broken in gensim==3.7 [AssertionError (WordEmbeddingSimilarityIndex problem)]

119 views
Skip to first unread message

tedo.v...@gmail.com

unread,
Jan 24, 2019, 6:47:40 PM1/24/19
to Gensim
I trained w2v model on my own corpus and try to use it to calculate SoftCosineSimilarity.

Folowing https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/soft_cosine_tutorial.ipynb, when execute 
similarity_index = WordEmbeddingSimilarityIndex(w2v_model)

I've got AssertionError.

  File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 1389, in __init__
   
assert isinstance(keyedvectors, WordEmbeddingsKeyedVectors)
AssertionError

This w2v model is 100% OK, I have been use it for Word2Vec based Cosine Similarity, Word2Vec based Word Mover's Distance and Word2Vec based Euclidean Distance.
Something went wrong with gensim==3.7 as I said in one of previous posts.

If I use old code, which worked on gensim==3.6
similarity_matrix = w2v_model.wv.similarity_matrix(dictionary)
now in gensim==3.7 gives mi error:
  File "/usr/local/lib/python2.7/dist-packages/gensim/utils.py", line 1447, in new_func1
   
return func(*args, **kwargs)
 
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 660, in similarity_matrix
    index
, dictionary, tfidf=tfidf, nonzero_limit=nonzero_limit, dtype=dtype)
 
File "/usr/local/lib/python2.7/dist-packages/gensim/similarities/termsim.py", line 234, in __init__
   
for term, similarity in index.most_similar(t1, num_rows)
 
File "/usr/local/lib/python2.7/dist-packages/gensim/models/keyedvectors.py", line 1401, in most_similar
   
for t2, similarity in most_similar:
TypeError: 'numpy.float32' object is not iterable

If I downgrade to gensim==3.6 the old code is working just fine.

tedo.v...@gmail.com

unread,
Jan 24, 2019, 7:15:55 PM1/24/19
to Gensim
Probably doesn't matter but it is on Python2.7.
Please check this bug.
Reply all
Reply to author
Forward
0 new messages