SparseTermSimilarityMatrix error, what to do with it?

58 views
Skip to first unread message

Tedo Vrbanec

unread,
Sep 12, 2024, 9:31:31 AM9/12/24
to Gensim
I haven't used my code in a long time, and now executing
similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)
I have got an error:

    similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gensim/similarities/termsim.py", line 513, in __init__
    source = _create_source(*args)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/gensim/similarities/termsim.py", line 281, in _create_source
    most_similar = [
                   ^
  File "/usr/local/lib/python3.11/dist-packages/gensim/similarities/termsim.py", line 281, in <listcomp>
    most_similar = [
                   ^
  File "/usr/local/lib/python3.11/dist-packages/gensim/similarities/termsim.py", line 157, in most_similar
    if t1 not in self.keyedvectors:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Tedo Vrbanec

unread,
Sep 13, 2024, 2:37:19 AM9/13/24
to Gensim
The line should be fixed as:
if t1 in map(str, self.keyedvectors):

[thanks to ChatGPT ;)]

Gordon Mohr

unread,
Sep 13, 2024, 8:37:36 AM9/13/24
to Gensim
That edit avoid an error; however, I would not have confidence it results in intended or proper behavior.

In particular, it converts what would normally be an efficient O(1) membership-check in a dict into an O(n) probe of the list-of-keys, which could have a big performance impact. And, in normal cases, should give the same unitary True/False result. 

By any chance are your 'terms' something other than plain strings, and if so, what?

Do you have any minimal standalone way to trigger the error you saw?

- Gordon

Tedo Vrbanec

unread,
Sep 13, 2024, 6:12:31 PM9/13/24
to Gensim
@Gordon
I wrote back the original code and added two lines before if:
print(t1, type(t1))
print(self.keyedvectors, type(self.keyedvectors))
I also made very small corpora of three sentences within three txt files.
Before code breaks with the error, here is the output of print commands:
acidic <class 'str'>
[array([ 0.05108338,  0.06412067, -0.09670051,  0.00860585,  0.06627294]), array([0.09389341, 0.06865101, 0.04285159, 0.01561402, 0.04258162]), array([ 0.02128182, -0.03809803, -0.06132105,  0.05650516, -0.08720438]), array([-0.07764339, -0.01059977, -0.02333061, -0.07759179, -0.09299952]), array([-0.08960435,  0.0279422 , -0.00464952,  0.0628281 ,  0.09856929]), array([-0.07997459,  0.02466222, -0.01713089,  0.03780125,  0.07330118]), array([ 0.00114985,  0.08646952,  0.05045629, -0.05994698,  0.04498813]), array([ 0.03900765,  0.00702658, -0.08184136,  0.04308209, -0.06924964]), array([-0.07969793, -0.02431524, -0.07010775, -0.00599253, -0.05824611]), array([-0.02898957,  0.08358514, -0.06880702, -0.04688286,  0.07443116])] <class 'list'>

Tedo Vrbanec

unread,
Sep 13, 2024, 6:14:43 PM9/13/24
to Gensim
And the code that trigger it is:
        dictionary = Dictionary(input_2D_list)
        reduced_list_bow_corpus = [dictionary.doc2bow(document) for document in reduced_list]
        similarity_index = WordEmbeddingSimilarityIndex(model_vectors)
        similarity_matrix = SparseTermSimilarityMatrix(similarity_index, dictionary)

Tedo Vrbanec

unread,
Sep 16, 2024, 7:29:27 PM9/16/24
to Gensim
Should I provide more info / data to fix this issue?

Gordon Mohr

unread,
Sep 17, 2024, 2:44:45 PM9/17/24
to Gensim
The extra context provided helps, but what's best – easiest to examine issue, confirm a problem, perhaps find best fix – is if you can provide a totally self-contained minimal case that triggers the same error. For example, code for a single cell in a code notebook that triggers the error. (If data can be expressed inline, that's best, but if using small external files required, representative minimal file descriptions as well.)

Thanks,

- Gordon

Tedo Vrbanec

unread,
Sep 22, 2024, 2:54:56 PM9/22/24
to Gensim
I am not a developer. It is obvious that code is looking if term (as string) is in keyedvectors which is numpy list, but should be dictionary, or just list of keys from that dictionary. One who programmed it should know what to do. I don't. Sorry.
Reply all
Reply to author
Forward
0 new messages