How to Append list of type String Int and Float using Numba

31 views
Skip to first unread message

vivek sharma

unread,
May 25, 2021, 8:01:49 AM5/25/21
to Numba Public Discussion - Public
I am using Numba to improve the speed of the below loop. without Numba it takes 135 sec to execute and with Numba it takes 0.30 sec :) which is very fast.

In the below loop I comparing the array with a threshold of 0.85. If the condition turns out to be True I am inserting the data into the List which will be returned by the function. 

The data which is getting inserted into the List looks like this.

``` ['Source ID', 'Source TEXT', 'Similar ID', Similar TEXT, 'Score'] ```

```
idd = df['ID'].to_numpy()
txt = df['TEXT'].to_numpy()

Column = 'TEXT'
df = preprocessing(dataresult, Column) # removing special characters of 'TEXT' column
message_embeddings = model_url(np.array(df['DescriptionNew']))  #passing df to universal sentence encoder model to create sentence embedding.
cos_sim = cosine_similarity(message_embeddings) #len(cos_sim) > 8000

# Below function finds duplicates amoung rows.
@numba.jit(nopython=True)
def similarity(nid, txxt, cos_sim, threshold):

  numba_list = List()
  for i in range(cos_sim.shape[0]):
    for index in range(i, cos_sim.shape[1]):
      if (cos_sim[i][index] > threshold) & (i!=index):
        numba_list.append([nid[i], nid[index], cos_sim[i][index]]) # either this works
        # numba_list.append([txxt[i], txxt[index]]) # or either this works
        # numba_list.append([nid[i], txxt[i], nid[index], txxt[index], cos_sim[i][index]]) # I want this to work.
              
  return numba_list

print(similarity(idd, txt, cos_sim, 0.85))

```
In the above code during appending List either columns with numbers get appended or either Text. I want all the columns with both numbers and text to get inserted into the ```numba_list```.

I am getting below Error 

```

1 frames
/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py in error_rewrite(e, issue_type)
    359                 raise e
    360             else:
--> 361                 raise e.with_traceback(None)
    362 
    363         argtypes = []

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Poison type used in arguments; got Poison<LiteralList((int64, [unichr x 12], int64, [unichr x 12], float32))>
During: resolving callee type: BoundFunction((<class 'numba.core.types.containers.ListType'>, 'append') for ListType[undefined])
During: typing of call at <ipython-input-179-6ee851edb6b1> (14)


File "<ipython-input-179-6ee851edb6b1>", line 14:
def zero(nid, txxt, cos_sim, threshold):
    <source elided>
        # print(i+1)
        numba_list.append([nid[i], txxt[i], nid[index], txxt[index], cos_sim[i][index]])
        ^
```

Stanley Seibert

unread,
May 25, 2021, 10:11:56 AM5/25/21
to Numba Public Discussion - Public
Hi Vivek!

This mailing list is being slowly deprecated in favor of our Discourse forum.  If you ask your question in the "How do I?" section, you'll get more visibility:


That said, the problem you are running into is that Numba lists are fast because they have only one element type.  If you don't declare an element type for the List, it will be the type of the first element you append, and then cannot change after that point.  People in the Discourse forum might have some ideas for how to work around that for your use case.


--
You received this message because you are subscribed to the Google Groups "Numba Public Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numba-users...@continuum.io.
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/numba-users/9c243c02-21b1-43d6-88e0-de6e86183ac4n%40continuum.io.
Reply all
Reply to author
Forward
0 new messages