FileNotFoundError when loading trained index under Similarity API

Johnny Lu

unread,

Sep 30, 2021, 9:50:09 AM9/30/21

to Gensim

Hi:

I have trained a similarity index via Similarity API:

```py

positive_index = df.loc[df.label == 1].index

posi_corpus = LsiCorpus(index_list=positive_index)

index_tmpfile = get_tmpfile('index')

sim_index_posi = Similarity(

output_prefix=index_tmpfile,

corpus=posi_corpus,

num_features=lsi.num_topics,

num_best=5

)

sim_index_posi.save('.....')

```

When I load it:

```py

from gensim.similarities import Similarity

pos_sim_index = Similarity.load('......')

```

and try to infer new sample's similarity:

```py

pos_sim_index[lsi_vector]

```

it gave me such error:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp8beo5bla/index.0'

May I know how to troubleshoot this?

Thanks

//JL

Radim Řehůřek

unread,

Oct 1, 2021, 3:15:13 AM10/1/21

to Gensim

Hi Johnny,

you have set output_prefix= get_tmpfile('index') . That means your Similarity index is stored under temp – as corroborated by path in your error message '/tmp/tmp8beo5bla/index.0'

My guess is you created your Similarity index, then wiped temp somehow (e.g. by restarting your machine?), then tried to load Similarity and got the error.

What files are physically there, under /tmp/tmp8beo5bla ?

-rr

Johnny Lu

unread,

Oct 6, 2021, 3:43:20 AM10/6/21

to Gensim

Hi rr:

My origin intention is portability, wanna to get the index, then move to production site to compute new similarity.

I have switch to `MatrixSimilarity` API to get the index.

Sorry I am new to Gensim, at the beginning I was surprised the index were so small, but I found out in `Similarity` API it require the temp file come along, and `MatrixSimilarity` require the `.npy` matrix come along, my teammate concern about the file size as the production model need to deploy to much lesser RAM IoT devices, well, still in debate.

So, there's no way to just using the index without the matrix file, right?

Thanks

//Johnny

Radim Řehůřek

unread,

Oct 6, 2021, 11:05:54 AM10/6/21

to Gensim

Correct – these similarity indexes store the full data points. So that will use a lot of memory – both disk and RAM.

For more lightweight indexing, check out Annoy:

https://radimrehurek.com/gensim/similarities/annoy.html

-rr

Reply all

Reply to author

Forward