How to save the model loaded from gensim.downloader

3,129 views
Skip to first unread message

Quantum Dreamer

unread,
Jun 11, 2019, 5:31:17 PM6/11/19
to Gensim
I use
fasttext_model300 = api.load('fasttext-wiki-news-subwords-300') to load the fasttext pretrained model. However every time I run this code, I spend time on loading the model, is there a way to save this model locally and call it back?

Michael Penkov

unread,
Jun 12, 2019, 8:54:07 PM6/12/19
to Gensim
Could you please clarify what you mean by "spend time on loading the model"? Do you mean the api.load call downloads the model from the network each time (how do you tell)? If yes, then it sounds like a bug, and we should look into it. Otherwise, read on.

On my machine, the following script takes a few minutes to run the first time, and I see a progress bar for the download. It then takes a few minutes (around 5) to actually load the model from disk into memory.

```python
import gensim.downloader as api
import logging
logging.basicConfig(level=logging.INFO)

fasttext_model300 = api.load('fasttext-wiki-news-subwords-300')
print(fasttext_model300)
```

When I run the same script subsequently, it takes less time. It skips the download, because the file is already available locally. The loading into memory step is unavoidable, though, and that still takes a few minutes for this model (i1GB).

If you want even faster load times, you can try this:

1) load the file that's already locally stored in ~/gensim-data (using api.load), then
2) save it with Gensim's native .save(), then
3) keep loading it back with .load(mmap='r') for faster load times.

Please let me know if that helps.

Sai Srujan

unread,
May 4, 2020, 12:18:01 PM5/4/20
to Gensim
I faced this problem too while working with pre-trained models, the below code worked for me on glove data:
part 1:
for the first time it will take some time for loading 
'''modell = api.load('glove-wiki-gigaword-50')
  modell.save('fstwk.d2v')'''
part 2:
after your model will be saved as "fstwk.d2v" in your working directory
then you can directly load your model with below code:
''modell = KeyedVectors.load("fstwk.d2v")''
*note that remove the above part 1 code from next execution(probably from second time of execution)
I hope this works fine with your fasttext data also
Reply all
Reply to author
Forward
0 new messages