Very slow speed loading vector model

4,684 views
Skip to first unread message

Haha

unread,
Apr 4, 2015, 10:32:45 PM4/4/15
to gen...@googlegroups.com
Hi,
I'm using 'gensim' to calculate cosine similarity between two words. I have a pre-trained a model in binary format which is 4.5 GB. I try to load the model into python using 'Word2Vec.load_word2vec_format'. But it really takes quite a long time, maybe more than 30 minutes. I need to use that model to calculate cosine similarities between many words in a text corpus, and I don't want to spend so much time just loading the model each time I run my program. Is there any better way to do that? 
Thanks.

Radim Řehůřek

unread,
Apr 5, 2015, 5:38:39 AM4/5/15
to gen...@googlegroups.com
Hello Haha,

loading 4.5GB from the C format can take a little, but 30 minutes sound way too excessive. Can you post your log (at DEBUG level), just to check everything's ok?

A faster way is to use gensim's save/load functionality. It uses a binary format and supports memory-mapping the large arrays into virtual memory directly, so loading should be pretty much instant.

Best,
Radim

Haha

unread,
Apr 5, 2015, 8:45:49 PM4/5/15
to gen...@googlegroups.com
Hi Radim,
Thanks for your reply. The model file is in 'bin' format, which is a model in binary format. So I think I can only use Word2Vec.load_word2vec_format to load it from disk to memory.  It's just one line of code : model = Word2Vec.load(model_name). And the program starts running. Actually, it takes much longer than 30 minutes, maybe 2 hours. So I wonder if there is anything wrong. I use OX 10.8.3, and never met this situation before.

Radim Řehůřek

unread,
Apr 6, 2015, 4:00:54 AM4/6/15
to gen...@googlegroups.com
You can try loading the model with `load_word2vec_format()` and then saving in normal format with `save()` (and then keep loading with `load(mmap='r')` ).

Let me know if that helped,
Radim

Haha

unread,
Apr 6, 2015, 5:36:41 PM4/6/15
to gen...@googlegroups.com
Hi Radim,
Is this code correct?
model = Word2Vec.load_word2vec_format('wikipedia-pubmed-and-PMC-w2v.bin',binary=True)
model.init_sims(replace=True)
model.save('bio_word')

#later load the model
Word2Vec.load('bio_word',mmap='r')

I saw in the documentation that 'init_sims' can save a lot of memory. So if I first use this function ,and then save this model, and then later I load this model using mmap='r', will it save a lot of time? 
Thanks a lot.

Radim Řehůřek

unread,
Apr 7, 2015, 5:06:02 AM4/7/15
to gen...@googlegroups.com
Yes, I think so.

Let us know how it went.

Radim

Haha

unread,
Apr 9, 2015, 6:27:24 PM4/9/15
to gen...@googlegroups.com
Hi Radim,
It seems that for that code to work, the model should at least be first loaded into the memory, which takes extremely long time, maybe several hours. Maybe the only way is to change to another computer with larger memory, maybe 16GB.

Haha

unread,
Apr 9, 2015, 10:29:01 PM4/9/15
to gen...@googlegroups.com
Hi Radim,
I tried to load the 4.5GB binary model and write a normal model on a machine with large memory (I use model.init_sims(replace=True) before I write to file). Then I load the model which is currently 355.4 MB using mmap = 'r' . This time it takes around 9 minutes. 

Radim Řehůřek

unread,
Apr 10, 2015, 5:42:57 AM4/10/15
to gen...@googlegroups.com
Hello,

mmap is be pretty much instant -- it only loads the pages into RAM on demand, when they are accessed.

If it takes your computer 9 minutes to load 355MB from disk (355MB = the other, non-mmaped pickle structures, I assume), there's probably something wrong with the disk.

Maybe if you post a full log, at DEBUG level, it would be clearer what's happening.

Best,
Radim

ankush babbar

unread,
Jun 11, 2017, 3:38:46 AM6/11/17
to gensim
I tried doing this. It gives the following error -
AttributeError: 'KeyedVectors' object has no attribute 'negative'

Gordon Mohr

unread,
Jun 11, 2017, 3:28:04 PM6/11/17
to gensim
Do you get this error when loading, or when trying something later? (What did you try, what is the full error stack shown?)

- Gordon

Alok Nayak

unread,
Dec 18, 2018, 5:18:37 AM12/18/18
to Gensim
Try to load using KeyedVectors.load e.g.
from gensim.models.keyedvectors import KeyedVectors
model = KeyedVectors.load('GoogleNews-vectors-gensim-normed.bin', mmap='r')
instead of 
from gensim.models import Word2Vec
model = Word2Vec.load('GoogleNews-vectors-gensim-normed.bin', mmap='r')
Reply all
Reply to author
Forward
0 new messages