FastText zip

242 views
Skip to first unread message

Ernst

unread,
Jun 28, 2017, 12:53:33 PM6/28/17
to gensim

Hello,


thanks a lot for creating Gensim!


Just a question - is it possible to read in the the fasttext zip model into gensim and create vectors for unknown words?

I tried to load a uncompressed fasttext model, but it uses lots of Ram - about 20 GB, and I need to have models for at least two languages in memory simultaneously...


Thanks,
Ernst

Ivan Menshikh

unread,
Jul 7, 2017, 2:13:07 AM7/7/17
to gensim, er.pra...@gmail.com, jayantj...@gmail.com
Hi Ernst,
I hope Prakhar and Jayant can help you.

jayant jain

unread,
Jul 7, 2017, 9:06:22 AM7/7/17
to gensim
Hi Ernst,

No, unfortunately it isn't possible to load a zipped model into gensim.
Though, even if it were possible, it wouldn't reduce the steady-state memory usage as the final weight matrices will still be the same for compressed files as for uncompressed files.

In the future (suggested by Lev), it might be a useful feature to have mmapp-ed numpy arrays for the weights.

20 GB memory usage sounds quite high though, is this number from one of the pretrained FastText models?

Emil Stenström

unread,
Dec 30, 2017, 2:17:21 AM12/30/17
to gensim
It seems https://pypi.python.org/pypi/pyfasttext has support for loading the quantized models. As these are not only zipped, but also made much smaller than the original version, memory usage should definitely be lower. The idea if to allow using fasttext on mobile: https://fasttext.cc/blog/2017/05/02/blog-post.html

Is support for this format something that gensim would accept as an PR?

/Emil

Radim Řehůřek

unread,
Dec 30, 2017, 4:33:05 AM12/30/17
to gensim
Absolutely!

Working with "full" FastText is a pain in the ass, because of the huge sizes.

This will be a very interesting and desirable PR.

Cheers,
Radim

Emil Stenström

unread,
Jan 1, 2018, 5:11:21 AM1/1/18
to gensim
I've looked at this further. Pyfasttext just calls out to the fasttext c++ code which does the model loading. I was hoping that I could reuse python code from pyfasttext, but this is not the case. My c++ is extremely rusty, so this will be way over my head. So I'm afraid I have to leave this to someone more skilled in the intricacies of FTZ files.

Ivan Menshikh

unread,
Jan 7, 2018, 11:14:28 PM1/7/18
to gensim
Hi Emil,

you can try to "rewrite" needed C++ functions to python and create PR, don't be afraid, community will try to help you.

Emil Stenström

unread,
Jan 8, 2018, 3:33:17 AM1/8/18
to gen...@googlegroups.com
Thanks for the encouragement. In this case, it's more about having the needed time available, and the skills to get it right, than the courage :)

--
Emil Stenström
--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages