Fasttext wrapper in gensim

291 views
Skip to first unread message

ali alk

unread,
Aug 15, 2017, 4:45:11 AM8/15/17
to gensim
Hello

I would like to implement genism.wrapper.fasttext for a large corpus.txt to get the word embeddings. it seems that I do not really understand how to get it run.

from gensim.models.wrappers import fasttext
model = fasttext.FastText.train(ft_path,...,...,...)

I do not understand what ft_path is,  in the documentation, it says " `ft_path` is the path to the FastText executable, e.g. `/home/kofola/fastText/fasttext`. "
but what exactly is that?


Thank you in advance

Ivan Menshikh

unread,
Aug 15, 2017, 7:59:21 AM8/15/17
to gensim
Hi,

For gensim now we have only "wrapper" for an original fasttext code.
Firstly, you should install original fasttext from Facebook.

cd fastText
make

After it, you found path for fasttext binary file and set it to `ft_path`

ali alk

unread,
Aug 15, 2017, 8:00:57 PM8/15/17
to gensim
Thank you for the reply Ivan.
but still I have problem in compiling the fasttext, After downloading git the clone and do make, here is the error


alkha@LAPTOP-07KF9VT1 MINGW64 ~
$ cd fasttext
alkha@LAPTOP-07KF9VT1 MINGW64 ~/fasttext (master)
$ make
c++ -pthread -std=c++0x -O3 -funroll-loops -c src/fasttext.cc
src/fasttext.cc: In member function 'void fasttext::FastText::quantize(std::shar                                                                                                                                  ed_ptr<fasttext::Args>)':
src/fasttext.cc:237:19: error: 'thread' is not a member of 'std'
       std::vector<std::thread> threads;
                   ^~~
src/fasttext.cc:237:19: error: 'thread' is not a member of 'std'
src/fasttext.cc:237:30: error: template argument 1 is invalid
       std::vector<std::thread> threads;
                              ^
src/fasttext.cc:237:30: error: template argument 2 is invalid
src/fasttext.cc:239:17: error: request for member 'push_back' in 'threads', whic                                                                                                                                  h is of non-class type 'int'
         threads.push_back(std::thread([=]() { trainThread(i); }));
                 ^~~~~~~~~
src/fasttext.cc:239:27: error: 'thread' is not a member of 'std'
         threads.push_back(std::thread([=]() { trainThread(i); }));
                           ^~~
src/fasttext.cc:241:30: error: request for member 'begin' in 'threads', which is                                                                                                                                   of non-class type 'int'
       for (auto it = threads.begin(); it != threads.end(); ++it) {
                              ^~~~~
src/fasttext.cc:241:53: error: request for member 'end' in 'threads', which is o                                                                                                                                  f non-class type 'int'
       for (auto it = threads.begin(); it != threads.end(); ++it) {
                                                     ^~~
src/fasttext.cc: In member function 'void fasttext::FastText::train(std::shared_                                                                                                                                  ptr<fasttext::Args>)':
src/fasttext.cc:631:17: error: 'thread' is not a member of 'std'
     std::vector<std::thread> threads;
                 ^~~
src/fasttext.cc:631:17: error: 'thread' is not a member of 'std'
src/fasttext.cc:631:28: error: template argument 1 is invalid
     std::vector<std::thread> threads;
                            ^
src/fasttext.cc:631:28: error: template argument 2 is invalid
src/fasttext.cc:633:15: error: request for member 'push_back' in 'threads', whic                                                                                                                                  h is of non-class type 'int'
       threads.push_back(std::thread([=]() { trainThread(i); }));
               ^~~~~~~~~
src/fasttext.cc:633:25: error: 'thread' is not a member of 'std'
       threads.push_back(std::thread([=]() { trainThread(i); }));
                         ^~~
src/fasttext.cc:635:28: error: request for member 'begin' in 'threads', which is                                                                                                                                   of non-class type 'int'
     for (auto it = threads.begin(); it != threads.end(); ++it) {
                            ^~~~~
src/fasttext.cc:635:51: error: request for member 'end' in 'threads', which is o                                                                                                                                  f non-class type 'int'
     for (auto it = threads.begin(); it != threads.end(); ++it) {
                                                   ^~~
make: *** [Makefile:46: fasttext.o] Error 1






Here is the info about the gcc



$ gcc -v
Using built-in specs.
COLLECT_GCC=C:\MinGW\bin\gcc.exe
COLLECT_LTO_WRAPPER=c:/mingw/bin/../libexec/gcc/mingw32/6.3.0/lto-wrapper.exe
Target: mingw32
Configured with: ../src/gcc-6.3.0/configure --build=x86_64-pc-linux-gnu --host=m                                                                                                                                  ingw32 --target=mingw32 --with-gmp=/mingw --with-mpfr --with-mpc=/mingw --with-i                                                                                                                                  sl=/mingw --prefix=/mingw --disable-win32-registry --with-arch=i586 --with-tune=                                                                                                                                  generic --enable-languages=c,c++,objc,obj-c++,fortran,ada --with-pkgversion='Min                                                                                                                                  GW.org GCC-6.3.0-1' --enable-static --enable-shared --enable-threads --with-dwar                                                                                                                                  f2 --disable-sjlj-exceptions --enable-version-specific-runtime-libs --with-libic                                                                                                                                  onv-prefix=/mingw --with-libintl-prefix=/mingw --enable-libstdcxx-debug --enable                                                                                                                                  -libgomp --disable-libvtv --enable-nls
Thread model: win32
gcc version 6.3.0 (MinGW.org GCC-6.3.0-1)




Any help?

Thank you

Ivan Menshikh

unread,
Aug 17, 2017, 5:15:25 AM8/17/17
to gensim
I just checked it and it works (current commit: 6b74dfc3997593ec2e3c376f6510e990871acda7), but I build it on linux. 
As I know, fasttext works with linux/macosx.
I heard about the unofficial fork that includes binary for windows, you can try it.
Also, you can discuss windows support in official fasttext repo.

ali alk

unread,
Aug 17, 2017, 3:21:22 PM8/17/17
to gensim
Thank you ivan
I hope some official work will be added soon to support fasttext in windows

Ivan Menshikh

unread,
Aug 24, 2017, 1:41:38 AM8/24/17
to gensim
Another important thing: now, one of our students implement custom fasttext (don't need a binary from facebook fasttext). You can track progress in #1525.
Reply all
Reply to author
Forward
0 new messages