Gensim is very slow training word2vec on ubuntu 14.04

1,562 views
Skip to first unread message

Jeffery Yee

unread,
Sep 11, 2015, 2:34:30 PM9/11/15
to gensim
Hi. All. 
Problem: Gensim is very slow training word2vec on ubuntu 14.04. 

I tried to import the fast version in ipython, which was fine. 
And there are indeed multiple threading running, but these threading are only using aroud 5% cpu. 
All these led to 2015-09-11 11:18:26,559 : Thread-25 : INFO : PROGRESS: at 30.03% words, alpha 0.01749, 16277 words/s. This is too slow. 

Could anyone suggest some possible check points for this? I am running out of solutions. thanks!

Additional debug bug:

python -c 'import scipy; scipy.show_config()'
lapack_info:
    libraries = ['lapack']
    library_dirs = ['/usr/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
umfpack_info:
    libraries = ['umfpack', 'amd']
    library_dirs = ['/usr/lib/x86_64-linux-gnu']
    define_macros = [('SCIPY_UMFPACK_H', None), ('SCIPY_AMD_H', None)]
    swig_opts = ['-I/usr/include/suitesparse', '-I/usr/include/suitesparse']
    include_dirs = ['/usr/include/suitesparse']
blas_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
atlas_threads_info:
  NOT AVAILABLE
atlas_blas_info:
  NOT AVAILABLE
openblas_info:
  NOT AVAILABLE
atlas_blas_threads_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
amd_info:
    libraries = ['amd']
    library_dirs = ['/usr/lib/x86_64-linux-gnu']
    define_macros = [('SCIPY_AMD_H', None)]
    swig_opts = ['-I/usr/include/suitesparse']
    include_dirs = ['/usr/include/suitesparse']
blas_opt_info:
    libraries = ['blas']
    library_dirs = ['/usr/lib']
    language = f77
    define_macros = [('NO_ATLAS_INFO', 1)]
atlas_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
mkl_info:
  NOT AVAILABLE

Gordon Mohr

unread,
Sep 11, 2015, 6:09:51 PM9/11/15
to gensim
If you installed recently, you probably have gensim-0.12.1 in combination with scipy-0.16.0. Unfortunately that combination prevents the Cython-accelerated training routines from working. (There should be a log message, when training starts, about the problem.)

The easiest workaround for now is to force the use of scipy-0.15.1 as described at:


- Gordon

Zheng Ye

unread,
Sep 11, 2015, 7:21:19 PM9/11/15
to gen...@googlegroups.com
This is what I am using, but I will look the post first. Thanks.
In [1]: import gensim

In [2]: gensim.__version__
Out[2]: '0.11.1-1'

In [3]: import scipy
scipy

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '0.13.3'


---
Best Regards!

Zheng Ye (叶正)

--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/uqpDwDpIDCg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeffery Yee

unread,
Sep 11, 2015, 8:03:12 PM9/11/15
to gensim
BTW, I didn't see any warning message.

Gordon Mohr

unread,
Sep 12, 2015, 1:00:28 PM9/12/15
to gensim
OK, with those versions, it's not the gensim-0.12.1 & scipy-0.16.0 incompatability. But, you probably still want to be using the latest (0.12.1) version of gensim. 

Also, it's still likely that it's some issue with the C-extensions. What's the value of  `gensim.models.doc2vec.FAST_VERSION)`?

If it's '2', I suggest...

* uninstall gensim
* make sure the machine has the Ubuntu 'build-essentials' and 'python-dev' packages
* install latest gensim 

...and watch very closely for any errors during that process.

- Gordon

Zheng Ye

unread,
Sep 12, 2015, 1:42:21 PM9/12/15
to gen...@googlegroups.com
Hi. Gordon
Thanks for you reply. 

it returns one. 

In [2]: import gensim

In [3]: gensim.models.doc2vec.FAST_VERSION
Out[3]: 1


On Sat, Sep 12, 2015 at 1:00 PM, Gordon Mohr <goj...@gmail.com> wrote:
gensim.models.doc2vec.FAST_VERSION

Gordon Mohr

unread,
Sep 12, 2015, 5:20:49 PM9/12/15
to gensim
That's a good sign, but please be sure you're using the latest gensim (0.12.1). With that, make sure FAST_VERSION is still non-zero. 

Then, when timing your progress, make sure there's no bottleneck due to slow IO (source of the tokens) or any virtual-memory  swapping (as might happen if you've specified a model that's larger than your RAM). 

Finally, maybe the fast-mode is working, but you've chosen particularly challenging training parameters. So if there's still a problem after checking everything above, then let us know your Word2Vec initialization parameters.

- Gordon

Zheng Ye

unread,
Oct 1, 2015, 10:43:10 PM10/1/15
to gen...@googlegroups.com
Hi. Gordon. 
    I got the latest gensim installed. See
In [2]: gensim.__version__
Out[2]: '0.12.2'

In [3]: import scipy

In [4]: scipy.__version__
Out[4]: '0.16.0'

But the speed is still very slow. I checked fast_version can be used. Actually this is the first thing I checked.  The problem is on other machine everything is fine. The dependencies I can think of are exactly the same. On this slow machine, we do have 26 threads running, but all of them are just using 5%~10% cpu, while other machine with acceptable speed are using 100% cpu for each thread. 

  Initialization is as follows.    
model = Doc2Vec(min_count=10, window=10, size=embedding_size, workers=26)
    model.build_vocab(corpus)
    for i in range(train_iter):
        model.train(corpus)
    model.save("./doc2vec.model")

Below are some log you might want to see: 
[word2vec.py:1335 - estimate_memory() ] estimated required memory for 156372 words and 200 dimensions: 38246515600 bytes
[word2vec.py:463 - create_binary_tree() ] constructing a huffman tree from 156372 words
[word2vec.py:487 - create_binary_tree() ] built huffman tree with maximum node depth 26
[word2vec.py:927 - reset_weights() ] resetting layer weights
[word2vec.py:677 - train() ] training model with 26 workers on 156372 vocabulary and 200 features, using sg=0 hs=1 sample=0 and negative=0

[word2vec.py:778 - train() ] PROGRESS: at 30.81% examples, 43653 words/s



On Sat, Sep 12, 2015 at 5:20 PM, Gordon Mohr <goj...@gmail.com> wrote:
maybe the fast-mode is working, but you've chosen particularly challenging training parameters. So if there's still a problem after checking everything above, then let us know your Word2Vec initialization parameters.



Zheng Ye

unread,
Oct 1, 2015, 10:46:34 PM10/1/15
to gen...@googlegroups.com
BTW, I saw this in the group:
"""
 I think 'conda install' will still get an older gensim version; you should prefer the latest, 0.12.1, though see another thread for another potential C-extension-interfering problem that arises with just-released scipy 0.16.0.)
"""
https://groups.google.com/forum/#!searchin/gensim/speed/gensim/fENp96rHVhQ/BrNPlMyTBgAJ

---
Best Regards!

Zheng Ye (叶正)

Gordon Mohr

unread,
Oct 2, 2015, 2:15:35 AM10/2/15
to gensim
If you've got gensim 0.12.2, then there should be no problem with scipy 0.16.0. If FAST_VERSION is not -1, then the optimized code is being used. 

Are you saying that on identical machines – same CPUs, same OS, same RAM, same disk, same python/gensim versions, same virtualization (if relevant), etc – one is processing the exact same dataset much faster? What is the "words/sec" rate on the faster machine?

Comparing the information in 'top' or other activity-tools, while both are processing the same data, might offer some clues as to why one is underperforming. (With other things ruled out, I would pay special attention to potential virtual-memory swapping or IO bottlenecks.)

It's normally optimal to choose a number of worker threads about equal to the number of CPU cores available... so a count of 26 workers is atypical. What motivated that choice?

- Gordon

Zheng Ye

unread,
Oct 2, 2015, 11:21:33 AM10/2/15
to gen...@googlegroups.com
the other machine is different from the slow one. Maybe the speed is not comparable because when I train on the other machine, I use less data due to memory limit. But all other machines (including my laptop) are actually fully using the cpus, while the slow machine use very very little cpu according to htop. 

The slow machine has 32 cores. The reason to use 26 is I don't want to fully occupy the machine in case other people wants to use. I don't think this matters because I also used the c-version with 26 threads, which only takes 8 mins to train exactly the same corpus. The slow machine took 30 hours, which is obviously having problem. 

Thanks for you reply. 

---
Best Regards!

Zheng Ye (叶正)

--

Jianbin Lin

unread,
Dec 14, 2015, 5:44:25 AM12/14/15
to gensim
hi, did you solve you problem? I encounter the same issue here,https://groups.google.com/forum/#!topic/gensim/7YgY9I9ywSQ

在 2015年10月2日星期五 UTC+8下午11:21:33,Jeffery Yee写道:

Zheng Ye

unread,
Jan 28, 2016, 4:15:59 PM1/28/16
to gen...@googlegroups.com
No. I didn't.
But it only happens to some ubuntu machine for some reason.
---
Best Regards!

Zheng Ye (叶正)


Reply all
Reply to author
Forward
0 new messages