Dave Challis
unread,Jul 29, 2015, 5:35:19 AM7/29/15Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to gen...@googlegroups.com
Just wondering if anyone could share some of the training speeds
they've been seeing for doc2vec?
It seems faily slow in my environment, so just wanted to check whether
there might be something wrong with my setup, or whether it's the sort
of performance I should be expecting.
Running on an Ubuntu host with 16 cores and ~64Gb RAM (with numpy
compiled against openblas), I'm running doc2vec with:
'iter': 10,
'size': 1000,
'alpha': 0.025,
'window': 8,
'min_count': 5,
'max_vocab_size': 2e8,
'sample': 0,
'seed': 1,
'min_alpha': 1e-4,
'dm': 1,
'hs': 1,
'negative': 0,
'dbow_words': 0,
'dm_mean': 0,
'dm_concat': 0,
'dm_tag_count': 1,
'workers': 1
Gensim's logs show training going at ~16500 words/s, e.g.:
INFO:gensim.models.word2vec:PROGRESS: at 52.00% examples, 16536 words/s
Increase the above to 'workers=8' results in only a tiny speedup to
~18500 words/s, e.g.:
INFO:gensim.models.word2vec:PROGRESS: at 0.47% examples, 18310 words/s
So I was wondering, do these speeds seem about normal? Also, why
doesn't increasing the number of workers result in much performance
gain? (or is the words/s metric per worker rather than overall?)
Thanks,
Dave