Accelerate via CUDA / gpu

786 views
Skip to first unread message

Alex Lane

unread,
May 18, 2017, 3:00:25 PM5/18/17
to gensim
Hey all,

I'm a bit lost into my first setup utilizing my GPU and training with gensim. Googling and stackoverflow reveal people looking for tensorflow-gpu, but not much on gensim. How can I utilize my NVIDIA gpu and gensim to improve training times?

Ivan Menshikh

unread,
May 19, 2017, 9:19:31 AM5/19/17
to gensim
Hi Alex,

Alex Lane

unread,
May 19, 2017, 1:00:46 PM5/19/17
to gensim
Thanks Ivan,

alright so gensim CPU is best we got so far then?

Alex Lane

unread,
May 19, 2017, 3:17:38 PM5/19/17
to gensim
I am trying to Doc2Vec

Ivan Menshikh

unread,
May 20, 2017, 3:46:51 AM5/20/17
to gensim
Yep, you can read this benchmark

Alex Lane

unread,
May 21, 2017, 5:51:28 PM5/21/17
to gensim
Would a benchmark test on an EVGA 1080ti ftw3 help? I'm building up a tensorflow Word2Vec - Doc2vec program as it hasn't been tested.

Radim Řehůřek

unread,
May 22, 2017, 2:03:12 AM5/22/17
to gensim, Lev Konstantinovskiy, ivan
Absolutely! Thanks for the offer Alex.

A proper, thorough benchmark on some proper HW would be awesome :)

Ivan, Lev -- can you please assist Alex there?

I'm thinking -- let's also throw our AWS and IBM SoftLayer machines at it, and compare CPU/GPU across these managed services as well (cost/benefit/perf on w2v/d2v). Should be an interesting data point for others.

Best,
Radim

Alex Lane

unread,
May 23, 2017, 2:33:02 PM5/23/17
to gensim, l...@rare-technologies.com, mensh...@gmail.com
Thanks so much for all of your and your team's work Radim. I have a simple question. I need to compare certain docvecs with eachother. the tags look like tags=[u'HeaderKey + '_' +str(idx)]. i need to compare docvecs with the same headerkey to eachother. If I can infer_vector() and get a similarity, that would also work, but it appears its still a TODO?

Ivan Menshikh

unread,
May 23, 2017, 3:49:42 PM5/23/17
to gensim, l...@rare-technologies.com, mensh...@gmail.com
You can check this tutorial to see how you can work with tags.
 
I can infer_vector() and get a similarity, that would also work
You are right, this is more intuitive and clean way
Message has been deleted

Alex Lane

unread,
May 23, 2017, 11:07:05 PM5/23/17
to gensim, l...@rare-technologies.com, mensh...@gmail.com
I can infer_vector() and get a similarity, that would also work

So I can do that? It looks like its still a TODO- model.similarity(model.infer_vector('sentence and words and stuff'),model.infer_vector('another sentence and words and stuff'))

sorry, the tutorial didn't really have what I was looking for.

Gordon Mohr

unread,
May 24, 2017, 1:36:21 AM5/24/17
to gensim, l...@rare-technologies.com, mensh...@gmail.com
You can `infer_vector()` on new texts – but they should be preprocessed/tokenized in the same way that texts were for training the model. (Don't pass strings, as in your post, but lists-of-tokens.) 

Also, note that better inference may be achieved with far more `steps` (an optional parameter to `infer_vector()`, default 5) or a lower starting `alpha` (default 0.1, but the same default as used in training, 0.025, may be more appropriate). 

- Gordon

Alex Lane

unread,
May 24, 2017, 2:02:05 PM5/24/17
to gensim, l...@rare-technologies.com, mensh...@gmail.com
Thanks Gordon! I'll tokenize and pass that, but my issue is, these have been trained in the model. How do I grab a set of docvecs with similar tags?
Reply all
Reply to author
Forward
0 new messages