Accelerate via CUDA / gpu

Alex Lane

unread,

May 18, 2017, 3:00:25 PM5/18/17

to gensim

Hey all,

I'm a bit lost into my first setup utilizing my GPU and training with gensim. Googling and stackoverflow reveal people looking for tensorflow-gpu, but not much on gensim. How can I utilize my NVIDIA gpu and gensim to improve training times?

Ivan Menshikh

unread,

May 19, 2017, 9:19:31 AM5/19/17

to gensim

Hi Alex,

Please check issue Word2vec to use GPU #449 and this PR [WIP] TensorFlow wrapper for using GPU #1033.

Alex Lane

unread,

May 19, 2017, 1:00:46 PM5/19/17

to gensim

Thanks Ivan,

alright so gensim CPU is best we got so far then?

Alex Lane

unread,

May 19, 2017, 3:17:38 PM5/19/17

to gensim

I am trying to Doc2Vec

Ivan Menshikh

unread,

May 20, 2017, 3:46:51 AM5/20/17

to gensim

Yep, you can read this benchmark

Alex Lane

unread,

May 21, 2017, 5:51:28 PM5/21/17

to gensim

Would a benchmark test on an EVGA 1080ti ftw3 help? I'm building up a tensorflow Word2Vec - Doc2vec program as it hasn't been tested.

Radim Řehůřek

unread,

May 22, 2017, 2:03:12 AM5/22/17

to gensim, Lev Konstantinovskiy, ivan

Absolutely! Thanks for the offer Alex.

A proper, thorough benchmark on some proper HW would be awesome :)

Ivan, Lev -- can you please assist Alex there?

I'm thinking -- let's also throw our AWS and IBM SoftLayer machines at it, and compare CPU/GPU across these managed services as well (cost/benefit/perf on w2v/d2v). Should be an interesting data point for others.

Best,

Radim

Alex Lane

unread,

May 23, 2017, 2:33:02 PM5/23/17

to gensim, l...@rare-technologies.com, mensh...@gmail.com

Thanks so much for all of your and your team's work Radim. I have a simple question. I need to compare certain docvecs with eachother. the tags look like tags=[u'HeaderKey + '_' +str(idx)]. i need to compare docvecs with the same headerkey to eachother. If I can infer_vector() and get a similarity, that would also work, but it appears its still a TODO?

Ivan Menshikh

unread,

May 23, 2017, 3:49:42 PM5/23/17

to gensim, l...@rare-technologies.com, mensh...@gmail.com

You can check this tutorial to see how you can work with tags.

I can infer_vector() and get a similarity, that would also work

You are right, this is more intuitive and clean way

Message has been deleted

Alex Lane

unread,

May 23, 2017, 11:07:05 PM5/23/17

to gensim, l...@rare-technologies.com, mensh...@gmail.com

I can infer_vector() and get a similarity, that would also work

So I can do that? It looks like its still a TODO- model.similarity(model.infer_vector('sentence and words and stuff'),model.infer_vector('another sentence and words and stuff'))

sorry, the tutorial didn't really have what I was looking for.

Gordon Mohr

unread,

May 24, 2017, 1:36:21 AM5/24/17

to gensim, l...@rare-technologies.com, mensh...@gmail.com

You can `infer_vector()` on new texts – but they should be preprocessed/tokenized in the same way that texts were for training the model. (Don't pass strings, as in your post, but lists-of-tokens.)

Also, note that better inference may be achieved with far more `steps` (an optional parameter to `infer_vector()`, default 5) or a lower starting `alpha` (default 0.1, but the same default as used in training, 0.025, may be more appropriate).

- Gordon

Alex Lane

unread,

May 24, 2017, 2:02:05 PM5/24/17

to gensim, l...@rare-technologies.com, mensh...@gmail.com

Thanks Gordon! I'll tokenize and pass that, but my issue is, these have been trained in the model. How do I grab a set of docvecs with similar tags?

Reply all

Reply to author

Forward