Need Doc2Vec Example

Varman

unread,

Oct 23, 2015, 2:32:15 PM10/23/15

to gensim

Hi Guys,

I am new to Doc2Vec. Can Anyone recommend me a nice post or example that helps me get started? I found few examples in the google but those articles are outdated.

Thanks,

Varman

Gordon Mohr

unread,

Oct 26, 2015, 1:36:36 PM10/26/15

to gensim

There's an IPython Notebook in gensim that steps through one of the sentiment experiments from the original "Paragraph Vectors" paper. It's a bit advanced in its Python usage, but serves as a working example of gensim's Doc2Vec class and options.

In the gensim install directory, look in `docs/notebooks/doc2vec-IMDB.ipynb`, or you can view it in Github at:

https://github.com/piskvorky/gensim/blob/develop/docs/notebooks/doc2vec-IMDB.ipynb

- Gordon

Kevin L

unread,

Oct 27, 2015, 6:35:46 AM10/27/15

to gensim

Well, here is a short script I used, if that helps...

from gensim import utils
from gensim.models.doc2vec import LabeledSentence
from gensim.models import Doc2Vec
import numpy
import random


sources = {'training.txt':'TRAINING'}
sentences = word2vec.LabeledLineSentence(sources)


model = Doc2Vec(min_count=5, window=8, size=100, sample=1e-4, negative=5, workers=4)
model.build_vocab(sentences.to_array())


sentences_list=sentences.to_array()
Idx=range(len(sentences_list))


for epoch in range(20):
 random.shuffle(Idx)
 perm_sentences = [sentences_list[i] for i in Idx]
 model.train(perm_sentences)
 print(epoch)


model.save('example.model')
model = Doc2Vec.load('example.model')

This way, you say first what parameters your model should have and then you feed it with the training data over a number of epochs (here 20) with a random permutation of your sentences (when in your training data is a sentence/paragraph/document on each line)

Kevin L

unread,

Oct 27, 2015, 6:41:21 AM10/27/15

to gensim

The "LabeledLineSentence" class I took from http://rare-technologies.com/doc2vec-tutorial/ I think...

Varman

unread,

Oct 27, 2015, 5:33:29 PM10/27/15

to gensim

Thanks Kevin. That helped to get started. But now my doubt is.

I trained the model with 100 documents but i want to find the similarity of a document that is separate from the documents that were used to train the model. How do i do that?

This is what i am doing. Am i doing it in correct way? Any advice will be helpful.

train_model = gensim.models.Doc2Vec(size=300, window=10, min_count=1, workers=11,alpha=0.025, min_alpha=0.025) # use fixed learning rate

train_model.build_vocab(train_sentences)

for epoch in range(10):

train_model.train(train_sentences)

train_model.alpha -= 0.002 # decrease the learning rate

train_model.min_alpha = model.alpha # fix the learning rate, no deca

train_model.train(train_sentences)

test_model = gensim.models.Doc2Vec(test_senteces,size=300, window=10, min_count=1, workers=11,alpha=0.025, min_alpha=0.025)

print(model.docvecs.most_similar([test_model.docvecs[0]]))

Thanks,

Varman

On Friday, October 23, 2015 at 11:32:15 AM UTC-7, Varman wrote:

Kevin L

unread,

Oct 28, 2015, 7:13:07 AM10/28/15

to gensim

Sorry, I'm also only a beginner in NLP and neural networks...

I never used the alpha value.

When I want to calculate a similarity (word2vec or doc2vec model) I use the n_similarity(ws1, ws2) function in the way:

>>> trained_model.n_similarity(['sushi', 'shop'], ['japanese', 'restaurant'])
0.61540466561049689

But I don't know how to get the new trained document/paragraph vectors in doc2vec model or if it's even possible to get them or reasonable to use them...

Aries Fitriawan

unread,

Sep 22, 2016, 6:05:27 AM9/22/16

to gensim

Thank you for your script. It was clearly understandable script. Unfortunately I have an error :

AttributeError: module 'gensim.models.word2vec' has no attribute 'LabeledLineSentence'

Any update for this code?

Lev Konstantinovskiy

unread,

Sep 22, 2016, 9:39:43 AM9/22/16

to gensim

Hi Aries,

Thanks for reporting it - there is an easier intro to doc2vec on our tutorials page.

See Doc2vec Quick Start on Lee Corpus - it has a smaller dataset than IMDB so it will give you results even on a laptop.

Regards

Lev

Veronica Cheng

unread,

Dec 12, 2016, 10:12:46 AM12/12/16

to gensim

Hi Aries, he mentioned that: The "LabeledLineSentence" class I took from http://rare-technologies.com/doc2vec-tutorial/ I think...

Reply all

Reply to author

Forward