Doc2vec - diff between PV-DM & PV-DBOW ?

4,267 views
Skip to first unread message

Enzo

unread,
Dec 18, 2015, 7:27:27 AM12/18/15
to gensim
(please note that this is cross posted on the word2vec toolkit google group - apologies if it is not the done thing)

I want to train a model with Doc2Vec - of course using gensim.

I follow all the steps, e.g. build a model with
model = Doc2Vec(dm=0, size=vecsize,window=8, negative=5, hs=0, min_count=1, workers=cores)

And I train it.


If I've used PV-DM (the default, with dm=1), I can still use most.similar.word and get a "sensible" answer (i.e. a similar answer that I would get using Word2Vec).

If I've used PV-DBOW (as above, with dm=0),  most.similar.word  gives very different answers that with PV-DM and in general the words do not seem to be very "similar".

Why?


I have the impression I'm missing something obvious (so apologies if the answer is obvious: it it not so for me!)....

Gordon Mohr

unread,
Dec 18, 2015, 1:24:59 PM12/18/15
to gensim
As described by the paper, PV-DBOW doesn't involve any NN-input-vectors per-word. There's one vector for the text, which alone is used to predict each individual word. 

So while word-vectors still get randomly initialized (simply because of the way gensim code is shared with word2vec and other modes), they're not updated during training at all, and are still at their random initial values at the end of training. 

Now, the DBOW training is very analogous to the Word2Vec "skip-gram" mode, but using vector(s) for the text-as-a-whole to predict target words, rather than just vector(s) for nearby-words... so it is very easy to combine with skip-gram word training. if you need the word-vectors too. You can turn this on with the `dbow_words=1` Doc2Vec initialization parameter. 

Note it will slow training noticeably – roughly in proportion to your `window` value. (The `window` parameter only has any effect when doing context_word_vec(s) -> target_word training, not plain DBOW doc-vec -> target_word training.) 

In comparison, Mikolov's demo '-sentence-vectors' patch to word2vec.c always includes word-training, because it uses an artificial pseudo-word at the start of each example, special-cased to participate in every 'window', as its sentence-vector-tag.

- Gordon
Reply all
Reply to author
Forward
0 new messages