Using data other than words

369 views

Skip to first unread message

Dan Howarth

unread,

Nov 2, 2015, 7:18:19 PM11/2/15

to gensim

Hello all,

I will refer to http://arxiv.org/pdf/1405.4053v2.pdf in this post

In figure 2 there is a W matrix. Could someone please explain how the doc2vec algorithm interacts with this matrix during learning?

(1) Does doc2vev learn W if I 'preseed' the model with vectors from word2vec?

(2) Does doc2vec learn W if I don't preseed the model with vectors?

I am asking because I would like to use non-word time series data, e.g. videos. I was thinking of creating a vocab that maps a string identifying the video and time point to a vector that is the video at that time point. So basically a column in W would correspond to a still image in the video.

However, if doc2vec 'better learns' W during training then that obviously won't work.

Thanks,

Dan Howarth

Gordon Mohr

unread,

Nov 10, 2015, 7:15:07 PM11/10/15

to gensim

On Monday, November 2, 2015 at 4:18:19 PM UTC-8, Dan Howarth wrote:

Hello all,

I will refer to http://arxiv.org/pdf/1405.4053v2.pdf in this post

In figure 2 there is a W matrix. Could someone please explain how the doc2vec algorithm interacts with this matrix during learning?

The W matrix is essentially the `syn0` of the gensim code (and the word2vec.c code on which it was based). It supplies the (in-training) word vectors that start the predictive neural-net's forward propagation. When its predictions are incrementally corrected, the errors back-propagate to W, improving them slightly.

(This is the case for the training mode that uses word vectors – "DM" – or if doing simultaneous skip-gram word2vec training during "DBOW" training. It's also possible to use DBOW in a mode that neither uses nor creates word vectors – figure 3 of the paper.)

(1) Does doc2vev learn W if I 'preseed' the model with vectors from word2vec?

(2) Does doc2vec learn W if I don't preseed the model with vectors?

W is usually seeded with low-magnitude random vectors at the start of training. Seeding with word vectors from other training runs is never necessary.

Still, many seem to drawn to the possibility and it *might* give the model a head-start on meaningful alignments of wordvecs/docvecs. Unless you lock the seeded vectors against normal back-propagation (an experimental option in gensim, via the `syn0_lockf` array), they'll continue to be adjusted during training.

My intuition is that with each additional pass over the new data, the influence of any preseeding is further diluted; with enough passes, the model should settle on values that are optimal for the currently-presented data, with the influence of preseeding should become arbitrarily small.

I am asking because I would like to use non-word time series data, e.g. videos. I was thinking of creating a vocab that maps a string identifying the video and time point to a vector that is the video at that time point. So basically a column in W would correspond to a still image in the video.

However, if doc2vec 'better learns' W during training then that obviously won't work.

This is an interesting idea, and you could lock these pseudo-words against normal training... but doing so would prevent them from adjusting their positions in the shared-coordinate space to be oriented, in some useful way, with the other words/pseudowords in the same training session. (Such learned-orientations are the heart of word2vec/doc2vec, as far as I'm concerned.)

From my understanding of other work in image-to-language mappings – such as http://cs.stanford.edu/people/karpathy/deepimagesent/ – I doubt a raw still image would be of any use: you'd need to extract other higher-level features, perhaps via a deeper network whose output is the vector that slots into W. (And perhaps in turn, that upstream network could be influenced by the backprop training on W.)

Good luck, hope this helps!

- Gordon

Thanks,
Dan Howarth

Reply all

Reply to author

Forward

0 new messages