word2vec loss per epoch increasing, but model training makes sense

311 views
Skip to first unread message

amjass

unread,
Feb 8, 2022, 9:23:29 AM2/8/22
to Gensim
Hi, 

I have seen similar posts to this one, but none really give me an idea of exactly what is happening. 
I am using a call back to print the latest training loss at each epoch.

My corpus of text is a vocabulary of size 62000 across 102000 individual texts (tokenized in a list of lists, each list being an text/sentence)

training proceeds without issue and the model actually makes sense after training - semantically, i see associations that are correct. 

I can also validate that if I instantiate an empty w2v model and then build vocabulary, the wv.most_similar for a word i know are nonsensical which is a good sanity check - so training works o the corpus. Below however are the increasing losses that i see as training progresses (12 epochs shown as example) -

Loss after epoch 0: 3143296.25 Loss after epoch 1: 5722781.0 Loss after epoch 2: 8161942.0 Loss after epoch 3: 10323672.0 Loss after epoch 4: 12479485.0 Loss after epoch 5: 14592014.0 Loss after epoch 6: 16683512.0 Loss after epoch 7: 18451656.0 Loss after epoch 8: 20224652.0 Loss after epoch 9: 21987936.0 Loss after epoch 10: 23745272.0 Loss after epoch 11: 25497322.0 
Loss after epoch 12: 27236514.0 

I am very confused by this pattern, and if the model is finding correct associations, its loss per example should be decreasing in order to be find correct word associations? or is it because it also gets many other word pairs incorrectly when negative sampling is employed?

thank you for any help in advance!

Gordon Mohr

unread,
Feb 8, 2022, 1:47:32 PM2/8/22
to Gensim
The loss-tallying in Gensim's `Word2Vec` is pretty buggy and unlike what would be useful: a reliable report of per-epoch or per-batch loss. 

An open issue describing some of the problems in tallying/interpretation, including links to other more specific issues, can be viewed at:


- Gordon

amjass

unread,
Feb 8, 2022, 3:43:44 PM2/8/22
to Gensim
thank you for clarfiying! i will ignore this - I was looking at way of using traditional metrics as one might with a NN to compare performance across different hyperparameters 
Reply all
Reply to author
Forward
0 new messages