How to evaluate the quality of a word embedding?

andrew

unread,

Dec 3, 2020, 4:18:51 AM12/3/20

to Gensim

Dear all,

I am training a word2vec model using GenSim 4.

My corpora is relatibely small.

How can I evaluate the quality of the model I have obtained?

In theory word2vec is a NN model, so I think I can use accuracy on an evaluation set or something like that? I am not sure.

Maybe I can use the performance on a standard task?

Is there a standardized way to measure the quality of a word embedding?

Best regards

Andrey Kutuzov

unread,

Dec 3, 2020, 10:46:44 AM12/3/20

to gen...@googlegroups.com

Hi,

The natural choice would be semantic similarity and analogies tasks. See
more here:
https://aclweb.org/aclwiki/SimLex-999_(State_of_the_art)
https://aclweb.org/aclwiki/Analogy_(State_of_the_art)

Gensim provides evaluate_word_pairs() and evaluate_word_analogies()
methods for each of these tasks correspondingly.

> --
> You received this message because you are subscribed to the Google
> Groups "Gensim" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gensim+un...@googlegroups.com
> <mailto:gensim+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gensim/cec2ab51-47cb-4876-b9fb-e0694cf52b6cn%40googlegroups.com
> <https://groups.google.com/d/msgid/gensim/cec2ab51-47cb-4876-b9fb-e0694cf52b6cn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Solve et coagula!
Andrey

Gordon Mohr

unread,

Dec 4, 2020, 7:40:06 PM12/4/20

to Gensim

There's no universal measure of 'quality' - only usefulness for a specific task. So, you should evaluate your word-embeddings on your intended task, or your best simulation/approximation of the full task.

The initial word2vec paper scored vectors based on a solving-analogies-via-word-vector-algebra task - and you can do that for your own word-vectors, too, using built-in methods of Gensim like: https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.KeyedVectors.evaluate_word_analogies

But unless that exact same style of analogy is your real task, word-vectors that do best on that may not do as well on other things. For example, I've seen models that are the top-scorer on an analogies-evaluation score worse when providing input for a classifier. Even within the analogies, you can see that different metaparameter choices sometimes improve one kind of analogy, while worsening others.

Reply all

Reply to author

Forward