Pre-trained Word Embedding

67 views

Skip to first unread message

Cristian Urtado Cunha Florido

unread,

Apr 20, 2022, 11:55:24 AM4/20/22

to Gensim

Hello!

I'm trying to learn how to use word embeddings. I found some pre-trained models to download <http://www.nilc.icmc.usp.br/embeddings>. I ended up downloading a .zip file (glove_s50.zip) with a .txt file (glove_s50.txt) that looks like:

tempo -0.408678 -0.739153 -0.795652 -3.055643 0.446275 0.109852 -0.460806 -0.884676 0.128661 0.472228 -0.416609 0.505957 -0.225755 0.282355 0.349181 -0.349351 -0.037606 -0.595970 -0.420329 1.059616 0.735213 -0.081331 -0.045673 -0.945494 -0.555012 -0.568955 0.381657 -0.453206 -0.283170 1.794502 0.384527 0.413930 0.377526 0.431059 0.592765 -1.214155 0.141168 0.005392 -1.278675 0.524249 0.038683 -0.588643 -0.648583 0.405800 0.930708 -0.052068 0.666735 0.525235 -0.056059 -0.170240

In the website mentioned before, they say to use KeyedVectors from gensim:

Install
pip install gensim==2.0.0
Run
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format(‘model.txt’)

But doing it, I can't compare words and get similarity, for example.

I'd like to use this pre-trained list to choose a word and get the others words by semantic proximity.

Thanks,

Cristian.

Radim Řehůřek

unread,

Apr 23, 2022, 2:40:40 PM4/23/22

to Gensim

Hi Cristian,

1) Try the latest version of gensim. Not 2.0.0 – what one is 5 years old.

2) For the GloVe format, add no_header=True to load_word2vec_format: https://radimrehurek.com/gensim/models/keyedvectors.html?highlight=load_word2vec_format#gensim.models.keyedvectors.KeyedVectors.load_word2vec_format

HTH,

Radim

Reply all

Reply to author

Forward

0 new messages