Hello everyone,
This might be a stupid question, but I can't figure it out. I am trying to load a local version of the glove embeddings. I have downloaded the model and converted it using this as suggested :
python -m gensim.scripts.glove2word2vec --input glove.42B.300d.txt --ouput glove
When trying to load the converted file, it brings up an error about a Unicode character :
Traceback (most recent call last):
File "analyse.py", line 15, in <module>
model_google = gensim.models.KeyedVectors.load_word2vec_format('embeddings/glove-utf-8', binary=True)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/keyedvectors.py", line 1498, in load_word2vec_format
limit=limit, datatype=datatype)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/utils_any2vec.py", line 382, in _load_word2vec_format
word = utils.to_unicode(b''.join(word), encoding=encoding, errors=unicode_errors)
File "/usr/local/lib/python3.5/dist-packages/gensim/utils.py", line 359, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte
Just for the sake of it I tried converting the output file to utf-8 using iconv but I get the same error.
Any suggestions?
Patrick