Loading Glove using Gensim

284 views
Skip to first unread message

Patrick Drouin

unread,
Dec 11, 2019, 8:48:58 PM12/11/19
to Gensim
Hello everyone,

This might be a stupid question, but I can't figure it out. I am trying to load a local version of the glove embeddings. I have downloaded the model and converted it using this as suggested  :

python -m gensim.scripts.glove2word2vec --input glove.42B.300d.txt --ouput glove

When trying to load the converted file, it brings up an error about a Unicode character :

Traceback (most recent call last):
  File "analyse.py", line 15, in <module>
    model_google = gensim.models.KeyedVectors.load_word2vec_format('embeddings/glove-utf-8', binary=True)
  File "/usr/local/lib/python3.5/dist-packages/gensim/models/keyedvectors.py", line 1498, in load_word2vec_format
    limit=limit, datatype=datatype)
  File "/usr/local/lib/python3.5/dist-packages/gensim/models/utils_any2vec.py", line 382, in _load_word2vec_format
    word = utils.to_unicode(b''.join(word), encoding=encoding, errors=unicode_errors)
  File "/usr/local/lib/python3.5/dist-packages/gensim/utils.py", line 359, in any2unicode
    return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 0: invalid start byte

Just for the sake of it I tried converting the output file to utf-8 using iconv but I get the same error.

Any suggestions?

Patrick

Andrey Kutuzov

unread,
Dec 11, 2019, 8:53:46 PM12/11/19
to gen...@googlegroups.com
Hi Patrick,

Use "binary=False".
> --
> You received this message because you are subscribed to the Google
> Groups "Gensim" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to gensim+un...@googlegroups.com
> <mailto:gensim+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gensim/c6a2307e-6d26-40fb-93a4-570a619d0643%40googlegroups.com
> <https://groups.google.com/d/msgid/gensim/c6a2307e-6d26-40fb-93a4-570a619d0643%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Solve et coagula!
Andrey

Patrick Drouin

unread,
Dec 11, 2019, 9:07:02 PM12/11/19
to Gensim
I knew it was stupid... Duh. Of course, it is a text file!
Thanks Andrey!
Reply all
Reply to author
Forward
0 new messages