Hi everybody,
in order to use the pre-trained word and phrase vectors's Google news in my AWS Ubuntu C3 instance, I downloaded the whole big file into my windows laptop, and splitted it into smaller files (10 Mb each).
I uploaded the very first of these smaller files into my AWS Ubuntu C3 instance, to make a trial load of the model.
python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gensim
>>> import logging
>>> logging.basicConfig(
... format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
model = Word2Vec.load_word2vec_format(
... '/home/ubuntu/ggc/prove/DCNN/G.bin.001')
2014-09-16 11:33:19,068 : INFO : loading projection weights from /home/ubuntu/ggc/prove/DCNN/G.bin.001
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/gensim-0.9.0-py2.7.egg/gensim/models/word2vec.py", line 422, in load_word2vec_format
vocab_size, layer1_size = map(int, header.split()) # throws for invalid file format
ValueError: invalid literal for int() with base 10: '\x1f\x8b\x08\x08\x07I\x17T\x02'
>>> model = Word2Vec.load_word2vec_format(
... '/home/ubuntu/ggc/prove/DCNN/G.bin.001')
2014-09-16 11:44:09,924 : INFO : loading projection weights from /home/ubuntu/ggc/prove/DCNN/G.bin.001
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/local/lib/python2.7/dist-packages/gensim-0.9.0-py2.7.egg/gensim/models/word2vec.py", line 422, in load_word2vec_format
vocab_size, layer1_size = map(int, header.split()) # throws for invalid file format
ValueError: invalid literal for int() with base 10: '\x1f\x8b\x08\x08\x07I\x17T\x02'
What am I wrongly doing?
and what can I do to solve the problem?
Looking forward to your helpfull hints.
Kind regards.
Marco