Freebase-vector

626 views
Skip to first unread message

Marco Ippolito

unread,
Oct 18, 2014, 6:36:58 AM10/18/14
to gen...@googlegroups.com
Hi all,
while with GoogleNews-vector I'm doing fine, it seems that I do not find any words with Freebase-vector.
What am I doing wrong?

Is there a way to print the list of all words (just the word-name, not the entire vector) contained in Freebase-vector and in GoogleNews-vector?

For example:

marco@marco-All-Series:~/CNN_Tut$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import gensim, logging
>>> logging.basicConfig(
... format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> model_g = Word2Vec.load_word2vec_format(
... '/home/marco/CNN_Tut/GoogleNews-vectors-negative300.bin.gz', binary=True)
2014-10-18 12:26:52,354 : INFO : loading projection weights from /home/marco/CNN_Tut/GoogleNews-vectors-negative300.bin.gz
2014-10-18 12:28:14,986 : INFO : loaded (3000000, 300) matrix from /home/marco/CNN_Tut/GoogleNews-vectors-negative300.bin.gz
2014-10-18 12:28:14,988 : INFO : precomputing L2-norms of word weight vectors
>>> print model2['water']

>>> print model_g['water']
[ -5.83994500e-02   5.27478904e-02   4.50240932e-02  -6.89490288e-02
  -4.31402400e-02   2.87287612e-03  -1.01257116e-03  -7.94986114e-02
   3.57932132e-03   3.84306051e-02   1.62953306e-02  -3.18371207e-02
.....   
while with Freebase:
>>> model = Word2Vec.load_word2vec_format(
... '/home/marco/CNN_Tut/freebase-vectors-skipgram1000.bin.gz', binary=True)
2014-10-18 11:28:45,368 : INFO : loading projection weights from /home/marco/CNN_Tut/freebase-vectors-skipgram1000.bin.gz
2014-10-18 11:29:40,085 : INFO : loaded (1422903, 1000) matrix from /home/marco/CNN_Tut/freebase-vectors-skipgram1000.bin.gz
2014-10-18 11:29:40,085 : INFO : precomputing L2-norms of word weight vectors

>>> print model['water']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 722, in __getitem__
    return self.syn0[self.vocab[word].index]
KeyError: 'water'

Looking forward to your kind help.
Kind regards.
Marco

Marco Ippolito

unread,
Oct 18, 2014, 7:23:23 AM10/18/14
to gen...@googlegroups.com
Hi all,
just to keep you informed about the progress made.

I uploaded as model the second freebase file present in https://code.google.com/p/word2vec/
so...the file indicated as "using the deprecated /en/ naming (more easily readable)"

in this case it works fine, even if, it's not such a nice thing to use "/en/word_to_look_for":
>>> model_f2 = Word2Vec.load_word2vec_format(
... '/home/marco/CNN_Tut/freebase-vectors-skipgram1000-en-2.bin.gz', binary=True)
2014-10-18 13:12:34,791 : INFO : loading projection weights from /home/marco/CNN_Tut/freebase-vectors-skipgram1000-en-2.bin.gz
2014-10-18 13:13:41,177 : INFO : loaded (1422903, 1000) matrix from /home/marco/CNN_Tut/freebase-vectors-skipgram1000-en-2.bin.gz
2014-10-18 13:13:41,177 : INFO : precomputing L2-norms of word weight vectors
>>> model_f2["/en/water"]
array([  1.23640141e-02,  -1.21191824e-02,   6.56149685e-02,
         4.86603519e-03,  -4.65180725e-02,   1.60670979e-03,
        -1.24252215e-02,   2.20348756e-03,   9.18119866e-03,

But the problem persists with the first downloadable freebase file.

Any hints to solve it?
Kind regards.
Marco
Reply all
Reply to author
Forward
0 new messages