Accessing keys in 4.0

239 views
Skip to first unread message

Swetha Pola

unread,
Apr 3, 2021, 3:07:27 AM4/3/21
to Gensim
Hello,

I used to be able to run this code to create an embeddings matrix.

# build the embedding matrix and the word-to-id map: 

for i, word in enumerate(model.vocab.keys()): 
  embedding_vector = model[word] 
  if embedding_vector is not None
  # words not found in embedding index will be all-zeros. 
  embedding_matrix[i] = embedding_vector vocab_dict[word] = i

However, now there is no .keys() or .vocab attribute on either a Word2Vec object or KeyedVector object.

Please suggest.

Many thanks.

Screen Shot 2021-04-03 at 12.06.05 AM.png

Radim Řehůřek

unread,
Apr 3, 2021, 4:03:58 AM4/3/21
to Gensim
Hi,

check out the Migration notes – namely item #4:

HTH,
Radim

Gordon Mohr

unread,
Apr 3, 2021, 1:26:32 PM4/3/21
to Gensim
Further, if your `model` is an instance of `KeyedVectors`:

* There's no need to copy each vector individually, row by row, out into your own `embedding_matrix` structure. The instance's `model.vectors` is already exactly what you need. (If you need a separate structure to modify without affecting the original, you could `.copy()` it.)
* The {word: index} mapping you're creating in your `vocab_dict` already exists, in Gensim-4.0.0 `KeyedVector` instance's `model.key_to_index` property already has a dict which maps {key: index}, where index is the relevant row in `model.vectors`.

So, while your code must adapt, it might only need to be:

    embedding_matrix = model.vectors
    vocab_dict = model.key_to_index

- Gordon

Swetha Pola

unread,
Apr 3, 2021, 5:23:14 PM4/3/21
to gen...@googlegroups.com
Hi all,

This is extremely helpful. Thank you very much! 

I have one more question; I have tried searching stackoverflow but cannot resolve this issue with the new update.

I've been struggling for some reason to structure the inputs to my CNN correctly. Currently my X data is in the form of an array of lists of ids (which correspond to embeddings).

My y labels are in the form of an array of lists of binary outcome variables (ie: [[0],[1]]).

I keep hitting the error ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

I have tried converting the data types to be 2d arrays, 2d lists, array of lists, list of arrays and keep getting the same error. What am I doing wrong?


Many thanks in advance.



--
You received this message because you are subscribed to a topic in the Google Groups "Gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/57twzMCd4mE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/ed54876c-0d0d-479b-9015-6701597402e1n%40googlegroups.com.

Ben Reaves

unread,
Apr 4, 2021, 1:53:49 AM4/4/21
to gen...@googlegroups.com
Because it's converting numpy to tensor, I think it's related to your X data not y. And it's complaining that your numpy is a list. I guess it wants a numpy array, not a list. See https://stackoverflow.com/questions/58636087
"The problem's rooted in using lists as inputs, as opposed to Numpy arrays; Keras/TF doesn't support former"

hope that helps,

Ben

You received this message because you are subscribed to the Google Groups "Gensim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gensim+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gensim/CA%2BHm9gp%3D9tFKDvUJo%3DMB7cjjjjPc298ykz17ADmyjntpaZKeww%40mail.gmail.com.


--
_____________________________________________________________________
Ben Reaves

--
Reply all
Reply to author
Forward
0 new messages