word2vec_dict={} for i in model.wv..vocab.keys(): try: word2vec_dict[i]=model[i] except: pass
X = np.array([i.T for i in word2vec_dict.itervalues()])
When using this as X my results are kind of good, even after multiple testing.
But I still don't understand the difference between both X. The ndarray and the vectors have the same shape, but the values are different.
Could somebody with more brainpower explain this to me? =)
from gensim.models import Word2Vecimport numpy as np
text = [["a", "b", "b", "a"], ["a", "b", "a", "c", "a"], ["a"] * 4, ["b"] * 4]
model = Word2Vec(sentences=text, size=30, negative=2, window=1, iter=500, min_count=1)
word2vec_dict = {}words = model.wv.index2word # order from model.wv.syn0
for i in words: word2vec_dict[i] = model[i]
X = np.array([word2vec_dict[i].T for i in words])
assert model.wv.syn0.shape == X.shapenp.testing.assert_almost_equal(X, model.wv.syn0)word2vec_dict={}
for i in modelC.wv.vocab.keys():
try:
word2vec_dict[i]=modelC.wv[i]
except:
pass
X = np.array([i.T for i in six.itervalues(word2vec_dict)])
Y= modelC.wv.syn0
X[0]
Out[43]:
array([ 0.18384653, -0.02168597, 0.16721378, ..., 0.50958902,
-0.70173872, -0.02091845], dtype=float32)
Y[0]
Out[44]:
array([-0.00280283, 0.00300408, 0.00404751, ..., 0.00016425,
-0.00118467, 0.00060632], dtype=float32)kmeans = KMeans().fit(X)
labels = kmeans.labels_
vocab = list(modelC.wv.vocab)
clusters = [list(a) for a in zip(vocab, labels)]