Pointers on outputting Embedding for visualization

520 views
Skip to first unread message

Shane McElligott

unread,
Nov 10, 2015, 1:16:41 PM11/10/15
to Keras-users
Can anyone provide some direction on how to extract embeddings from a trained model for visualization with t-SNE, etc. ?

I naively assume you start with save_weights but my thought process gets lost after that.  

mgarn...@gmail.com

unread,
Nov 11, 2015, 7:27:14 PM11/11/15
to Keras-users
It may depend on your model as to what kind of information you can get out of it. You should be able to "chop" your trained network off at the encoded layer, run all of your inputs trough and collect their encodings. Run those encodings through t-SNE and reduce the dimensions down to 2D or 3D. Then use matplotlib to plot your results. 

Try making a copy of your original network but literally remove every layer past the encoded layer.
Use a modified version of the model.load_weights function to only load the weights up to the encoded layer:

def load_weights(self, filepath):
        # Loads weights from HDF5 file
        import h5py
        f = h5py.File(filepath)
        for k in range(NUM_LAYERS_TO_LOAD):
            g = f['layer_{}'.format(k)]
            weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])]
            self.layers[k].set_weights(weights)
        f.close()


I'm pretty new to Keras, but I know this method works in the Neon framework, so I hope I'm not steering you wrong.
What's the topology of the network are you using to generate your embeddings? Maybe we can whip up some code if we know what you're dealing with.

Shane McElligott

unread,
Nov 13, 2015, 4:04:57 AM11/13/15
to Keras-users, mgarn...@gmail.com
This is helpful info I appreciate it.  My model is a merged model but I created a stand-alone topology for the NLP RNN so we can hack it.

model = Sequential()
model.add(Embedding(vocab_size,128,mask_zero=True))
model.add(rnn.JZS3(128, return_sequences=False))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))

X_text shape is (2388, 2046)
vocab size is 28294
Y shape is (2388, 2)

Thanks again

mgarn...@gmail.com

unread,
Nov 13, 2015, 1:46:57 PM11/13/15
to Keras-users, mgarn...@gmail.com
Sweet! Glad it was useful info :) 
It looks like you're already outputting in 2 dimensions btw so you shouldn't need to use t-SNE at all. Just plot the output of your network. 
Hope you get some cool looking data!

Shane McElligott

unread,
Nov 14, 2015, 12:17:22 AM11/14/15
to Keras-users, mgarn...@gmail.com
Hope so.  Thanks.  I was thinking t-SNE for visualizing the word vector and relationships.  but there is definitely alto to be explored in terms of output.  It will be interesting I'm sure I'll learn something unexpected.

Marcos Treviso

unread,
Nov 14, 2015, 10:12:34 AM11/14/15
to Keras-users
I don't know if you already fixed this, but here is the way that I'm doing:

from matplotlib import pyplot
from sklearn.manifold import TSNE

def plot_embeddings(embeddings, names):
    model
= TSNE(n_components=2, random_state=0)
   
vectors = model.fit_transform(embeddings)
    x
, y = vectors[:, 0], vectors[:, 1]
    fig
, ax = pyplot.subplots()
    ax
.scatter(x, y)
   
for i, tname in enumerate(names):
        ax
.annotate(tname, (x[i], y[i]))
    pyplot
.show()



and for recover the weights of Embedding layer, depends of your model. But suppose that is a simple model like:: Sequential(Embedding + Flatten + Dense), then you can do:

embeddings = model.layers[0].get_weights()[0]
names
= list(vocabulary.keys())
plot_embeddings
(embeddings, names)


Shane McElligott

unread,
Nov 14, 2015, 3:50:39 PM11/14/15
to Keras-users
This is fantastic.  Thank you.
Reply all
Reply to author
Forward
0 new messages