Prediction with masking in RNN

570 views
Skip to first unread message

Brian Miner

unread,
May 21, 2016, 9:28:25 PM5/21/16
to Keras-users


I am looking at an example below of using an LSTM to predict the next element in a sequence. I am padding the X and y of the train and test as one sequence is length 2 and the other is length 3. I think this is the proper way of doing this (please let me know if not!) and I have a question on using a mask of zero in the embedding layer. In the test prediction there is a prediction for the mask zero (the first element of the first test sequence in X_test):

[  4.75309849e-01,   5.24690151e-01]

Should this prediction be occurring if the 0 really is a mask? Or will a prediction still occur but we need need to ignore it?






from keras.models import Sequential  
from keras.layers.core import TimeDistributedDense, Activation, Dropout  
from keras.layers.recurrent import GRU, LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from keras.optimizers import RMSprop
import numpy as np
import time

#how long to make the sequences, will pad or truncate in the pad_sequence step
maxlen
= 3

#mini-batch size
batch_size
= 1

#vocab size (vocab +1 for 0 mask)
nb_word
= 4  #since 3 is the largest number below

#dim_size of the word embeddings
dim_size_embed
=128

#number of units in the LSTM hidden layer
hidden_units
=128

#this is the length I believe of the softmax
nb_tag
= 2

# The X is made up of a list of lists, where the inner lists are the sequences, for example the first one is [1,2]
# and the second sequence is [1,3,2]

X_train
= [[1,2],[1,3,2]] #two sequences, one is [1,2] and another is [1,3]


#list of list of lists
# the inner list is a one-hot encoding of the output
# the second inner list the sequence for the specific training examle
# so, here we have 2 inner lists, related to each of the two input sequences
# the first is [[0,1],[1,0]] and the second is [[0,1],[1,0],[1,0]]
# within each we have the target for step 1,2 and 3 one-hot encoded, so the target for sequence 1 step 1 is [0,1]

Y_train
= [[[0,1],[1,0]],[[0,1],[1,0],[1,0]]] #the output should be 3D and one-hot for softmax output with categorical_crossentropy
Y_train
= sequence.pad_sequences(Y_train, maxlen=maxlen) #pads the list so there is one added to the first sequence that is [0,0]



X_test
= [[1,2],[1,3,1]]
Y_test
= [[[0,1],[1,0]],[[0,1],[1,0],[0,1]]]

X_train
= sequence.pad_sequences(X_train, maxlen=maxlen)
X_test
= sequence.pad_sequences(X_test, maxlen=maxlen)
Y_test
= sequence.pad_sequences(Y_test, maxlen=maxlen)
         
         
Y_train
= np.asarray(Y_train, dtype='int32')
Y_test
= np.asarray(Y_test, dtype='int32')

model
= Sequential()
model
.add(Embedding(nb_word, dim_size_embed,mask_zero=True))
model
.add(LSTM(hidden_units, return_sequences=True))
model
.add(TimeDistributedDense(nb_tag))
model
.add(Activation('softmax'))

rms
= RMSprop()
model
.compile(loss='categorical_crossentropy', optimizer=rms)

model
.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=100, show_accuracy=True)
time
.sleep(0.1) #https://github.com/fchollet/keras/issues/2110
res
= model.predict(X_test)
res

array([[[  4.75309849e-01,   5.24690151e-01],
        [  3.14613567e-06,   9.99996901e-01],
        [  9.99999881e-01,   7.44754658e-08]],

       [[  3.14613567e-06,   9.99996901e-01],
        [  9.99999881e-01,   7.96038861e-08],
        [  9.99953151e-01,   4.68720391e-05]]])
Reply all
Reply to author
Forward
0 new messages