I am trying to do classification of sequences using an LSTM layer in my network. The sequences in my data are of different lengths.
I have defined my network as follows:
val lstm = LSTM(
numFeatures,
lstmLayerSize
)
val recurrent = Recurrent(maskZero = true)
recurrent.add(lstm)
val model = Sequential()
model.add(recurrent)
model.add(Select(2, -1))
model.add(Linear(lstmLayerSize, numOutcomes)
Because the Tensors in the input data all need to have the same dimension, any input sequences that are shorter than the maximum length are being padded at the beginning with zeroes.
e.g. with three input features and a maximum length of five, the sequence:
0 0 1
1 0 1
1 1 0
is being input as:
0 0 0 0 1
0 0 1 0 1
0 0 1 1 0
As I understand it, setting maskZero = true in the Recurrent layer means that the initial zero padding is ignored during training.
Furthermore the Select(2, -1) layer means that only the output of the last element in the sequence is used for updating the weights.
Is this the correct approach?