I am trying to train a LSTM (single layer) of input vector size=26 for classification into 18 categories using ‘nnx’ (code by nicholas-leonard). I use the following code to build the structure of lstm:
inputSize = 26
hiddenSize = 256
outputSize = 18
lr = 0.01
updateInterval = 10
l=nn.LSTM(inputSize, hiddenSize)
lstm = nn.Sequential()
lstm:add(l)
lstm:add(nn.Linear(hiddenSize, outputSize))
lstm:add(nn.LogSoftMax())
lstm:cuda()
criterion = nn.ClassNLLCriterion()
criterion:cuda()
The code for training each example is
for step=1,nSteps do
input = inputs[step]
target = targets[step]
local output = lstm:forward(input:cuda())
local err = criterion:forward(output:cuda(), target:cuda())
local gradOutput = criterion:backward(output:cuda(), target:cuda())
lstm:backward(input:cuda(), gradOutput:cuda())
if step % updateInterval == 0 or step == nSteps then
l:updateParameters(lr)
end
end
The output from the forward pass becomes 'nan' after the first time the parameters of the lstm are updated using updateParameters. This code is similar in spirit to the RNN code at https://github.com/Element-Research/rnn