Hi,
I was trying to follow
this blog post for LSTMs using Caffe's current "master" branch implementation (using the windows branch though) but fail to get it to work. (As far as I can tell, the code is related to
this C++ code of caffe-lstm.)
The original network definition is on the bottom of this post. I had to add a couple of changes, e.g.
input_shape { dim: 320 dim: 1 }
didn't work at all, so I changed it to
input_shape { dim: 320 dim: 1 dim: 1 }
the lstm_param property and the clipping parameter in
layer {
name: "lstm1"
type: "Lstm"
bottom: "data"
bottom: "clip"
top: "lstm1"
lstm_param {
num_output: 15
clipping_threshold: 0.1
weight_filler { type: "gaussian" std: 0.1 }
bias_filler { type: "constant" }
}
}
didn't exist, so I tried to replace them with
recurrent_param and
clip_gradients on the solver, also I had to use capital-letter "LSTM" layer type instead of "Lstm.
So the problem appears when I try to run the network using the following code (basically straight from the blog post):
import numpy as np
import matplotlib.pyplot as plt
import caffe
caffe.set_mode_cpu()
solver = caffe.SGDSolver('solver.prototxt')
a = np.arange(0, 32, 0.01)
d = 0.5*np.sin(2*a) - 0.05 * np.cos(17*a + 0.8) + 0.05 * np.sin(25 * a + 10) - 0.02 * np.cos(45 * a + 0.3)
d = d / max(np.max(d), -np.min(d))
d = d - np.mean(d)
niter = 5000
train_loss = np.zeros(niter)
solver.net.params['lstm1'][2].data[15:30] = 5
solver.net.blobs['clip'].data[...] = 1
for i in range(niter):
seq_idx = i % (len(d) / 320)
solver.net.blobs['clip'].data[0] = seq_idx > 0
solver.net.blobs['label'].data[:, 0] = d[seq_idx * 320 : (seq_idx+1) * 320]
solver.step(1)
train_loss[i] = solver.net.blobs['loss'].data
plt.plot(np.arange(niter), train_loss)
When doing that, the training loss is basically a sort-of-sinosoidal shape with a period length of 10 iterations and doesn't decrease. It should look like this:
but for me, the loss looks like this instead:
I zoomed in on the start, but it appears to repeat that pattern for all 5000 iterations. It also doesn't seem to matter if I change the learning rate or gradient clipping. To be fair, the loss actually looks a bit like the actual training input ...
In addition, the code ends with a validation part that looks like this:
solver.test_nets[0].blobs['data'].reshape(2,1)
solver.test_nets[0].blobs['clip'].reshape(2,1)
solver.test_nets[0].reshape()
solver.test_nets[0].blobs['clip'].data[...] = 1
preds = np.zeros(len(d))
for i in range(len(d)):
solver.test_nets[0].blobs['clip'].data[0] = i > 0
preds[i] = solver.test_nets[0].forward()['ip1'][0][0]
plt.plot(np.arange(len(d)), preds)
plt.plot(np.arange(len(d)), d)
plt.show()
When I try to execute that, Caffe crashes on the reshape() line. I have no clue what this specific reshape does or why it is required, but since the training doesn't even succeed, I can also not toy around with it to see what happens.
Can somebody help and/or explain what's going on?
Best regards,
Markus
For completeness, this is the original network from the blog post:
name: "LSTM"
input: "data"
input_shape { dim: 320 dim: 1 }
input: "clip"
input_shape { dim: 320 dim: 1 }
input: "label"
input_shape { dim: 320 dim: 1 }
layer {
name: "Silence"
type: "Silence"
bottom: "label"
include: { phase: TEST }
}
layer {
name: "lstm1"
type: "Lstm"
bottom: "data"
bottom: "clip"
top: "lstm1"
lstm_param {
num_output: 15
clipping_threshold: 0.1
weight_filler { type: "gaussian" std: 0.1 }
bias_filler { type: "constant" }
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "lstm1"
top: "ip1"
inner_product_param {
num_output: 1
weight_filler { type: "gaussian" std: 0.1 }
bias_filler { type: "constant" }
}
}
layer {
name: "loss"
type: "EuclideanLoss"
bottom: "ip1"
bottom: "label"
top: "loss"
include: { phase: TRAIN }
}
and this is the solver configuration:
net: "lstm.prototxt"
test_iter: 1
test_interval: 2000000
base_lr: 0.0001
momentum: 0.95
lr_policy: "fixed"
display: 200
max_iter: 100000
solver_mode: CPU
average_loss: 200