Getting an LSTM example for sinusoidal signal estimation to work

846 views
Skip to first unread message

Markus Mayer

unread,
Nov 15, 2016, 6:18:47 AM11/15/16
to Caffe Users

Hi,

I was trying to follow this blog post for LSTMs using Caffe's current "master" branch implementation (using the windows branch though) but fail to get it to work. (As far as I can tell, the code is related to this C++ code of caffe-lstm.)

The original network definition is on the bottom of this post. I had to add a couple of changes, e.g.

input_shape { dim: 320 dim: 1 }

didn't work at all, so I changed it to 

input_shape { dim: 320 dim: 1 dim: 1 }

the lstm_param property and the clipping parameter in 

layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  lstm_param {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler { type: "gaussian" std: 0.1 }
    bias_filler { type: "constant" }
  }
}

didn't exist, so I tried to replace them with recurrent_param and clip_gradients on the solver, also I had to use capital-letter "LSTM" layer type instead of "Lstm.

So the problem appears when I try to run the network using the following code (basically straight from the blog post):

import numpy as np
import matplotlib.pyplot as plt
import caffe

caffe.set_mode_cpu()
solver = caffe.SGDSolver('solver.prototxt')

a = np.arange(0, 32, 0.01)
d = 0.5*np.sin(2*a) - 0.05 * np.cos(17*a + 0.8) + 0.05 * np.sin(25 * a + 10) - 0.02 * np.cos(45 * a + 0.3)
d = d / max(np.max(d), -np.min(d))
d = d - np.mean(d)

niter = 5000
train_loss = np.zeros(niter)
solver.net.params['lstm1'][2].data[15:30] = 5
solver.net.blobs['clip'].data[...] = 1
for i in range(niter):
    seq_idx = i % (len(d) / 320)
    solver.net.blobs['clip'].data[0] = seq_idx > 0
    solver.net.blobs['label'].data[:, 0] = d[seq_idx * 320 : (seq_idx+1) * 320]
    solver.step(1)
    train_loss[i] = solver.net.blobs['loss'].data

plt.plot(np.arange(niter), train_loss)

When doing that, the training loss is basically a sort-of-sinosoidal shape with a period length of 10 iterations and doesn't decrease. It should look like this:

but for me, the loss looks like this instead:

I zoomed in on the start, but it appears to repeat that pattern for all 5000 iterations. It also doesn't seem to matter if I change the learning rate or gradient clipping. To be fair, the loss actually looks a bit like the actual training input ...

In addition, the code ends with a validation part that looks like this:

solver.test_nets[0].blobs['data'].reshape(2,1)
solver.test_nets[0].blobs['clip'].reshape(2,1)
solver.test_nets[0].reshape()

solver.test_nets[0].blobs['clip'].data[...] = 1
preds = np.zeros(len(d))
for i in range(len(d)):
    solver.test_nets[0].blobs['clip'].data[0] = i > 0
    preds[i] =  solver.test_nets[0].forward()['ip1'][0][0]

plt.plot(np.arange(len(d)), preds)
plt.plot(np.arange(len(d)), d)
plt.show()

When I try to execute that, Caffe crashes on the reshape() line. I have no clue what this specific reshape does or why it is required, but since the training doesn't even succeed, I can also not toy around with it to see what happens.

Can somebody help and/or explain what's going on?

Best regards,
Markus




For completeness, this is the original network from the blog post:

name: "LSTM"
input: "data"
input_shape { dim: 320 dim: 1 }
input: "clip"
input_shape { dim: 320 dim: 1 }
input: "label"
input_shape { dim: 320 dim: 1 }
layer {
  name: "Silence"
  type: "Silence"
  bottom: "label"
  include: { phase: TEST }
}
layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  lstm_param {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler { type: "gaussian" std: 0.1 }
    bias_filler { type: "constant" }
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "lstm1"
  top: "ip1"

  inner_product_param {
    num_output: 1
    weight_filler { type: "gaussian" std: 0.1 }
    bias_filler { type: "constant" }
  }
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
  include: { phase: TRAIN }
}

and this is the solver configuration:

net: "lstm.prototxt"
test_iter: 1
test_interval: 2000000
base_lr: 0.0001
momentum: 0.95
lr_policy: "fixed"
display: 200
max_iter: 100000
solver_mode: CPU
average_loss: 200

Sarah Adel Bargal

unread,
Dec 15, 2016, 4:48:57 PM12/15/16
to Caffe Users

Did you find a solution for this? We are getting the same error that is not decreasing.

dusa

unread,
Apr 8, 2017, 11:56:43 PM4/8/17
to Caffe Users

Same exact problem here. I thought it was due to the batches but it is just too awkward. I am working on a similar project with the lstm layer of caffe, changing the learning rate etc. didn't help with the periodic oscillation at all. Did you happen to figure this out?
Message has been deleted

Marshall Worth

unread,
Apr 28, 2017, 4:00:56 PM4/28/17
to Caffe Users
Hi All, I've started to get this working, however take what I say and/or do with a grain of salt. I am very new to caffe, my background is with Matlab and the neural network toolbox, aka I'm kind of used to being handed complete packages...

Anyways, it appears to me that the problem with the implementation of the LSTM described in the blog post above is that during training there is never, explicitly for that matter, any indication of input data to the LSTM. I have worked with standard ANN's as well as NARX net and RNN derivatives so I am used to supplying inputs to the network with outputs to compare forward propagation's. If you run through the code you will see that blobs['data'] is never given anything. So right off the bat it seems weird that the network is expected to learn to generate a "sin" wave with the absence of input. Furthermore, to confound this the network is trained over and over again to produce different time periods of the sin wave with the same input, ie zeros. I would imagine this to be an impossible task.

To correct this, I thought of another purpose for this network. The goal was to produce a sin wave with amplitude 1 and period 2, the input to this network would be the "noisy" wave generated in the beginning. All I added was...

b = np.sin(2*a)

where 'a' equals the original array of 0 to 32 by 0.01, next I modified the blobs['label'] input and added a blobs ['data'] input within the original for loop like this...

solver.net.blobs['data'].data[:, 0, 0] = d[seq_idx * 320 : (seq_idx+1) * 320]
solver.net.blobs['label'].data[:, 0] = b[seq_idx * 320 : (seq_idx+1) * 320]

This way the "perfect" sin wave desired output time period is in synch with the "noisy" sine wave input. I'm working on the testing part right now, however it appears to be some combination setting the data inputs with...

solver.test_nets[0].blobs['data'].data[:,0,0]=d[seq_idx * 320 : (seq_idx+1) * 320]
solver.test_nets[0].forward()
output=solver.test_nets[0].blobs['ip1'].data

Where we are giving the network inputs in blobs['data'], initiated one forward propagation, and capturing the output layer at blobs['ip1']

Will report back if I can get working better. Anyone please respond if my methods are tedious, unsound, or need improving.

Colin Brown

unread,
Jul 7, 2017, 4:37:08 PM7/7/17
to Caffe Users
Hi Marshall,

I initially had the same thought as you. I thought that the input data had been carelessly omitted. However, I'm now wondering if this was done intentionally in order to show (rather cryptically) that even *without* input data, the LSTM layer can learn to predict the next output, based on an initial input (0?) and the current state of LSTM memory. Unfortunately, I can neither confirm or deny this hypothesis because I can't seem to get this example to work. (Like you though, I will make a disclaimer that I am new to the world of LSTM training).

Did anyone manage to create a working example in mainline Caffe? A working helloworld for LSTMs on Caffe would be extremely useful.

Thanks,

Colin

Yasin Almalıoğlu

unread,
Aug 9, 2017, 6:36:08 AM8/9/17
to Caffe Users, sunsi...@gmail.com
Why isn't there any solid example showing how to use LSTM layer eventhough it is in the Caffe release. There are many PRs of Caffe with LSTM layer but non is exactly the same.

Rabia Saeed

unread,
Mar 29, 2018, 11:39:49 AM3/29/18
to Caffe Users
Hello,

Is anyone able to find a solution. I am stuck at this problem also.

Thanks
Reply all
Reply to author
Forward
0 new messages