LSTM not learning

773 views
Skip to first unread message

Greg Feldmann

unread,
Oct 24, 2017, 1:19:12 AM10/24/17
to Keras-users
I'm using the NASA C-MAPSS turbofan data to try and predict the remaining useful life (in cycles) of an engine, given a set of sensor measurements, the previous 9 measurements for each sensor and the operational settings. I'm feeding this into an LSTM, using MSE as the loss function, but it doesn't seem to be learning anything - the MSE gets stuck at around 8000. In comparison, a 2 hidden layer feed-forward neural network with 40 neurons in each layer was able to get a MSE of about 3500. 

I suspect the way I've organized the data is the cause of this. 

The original data looks like this:

unit_number  cycles  operating_setting_1  operating_setting_2  \
0          1.0     1.0              -0.0007              -0.0004   
1          1.0     2.0               0.0019              -0.0003   
2          1.0     3.0              -0.0043               0.0003   
3          1.0     4.0               0.0007               0.0000   
4          1.0     5.0              -0.0019              -0.0002   

operating_setting_3  sensor_measurement_1  sensor_measurement_2  \
0                100.0                518.67                641.82   
1                100.0                518.67                642.15   
2                100.0                518.67                642.35   
3                100.0                518.67                642.35   
4                100.0                518.67                642.37   

with sensor measurements up to 21, and a RUL column as well.

As I understand it, the LSTM expects an input of shape (sample size, time_steps, features). What I've done is to create a matrix with the following columns as an individual data point (note the RUL has been removed prior to this):

operating_setting_1    operating_setting_2  ......
(value at time t-9)        (value at time t-9)
(value at time t-9)        .
.                                   .
.                                   .
.
(value at time t)

I then place each of these inside a numpy array, so that I end up with a 3d array of size (samples, time steps = 10 (to represent the rows in the above matrix), features = operational settings(3)+sensor measurements(21)). The RUL is simply an array of size (samples) that gives the RUL at time t for each matrix.

The code for my LSTM is below:

model = Sequential()

model.add(LSTM(100, return_sequences=False, input_shape=(10, 25), recurrent_activation='relu'))

model.add(Dense(1))


model.compile(loss="mean_squared_error", optimizer="adam")

LSTM=model.fit(data_train, RUL_train, epochs=number_epochs, batch_size=100, verbose=1,

validation_data=(data_validation, RUL_validation))


It doesn't seem to matter which activation function I use, unlike the FFNN, which could only learn using the rectifier activation function.


I'm new to Keras and neural nets in general, so any help would be greatly appreciated.


Greg Feldmann

unread,
Oct 24, 2017, 1:21:54 AM10/24/17
to Keras-users
Sorry, I forgot to add that 'cycle_number' is also a feature I used, so the features do add up to 25 (they only add up to 24 below).

Greg Feldmann

unread,
Oct 29, 2017, 9:40:48 PM10/29/17
to Keras-users
If anyone is interested, I figured out the solution to my problem.

First, using the 'window method' (inputting x_t, x_t-1, x_t-2, etc for one step, then x_t+1, x_t, x_t-1 for the next step) is unnecessary for an LSTM as it has a memory. Second, my batches should be divided by the different engines in my data. The presence of multiple engines basically means each sensor feature is a combination of different time series' that have been arbitrarily 'glued together' end to end. This was causing the LSTM to treat data from a different engine as data from the current engine's past, which is not true and is confusing, as we the LSTM now thinks a machine can fail more than once in a single lifetime.

After implementing these changes, my LSTM was able to beat my FFNN by about 2000 (low 3000's vs ~800). But this isn't an entirely fair comparison, as I had to trim rows from the start of each engine's life so that I could have equal batch sizes. I'm working on fixing this so that I can include all of my data.

Faezeh Hajiaghajani

unread,
Oct 29, 2017, 10:30:49 PM10/29/17
to Greg Feldmann, Keras-users
Hello Greg, 

Thanks for writing the solution to your problem. 
With your first point, do you mean that you found the window method unnecessary, therefore you are feeding only 1 step back from the value that you want to predict instead of multiple steps back?

Thanks,
Faezeh

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/73425634-9219-4e27-bffd-daf452970702%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages