I'm using the NASA C-MAPSS turbofan data to try and predict the remaining useful life (in cycles) of an engine, given a set of sensor measurements, the previous 9 measurements for each sensor and the operational settings. I'm feeding this into an LSTM, using MSE as the loss function, but it doesn't seem to be learning anything - the MSE gets stuck at around 8000. In comparison, a 2 hidden layer feed-forward neural network with 40 neurons in each layer was able to get a MSE of about 3500.
I suspect the way I've organized the data is the cause of this.
The original data looks like this:
unit_number cycles operating_setting_1 operating_setting_2 \
0 1.0 1.0 -0.0007 -0.0004
1 1.0 2.0 0.0019 -0.0003
2 1.0 3.0 -0.0043 0.0003
3 1.0 4.0 0.0007 0.0000
4 1.0 5.0 -0.0019 -0.0002
operating_setting_3 sensor_measurement_1 sensor_measurement_2 \
0 100.0 518.67 641.82
1 100.0 518.67 642.15
2 100.0 518.67 642.35
3 100.0 518.67 642.35
4 100.0 518.67 642.37
with sensor measurements up to 21, and a RUL column as well.
As I understand it, the LSTM expects an input of shape (sample size, time_steps, features). What I've done is to create a matrix with the following columns as an individual data point (note the RUL has been removed prior to this):
operating_setting_1 operating_setting_2 ......
(value at time t-9) (value at time t-9)
(value at time t-9) .
. .
. .
.
(value at time t)
I then place each of these inside a numpy array, so that I end up with a 3d array of size (samples, time steps = 10 (to represent the rows in the above matrix), features = operational settings(3)+sensor measurements(21)). The RUL is simply an array of size (samples) that gives the RUL at time t for each matrix.
The code for my LSTM is below:
model = Sequential()
model.add(LSTM(100, return_sequences=False, input_shape=(10, 25), recurrent_activation='relu'))
model.add(Dense(1))
model.compile(loss="mean_squared_error", optimizer="adam")
LSTM=model.fit(data_train, RUL_train, epochs=number_epochs, batch_size=100, verbose=1,
validation_data=(data_validation, RUL_validation))
It doesn't seem to matter which activation function I use, unlike the FFNN, which could only learn using the rectifier activation function.
I'm new to Keras and neural nets in general, so any help would be greatly appreciated.