LSTM training loss decrease, but the validation loss doesn't change!

10,219 views
Skip to first unread message

s.alia...@gmail.com

unread,
Jun 22, 2016, 2:13:04 AM6/22/16
to Keras-users
Dear Keras users,

I'm trying to implement "Activity Recognition" using CNN and LSTM. When I train my LSTM, the loss of training decreases reasonably, but, for the validation, it does not change.
Log is like this:
Epoch 1/100
3000/3000 [==============================] - 11s - loss: 1.8283 - acc: 0.1883 - val_loss: 1.8100 - val_acc: 0.1726
Epoch 2/100
3000/3000 [==============================] - 10s - loss: 1.7769 - acc: 0.2213 - val_loss: 1.8189 - val_acc: 0.1854
Epoch 3/100
3000/3000 [==============================] - 10s - loss: 1.7488 - acc: 0.2430 - val_loss: 1.8194 - val_acc: 0.1905
Epoch 4/100
3000/3000 [==============================] - 10s - loss: 1.7324 - acc: 0.2553 - val_loss: 1.8168 - val_acc: 0.1777
Epoch 5/100
3000/3000 [==============================] - 10s - loss: 1.7099 - acc: 0.2840 - val_loss: 1.8204 - val_acc: 0.1841
Epoch 6/100
3000/3000 [==============================] - 10s - loss: 1.6925 - acc: 0.2967 - val_loss: 1.8240 - val_acc: 0.1867
Epoch 7/100
3000/3000 [==============================] - 10s - loss: 1.6651 - acc: 0.3237 - val_loss: 1.8353 - val_acc: 0.1777
Epoch 8/100
3000/3000 [==============================] - 10s - loss: 1.6581 - acc: 0.3213 - val_loss: 1.8458 - val_acc: 0.1701
Epoch 9/100
3000/3000 [==============================] - 10s - loss: 1.6390 - acc: 0.3473 - val_loss: 1.8578 - val_acc: 0.1586
Epoch 10/100
3000/3000 [==============================] - 10s - loss: 1.6238 - acc: 0.3577 - val_loss: 1.8446 - val_acc: 0.1944
Epoch 11/100
3000/3000 [==============================] - 10s - loss: 1.6020 - acc: 0.3800 - val_loss: 1.8435 - val_acc: 0.2008
Epoch 12/100
3000/3000 [==============================] - 10s - loss: 1.5945 - acc: 0.3807 - val_loss: 1.8515 - val_acc: 0.1701
Epoch 13/100
3000/3000 [==============================] - 10s - loss: 1.5621 - acc: 0.3990 - val_loss: 1.8521 - val_acc: 0.1931
Epoch 14/100
3000/3000 [==============================] - 10s - loss: 1.5482 - acc: 0.4053 - val_loss: 1.8607 - val_acc: 0.1829
Epoch 15/100
3000/3000 [==============================] - 10s - loss: 1.5243 - acc: 0.4367 - val_loss: 1.8788 - val_acc: 0.1816
Epoch 16/100
3000/3000 [==============================] - 10s - loss: 1.5154 - acc: 0.4410 - val_loss: 1.8903 - val_acc: 0.1662
Epoch 17/100
3000/3000 [==============================] - 10s - loss: 1.4950 - acc: 0.4557 - val_loss: 1.8932 - val_acc: 0.1573
...
...

...

Briefly, from each frame of the clip, I extract CNN features from VGG network (pre-trained on ImageNet) until fc7; so, the length of feature vector for each frame is 2048. For each activity, I use 50 frames.
My LSTM model is:

INPUT_LEN = 50
INPUT_DIM = 4096
OUTPUT_LEN = 6

model = Sequential()
model.add(LSTM(256, input_dim=INPUT_DIM, input_length=INPUT_LEN))
model.add(Dense(OUTPUT_LEN))
model.add(Activation('softmax'))
 
sgd = SGD(lr=0.001, decay=0.0005, momentum=0.9, nesterov=True)
model.compile(loss='crossentropy', optimizer=sgd, metrics=['accuracy'])

and for the training:
model.fit(X_train, Y_train, batch_size=BATCH_SIZE, nb_epoch=100, validation_data = (X_test, Y_test))

Does anybody have an idea why the training loss and accuracy changes reasonably at each epoch, but, the validation behaviour is crazy?

Thanks

Koustav Mullick

unread,
Jun 22, 2016, 2:51:09 AM6/22/16
to Keras-users, s.alia...@gmail.com
Hi,

Try tuning the parameters a bit. The training accuracy improvement also isn't significant. Ideally, the initial improvements should be somewhat huge, and gradually it reduces as it approaches the minima. Try increasing your LR, or maybe using other forms of optimizers.

Looking at the gradual increase in training accuracy, I feel you have scope for increasing the learning rate.

Koustav Mullick

unread,
Jun 22, 2016, 2:52:48 AM6/22/16
to Keras-users, s.alia...@gmail.com
And also try inserting regularizers like Dropout etc to prevent it from over-fitting on the training data.

s.alia...@gmail.com

unread,
Jun 22, 2016, 2:59:45 AM6/22/16
to Keras-users, s.alia...@gmail.com
Thank you for the reply. I've tried different LRs, but, as you suggested I will try larger LR. 
About the changes in the loss and training accuracy, after 100 epochs, the training accuracy reaches to 99.9% and the loss comes to 0.28! but the validation accuracy remains 17% and the validation loss becomes 4.5%.

Do you have any idea why it happens?

Koustav Mullick

unread,
Jun 22, 2016, 3:02:47 AM6/22/16
to Keras-users, s.alia...@gmail.com
Then the model is purely over-fitting on the training set. But over-fitting from the very first epoch seems a bit odd. Just check whether you are providing similar data as training and validation set or not.
By similar I mean if you have applied some sort of pre-processing on the training data before feeding it into the CNN, make sure your validation data also undergoes through the same steps.

s.alia...@gmail.com

unread,
Jun 22, 2016, 3:06:08 AM6/22/16
to Keras-users, s.alia...@gmail.com
I agree that the model is over-fitting. But, I'm not sure how to avoid it. I add a dropout(0.5) after LSTM layer and increased the LR to 0.01. But, same crazy behaviour was observerd!
About the data, I first generate CNN feature from the whole dataset and then split it into two sets of training and validation. Moreover, I tried validation_split=0.2 in model.fit() to ensure that the data is correctly splited. But, same behaviour again!

Koustav Mullick

unread,
Jun 22, 2016, 3:20:42 AM6/22/16
to Keras-users, s.alia...@gmail.com
That is really strange. Maybe try using some other optimization like adadelta or something and use the dafault parameters.

Also I see you are using 'crossentropy' as your loss function. Is it the same as 'categorical_crossentropy' ?

s.alia...@gmail.com

unread,
Jun 22, 2016, 3:22:52 AM6/22/16
to Keras-users, s.alia...@gmail.com
In my code, I am using categorical_crossentropy. Thanks for the suggestion. I'll use different optimizers

Koustav Mullick

unread,
Jun 22, 2016, 3:31:43 AM6/22/16
to Keras-users, s.alia...@gmail.com
As a sanity check, send you training data only as validation data and see whether the learning on the training data is getting reflected on it or not. 

If yes, then there is some issue with the data and your training process. That is, the model cannot generalize whatever it is learning, on new unseen data.

If no, then you need to debug and find out the exact reason for it not happening.

s.alia...@gmail.com

unread,
Jun 22, 2016, 3:35:43 AM6/22/16
to Keras-users, s.alia...@gmail.com
When I use training data as both training and validation, everything seems to be OK. both training accuracy and validation accuray behave normally.
So, the problem is with the data?

Koustav Mullick

unread,
Jun 22, 2016, 3:37:20 AM6/22/16
to Keras-users, s.alia...@gmail.com
Data, or maybe the features that you are using isn't ideal for the task at hand. Because as you can see, it is training well, but once presented with previously unseen data, it's failing drastically.

s.alia...@gmail.com

unread,
Jun 22, 2016, 3:39:33 AM6/22/16
to Keras-users, s.alia...@gmail.com
Thank you very much. I will try extracting other types of features to see its effect on training.
Thanks for the suggestions!

Koustav Mullick

unread,
Jun 22, 2016, 3:46:46 AM6/22/16
to Keras-users, s.alia...@gmail.com
You say length of feature vector for each frame is 2048, but you set input_dim = 4096. 
Is that a typo ?

s.alia...@gmail.com

unread,
Jun 22, 2016, 3:50:54 AM6/22/16
to Keras-users, s.alia...@gmail.com
Yes, it's just a typo!

s.alia...@gmail.com

unread,
Jun 22, 2016, 10:53:35 PM6/22/16
to Keras-users, s.alia...@gmail.com
Solved!
The problem was with the normalization of the data. Before extracting CNN features, I had to normalize RGB data (according to RGB values of ImageNet). After that, everything works fine!
Thanks.

Koustav Mullick

unread,
Jun 23, 2016, 1:08:19 AM6/23/16
to Keras-users, s.alia...@gmail.com
Hi,

Glad that it got resolved. :)

Nader Nazemi

unread,
Apr 18, 2017, 11:25:33 AM4/18/17
to Keras-users, s.alia...@gmail.com
how did you normalize the data in RGB ?

rahulk...@gmail.com

unread,
May 15, 2018, 4:25:23 AM5/15/18
to Keras-users
Using the ImageDataGenerator, you can use the 'rescale' parameter. Normally, it goes like this:

img_datagen = ImageDataGenerator(rescale=1./255)

train_gen = img_datagen .flow_from_directory(..., 'train')
val_gen = img_datagen .flow_from_directory(..., 'val')

history = model.fit_generator(...)

fatmaels...@gmail.com

unread,
Nov 26, 2018, 7:07:44 AM11/26/18
to Keras-users
Hello,

I'm also using the the same CNN features and I'm facing the same problem of overfitting. I need to know what do u mean by normalizing the data before extracting CNN features and how u did it?

Thanks

beyr...@gmail.com

unread,
Dec 1, 2018, 9:17:12 AM12/1/18
to Keras-users
Batch Normalization 
Reply all
Reply to author
Forward
0 new messages