Extremely poor performance of text classification using LSTM

Arthi Venkataraman

unread,

Nov 14, 2015, 7:28:37 AM11/14/15

to Keras-users

Hi,

Iam trying to build a text classification system using keras.

Input is unstructured text and the output needs to be placed into one of the pre-defined classes

Iam using LSTM.

The classifier accuracy is extremely low. ( < 10% of F-score)

An alternate non deep learning based classifier say an SVM using sklearn is giving performance much better. ( > 60% F-score)

Characteristics of the data:

1. Many classes - ~ 1000

2. Data is noisy

3. Imbalanced ( Many classes < 10 training points).

Any ideas on what I could do to improve.

Things I have tried till now include :

1. Batching - Iam batching all data for a class & incrementally training the model 1 class at a Time

2 .Have added PreLu layer as well as Batch normalization layer

Is 1 the correct thing to do?

What else I can try here?

Is there a benefit in using a Deep Learning model for this kind of task.

Thanking you

Net architecture

model = Sequential()

model.add(LSTM(wordvecSize, 512,init='glorot_uniform', inner_init='orthogonal',activation='tanh', inner_activation='hard_sigmoid', return_sequences=False))

model.add(Dropout(0.5))

model.add(Dense(512, len_labels))

model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', class_mode='categorical')

Ozan Çağlayan

unread,

Nov 14, 2015, 8:27:09 AM11/14/15

to keras...@googlegroups.com

Hi,

For the first question regarding batches:

I think it's not good to train the system one class at a time. You have to shuffle your dataset before splitting into batches.

Arthi Venkataraman

unread,

Nov 15, 2015, 9:39:10 PM11/15/15

to Keras-users

Hi Ozan,

Thanks for input.

I was having an understanding that keras resets the weights for every call of fit. Hence felt if we train 1 class at a time then the weights would be appropriately relearnt for that class. Is this assumption wrong.

Also read some where that keras has an issue of currently not having a memory across batches.

Can someone throw some light on how this limitation can impact training an LSTM classifier for a large set of classes.?

Thanks and Rgds,

Arthi

p.nec...@gmail.com

unread,

Nov 16, 2015, 7:56:05 AM11/16/15

to Keras-users

Keras does not reset the weights for every fit call. What happens at every call is that the batch gets transfered to the gpu (if you have any)
Also the optimizers you use (if I'm not mistaken) will retain their parameters, so adam, rmsprop, etc will retain the gradient update scaling params they used before.

I'd just shuffle the sequences (and hope you have balanced classes) and train them as usual with one of this fancy optimizers.

Arthi Venkataraman

unread,

Nov 16, 2015, 9:45:00 PM11/16/15

to Keras-users

Thanks for the input.

Based on inputs I will try training across the entire data set and check if results improve.

In the discussion https://groups.google.com/forum/#!topic/keras-users/NIEkaDpS3lc,

Francois has pointed that "The memory state is cleared after each batch (for now). So you would have to feed the complete sequences to the LSTM layer."

Will this impact training of the classifier in any way.?

What is the impact of the memory state getting cleared?

p.nec...@gmail.com

unread,

Nov 17, 2015, 4:31:09 AM11/17/15

to Keras-users

That just means that you can't train the network in 'online mode', as in retain the hidden states and keep on computing and training.
Such as train for times [0..100] then train for [101..200] and hope it finds the 180->5 correlations.
If your sequence dependencies are that long, you have to chop your sequences in 200 length ones in order to propagate the gradients.

Arthi Venkataraman

unread,

Nov 18, 2015, 2:20:41 AM11/18/15

to Keras-users

Thanks for the clarification.

Reply all

Reply to author

Forward

Message has been deleted