Sequence classification using LSTMs where sequences are different lengths

mark....@thefilter.com

unread,

May 11, 2018, 5:57:03 AM5/11/18

to BigDL User Group

I am trying to do classification of sequences using an LSTM layer in my network. The sequences in my data are of different lengths.

I have defined my network as follows:

    val lstm = LSTM(
      numFeatures,
      lstmLayerSize
    )

    val recurrent = Recurrent(maskZero = true)
    recurrent.add(lstm)

    val model = Sequential()
    model.add(recurrent)
    model.add(Select(2, -1))
    model.add(Linear(lstmLayerSize, numOutcomes)

Because the Tensors in the input data all need to have the same dimension, any input sequences that are shorter than the maximum length are being padded at the beginning with zeroes.

e.g. with three input features and a maximum length of five, the sequence:

0 0 1

1 0 1

1 1 0

is being input as:

0 0 0 0 1

0 0 1 0 1

0 0 1 1 0

As I understand it, setting maskZero = true in the Recurrent layer means that the initial zero padding is ignored during training.

Furthermore the Select(2, -1) layer means that only the output of the last element in the sequence is used for updating the weights.

Is this the correct approach?

Yiheng Wang

unread,

May 14, 2018, 9:45:39 AM5/14/18

to mark....@thefilter.com, BigDL User Group

Hi

I think the understanding of the maskZero is not correct. Here's from scala doc

"

The recurrent includes some mask mechanisms if the maskZero variable is set to true, the Recurrent module will not consider zero vector inputs. For each time step input, if a certain row is a zero vector (all the elements of the vector equals zero), then output of certain row of this time step would be a zero vector, and the hidden state of the certain row of this time step would be the same as the corresponding row of the hidden state of the previous step.

"

There's LSTM example for text classification. Although it's a python version, but the API is similar.

https://github.com/intel-analytics/BigDL/blob/master/pyspark/bigdl/models/textclassifier/textclassifier.py

Here's another LSTM example for image classification...

https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/lstm.ipynb

I suggest you can first refer these examples. And welcome for discussion if you have further questions.

Regards,

Yiheng

--
You received this message because you are subscribed to the Google Groups "BigDL User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bigdl-user-group+unsubscribe@googlegroups.com.
To post to this group, send email to bigdl-user-group@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bigdl-user-group/87685bb4-be90-4598-9a6b-a471cceb703a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mark....@thefilter.com

unread,

Jun 11, 2018, 5:35:02 AM6/11/18

to BigDL User Group

Hi Yiheng,

Thank you for your reply.

As I understand it, if I specify maskZero=true for my RNN then if one of my input vectors is all zeroes it will have no effect on the weights, i.e. it will be as if that vector was not input.

If I input a sequence of vectors but the vectors at the start of the sequence are all zero then in terms of training the network it will be as if only the non-zero vectors at the end of the sequence were input.

Is that correct?

I want to do the same thing as described here in the documentation for the deeplearning4j library: https://deeplearning4j.org/usingrnns#testtimemasking but BigDL does not have the same masking functionality so I am trying to find a way to replicate it.

Many thanks for your help.

Reply all

Reply to author

Forward