LSTM where each feature has a different sequence length

516 views
Skip to first unread message

Lee

unread,
Apr 14, 2017, 5:26:04 PM4/14/17
to lasagne-users
Hello,

I've got an LSTM where a few of the features have different sequence lengths (but these are the same from example to example).

For example:

feature 1 length is always 10 samples
feature 2 length is always 20 samples


Is there a way to incorporate this into the LSTM as an input, other than by padding with zeros or something similar?

feature 1 [0:10]=real values, feature1 [10:20]=0
feature 2 [0:20]=real values

I don't want to just pad with zeros because the whole point of having different sampling rates is to the reduce the problem complexity in this case.

Best,
Lee

Jan Schlüter

unread,
Apr 20, 2017, 8:32:50 AM4/20/17
to lasagne-users
For example:

feature 1 length is always 10 samples
feature 2 length is always 20 samples

Is there a way to incorporate this into the LSTM as an input, other than by padding with zeros or something similar?

Padding with zeros will give you features that are out of sync. I guess I'd try processing feature2 separately at first, then downsample it to match the frame rate of feature1, concatenate the two (in the feature dimension) and continue processing them together. The preprocessing of feature1 could be an LSTM or a Conv2D layer, followed by mean or max pooling.
Message has been deleted

Lee

unread,
Apr 29, 2017, 12:36:06 PM4/29/17
to lasagne-users
Thanks!


try processing feature2 separately at first, then downsample it to match the frame rate of feature1


I'd like to make sure I understand how to do this step properly. Is the layer usage/layout below correct?

in1 = feature seq 1                        in2 = feature seq 2
        |                                                    |        
        |                                      lstm1 =         LSTMLayer(in2, num_units)
        |                                                    |  
        |                                       dsl =            DimshuffleLayer(lstm1, (0,2,1))   #moving seq length dim (next layer acts on trailing axis) 
        |                                                    |  
        |                                       pl =             Pool1DLayer(dsl,  2)  #using 2 for pool size to downsample by two to match feature 1 seq length
        |                                                    |  
        |                                      dsl2  =          DimshuffleLayer(pl, (0,2,1)) #back to original LSTM output shape  (instances, seq length, features)
                               |                                                
                 cl  =           ConcatLayer( [in1, dsl2], axis=2)  #axis 2 is the feature dimension
                               |  
           <remaining LSTM layers in network>


Lee

unread,
Apr 29, 2017, 1:19:34 PM4/29/17
to lasagne-users

One thing also here: I want to predict targets at the higher sampling rate (i.e. that of feature seq 1). In the scheme above, I would be predicting at the lower rate.

Jan Schlüter

unread,
May 2, 2017, 5:01:46 AM5/2/17
to lasagne-users
Is the layer usage/layout below correct

Yes, looks good!
 
One thing also here: I want to predict targets at the higher sampling rate (i.e. that of feature seq 1). In the scheme above, I would be predicting at the lower rate.

But in the first post you said "the whole point of having different sampling rates is to the reduce the problem complexity" -- that's why I assumed you want to process at the lower rate in the end. If you want to predict at the highest rate in the end, instead of downsampling the second sequence, you should probably upsample the first. You can use an Upscale1DLayer for this, again with DimshuffleLayers around. You can also process the first sequence with one or two LSTMLayers before upsampling and merging with the high-resolution features. This would be the counterpart to the architecture you drew above.

Best, Jan

Lee

unread,
May 8, 2017, 6:27:21 PM5/8/17
to lasagne-users

Ok, thanks again, Jan! This helps a lot.


But in the first post you said "the whole point of having different sampling rates is to the reduce the problem complexity" -- that's why I assumed you want to process at the lower rate in the end

Yeah, I wasn't being very clear in my writing: I'm using sequences which are quite long in duration (e.g. 4000 1-second time steps), and I have a handful of variables/features (7 - 9). For some of these features, the full sample rate is needed because there are fast dynamics that have a large immediate impact on the output as well as slower changes that can have an impact over long timescales. For others, only the slower changes matter, so heavily downsampling them is fine.  The full input space is large enough that I start running into difficulties converging, as well as having some prohibitive restrictions on batch size and network size (running out of memory). As part of pre-processing the data, reducing the input space by downsampling each feature by the same amount (e.g. the first 2000 time step values sampled at a low rate, the next 1000 sampled at a higher rate, then the last 1000 sampled at the full rate) helped quite a lot with this. But, because I know I can get away with many fewer samples from a few of the features in particular, I'd like to be able to cut those out. So, I guess really what I meant is that I want to reduce the overall size of the input by cutting out some of the information I know I can throw away for several of the features, just to help make the problem simpler for the network to learn and also to help fit larger batches into training.

With the Upscale1DLayer at some point I have the full sequence size again for all features, so it seems I might not avoid the memory issues that way unless I'm able to use a smaller size/number of layers by processing the two sets of data separately? I'll give it a shot regardless.

Cheers, and thanks again,
Lee
Reply all
Reply to author
Forward
0 new messages