How to combine multiple convolution layers with an LSTM layer?

590 views
Skip to first unread message

Mark

unread,
Jul 19, 2018, 8:50:36 AM7/19/18
to Keras-users
Hello all.  I am trying to combine several convolution layers with an LSTM layer.  I am aware of the ConvLSTM2D layer that is built in to Keras.  This is a great start, but let me make up some numbers to illustrate an example:  I would like to have 6 convolution layers, then 1 LSTM layer that takes 10 data inputs.  

For example, the final structure would take individual data points of shape (10, 64, 64, 3), for example, and apply the convolution layers to all 10 of the 64x64 images, then feed the resulting data (again of shape (10, x, y, z)) in to the LSTM layer.

I have a feeling the answer involves the functional scheme, if it can be done.  Does anyone have any ideas?

Sergey O.

unread,
Jul 19, 2018, 9:49:04 AM7/19/18
to Mark, Keras-users
Is your input (None, 10, 64, 64, 3) or (10, 64, 64, 3) (Where "None" is the batch dimension)?

What is the "time" and "channel" dimension of the input to the LSTM? By default, LSTM assumes the first dimension (batch) is "time".


sn1...@gmail.com

unread,
Jul 19, 2018, 5:05:15 PM7/19/18
to Keras-users
The input will be (None, 10, 64, 64, 3).  Channels last format.  That is:  (Batch_number, time=10, image_width=64, image_height=64, channels=3).

It's the exact same inputs that a ConvLSTM2D layer takes, although in principle I don't care too much about the exact format of the input.

Sergey O.

unread,
Jul 19, 2018, 5:10:44 PM7/19/18
to Mark, Keras-users
This is not the most elegant solution... but perhaps you can do something along the lines of:

import keras
import tensorflow as tf
from keras.layers import *
from keras.models import Model

i_3D = Input(shape=(10,64,64,3))
img = []
for i in range(10):
  img.append(Lambda(lambda x: x[:,i])(i_3D))
  
con = []
for c in range(6):
  con.append(Conv2D(1,(3,3),padding="same",name="con_"+str(c)))

A = []
for i in range(10):
  a = img[i]
  for c in range(6):
    a = con[c](a)
  A.append(a)

A = concatenate(A)
A = Lambda(lambda x: tf.transpose(x,[0,3,1,2]))(A)
model = Model(i_3D,A)


--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/8a493379-aab1-4cda-9354-92e44c7a2e23%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey O.

unread,
Jul 19, 2018, 5:32:58 PM7/19/18
to Mark, Keras-users
Found a bug in the previous email. Here is the corrected version. 

Here I split each time-steps (None, 64, 64, 3), apply multiple Conv2D (2 filters) to each, then stack to create (None, 10, 64, 64, 2). Then I reshape/flatten the last 3 dimension to (None, 10, 8192) and push that to an LSTM to get (None, 1)!

import keras
from keras.layers import *
from keras.models import Sequential,Model
import keras.backend as K

i_3D = Input(shape=(10,64,64,3))
img = []
for i in range(10):
  img.append(Lambda(lambda x: x[:,i])(i_3D))
  
con = []
for c in range(6):
  con.append(Conv2D(2,(3,3),padding="same",name="con_"+str(c)))

A = []
for i in range(10):
  a = img[i]
  for c in range(6):
    a = con[c](a)
  A.append(a)
  
A = Lambda(lambda x: K.stack(x,1))(A)
A = Reshape((10,-1))(A)
A = LSTM(1)(A)

model = Model(i_3D,A)
print(model.summary(120))
Reply all
Reply to author
Forward
0 new messages