Hi,
I'm trying to use TimeDistributed and LSTM in a way that apparently isn't supported; have any of you run into the same problem? And if so, is there a way around this?
Here's the problem:
>>> from keras.layers import Highway, Input, TimeDistributed
>>> input1 = Input(shape=(3, 5))
>>> input2 = Input(shape=(1, 5))
>>> highway_layer = Highway(activation='relu', name='highway')
>>> distributed_highway_layer = TimeDistributed(highway_layer, name='distributed_highway')
>>> highway_input1 = distributed_highway_layer(input1)
>>> highway_input2 = distributed_highway_layer(input2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mattg/anaconda3/lib/python3.5/site-packages/keras/engine/topology.py", line 494, in __call__
self.assert_input_compatibility(x)
File "/home/mattg/anaconda3/lib/python3.5/site-packages/keras/engine/topology.py", line 434, in assert_input_compatibility
str(x_shape))
Exception: Input 0 is incompatible with layer distributed_highway: expected shape=(None, 3, 5), found shape=(None, 1, 5)
Because the TimeDistributed layer was built when applied to the first input, it thinks it has a particular input shape, but this assumption fails when it gets applied to the second input, with a different number of timesteps, and it crashes.
Where do you run into this problem? I'm trying to re-implement a
reading comprehension model, and it uses highway layers on top of word embeddings for both a question and a passage of text. So my question tensor has shape (batch_size, num_question_words, embedding_dim), and my passage tensor has shape (batch_size, num_passage_words, embedding_dim). I just want a highway layer applied to the embeddings, and I want it to be the _same_ highway layer for both the question and the passage. The code above seems like a natural way to implement this (assuming "input1" and "input2" actually previous layer outputs that are my word embeddings for the question and the passage). However, it doesn't work.
I can think of one work around, and that is to instantiate two separate TimeDistributed objects, both using the same underlying Highway layer. Because TimeDistributed doesn't have any parameters, this actually works, but it's a bit ugly. And it wouldn't be so bad if this only applied to TimeDistributed, but you get the same problem with any recurrent layer:
>>> from keras.layers import LSTM, Input
>>> input1 = Input(shape=(3, 5))
>>> input2 = Input(shape=(1, 5))
>>> lstm = LSTM(10)
>>> lstm(input1)
>>> lstm(input2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mattg/anaconda3/lib/python3.5/site-packages/keras/engine/topology.py", line 494, in __call__
self.assert_input_compatibility(x)
File "/home/mattg/anaconda3/lib/python3.5/site-packages/keras/engine/topology.py", line 434, in assert_input_compatibility
str(x_shape))
Exception: Input 0 is incompatible with layer distributed_highway: expected shape=(None, 3, 5), found shape=(None, 1, 5)
And I don't know a way around this problem. Any ideas?
Matt