Is it possible to use stateful LSTM layer with multiple embedded inputs that are then merged?

942 views
Skip to first unread message

Eddie Yolo

unread,
Jul 30, 2016, 4:28:12 AM7/30/16
to Keras-users

I'm trying to create a stateful LSTM layer after I have embedded multiple distinct feature inputs, and then merged them together along with a one-hot encoded vector input.

Here is the general structure of what I want to do:

A = Input(shape=(1,), dtype='int32')
embeddedA
= Embedding(input_dim=5345, output_dim=16, input_length=1)(A)
embeddedA
= Flatten()(embeddedA)

B
= Input(shape=(1,), dtype='int32')
embeddedB
= Embedding(input_dim=9453, output_dim=16, input_length=1)(B)
embeddedB
= Flatten()(embeddedB)

C
= Input(shape=(392,), dtype='int32')
embeddedC
= Embedding(input_dim=19240, output_dim=16, input_length=392)(C)
embeddedC
= Flatten()(embeddedC)

D
= Input(shape=(64,))

mergedX
= merge([
    embeddedA
,
    embeddedB
,
    embeddedC
,
    D
],
    mode
='concat')


mergedX
= Reshape((1, 6368))(mergedX)

a
= LSTM(256, stateful=True, return_sequences=True, batch_input_shape=(1,1,6368))(mergedX)

b
= LSTM(256, stateful=True)(a)

c
= Dense(128, activation='relu')(b)

y
= Dense(16, activation="softmax")(c)

model
= Model(input=[A, B, C, D], output=[y])


Although when I do this, I get an exception:
Exception: If a RNN is stateful, a complete input_shape must be provided (including batch size).

When reading the documentation, it becomes clear why this is not working:
"To enable statefulness: - specify stateful=True in the layer constructor. - specify a fixed batch size for your model, by passing a batch_input_shape=(...) to the first layer in your model. This is the expected shape of your inputs including the batch size."

I have tried adding batch_input_shape to each input, but this does not work either. The documentation also isn't clear in the context of this model, nor is it in the context of any model created with multiple inputs using the functional API.

Note: This model works without statefulness.

Thanks for any help in advance!

Eddie Yolo

unread,
Aug 1, 2016, 2:41:59 AM8/1/16
to Keras-users
Ok, I suppose that may not possible.. The reason I was asking that question is because I hit another wall here: 

Is it possible to add masking before/during embedding, with a reshape after the embed?  (it appears not, what I'm looking for is a clever workaround really)
OR
Is it possible to embed a 2d vector, and retain the first axis, while the output dim changes the second?

Basically, I have 4 main types of features; As, Bs, Cs, and Ds, that I have for each term of a sequence of at most 12 (sequences in the sample space vary between 1 and 12, hence the desire for the masking). For each term in the sequence, the A, B, and Cs need to be embedded. D is a set of features that are easily one hot encoded. The Cs are the problem here, there are 392 of them per term in each sequence. It appears that I can neither reshape after masking on the embedding layer, nor can I embed a 2d input (to avoid having to reshape) and retain proper output dimensions. Of course I could simply let the masked values pass through,.. but that's not ideal.. I feel like any of the above use cases should be valid...

Here's some code to make more clear what I'm talking about:

A = Input(shape=(12,), dtype='int32')
embeddedA
= Embedding(input_dim=5345, output_dim=16, input_length=12, mask_zero=True)(A)

B
= Input(shape=(12,), dtype='int32')
embeddedB
= Embedding(input_dim=9453, output_dim=16, input_length=12, mask_zero=True)(B)

C
= Input(shape=(12*392,), dtype='int32')
embeddedC
= Embedding(input_dim=19240, output_dim=16, input_length=12*392, mask_zero=True)(C)
embeddedC
= Reshape((12, 16*392))

# Idealy C would actually look like this, as to avoid a reshape, but embedding can't seem to handle more than 1d input properly:
# C = Input(shape=(12,392), dtype='int32')
# embeddedC = Embedding(input_dim=19240, output_dim=16, input_length=(12,392), mask_zero=True)(C)

D
= Input(shape=(12, 64))
D
= Masking()(D)


mergedX
= merge([
    embeddedA
,
    embeddedB
,
    embeddedC
,
    D
],
    mode
='concat')

# Can't mask here because embedding layers will obscure mask value for A, B, and Cs...

a
= LSTM(256, return_sequences=True)(mergedX)
b
= LSTM(256)(a)


c
= Dense(128, activation='relu')(b)

y
= Dense(16, activation="softmax")(c)

model
= Model(input=[A, B, C, D], output=[y])

Here's a more manageable toy model with virtually the same problem. Here instead of sequences of length 12, they're 2; and the term features are 3, instead of 392 - to prove a point afterward:

rx = np.random.random_integers(1, 19240, (500, 2, 3))
ry
= np.random.random((500, 4))

x
= Input(shape=(2, 3), dtype='int32')
e
= Embedding(19240, 16, input_length=(2, 3), mask_zero=True)(x) # Ideal would output shape (2, 3*16)
a
= LSTM(32)(e)
y
= Dense(4)(a)

model
= Model(input=x, output=y)
model
.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model
.fit(rx, ry)

Now that doesn't work... But you CAN accomplish nearly the same thing (that does compile) by separating x and b. Though, I'm nearly 100% certain masking doesn't properly work with the following, how could it?:

rx11 = np.random.random_integers(1, 19240, (500, 1))
rx12 = np.random.random_integers(1, 19240, (500, 1))
rx13 = np.random.random_integers(1, 19240, (500, 1))

rx21 = np.zeros((500, 1), dtype='int32')
rx22 = np.zeros((500, 1), dtype='int32')
rx23 = np.zeros((500, 1), dtype='int32')

ry = np.random.random((500, 4))

x11 = Input(shape=(1,), dtype='int32')
x12 = Input(shape=(1,), dtype='int32')
x13 = Input(shape=(1,), dtype='int32')

# Either all or none are zeros
x21 = Input(shape=(1,), dtype='int32')
x22 = Input(shape=(1,), dtype='int32')
x23 = Input(shape=(1,), dtype='int32')

e = Embedding(19240, 16, input_length=1, mask_zero=True)

e11 = e(x11) # (1, 16)
e12 = e(x12) # (1, 16)
e13 = e(x13) # (1, 16)

e21 = e(x21) # (1, 16)
e22 = e(x22) # (1, 16)
e23 = e(x23) # (1, 16)

e1 = merge([e11, e12, e13], mode='concat', concat_axis=-1) # (1, 48)
e2 = merge([e21, e22, e23], mode='concat', concat_axis=-1) # (1, 48)

e = merge([e1, e2], mode='concat', concat_axis=1) # (2, 48)

a = LSTM(32)(e)
y = Dense(4)(a)

model = Model(input=[x11, x12, x13, x21, x22, x23], output=y)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit([rx11, rx12, rx13, rx21, rx22, rx23], ry)

But there has to be a better way that I'm missing so I don't have to have 12*392 separate inputs and embedding calls for feature C? + I doubt that masking still works, it's hard to tell what happens? Maybe someone could at least share some insight into that? How does the merge even work in this case? Could it not be made so Reshape could work with Masking as well?

So to sum up:

1. Is it possible to add masking to an embedding layer (with a flattened 2d input) such that it can be reshaped properly before reaching the LSTM layer?
Or, more simply:
2. Is there a way to give 2d input to an embedding layer such that the input is (x, y) and the output is (x, y*embeddingOutputSize)? 
(instead of (x, y, embeddingOutputSize) which is what happens right now)

Eddie Yolo

unread,
Aug 1, 2016, 6:09:08 PM8/1/16
to Keras-users
Ok, after looking through the source code I found out why this wasn't working. My suspicion that the documentation is wrong turned out to be correct (for models built with the functional API that is - see here). I'll file an issue for it. For anyone looking for a fast answer, here it is (this is for the original question, not the questions below!):

Use batch_shape on Input "layers" to specify the size of each batch (instead of batch_input_size) and make sure that you end up with a 3D shape coming out of your merge layer (you can check this using layer._keras_shape), you could do this any number of ways, here's one such way. This is copy and paste ready, so I hope this can help someone!

numSamples = 1337
aX
= np.random.random_integers(0, 5344, (numSamples, 1))
bX
= np.random.random_integers(0, 9452, (numSamples, 1))
cX
= np.random.random_integers(0, 19239, (numSamples, 392))
dX
= np.random.random((numSamples, 1, 64))
y
= np.random.random((numSamples, 16))

A
= Input(shape=(1,), batch_shape=(1, 1), dtype='int32')

embeddedA
= Embedding(input_dim=5345, output_dim=16, input_length=1)(A)


B
= Input(shape=(1,), batch_shape=(1, 1), dtype='int32')

embeddedB
= Embedding(input_dim=9453, output_dim=16, input_length=1)(B)


C
= Input(shape=(392,), batch_shape=(1, 392), dtype='int32')
embeddedC
= Embedding(input_dim=19240, output_dim=16, input_length=392)(C) # (392, 16)
embeddedC
= Reshape((1, 392*16))(embeddedC) # Flatten (392, 16) per batch, so (1, 392*16)

D
= Input(shape=(1, 64), batch_shape=(1, 1, 64))


mergedX
= merge([
    embeddedA
,
    embeddedB
,
    embeddedC
,
    D
],

    mode
='concat') # (1, 394*16 + 64)

a
= LSTM(256, stateful=True, return_sequences=True)(mergedX)

b
= LSTM(256, stateful=True)(a)
c
= Dense(128, activation='relu')(b)


Y
= Dense(16, activation="softmax")(c)

model
= Model(input=[A, B, C, D], output=[Y])

model
.compile(loss="categorical_crossentropy",
    optimizer
='rmsprop',
    metrics
=["accuracy"])

for i in range(numSamples):
   
   
# Add your state reset logic here, for example during use of extremely long sequences with high variability (that's the only good reason you'd even be doing it this way...)
    decidedToResetStateForCurrentBatchDependingOnSomeCondition
= False
   
if decidedToResetStateForCurrentBatchDependingOnSomeCondition:
        model
.reset_states()
   
    loss
= model.train_on_batch([aX[[i]], bX[[i]], cX[[i]], dX[[i]]], [y[[i]]])

    progress
= int(math.floor(30.0 * (i + 1) / numSamples))
    progressBar
= '\r' + str(i + 1) + '/' + str(numSamples) + ' [' + ('=' * progress) + ('>' if 0 < progress < 30 else '') + ('.' * (30 - progress)) + '] - loss: %f - acc: %f'%tuple(loss)  
    sys
.stdout.write(progressBar)
    sys
.stdout.flush()

One thing to note about this is that it doesn't take advantage of any batching... So essentially this is pretty slow. To make this work the fastest possible for your use case, simply set the batch size to the lowest common denominator of your sequence lengths, for my case and the most general case, this is 1.

Emil Hovad

unread,
Apr 21, 2021, 5:44:21 AM4/21/21
to Keras-users
# change merge "layer" to this and the example works with keras 2.2.4
concatenate([embeddedA, embeddedB, embeddedC, D], axis=-1)
Reply all
Reply to author
Forward
0 new messages