I'm trying to build model that label each element in a sequence, where size of the inputs is equal to size of outputs. I'm using Keras 0.3.2. Dealing with padded values, I'm expecting that "mask_zero" parameter in 1st Embedding layer should lead to the same results as using "sample_weight" with temporal mode within fit() function. However, it seems that results are quite different. More specifically, running this code
X = pad_sequences(np.random.randint(1, 100, (1000, 10)), maxlen=20)
y = pad_sequences(np.random.randint(1, 10, (1000, 10)), maxlen=20)
Y = np.array([to_categorical(yt, 10) for yt in y])
W = (y>0).astype('float')
print('Model1: masking inputs using mask_zero=True:')
m = Sequential()
m.add(Embedding(input_dim=100, output_dim=4, input_length=20, mask_zero=True))
m.add(LSTM(output_dim=4, return_sequences=True))
m.add(TimeDistributedDense(10, activation='softmax'))
m.compile(loss='categorical_crossentropy', class_mode='categorical', optimizer='adam')
m.fit(X, Y, nb_epoch=10, show_accuracy=True)
W1 = m.layers[-1].get_weights()[0]
print('Model2: masking outputs using sample_weight:')
m = Sequential()
m.add(Embedding(input_dim=100, output_dim=4, input_length=20))
m.add(LSTM(output_dim=4, return_sequences=True))
m.add(TimeDistributedDense(10, activation='softmax'))
m.compile(loss='categorical_crossentropy', class_mode='categorical', optimizer='adam', sample_weight_mode='temporal')
m.fit(X, Y, nb_epoch=10, show_accuracy=True, sample_weight=W)
W2 = m.layers[-1].get_weights()[0]
print('Model1 == Model2 is {0}, error is {1}'.format(np.all(W1==W2), np.sum((W1-W2)**2)))
outputs different model weights:
Model1 == Model2 is False, error is 16.270280838
Could you please tell me why "mask_zero" is not equivalent to "sample_weight" in this case?