Variational Recurrent Autoencoder (VRAE) or/and RNN Encoder-Decoder

39 views
Skip to first unread message

Jean-Pie...@lip6.fr

unread,
Oct 31, 2019, 5:50:54 PM10/31/19
to Keras-users
Hi !

I am trying to implement a simple Variational Recurrent Autoencoder (VRAE), i.e., an autoencoder (variational) with both its encoder and its decoder being RNNs (LSTM in practice). (This is also named a (variational) RNN Encoder-Decoder).
I have already tried out (tested) RNNs, Autoencoders and Variational Autoencoders, thanks to the nice code available in Keras doc/blogs (e.g., the one on Autoencoders is very welll done).
I have found a Keras simple implementation of a Sequence-to-Sequence architecture (e.g., for translation), also from François Chollet (Keras' designer).
It works also fine !

However, when trying to derive a RNN Encoder-Decoder, I get stuck, although at first it would appear as a simplification from a Sequence to Sequence (because in the case of a VRAE, input and output are of same nature and length).
Adding the variational extra-constraint should be easy.
The issue is to manage well the passing of the accumulated states but I get a bit lost with the connexion between Keras layers and shapes/inputs...
I want an interface (input) to be able to inject values for the latent variables of the autoencoder and feedforward them up into the decoder's RNN (in order to seed up the RNN and then recursively create successive items/notes through some loop).

The motivation is to generate arbitrary long melodies with some control over the latent variables space. See, ex: VRAE or MusicVAE architectures (and papers).

Here is a quick sketch (not functioning yet :)

# Encoder
encoder_input = Input(shape=(None, data_dim))
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_output, state_h, state_c = encoder_lstm(encoder_input)
encoder_state = [state_h, state_c]

#Decoder
decoder_input = Input(shape=(latent_dim, ))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)(encoder_lstm)
decoder_output, _, _ = decoder_lstm(decoder_input, initial_state=encoder_states)
decoder_output = Dense(data_dim, activation='softmax')(decoder_output)

#Autoencoder
model = Model(encoder_input, decoder_output)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit(X_train, X_train) # or X_train_with_offset (teacher forcing) ?

If anyone has an experience by any chance?!
Thank you in advance !

jean-pierre

Reply all
Reply to author
Forward
0 new messages