How to reconstruct a spectrogram with a CAE using 1D convolutions?

802 views
Skip to first unread message

mattia....@gmail.com

unread,
Apr 25, 2017, 11:43:34 AM4/25/17
to Keras-users
I've been writing some script where I'm trying to reconstruct an input spectrogram using a convolutional autoencoder.
I'm trying to interpret the spectrogram both as a raw image and as a temporal sequence. For the latter, I'm using Conv1D.

An example of a layer in the structure is
x = Conv1D(filters=256,
kernel_size=(4),activation=LeakyReLU(),padding='causal',dilation_rate=2,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)

Given an input:
input_img = Input(shape=(500,128))
where in the spectrogram stft_frames=500 and mel_bins=128, I get an encoding representation of shape after some pooling 
(?,125,512)
and that's good. But when reconstructing, the decoding representation has shape
(?,500,1)
that is, seems that I lost my dimension.


Does it make sense? Or am I forced to use Conv2d?
My references are [1][2] and [3].
Thanks for helping.

Daπid

unread,
Apr 25, 2017, 12:01:20 PM4/25/17
to mattia....@gmail.com, Keras-users
It is impossible to know what went wrong without the full code, not just one layer.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/479e9b40-a23a-45a5-86a9-0ff6326dfe23%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted
Message has been deleted
Message has been deleted

mattia....@gmail.com

unread,
Apr 28, 2017, 3:24:17 PM4/28/17
to Keras-users
It makes sense.
This is the structure

import numpy as np

from keras.layers import Input # define the input shape for the model
from keras.layers import Conv1D, MaxPooling1D, UpSampling1D # for the convnet structure
from keras.models import Model # for the overall definition


from keras.initializers import Constant # bias initialisation
from keras.initializers import TruncatedNormal # kernel initialissation
from keras.layers.advanced_activations import LeakyReLU # activation function (from NSynth)


# define input shape
input_img = Input(shape=(500,128))
print('Some information about tensor expected shapes')
print('Input tensor shape:', input_img.shape)


# define encoder convnet
# obs: 1D convolution implemented
x = Conv1D(filters=128,kernel_size=4,activation=LeakyReLU(),padding='causal',dilation_rate=4,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(input_img)
x = Conv1D(filters=256,kernel_size=(4),activation=LeakyReLU(),padding='causal',dilation_rate=2,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)
x = MaxPooling1D(pool_size=4,strides=4)(x)
encoded = Conv1D(filters=512,kernel_size=4,activation=LeakyReLU(),padding='causal',bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)
print('Encoded representation tensor shape:', encoded.shape)


# define decoder convnet
x = Conv1D(filters=256,kernel_size=4,activation=LeakyReLU(),padding='causal',bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(encoded)
x = UpSampling1D(size=4)(x)
x = Conv1D(filters=128,kernel_size=4,activation=LeakyReLU(),padding='causal',dilation_rate=2,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)
decoded = Conv1D(filters=1,kernel_size=4,activation=LeakyReLU(),padding='causal',dilation_rate=4,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)
print('Decoded representation tensor shape:', decoded.shape)


# define overal autoencoder model
cae = Model(inputs=input_img, outputs=decoded)
cae.compile(optimizer='adam', loss='mse')

# check for equal size
# obs: the missing value is the batch_size
if input_img.shape[1:] != decoded.shape[1:]: print('alert: in/out dimension mismatch')

And, with no surprise, I get
alert: in/out dimension mismatch

That's because I lost the mel dimension in the Conv1d operations.
There's no a real error, I'm interested in understanding whether it is possible to reconstruct a spectrogram using such a structure or not.
Thanks.





On Tuesday, 25 April 2017 18:01:20 UTC+2, David Menéndez Hurtado wrote:
It is impossible to know what went wrong without the full code, not just one layer.
On 25 April 2017 at 17:43, <mattia....@gmail.com> wrote:
I've been writing some script where I'm trying to reconstruct an input spectrogram using a convolutional autoencoder.
I'm trying to interpret the spectrogram both as a raw image and as a temporal sequence. For the latter, I'm using Conv1D.

An example of a layer in the structure is
x = Conv1D(filters=256,
kernel_size=(4),activation=LeakyReLU(),padding='causal',dilation_rate=2,bias_initializer=Constant(0.1),kernel_initializer=TruncatedNormal())(x)

Given an input:
input_img = Input(shape=(500,128))
where in the spectrogram stft_frames=500 and mel_bins=128, I get an encoding representation of shape after some pooling 
(?,125,512)
and that's good. But when reconstructing, the decoding representation has shape
(?,500,1)
that is, seems that I lost my dimension.


Does it make sense? Or am I forced to use Conv2d?
My references are [1][2] and [3].
Thanks for helping.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

YS

unread,
Jun 4, 2017, 9:06:28 AM6/4/17
to Keras-users, mattia....@gmail.com
Hi,

did you succeed with the spectrogram autoencoder? Can you share your code?

Thanks 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages