The network consisted of two convolutional layers with 128 filters span-
ning 6 timesteps and the entire frequency range, yielding one-dimensional
convolutions. A max-pooling layer with a pooling window of 4 timesteps
followed the first convolutional layer, and one with a pooling window of 5
timesteps followed the second convolutional layer. A fully connected hidden
layer with 400 units was stacked on top of this, and finally the output layer
had a number of units equaling the number of latent factors to predict.
That is expanded in this reddit's thread.
K.set_image_data_format('channels_first')
input_shape = (128, 130, 1)
n_filters = 128
num_classes = 50model = Sequential()
model.add(Conv2D(n_filters, 4, padding='valid', strides=1,
input_shape=input_shape))
model.add(Activation('relu'))
model.add(Conv2D(n_filters, 5))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(4,1)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(400))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))ValueError: ('The specified size contains a dimension with value <= 0', (-23400, 400))
model.add(Conv2D(n_filters, 4, padding='valid', strides=1,
input_shape=input_shape))