Wavenet setup

Alexander Whillas

unread,

Oct 20, 2016, 6:59:11 AM10/20/16

to Keras-users

Hi y'all,

I have just read the paper associated with this (i encourage a quick skim, the diagrams are amazing!):

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

very human speech synthesis (and piano synthesis!).

The English researches seem to be talking another language when it comes to deepnets, so i'm unsure if their notion of dilated convolution nets is just another way of looking at 1D pooling:

"It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps."

(there is a great animation that illustrates this description).

Am i crazy or is this just a series of 1D convolutional layers in parallel that feed into a final MLP later of 1 node with a softmax?

I'm going have a play in Keras to see if i can get speech synthesis happening (assuming i can find some data). Scratching my head a little on the Keras architecture but will gve it a shot (any initial thoughts would be welcome).

alex

Alexander Whillas

unread,

Oct 20, 2016, 7:00:10 AM10/20/16

to Keras-users

Here's that cool diagram i was talking about

Gökçen Eraslan

unread,

Oct 20, 2016, 7:18:10 AM10/20/16

to Alexander Whillas, Keras-users

It's 1D conv but a dilated one:

https://keras.io/layers/convolutional/#atrousconvolution1d

It's not doable just by using different strides in each layer.

Goekcen.

On 2016-10-20 13:00, Alexander Whillas wrote:
>
<https://lh3.googleusercontent.com/LG5dLIqDTDKNiSCsRtrt8_B0at9slkrdVxVO2BRJ6Hva6asqP2vsixIsuLZt-cS1QYy9B7Tw9mrjCviL7e1I7_sa>

Alexander Whillas

unread,

Oct 20, 2016, 8:17:26 AM10/20/16

to Keras-users, whi...@gmail.com

Hey thanks Goekcen!

I'm impressed that Keras has this option already.

I'm fuzzy on the AtrousConvolution1D. Is there a paper i should look at that explains it?

thanks

alex

Daπid

unread,

Oct 20, 2016, 8:31:18 AM10/20/16

to Alexander Whillas, Keras-users

The reference is in the Atrousconvolution2D:

https://arxiv.org/abs/1511.07122

But you can just look at the figures 2 and 3 of this paper:

https://arxiv.org/abs/1606.00915

> --
> You received this message because you are subscribed to the Google Groups
> "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to keras-users...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/keras-users/a31d1144-1bb6-401c-8670-a9a3b9b6a70d%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Alexander Whillas

unread,

Oct 20, 2016, 9:54:12 AM10/20/16

to Keras-users, whi...@gmail.com

Hey,

Thanks David. I was looking at the documentation and couldn't see the link to the research paper(s), which i thought strange as there usually is one in the Keras documentation?

I'll have a read and hopefully get my head around it.

The other major problem is finding some training data. FYI i've found (in case anyone is interested):

voxforge - lots but poor quality.

The Harvard-Haskins Database of Regularly-Timed Speech

TSP Speech Database

CMU_ARCTIC speech synthesis databases

and these guys

vocal ID - they are collecting from the public but not sharing :(

And for music there is:

magnatagatune

Reply all

Reply to author

Forward