Wavenet setup

255 views
Skip to first unread message

Alexander Whillas

unread,
Oct 20, 2016, 6:59:11 AM10/20/16
to Keras-users
Hi y'all,

I have just read the paper associated with this (i encourage a quick skim, the diagrams are amazing!):
very human speech synthesis (and piano synthesis!).

The English researches seem to be talking another language when it comes to deepnets, so i'm unsure if their notion of dilated convolution nets is just another way of looking at 1D pooling:

"It is a fully convolutional neural network, where the convolutional layers have various dilation factors that allow its receptive field to grow exponentially with depth and cover thousands of timesteps." 
(there is a great animation that illustrates this description).

Am i crazy or is this just a series of 1D convolutional layers in parallel that feed into a final MLP later of 1 node with a softmax?

I'm going have a play in Keras to see if i can get speech synthesis happening (assuming i can find some data). Scratching my head a little on the Keras architecture but will gve it a shot (any initial thoughts would be welcome).

alex 

Alexander Whillas

unread,
Oct 20, 2016, 7:00:10 AM10/20/16
to Keras-users

Here's that cool diagram i was talking about

Gökçen Eraslan

unread,
Oct 20, 2016, 7:18:10 AM10/20/16
to Alexander Whillas, Keras-users
It's 1D conv but a dilated one:

https://keras.io/layers/convolutional/#atrousconvolution1d

It's not doable just by using different strides in each layer.

Goekcen.

On 2016-10-20 13:00, Alexander Whillas wrote:
>
<https://lh3.googleusercontent.com/LG5dLIqDTDKNiSCsRtrt8_B0at9slkrdVxVO2BRJ6Hva6asqP2vsixIsuLZt-cS1QYy9B7Tw9mrjCviL7e1I7_sa>

Alexander Whillas

unread,
Oct 20, 2016, 8:17:26 AM10/20/16
to Keras-users, whi...@gmail.com
Hey thanks Goekcen!

I'm impressed that Keras has this option already.

I'm fuzzy on the AtrousConvolution1D. Is there a paper i should look at that explains it?

thanks

alex

Daπid

unread,
Oct 20, 2016, 8:31:18 AM10/20/16
to Alexander Whillas, Keras-users
The reference is in the Atrousconvolution2D:

https://arxiv.org/abs/1511.07122

But you can just look at the figures 2 and 3 of this paper:

https://arxiv.org/abs/1606.00915
> --
> You received this message because you are subscribed to the Google Groups
> "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to keras-users...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/keras-users/a31d1144-1bb6-401c-8670-a9a3b9b6a70d%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

Alexander Whillas

unread,
Oct 20, 2016, 9:54:12 AM10/20/16
to Keras-users, whi...@gmail.com
Hey, 
Thanks David. I was looking at the documentation and couldn't see the link to the research paper(s), which i thought strange as there usually is one in the Keras documentation?

I'll have a read and hopefully get my head around it.

The other major problem is finding some training data. FYI i've found (in case anyone is interested):

voxforge - lots but poor quality.

and these guys
vocal ID - they are collecting from the public but not sharing :(

And for music there is:
Reply all
Reply to author
Forward
0 new messages