Parag K. Mital, Ph.D. / Director of Machine Intelligence
pa...@kadenze.com
Kadenze, Inc. Office: (661) 367-1361
27200 Tourney Rd / Ste. 350 / Valencia, CA 91355
Kadenze and Kannu are trademarks of Kadenze, Inc.
We condition the vanilla WaveNet decoder with this embedding by upsampling it to the original time resolution, applying a 1x1 convolution, and finally adding this result as a bias to each of the decoder’s thirty layers. Note that this conditioning is not external as it’s learned by the model. Since the embeddings bias the autoregressive system, we can imagine it acting as a driving function for a nonlinear oscillator. This interpretation is corroborated by the fact that the magnitude contours of the embeddings mimic those of the audio itself.
--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.
Hi,I'm maybe a little bit outdated because I started working on this research branch in 2016 when the first wavenet paper came out and later left it for sampleRNN which provides similar results at much lower computational cost (with pros and cons).There are already a large number of wavenet decoder implementations, but I just can't understand how are people going to handle all the computational power needed by it. Differently from the previous magenta stuff based on MIDI, here you need a lot of resources. We barely generated some flimsy sounds with days of training on a Titan X GPU.What are other people's experiences? Are you able to generate sounds? Are recent implementation faster?Nonetheless, Magenta project got it right, I think entangled generation is much more significant to music research than midi stuff and hope to see cool developments on this. I'm writing my opinion regarding this on a Computer Music Journal letter to appear in the next months.Best regards
A difference in this case is that we've already done the expensive training for you. With a single TitanX and a proper sampling algorithm, the model we released should generate around two seconds of audio every minute (batch size 16).
On Apr 12, 2017 2:07 AM, "Leonardo O. Gabrielli" <leonardo.o.gabrielli@gmail.com> wrote:
Hi,I'm maybe a little bit outdated because I started working on this research branch in 2016 when the first wavenet paper came out and later left it for sampleRNN which provides similar results at much lower computational cost (with pros and cons).There are already a large number of wavenet decoder implementations, but I just can't understand how are people going to handle all the computational power needed by it. Differently from the previous magenta stuff based on MIDI, here you need a lot of resources. We barely generated some flimsy sounds with days of training on a Titan X GPU.What are other people's experiences? Are you able to generate sounds? Are recent implementation faster?Nonetheless, Magenta project got it right, I think entangled generation is much more significant to music research than midi stuff and hope to see cool developments on this. I'm writing my opinion regarding this on a Computer Music Journal letter to appear in the next months.Best regards