Try my 48kHz/stereo implementation of DDSP

244 views
Skip to first unread message

Josh

unread,
Aug 9, 2020, 12:40:54 AM8/9/20
to Magenta Discuss
I was able to improve on the already excellent sound quality. I also added variable resynthesis length.

https://colab.research.google.com/github/DemonFlexCouncil/DDSP-48kHz-Stereo/blob/master/ddsp/colab/ddsp_train_and_timbre_transfer_48kHz_stereo.ipynb

Jesse Engel

unread,
Aug 9, 2020, 1:34:27 AM8/9/20
to Josh, Magenta Discuss
Just checked it out, nice! I'd love to hear some samples if you have some that you'd like to share.

On Sat, Aug 8, 2020 at 9:40 PM Josh <josh.l...@gmail.com> wrote:
I was able to improve on the already excellent sound quality. I also added variable resynthesis length.

https://colab.research.google.com/github/DemonFlexCouncil/DDSP-48kHz-Stereo/blob/master/ddsp/colab/ddsp_train_and_timbre_transfer_48kHz_stereo.ipynb

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Oscar Friedman

unread,
Aug 9, 2020, 1:52:54 AM8/9/20
to Jesse Engel, Josh, Magenta Discuss
Hi Josh,

Trying it out now! Could you please share any interesting examples with me as well? Also, do you happen to know whether bit depth is limited for timbre transfer? I remember NSynth was only 8-bit.

Thanks,
Oscar
--
Oscar

Jesse Engel

unread,
Aug 10, 2020, 2:59:23 AM8/10/20
to Oscar Friedman, Josh, Magenta Discuss
DDSP is full bit depth (more actually, 32 bit)

Josh

unread,
Aug 10, 2020, 4:06:33 AM8/10/20
to Magenta Discuss, josh.l...@gmail.com
Hi Jesse, here are some examples of the algorithm resynthesizing single tones from a nyckelharpa sample set:

https://demonflexcouncil.wixsite.com/demonflexcouncil/wavs

I’m curious what you think of my parameters: fft 6144, 120 harmonics, spectral losses on 4096 and 8192 in addition to what you had.


On Saturday, August 8, 2020 at 10:34:27 PM UTC-7, Jesse Engel wrote:
Just checked it out, nice! I'd love to hear some samples if you have some that you'd like to share.

On Sat, Aug 8, 2020 at 9:40 PM Josh <josh.l...@gmail.com> wrote:
I was able to improve on the already excellent sound quality. I also added variable resynthesis length.

https://colab.research.google.com/github/DemonFlexCouncil/DDSP-48kHz-Stereo/blob/master/ddsp/colab/ddsp_train_and_timbre_transfer_48kHz_stereo.ipynb

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.

Josh

unread,
Aug 10, 2020, 4:08:28 AM8/10/20
to Magenta Discuss, jesse...@google.com, josh.l...@gmail.com
Hi Oscar, here are some examples of the algorithm resynthesizing single tones from a nyckelharpa sample set:

https://demonflexcouncil.wixsite.com/demonflexcouncil/wavs

Good luck and please let me know if you run into any issues.


On Saturday, August 8, 2020 at 10:52:54 PM UTC-7, Oscar Friedman wrote:
Hi Josh,

Trying it out now! Could you please share any interesting examples with me as well? Also, do you happen to know whether bit depth is limited for timbre transfer? I remember NSynth was only 8-bit.

Thanks,
Oscar
On Sat, Aug 8, 2020 at 10:34 PM 'Jesse Engel' via Magenta Discuss <magenta...@tensorflow.org> wrote:
Just checked it out, nice! I'd love to hear some samples if you have some that you'd like to share.

On Sat, Aug 8, 2020 at 9:40 PM Josh <josh.l...@gmail.com> wrote:
I was able to improve on the already excellent sound quality. I also added variable resynthesis length.

https://colab.research.google.com/github/DemonFlexCouncil/DDSP-48kHz-Stereo/blob/master/ddsp/colab/ddsp_train_and_timbre_transfer_48kHz_stereo.ipynb

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.
--
Oscar

Adam Roberts

unread,
Aug 10, 2020, 8:00:57 AM8/10/20
to Josh, Magenta Discuss, Jesse Engel
Those sound quite good to my ear!

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Josh

unread,
Aug 10, 2020, 5:53:39 PM8/10/20
to Magenta Discuss, josh.l...@gmail.com, jesse...@google.com
Thanks Adam!

Jesse Engel

unread,
Aug 10, 2020, 8:12:01 PM8/10/20
to Josh, Magenta Discuss
Yah, real nice work! Overall I'm impressed that it sounds so good with straigtforward modifications. It seems to struggle a little bit with dampening down the reverb and filtered noise, but your dataset is super small, so I'm not totally surprised. The values you've added are very reasonable. I took a quick look at your code, just so I'm clear, it looks like you train a separate network entirely for left, right, and mono signals is that right?

Josh

unread,
Aug 10, 2020, 10:49:16 PM8/10/20
to Magenta Discuss, josh.l...@gmail.com
Thanks, yeah it’s totally separate for left, right, and mono. I agree that the reverb is harsh compared to the dataset. I’m going to try some larger datasets to see if it fixes the problem. I also may try changing the reverb type and center frequency. I thought you had it set at 4kHz because Nyquist at 16kHz is 8kHz, but by the same logic 12kHz seems too high in the hearing range.

Jesse Engel

unread,
Aug 11, 2020, 2:11:45 PM8/11/20
to Josh, Magenta Discuss
Cool, a few follow ups. 

* I think since you're training them all separately you can actually simplify your code quite a lot, but just having a single channel model, and training 3 of them on 3 different datasets (left, right, mono), and then generating from the three models independently. Then the only code that's specific to your repo is using a different gin config for larger sample rates I believe.

* How do you combine the audio at the end? Do you just add left right and mono together? Left/right decomposition makes sense to me, but it seems like the mono as well would be redundant / louder than the input.

* Not sure what you mean about the center frequency for reverb. The FilteredNoiseReverb has explicit frequencies, but it tiles frequency space linearly.


To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Josh

unread,
Aug 11, 2020, 6:19:40 PM8/11/20
to Magenta Discuss, josh.l...@gmail.com
Yes, one channel with separate datasets is a good idea. I’ll look into it and update you when I have something.

I combine the left and right channels at the end, but the strange thing is that the generated audio is about 6-12dB softer than the dataset when training on multi-timbral material and about the same level training with single notes. The mono renders are the same level as stereo whatever dataset I use.

I think I just picked up center_frequency from from 1_synths_and_effects.ipynb. Now that I’m looking at it again, it appears to be a non-issue.

Oscar Friedman

unread,
Aug 17, 2020, 2:28:37 PM8/17/20
to Josh, Magenta Discuss
Hi Josh,

I tried your 48khz/stereo implementation of DDSP trained on a female vocalist and then used the dry lead vocal from Katy Perry - Roar as a primer. 
Though the left and right channels for both the training data and the primer are identical, the output has a strange stereo effect where the right channel is a vocoder-like sound and the left channel is more what I was expecting.

Any insights?


To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.


--
Oscar
Ariel Roar Stereo.mp3
Ariel Roar R.wav
Ariel Roar L.wav

Josh

unread,
Aug 17, 2020, 6:04:00 PM8/17/20
to Magenta Discuss, josh.l...@gmail.com
Hey Oscar, the only thing I can think of is try shortening your training time. One time I ran about 60-70K steps and the same thing happened, but that was without latent vectors and different data for L and R. Usually long training times are OK, but maybe there’s something in your dataset or primer that works against it.
Reply all
Reply to author
Forward
0 new messages