Is it Possible to Separate Audio Layers from a Music Track using Deep Learning?

Prasanna Andoju

unread,

Jun 18, 2017, 1:25:49 AM6/18/17

to Magenta Discuss

When I listen to any old song, I sometimes feel that artist should have used a different drum, not used that instrument or increase the tempo, etc.,

While reading, I had an idea and want to run by you to see its feasibility. Can we create an application which separates a music piece into separate audio layers and allows users to edit these layers to their sensibilities to recreate new music piece?

Let's suppose take a Coldplay song, Fix you. It primarily consists of vocals, drums, guitar and bass guitar tracks. During the mixing process, an audio engineer would add audio effects like reverb, fade in, fade out, etc., to these audio tracks which would turn it into a song.

Input: fix you song

Algorithm:

should separate each layer
identify audio effects

Output:

each audio layer
audio effects used for each audio layer
start time, end time for each audio effect

With the output, a user can

edit an audio layer
replace/remove an audio layer
edit an audio effect
replace/remove an audio effect

Do you think it's possible to accomplish these things with the help of deep learning? These advances would greatly help musicians to create new music easily. Most of the music creation heavily depends on sampling the old music, this technique of separating a source into its audio layers and reuse the components would benefit the industry.Also, people can come up with new styles by fusing disparate sources of music.

Leonardo O. Gabrielli

unread,

Jun 19, 2017, 3:55:25 AM6/19/17

to Prasanna Andoju, Magenta Discuss

Hi,

the problem you are referring to is called "blind source separation" and there's a lot of scientific literature you can read about it. Applications to audio tracks also exist, far from perfect, but they exist. In the past many used regular DSP techniques to extract vocals from the rest based on some assumptions, but with deep learning we can do more. However I think we are far from what you propose, in terms of quality. But if you happen to find something that sounds very good to you please report here, I'd be glad to see that ML has advanced further in this field.

Regards

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Matthias Orgler

unread,

Jun 19, 2017, 6:37:14 PM6/19/17

to Magenta Discuss

Hey Prasanna, that is a great idea. Although I'm just getting started with TensorFlow and Magenta, I have decades of experience in music. I see that part of what you're wishing for should be possible, but other parts will not be possible.

What should work is to get the separate tracks from the original recording and train the system with the mixed track as input data and the separate tracks as output (training) data. If you don't have the tracks from Coldplay, you could also train it on tracks of smaller bands, where it might be easier to get a hold of the files from the studio.

Then it depends on whether we want to extract the audio of the track or "just" the notes played. To get the audio of the track, the system must get the raw waveform sample data as input and output, which would be very interesting to see (I have no feeling how feasible it is). If you want to extract the notes (i.e. MIDI) from the track, this touches the more familiar realm of extracting note frequencies from a song.

As for the audio FX and the mix: you will never be able to extract these. It's like if you bake a cake and then try to extract a raw egg from it – just too much entropy and too much information "lost". But what should be possible is to extract the general type of effect (reverb, delay, chorus, distortion), which would already help. I suppose you would have to train the system on how certain effects sound like (or "look like" in the waveform).

All in all very interesting. I'd be willing to try to produce waveform sample output from waveform sample input – if anyone wants to jump in.

Cheers,

Matthias

John Theo

unread,

Jun 19, 2017, 9:49:40 PM6/19/17

to Matthias Orgler, Magenta Discuss

Hi everyone!

It is my Mastering objective =)

So far I'm traveling around, timbre extraction and classification but further I wish I simulate the musical/musician ear behavior. Maybe using unsupervised learning or even reinforcement learning. So, if someone else interested on this topic, please get in touch and we can share some knowledge.

[]s

John Theo

--

Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--

John Theo

(41) 9651-8181

John Pope

unread,

Jul 10, 2017, 10:44:38 AM7/10/17

to John Theo, Matthias Orgler, Magenta Discuss

Hi Matthias ,

Today maybe your luck day. I found this code on github - it may help people get started with this

https://github.com/andabi/music-source-separation

Looking forward to Adobe doing some of this stuff out of the box

You also maybe interested in this - Two Minute Papers youtube channel /

Text-based Editing of Audio Narration -

https://www.youtube.com/watch?time_continue=2&v=ldO7RD3s4_s

Seems like it's only a matter of time before this kind of technology spills over to music

John Pope

On Mon, Jun 19, 2017 at 9:49 PM, John Theo <john.th...@gmail.com> wrote:

Hi everyone!
It is my Mastering objective =)
So far I'm traveling around, timbre extraction and classification but further I wish I simulate the musical/musician ear behavior. Maybe using unsupervised learning or even reinforcement learning. So, if someone else interested on this topic, please get in touch and we can share some knowledge.

[]s

John Theo

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

--
John Theo
(41) 9651-8181

--

Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

--

John Pope | Senior Software Engineer

Fieldstorm

Level 5, 70 City Road, Southbank

VIC Australia 3006, Australia

T +61423562387 | E john...@fieldstormapp.com

IN http://www.linkedin.com/in/jdpope

Lucas Castelnovo

unread,

Jul 11, 2017, 2:07:19 AM7/11/17

to Magenta Discuss, mor...@googlemail.com

Hello!

This is also something that comes to my mind everytime and also one of my biggest dreams: train a RNN to do exactly this. Let's get in touch so we can share knowledge!

lucas.ca...@gmail.com

Greetings

Message has been deleted

avin...@kakaobrain.com

unread,

Jul 11, 2017, 9:36:57 AM7/11/17

to Magenta Discuss

Recently I've worked about music source separation task using deep learning.

(Reference git repo: https://github.com/andabi/music-source-separation)

I found that separating singing voice from music was quite possible by using only 3-layer deep neural net. (deep neural net is cool!)

Further, I'm planning to apply advanced neural nets and use more music data like k-pop ;)

VivekP

unread,

Oct 9, 2019, 7:03:28 AM10/9/19

to Magenta Discuss, mor...@googlemail.com

Hey john theo, I hope you have made some research on this thing, I need your little help. Idk if youre active or not but i am working on the same project called "music instrument identification" from an mixed audio signal.

To unsubscribe from this group, send email to magenta...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.

Phil Reyneri

unread,

Oct 9, 2019, 3:09:34 PM10/9/19

to VivekP, Magenta Discuss, mor...@googlemail.com

Individual audio tracks are known as "stems."

Ale Koretzky has done a great bit of writing on the topic using CNNs (with code) here:

https://towardsdatascience.com/audio-ai-isolating-instruments-from-stereo-music-using-convolutional-neural-networks-584ababf69de

and here:

https://towardsdatascience.com/audio-ai-isolating-vocals-from-stereo-music-using-convolutional-neural-networks-210532383785

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

--

Phil Reyneri

philreyneri.com

CJ Carr

unread,

Oct 9, 2019, 3:53:51 PM10/9/19

to Phil Reyneri, VivekP, Magenta Discuss, mor...@googlemail.com

Try Wave-U-Net

https://github.com/f90/Wave-U-Net

The pretrained net even separates death metal growls.. it works on generative audio too

Reply all

Reply to author

Forward