Is it Possible to Separate Audio Layers from a Music Track using Deep Learning?

4,628 views
Skip to first unread message

Prasanna Andoju

unread,
Jun 18, 2017, 1:25:49 AM6/18/17
to Magenta Discuss
When I listen to any old song, I sometimes feel that artist should have used a different drum, not used that instrument or increase the tempo, etc.,

While reading, I had an idea and want to run by you to see its feasibility. Can we create an application which separates a music piece into separate audio layers and allows users to edit these layers to their sensibilities to recreate new music piece?

Let's suppose take a Coldplay song, Fix you. It primarily consists of vocals, drums, guitar and bass guitar tracks. During the mixing process, an audio engineer would add audio effects like reverb, fade in, fade out, etc., to these audio tracks which would turn it into a song.


Input: fix you song

Algorithm:  
  1. should separate each layer
  2. identify audio effects
Output:
  1. each audio layer
  2. audio effects used for each audio layer
  3. start time, end time for each audio effect
With the output, a user can
  1. edit an audio layer
  2. replace/remove an audio layer
  3. edit an audio effect
  4. replace/remove an audio effect

Do you think it's possible to accomplish these things with the help of deep learning? These advances would greatly help musicians to create new music easily. Most of the music creation heavily depends on sampling the old music, this technique of separating a source into its audio layers and reuse the components would benefit the industry.Also, people can come up with new styles by fusing disparate sources of music.

Leonardo O. Gabrielli

unread,
Jun 19, 2017, 3:55:25 AM6/19/17
to Prasanna Andoju, Magenta Discuss
Hi,
the problem you are referring to is called "blind source separation" and there's a lot of scientific literature you can read about it. Applications to audio tracks also exist, far from perfect, but they exist. In the past many used regular DSP techniques to extract vocals from the rest based on some assumptions, but with deep learning we can do more. However I think we are far from what you propose, in terms of quality. But if you happen to find something that sounds very good to you please report here, I'd be glad to see that ML has advanced further in this field.

Regards

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Matthias Orgler

unread,
Jun 19, 2017, 6:37:14 PM6/19/17
to Magenta Discuss
Hey Prasanna, that is a great idea. Although I'm just getting started with TensorFlow and Magenta, I have decades of experience in music. I see that part of what you're wishing for should be possible, but other parts will not be possible.

What should work is to get the separate tracks from the original recording and train the system with the mixed track as input data and the separate tracks as output (training) data. If you don't have the tracks from Coldplay, you could also train it on tracks of smaller bands, where it might be easier to get a hold of the files from the studio.

Then it depends on whether we want to extract the audio of the track or "just" the notes played. To get the audio of the track, the system must get the raw waveform sample data as input and output, which would be very interesting to see (I have no feeling how feasible it is). If you want to extract the notes (i.e. MIDI) from the track, this touches the more familiar realm of extracting note frequencies from a song.

As for the audio FX and the mix: you will never be able to extract these. It's like if you bake a cake and then try to extract a raw egg from it – just too much entropy and too much information "lost". But what should be possible is to extract the general type of effect (reverb, delay, chorus, distortion), which would already help. I suppose you would have to train the system on how certain effects sound like (or "look like" in the waveform).

All in all very interesting. I'd be willing to try to produce waveform sample output from waveform sample input – if anyone wants to jump in.

Cheers,

Matthias

John Theo

unread,
Jun 19, 2017, 9:49:40 PM6/19/17
to Matthias Orgler, Magenta Discuss
Hi everyone! 
It is my Mastering objective =)
So far I'm traveling around, timbre extraction and classification but further I wish I simulate the musical/musician ear behavior. Maybe using unsupervised learning or even reinforcement learning. So, if someone else interested on this topic, please get in touch and we can share some knowledge.

[]s

John Theo

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.
--
John Theo
(41) 9651-8181

John Pope

unread,
Jul 10, 2017, 10:44:38 AM7/10/17
to John Theo, Matthias Orgler, Magenta Discuss
Hi Matthias ,

Today maybe your luck day. I found this code on github - it may help people get started with this

Looking forward to Adobe doing some of this stuff out of the box 
You also maybe interested in this - Two Minute Papers youtube channel / 
Text-based Editing of Audio Narration - 

Seems like it's only a matter of time before this kind of technology spills over to music 

John Pope

On Mon, Jun 19, 2017 at 9:49 PM, John Theo <john.th...@gmail.com> wrote:
Hi everyone! 
It is my Mastering objective =)
So far I'm traveling around, timbre extraction and classification but further I wish I simulate the musical/musician ear behavior. Maybe using unsupervised learning or even reinforcement learning. So, if someone else interested on this topic, please get in touch and we can share some knowledge.

[]s

John Theo

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.
--
John Theo
(41) 9651-8181

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.



--




John Pope | Senior Software Engineer

Fieldstorm

Level 5, 70 City Road, Southbank 
VIC Australia 3006, Australia

Lucas Castelnovo

unread,
Jul 11, 2017, 2:07:19 AM7/11/17
to Magenta Discuss, mor...@googlemail.com
Hello! 
This is also something that comes to my mind everytime and also one of my biggest dreams: train a RNN to do exactly this. Let's get in touch so we can share knowledge!


Greetings
Message has been deleted

avin...@kakaobrain.com

unread,
Jul 11, 2017, 9:36:57 AM7/11/17
to Magenta Discuss
Recently I've worked about music source separation task using deep learning.
(Reference git repo: https://github.com/andabi/music-source-separation)

I found that separating singing voice from music was quite possible by using only 3-layer deep neural net. (deep neural net is cool!)
Further, I'm planning to apply advanced neural nets and use more music data like k-pop ;)

VivekP

unread,
Oct 9, 2019, 7:03:28 AM10/9/19
to Magenta Discuss, mor...@googlemail.com
Hey john theo, I hope you have made some research on this thing, I need your little help. Idk if youre active or not but i am working on the same project called "music instrument identification" from an mixed audio signal. 
To unsubscribe from this group, send email to magenta...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta...@tensorflow.org.

Phil Reyneri

unread,
Oct 9, 2019, 3:09:34 PM10/9/19
to VivekP, Magenta Discuss, mor...@googlemail.com

To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.


--
Phil Reyneri

CJ Carr

unread,
Oct 9, 2019, 3:53:51 PM10/9/19
to Phil Reyneri, VivekP, Magenta Discuss, mor...@googlemail.com
Try Wave-U-Net 
The pretrained net even separates death metal growls.. it works on generative audio too
Reply all
Reply to author
Forward
0 new messages