Regarding classification project using NSynth

231 views
Skip to first unread message

Somesh Ganesh

unread,
Dec 1, 2017, 6:52:34 PM12/1/17
to magenta...@tensorflow.org
Hello all,

My name is Somesh Ganesh. I am currently a master's student in the music technology program at Georgia Tech. I work in the music informatics laboratory where our work mainly involves machine learning, music information retrieval and audio DSP.

I am currently working on a musical instrument family classification project. The idea is to use the NSynth dataset to do this instrument family classification to see how the use of electronic and synthetic data impacts this classification as opposed to just using acoustic data. 

We are using the concept of sparse coding along with an SVM classifier for our experiment. The dictionary for sparse coding is being trained on 1000 audio files randomly selected and distributed evenly over the families. Our baseline model was a simple SVM along with MFCCs and their first order and second order differences.

We have divided the instruments from NSynth into our defined families accordingly:
1) Strings - contains bass, guitar and strings 
2) Woodwinds - contains flute, reed and organ
3) Brass - contains brass
4) Non-sustained - contains keyboard and mallet

We are currently getting a lower accuracy than expected and feel the following may be a problem:
Our data distribution amongst the families may not be optimal. We wanted to ask the community since there would definitely be many people who would know the dataset much better than us.

We noticed a lot of differences in the audio files for the same instrument too while listening to a bunch of them. This was obvious but we want to understand how "correlated" files from the same instrument or family are to each other.

Any other suggestions, comments or questions are welcome!
You can reach me at some...@gatech.edu or just post it here so that we can all have a discussion.

Thank you for your time and consideration!
Somesh

Jesse Engel

unread,
Dec 1, 2017, 10:32:34 PM12/1/17
to Somesh Ganesh, Magenta Discuss
Hi Somesh,

Sounds like a fun experiment. At a glance, your data division seems reasonable, however "Lower accuracy than expected" can come from many sources: Bias, Variance, Label Error, and Generalization. 

If you're just looking at training error, you can ignore generalization/variance for now. A good starting point is, for a small subset of the data for which you know the labels are correct (i.e. task could be reasonably solved by a person), how does your model perform? If you still have errors, then your model is likely to low in modeling capacity and may need to increase the hypothesis space (more complex kernels, NNs, etc.).

If you solve the bias problem, but still have errors on the training set, it may be due to the data / labels themselves. If you find a better categorization of the data please feel free to share as I think others could benefit from it. 

All the best,
Jesse

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org
---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Somesh Ganesh

unread,
Dec 2, 2017, 1:04:23 AM12/2/17
to Jesse Engel, Magenta Discuss
Hello Jesse,

Thank you for your inputs. I'll share my progress with everyone on this post once I get done!

Somesh

Abdullah

unread,
Dec 2, 2017, 4:04:29 PM12/2/17
to Magenta Discuss
I might also ask you to reconsider the use of MFCCs. I don't believe they are the best acoustic features for music, and have found people preferring mel-filterbanks (MFCCs without the de-correlating DCT step), especially when working with deep learning methods. May be you can look into trying different acoustic features as well that may be more appropriate (chroma, etc).

Kıvanç Tatar

unread,
Dec 3, 2017, 4:22:53 AM12/3/17
to Abdullah, Magenta Discuss
Following Abdullah's comment, we were unhappy with using MFCCs to calculate sound dissimilarity for our study on synthesizer preset generation with OP-1. We went for a multi-objective approach with three other features instead.

Have a look at our fitness function for sound similarity. Here is the paper:  http://www.tandfonline.com/doi/citedby/10.1080/09298215.2016.1175481

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.
--
Kıvanç Tatar
----------------------------------
PhD Student
Interactive Arts and Technology
Simon Fraser University, Vancouver, Canada
Email: kivan...@gmail.com
Website: https://kivanctatar.wordpress.com/

Somesh Ganesh

unread,
Dec 5, 2017, 11:50:43 AM12/5/17
to Kıvanç Tatar, Abdullah, Magenta Discuss
Thank you Abdullah and Kivanc!

I will look into this and update you all soon!

Somesh

To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.
--
Kıvanç Tatar
----------------------------------
PhD Student
Interactive Arts and Technology
Simon Fraser University, Vancouver, Canada
Email: kivan...@gmail.com
Website: https://kivanctatar.wordpress.com/

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discuss+unsubscribe@tensorflow.org

---
You received this message because you are subscribed to the Google Groups "Magenta Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discuss+unsubscribe@tensorflow.org.

Somesh Ganesh

unread,
Jan 20, 2018, 10:45:52 PM1/20/18
to Magenta Discuss
Hello all,

We took your feedback into consideration and did some other experiments. 
Our final project is titled, "Musical instrument family classification using synthetic data". This is a report we wrote at the end of the class project as a paper in the ISMIR format. I would like to share this with everyone since it might be an interesting read.

Please let me know if you have any questions.

Thank you,
Somesh
MIR_project_final_paper.pdf

Jesse Engel

unread,
Jan 21, 2018, 1:02:51 AM1/21/18
to Somesh Ganesh, Magenta Discuss
Cool, thanks for sharing! Fyi if you're going to put this public somewhere the dataset has instructions for citation at the bottom (https://magenta.tensorflow.org/datasets/nsynth).

Best,
Jesse
Reply all
Reply to author
Forward
0 new messages