Hello all,
My name is Somesh Ganesh. I am currently a master's student in the music technology program at Georgia Tech. I work in the music informatics laboratory where our work mainly involves machine learning, music information retrieval and audio DSP.
I am currently working on a musical instrument family classification project. The idea is to use the NSynth dataset to do this instrument family classification to see how the use of electronic and synthetic data impacts this classification as opposed to just using acoustic data.
We are using the concept of sparse coding along with an SVM classifier for our experiment. The dictionary for sparse coding is being trained on 1000 audio files randomly selected and distributed evenly over the families. Our baseline model was a simple SVM along with MFCCs and their first order and second order differences.
We have divided the instruments from NSynth into our defined families accordingly:
1) Strings - contains bass, guitar and strings
2) Woodwinds - contains flute, reed and organ
3) Brass - contains brass
4) Non-sustained - contains keyboard and mallet
We are currently getting a lower accuracy than expected and feel the following may be a problem:
Our data distribution amongst the families may not be optimal. We wanted to ask the community since there would definitely be many people who would know the dataset much better than us.
We noticed a lot of differences in the audio files for the same instrument too while listening to a bunch of them. This was obvious but we want to understand how "correlated" files from the same instrument or family are to each other.
Any other suggestions, comments or questions are welcome!
You can reach me at
some...@gatech.edu or just post it here so that we can all have a discussion.
Thank you for your time and consideration!
Somesh