Regarding Million Song Data set on UCI Machine learning repository.

440 views
Skip to first unread message

Prakhar

unread,
Oct 9, 2015, 6:20:32 AM10/9/15
to millionso...@googlegroups.com
Respected members, 

I did a MOOC on scalable Machine Learing using Apache Spark hosted on edX recently. I was overwhelmed with one of the labs that used Million Songs data set, available on UCI Machine learning repository (https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD). 

I was working on a project where I wanted to use whole of 201 MB of the data, available there. 

I am unable to understand that which 12 out of 90 features listed in the data set are the Timber features of the song? 
Also, what does the negative value of a feature signify?
Apart from this, in the lab, which was the part of the coursework of the MOOC, the values were between 0 and 1, however, here the values are way above 100. What does this mean? Do we need to normalize the data before putting it to use?

These were some of the things that were troubling me since long. 
Anticipating some assistance in the same. 

Thanks in advance. 

Regards,
Prakhar Mishra
+91-7610833303
Final year student
B.Tech: Computer Science and Engineering.

----------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages