Hi All,
I am working on a recommender system and was planning to extract an MSD dataset of the following form:
USER LABEL (listened or not listened) MUSIC_FEATURES
I can extract the USER-TRACK information from the Taste Profile dataset which would give me the labels.
My question is about the features:
-- I know I can extract lyrics from MusicMatch
-- The audiowave features like beat, loudness, etc are available in the MSD database. But there are too many of them and since I don't have much background on music research, I was wondering which ones to use. For example, I would be happy with the feature set in the YearPrediction data -- but the trackid is missing here and so I can match these features with the other datasets.
Has someone faced a similar situation before?
Does the MSD team already have a dataset in the above format? If yes, then it would be great if it could be made available on the MSD and/or UCI repo. This could be very useful for machine learning applications like multitask learning where the need of the hour is a large scale dataset in the above format and MSD seems to perfectly fit the bill. If the data is not already available then kindly let me know how to match the data in the YearPrediction to track ids of maybe how to extract an audio feature subset from the database.
Thanks,
Avishek