We are happy to announce the Million Song Dataset Challenge: a large-scale, open evaluation of personalized music recommendation algorithms.
We provide listening history data for 1.1 million users (1 million train, 110 thousand validation and test) and over 380 thousand songs from the Million Song Dataset. Given partial historical data for each test user, the goal is to produce a ranking over the remaining songs for that user.
What makes this challenge unique? Openness! The songs in the dataset are equipped with metadata (e.g., artist and title), as well as audio content analysis, semantic annotations, lyrics, etc. Because the data is open, participants are free to construct and include any additional features, or ignore the features altogether. The field is wide open!
A post-analysis of the leading submissions will be performed by the Music Information Retrieval Evaluation eXchange (MIREX), and the results presented at the 13th International Society for Music Information Retrieval (ISMIR) conference in October, 2012.