Big Data for Music Analysis and Musicology
The size of datasets that are available in MIR is growing, especially in commercial use (e.g. Spotify / The Echo Nest, iTunes) but also increasingly in openly available collections of audio or as featuresets (e.g. at The Internet Archive or The Million Song Datasets). Apart from recommending tracks and generating playlists, the question arises what can we learn from large datasets about music itself and its relation to culture, society and the world in general. Recent projects in the UK and elsewhere [1], [2] are working on creating new large datasets providing and sometimes integrating different representations. In addition, computational methods for audio analysis tasks such as automatic transcription and chord extraction have progressed to a degree that makes an integrated musical analysis from audio feasible, at least on a statistical basis. Large datasets and a statistical approach enable us to ask and answer new questions about music that are different from traditional work- or composer-centred musicological analysis. Ethnomusicology is already embracing this approach, but on a relatively small scale. Given that large datasets are increasingly being made available, is thus worth discussing which questions are worth asking and how can we use MIR technology to answer them?
[1] Digital Music Lab - Analysing Big Music Data (DML). URL: http://dml.city.ac.uk/
[2] Single Interface for Musical Score Searching and Analysis (SIMSSA). URL: http://simssa.music.mcgill.ca/