Music Structure Analysis

Oriol Nieto

unread,

Oct 21, 2014, 9:33:59 AM10/21/14

to ismir2014-unco...@ismir.net

Hi gang,

Given that this year we'll have a tutorial on this MIR topic, and my passion for this task, I would like to organize another session on Music Structure Analysis. More specifically, I am very interested in discussing the following:

The impact of subjectivity in this task: do we need more than one annotation per track?
Using deep learning to extract boundaries / labels.
Hierarchical approaches: the future?
Limitations of the current evaluation metrics.

Let me know if some of you would be interested in joining us or if you have more music segmentation related ideas to discuss.

Hope to see you in a few days, safe travels anyone!

uri

cran...@gmail.com

unread,

Oct 21, 2014, 9:44:28 PM10/21/14

to ismir2014-unco...@ismir.net

Hi Uri and others:

I'd very much like to participate in this topic. I was wondering if you could elaborate on your point called "hierarchical approaches: the future?" I'll try to think of other points of discussion too.

Christopher

Oriol Nieto

unread,

Oct 21, 2014, 10:06:07 PM10/21/14

to ismir2014-unco...@ismir.net

Hi Christopher (et al.),

Arguably, music is structured hierarchically: large sections on top that may hierarchically contain phrases, riffs or short motives at the bottom layers. In fact, the SALAMI dataset is already annotated at two levels of this hierarchy: large and small (see SALAMI guidelines). The work that McFee & Ellis will present at this year's structure session seems to have the potential of identifying these hierarchies. This raises the following questions:

- How to assess these multiple hierarchies?

- Are the identified segments at the smaller scales valid as musical patterns for the pattern discovery task?

- Should we reconsider this task by allowing hierarchical approaches, or a new task should be added to MIREX in the not so distant future?

I hope this clarifies a little bit. Let me know if you have any comments or questions.

See you soon in Taipei!

tomthe...@gmail.com

unread,

Oct 27, 2014, 10:17:28 AM10/27/14

to ismir2014-unco...@ismir.net

Hey all,

I look forward to joining this session on Friday afternoon. I'd be happy to clarify what I see as the differences between the Audio Structural Segmentation (ASS - great acronym) and Discovery of Repeated Themes & Sections (DRTS, not such a good acronym) tasks, since Uri alludes to both of them in his remarks. (For info, I organise the latter task.)

It is true that, unlike the ASS task, the existence of hierarchical or nested repetition (e.g., a motif might repeat within a riff or theme, which itself might repeat within some larger repeated section) is built into DRTS and its evaluation metrics. Generally, I use the term pattern to refer to a motif, theme, or other repetitive element, and the term occurrence to refer to the various instances of a pattern throughout a song/piece. In DRTS, as well as pattern nesting, it's also possible for occurrences (either of two different patterns, or of the same pattern) to overlap, such as F1, F2,..., F8 here. As well as identifying nested repetitions, algorithms submitted to DRTS are expected to identify occurrences of such overlapping patterns, and the evaluation metrics will be reward success/punish failure in this regard. Overlaps aren't as common as hierarchies, but I think both are important aspect of repetitive structure, so I'd like to see examples of them included in any training and test databases. Please correct me if I'm wrong, but I don't think SALAMI contains such overlaps.

ASS has an appealing simplicity, so I say keep it for next year. But maybe we also try to assemble some richer annotations and metrics for evaluating algorithms that are designed to extract repetitive elements from performed audio. I prepared a small amount of data for such a task in my Audio Engineering Society paper from January this year. I also would like to maintain the audio/symbolic dual aspect in any future task (by which I mean it's possible for researchers to submit algorithms that work on one or more of deadpan symbolic data, audio synthesised from deadpan symbolic data, audio from human-performances), because in general we should be linking the audio and symbolic domains wherever possible.

Recently there was a post to the MIR list by Carlos Vaquero looking for similar kind of data, and it got no responses. Research gap to be filled by new PhD student(s)? I think so! So we should post the minutes of Friday's discussion to the list, so that those who did not make it to Taipei might still play a role. Generally I would be in favour of having at least two people annotate any given song/piece in a database, as inter-judge reliability as a function of pattern length is an interesting topic in its own right.

Cheers!

Reply all

Reply to author

Forward