Hey all,
I look forward to joining this session on Friday afternoon. I'd be happy to clarify what I see as the differences between the Audio Structural Segmentation (ASS - great acronym) and Discovery of Repeated Themes & Sections (DRTS, not such a good acronym) tasks, since Uri alludes to both of them in his remarks. (For info, I organise the latter task.)
It is true that, unlike the ASS task, the existence of hierarchical or nested repetition (e.g., a motif might repeat within a riff or theme, which itself might repeat within some larger repeated section) is built into DRTS and its
evaluation metrics. Generally, I use the term
pattern to refer to a motif, theme, or other repetitive element, and the term
occurrence to refer to the various instances of a pattern throughout a song/piece. In DRTS, as well as pattern nesting, it's also possible for occurrences (either of two different patterns, or of the same pattern) to overlap, such as F1, F2,..., F8
here. As well as identifying nested repetitions, algorithms submitted to DRTS are expected to identify occurrences of such overlapping patterns, and the evaluation metrics will be reward success/punish failure in this regard. Overlaps aren't as common as hierarchies, but I think both are important aspect of repetitive structure, so I'd like to see examples of them included in any training and test databases. Please correct me if I'm wrong, but I don't think SALAMI contains such overlaps.
ASS has an appealing simplicity, so I say keep it for next year. But maybe we also try to assemble some richer annotations and metrics for evaluating algorithms that are designed to extract repetitive elements from performed audio. I prepared a small amount of data for such a task in
my Audio Engineering Society paper from January this year. I also would like to maintain the audio/symbolic dual aspect in any future task (by which I mean it's possible for researchers to submit algorithms that work on one or more of deadpan symbolic data, audio synthesised from deadpan symbolic data, audio from human-performances), because in general we should be linking the audio and symbolic domains wherever possible.
Recently there was a post to the MIR list by Carlos Vaquero looking for similar kind of data, and it got no responses. Research gap to be filled by new PhD student(s)? I think so! So we should post the minutes of Friday's discussion to the list, so that those who did not make it to Taipei might still play a role. Generally I would be in favour of having at least two people annotate any given song/piece in a database, as inter-judge reliability as a function of pattern length is an interesting topic in its own right.
Cheers!