MT3 ISMIR2022 (ismir2022_base)

79 views

Skip to first unread message

Drew Edwards

unread,

May 15, 2023, 8:59:29 PM5/15/23

to Magenta Discuss, ians...@google.com

Hi,

I suppose this question is directed at the authors of "Scaling Polyphonic Transcription with Mixtures of Monophonic Transcriptions", (Ian Simon / Josh Gardner / Curtis Hawthorne / Jesse Engel / Ethan Manilow), but maybe others have tinkered with this checkpoint.

I'm eager to perform some further evaluation of the MT3 family of transcription models. I am particularly interested in the performance of the results from the paper above. I have found a checkpoint that seems like the one from this project.

Specifically, I am talking about gs://mt3/checkpoints/ismir2022_base/

➜ ~ gsutil ls -l gs://mt3/checkpoints/
0 2021-11-05T04:29:25Z gs://mt3/checkpoints/
gs://mt3/checkpoints/ismir2021/
gs://mt3/checkpoints/ismir2022_base/
gs://mt3/checkpoints/ismir2022_small/
gs://mt3/checkpoints/mt3/

Is it correct that this checkpoint has only been trained on the polyphonic mixtures?

If so, do you have the checkpoint that was subsequently fine-tuned on the mixture of the six datasets' training splits? Or should I reproduce this experiment?

I am assuming the mt3 checkpoint in the Colab notebook is the ICLR 2022 result, not the newer Monophonic Mixture result, based on ls -l times.

Ian Simon

unread,

May 16, 2023, 12:41:10 PM5/16/23

to Drew Edwards, Magenta Discuss

Oops, I actually didn't realize those checkpoints were visible! But they were in fact finetuned on the six MT3 datasets.

The results may be slightly (but hopefully not very) different from the ones in the paper as the open-source codebase is a little different from the internal codebase we used for the experiments in the paper.