Hi,
I'm eager to perform some further evaluation of the MT3 family of transcription models. I am particularly interested in the performance of the results from the paper above. I have found a checkpoint that seems like the one from this project.
Specifically, I am talking about gs://mt3/checkpoints/ismir2022_base/
➜ ~ gsutil ls -l gs://mt3/checkpoints/
0 2021-11-05T04:29:25Z gs://mt3/checkpoints/
gs://mt3/checkpoints/ismir2021/
gs://mt3/checkpoints/ismir2022_base/
gs://mt3/checkpoints/ismir2022_small/
gs://mt3/checkpoints/mt3/
Is it correct that this checkpoint has only been trained on the polyphonic mixtures?
If so, do you have the checkpoint that was subsequently fine-tuned on the mixture of the six datasets' training splits? Or should I reproduce this experiment?
I am assuming the mt3 checkpoint in the Colab notebook is the ICLR 2022 result, not the newer Monophonic Mixture result, based on ls -l times.