Fine-tuning Onsets and Frames model

106 views
Skip to first unread message

Karthik Velayutham

unread,
Apr 8, 2021, 6:00:37 PM4/8/21
to Magenta Discuss
Hi all,

I'm an undergraduate working on fine-tuning the Onsets and Frames model (specifically the linked pytorch implementation) with jazz music. I'm new to machine learning things in general, so I was a bit unsure of how to go about this process. I can outline what I've done thus far:
  • Acquired dataset of jazz music (not as large as the MAESTRO dataset, at least one order of magnitude smaller) and converted them to MIDI
  • Have trained different mixes of jazz music with the MAESTRO dataset and have some F1 scores
A couple of things that I was thinking about was perhaps making incremental training progress from a checkpoint as opposed to training everything from scratch. Another idea was to do dataset mixing within each batch of training. Finally, I was thinking about doing a hyper parameter search. Any ideas would be much appreciated! Thank you so much for your time!

Best,
Karthik

Curtis "Fjord" Hawthorne

unread,
Apr 11, 2021, 4:39:04 PM4/11/21
to Karthik Velayutham, Magenta Discuss
Hi Karthik,

Sounds like a great project! Is the jazz music dataset also piano, or is it another instrument? If it's piano, the existing model might already do quite well, though it would be interesting to see what its shortcomings are and how it can be improved for a specific genre.

I think all of your ideas are worth trying. I'd be interested to hear whether training from scratch with a mix of datasets or finetuning from a checkpoint ends up working better. Because of the smaller size of your dataset, I think the big difficulty will be in how to get the model to train well for the specifics of that data without overfitting. I could see hparams related to dropout or model capacity helping there, as well as data augmentation (e.g., adding some noise or maybe even pitch shifting).

Best of luck with everything, and I'd love to hear how the project progresses!

-Fjord

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Karthik Velayutham

unread,
Apr 11, 2021, 4:46:50 PM4/11/21
to Curtis Fjord Hawthorne, Magenta Discuss
Hi Curtis,

Thanks for getting back to me!

The jazz music dataset is piano. The existing model seems to do decently with the dataset mix, so we're trying to see whether fine-tuning by freezing everything but the linear layer will have better performance. I will keep you posted once we collect all our results. We will look into your suggestions! Thanks so much.

Best,
Karthik

Steven Smith

unread,
Apr 11, 2021, 8:50:21 PM4/11/21
to Curtis Fjord Hawthorne, Karthik Velayutham, Magenta Discuss
the existing model might already do quite well, though it would be interesting to see what its shortcomings are and how it can be improved for a specific genre. 

I’ve used MAESTRO to transcribe Thelonious Monk and the results are impressively beautiful. Roger Ebert wrote that Monk “played the piano as if he knew exactly what every note should mean and be, and had known it for a long time.” It’s pretty amazing to listen to Magenta present Monk in front you, knowing exactly what every note should be.

The comment I have about these transcriptions is that current capabilities for jazz quartets have strengths that are also shortcomings. Monk’s transcription is perfect as far as I can tell, and Charlie Rouse is rendered as a piano—not tenor-sax—accompaniment. The result I think is musical and satisfying, especially for a piano-only performance. But Monk is Monk, and Charlie Rouse is no longer Charlie Rouse.

I think fine-tuning ideas like this are interesting because they also could point a way to adding classifier layers or even a GAN that can pull out a specific instrument, or generate Monk-like performances.

Steve


On Apr 11, 2021, at 16:39, 'Curtis "Fjord" Hawthorne' via Magenta Discuss <magenta...@tensorflow.org> wrote:


Reply all
Reply to author
Forward
0 new messages