MusicTransformer reproduction

138 views
Skip to first unread message

Pierre Cournut

unread,
Nov 10, 2020, 1:02:00 PM11/10/20
to magenta...@tensorflow.org
Hi, 

I’m trying to reproduce your model from both your papers and implementation and I am having a hard time reproducing the results you have on your collab that are truly awesome!

After extensively looking up:
there are still a few points that I’m unsure of:
  • Data preprocessing: do you use the full MIDI range (1-128) as stated in your paper or the restrained range (21-108) hard-coded in score2perf.py?
  • What learning rate do you refer to in your original MusicTransformer paper and in the update_small_lr hparams setup function? I get confused since the transformer_base_v3 hparams update function defines a learning rate constant and schedule which in my understanding totally defines the learning rate along training.
  • Could you please quickly explain the role of those 3 variables: hidden_size, attention_key_channels and filter_size? I get confused by this sentence in the MusicTransformer paper  « We found that reducing the query and key hidden size (att) to half the hidden size (hs) works well and use this relationship for all of the models » and the values I found in the hparams (hidden_size = 384, attention_key_channels = 512, filter_size = 1024).  
  • I did not implement local attention yet, could it help me get closer to reproducing your results or does it mostly help with memory and thus could help me train faster? 

Thanks in advance!

Regards,
Pierre


Ian Simon

unread,
Nov 10, 2020, 4:39:34 PM11/10/20
to Pierre Cournut, Anna Huang, Magenta Discuss
Hi Pierre, the model in the Colab is trained on piano transcriptions from YouTube and not MAESTRO, and also doesn't use relative attention, just the transformer_tpu hparams but with 16 hidden layers.

I'm actually not sure what configuration was used for the Music Transformer paper, but +Anna Huang may be able to help you.

-Ian

--
Magenta project: magenta.tensorflow.org
To post to this group, send email to magenta...@tensorflow.org
To unsubscribe from this group, send email to magenta-discu...@tensorflow.org
---
To unsubscribe from this group and stop receiving emails from it, send an email to magenta-discu...@tensorflow.org.

Pierre Cournut

unread,
Nov 12, 2020, 3:46:21 AM11/12/20
to Anna Huang, Ian Simon, Magenta Discuss
Hi Ian, 

Thank you for your quick answer! 
I’ll give the transformer_tpu params a closer look then. 

Best,
Pierre

Drew Edwards

unread,
Feb 9, 2023, 7:29:21 AM2/9/23
to Magenta Discuss, pierre....@mwm.io, Magenta Discuss, anna...@google.com, ians...@google.com
Hi, I'm also curious about Pierre's initial queries. Additionally, I am wondering why the blog/Colab model uses a different architecture. 

Thanks in advance for any learnings you are able to share!

Reply all
Reply to author
Forward
0 new messages