I am following the code found here:
The code can be found on github in a notebook here:
I have copied the code fairly closely, but have changed the names of some things. When I run my code there is a message that the learning rate has been set to 2.00 . Is this right? Should it not be something like 0.05?
below is some code from my 'problem.py' code.
@registry.register_hparams
def transformer_chat():
hparams = transformer.transformer_base()
hparams.num_hidden_layers = 2
hparams.hidden_size = 128
hparams.filter_size = 512
hparams.num_heads = 4
hparams.attention_dropout = 0.6
hparams.layer_prepostprocess_dropout = 0.6
hparams.learning_rate = 0.05
return hparams
# hyperparameter tuning ranges
@registry.register_ranged_hparams
def transformer_chat_range(rhp):
rhp.set_float("learning_rate", 0.05, 0.25, scale=rhp.LOG_SCALE)
rhp.set_int("num_hidden_layers", 2, 4)
rhp.set_discrete("hidden_size", [128, 256, 512])
rhp.set_float("attention_dropout", 0.4, 0.7)
This is a snippet of the output that shows the learning rate being reported as 2.0:
I0731 11:12:03.723451 140080027969344 learning_rate.py:29] Base learning rate: 2.000000
I0731 11:12:03.892279 140080027969344 optimize.py:327] Trainable Variables Total size: 1972992
I0731 11:12:03.892893 140080027969344 optimize.py:327] Non-trainable variables Total size: 5
I0731 11:12:03.893257 140080027969344 optimize.py:182] Using optimizer adam
I0731 11:12:07.278682 140080027969344 estimator.py:1147] Done calling model_fn.
I0731 11:12:07.279783 140080027969344 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I suspect I'm doing something wrong. Can you point it out? Thanks for your time.
Good day.
I think the Base learning rate text is logged here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L29 while the value of 2 for hparams.learning_rate_constant comes from https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1923.
Maybe you can either try setting hparams.learning_rate_schedule = "legacy" explicitly or inheriting hparams from transformer_base_v2 instead of transformer_base so that it goes to https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L114 and prints 0.05 from hparams.learning_rate.
The tutorial you're looking at was written last year. transformer_base now uses transformer_base_v3 which has hparams.learning_rate_schedule = ("constant*linear_warmup*rsqrt_decay*rsqrt_hidden_size"). Maybe it was still using transformer_base_v2 which has hparams.learning_rate_schedule = "legacy" back then.
This is just my guess though. Apologies if it doesn't work. I'm also still trying to get my head around learning more about tensor2tensor hehe.
Cheers,
John