learning rate 2.0

Skip to first unread message

David Liebman

Sep 29, 2019, 10:31:42 AM9/29/19
to tensor2tensor

I am following the code found here:


The code can be found on github in a notebook here:


I have copied the code fairly closely, but have changed the names of some things. When I run my code there is a message that the learning rate has been set to 2.00 . Is this right? Should it not be something like 0.05?

below is some code from my 'problem.py' code.

def transformer_chat():
    hparams = transformer.transformer_base()
    hparams.num_hidden_layers = 2
    hparams.hidden_size = 128
    hparams.filter_size = 512
    hparams.num_heads = 4
    hparams.attention_dropout = 0.6
    hparams.layer_prepostprocess_dropout = 0.6
    hparams.learning_rate = 0.05
    return hparams

# hyperparameter tuning ranges
def transformer_chat_range(rhp):
    rhp.set_float("learning_rate", 0.05, 0.25, scale=rhp.LOG_SCALE)
    rhp.set_int("num_hidden_layers", 2, 4)
    rhp.set_discrete("hidden_size", [128, 256, 512])
    rhp.set_float("attention_dropout", 0.4, 0.7)

This is a snippet of the output that shows the learning rate being reported as 2.0:

I0731 11:12:03.723451 140080027969344 learning_rate.py:29] Base learning rate: 2.000000
I0731 11:12:03.892279 140080027969344 optimize.py:327] Trainable Variables Total size: 1972992
I0731 11:12:03.892893 140080027969344 optimize.py:327] Non-trainable variables Total size: 5
I0731 11:12:03.893257 140080027969344 optimize.py:182] Using optimizer adam
I0731 11:12:07.278682 140080027969344 estimator.py:1147] Done calling model_fn.
I0731 11:12:07.279783 140080027969344 basic_session_run_hooks.py:541] Create CheckpointSaverHook.

I suspect I'm doing something wrong. Can you point it out? Thanks for your time.

John Ed Alvinez

Sep 29, 2019, 11:16:35 AM9/29/19
to tensor2tensor
Hi David,

Good day.

I think the Base learning rate text is logged here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L29 while the value of 2 for hparams.learning_rate_constant comes from https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1923.

Maybe you can either try setting hparams.learning_rate_schedule = "legacy" explicitly or inheriting hparams from transformer_base_v2 instead of transformer_base so that it goes to https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L114 and prints 0.05 from hparams.learning_rate.

The tutorial you're looking at was written last year. transformer_base now uses transformer_base_v3 which has hparams.learning_rate_schedule = ("constant*linear_warmup*rsqrt_decay*rsqrt_hidden_size"). Maybe it was still using transformer_base_v2 which has hparams.learning_rate_schedule = "legacy" back then.

This is just my guess though. Apologies if it doesn't work. I'm also still trying to get my head around learning more about tensor2tensor hehe.


Reply all
Reply to author
0 new messages