learning rate 2.0

35 views

Skip to first unread message

David Liebman

unread,

Sep 29, 2019, 10:31:42 AM9/29/19

to tensor2tensor

I am following the code found here:

https://cloud.google.com/blog/products/gcp/cloud-poetry-training-and-hyperparameter-tuning-custom-text-models-on-cloud-ml-engine

The code can be found on github in a notebook here:

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/09_sequence/poetry.ipynb

I have copied the code fairly closely, but have changed the names of some things. When I run my code there is a message that the learning rate has been set to 2.00 . Is this right? Should it not be something like 0.05?

below is some code from my 'problem.py' code.

@registry.register_hparams
def transformer_chat():
    hparams = transformer.transformer_base()
    hparams.num_hidden_layers = 2
    hparams.hidden_size = 128
    hparams.filter_size = 512
    hparams.num_heads = 4
    hparams.attention_dropout = 0.6
    hparams.layer_prepostprocess_dropout = 0.6
    hparams.learning_rate = 0.05
    return hparams

# hyperparameter tuning ranges
@registry.register_ranged_hparams
def transformer_chat_range(rhp):
    rhp.set_float("learning_rate", 0.05, 0.25, scale=rhp.LOG_SCALE)
    rhp.set_int("num_hidden_layers", 2, 4)
    rhp.set_discrete("hidden_size", [128, 256, 512])
    rhp.set_float("attention_dropout", 0.4, 0.7)

This is a snippet of the output that shows the learning rate being reported as 2.0:

I0731 11:12:03.723451 140080027969344 learning_rate.py:29] Base learning rate: 2.000000
I0731 11:12:03.892279 140080027969344 optimize.py:327] Trainable Variables Total size: 1972992
I0731 11:12:03.892893 140080027969344 optimize.py:327] Non-trainable variables Total size: 5
I0731 11:12:03.893257 140080027969344 optimize.py:182] Using optimizer adam
I0731 11:12:07.278682 140080027969344 estimator.py:1147] Done calling model_fn.
I0731 11:12:07.279783 140080027969344 basic_session_run_hooks.py:541] Create CheckpointSaverHook.

I suspect I'm doing something wrong. Can you point it out? Thanks for your time.

John Ed Alvinez

unread,

Sep 29, 2019, 11:16:35 AM9/29/19

to tensor2tensor

Hi David,

Good day.

I think the Base learning rate text is logged here: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L29 while the value of 2 for hparams.learning_rate_constant comes from https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1923.

Maybe you can either try setting hparams.learning_rate_schedule = "legacy" explicitly or inheriting hparams from transformer_base_v2 instead of transformer_base so that it goes to https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/utils/learning_rate.py#L114 and prints 0.05 from hparams.learning_rate.

The tutorial you're looking at was written last year. transformer_base now uses transformer_base_v3 which has hparams.learning_rate_schedule = ("constant*linear_warmup*rsqrt_decay*rsqrt_hidden_size"). Maybe it was still using transformer_base_v2 which has hparams.learning_rate_schedule = "legacy" back then.

This is just my guess though. Apologies if it doesn't work. I'm also still trying to get my head around learning more about tensor2tensor hehe.