How to train model after INFO:tensorflow:Skipping training since max_steps has already saved.

603 views
Skip to first unread message

Rodrigo Silveira

unread,
Feb 2, 2018, 7:31:50 AM2/2/18
to Discuss
I'm using tf.estimator.train_and_evaluate on TF 1.4.1, and training locally. If I set tf.estimator.TrainSpec(max_steps=None) and the training input_fn declares dataset.repeat(1)what I see is that the training input function will never throw an OutOfRangeError. So what I tried to do instead is to specify max_steps in the TrainSpec. But now I can only train the model once.

For example,

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=max_steps)
tf
.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

This will train the model for the expected max_steps (say, 10,000 steps). When training finishes, I'll see this, as expected:

INFO:tensorflow:Saving dict for global step 10000: auc_eval = 0.743603, global_step = 10000, loss = 0.151304
DEBUG
:tensorflow:Calling exporter with the `is_the_final_export=True`.

The problem is that now if I want to train the model for an additional 10,000 steps, regardless of what value I use for max_steps (I've tried 10,000, 20,000, 1,000,000, and None), the model won't train, and I'll instead get:

INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO
:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO
:tensorflow:Skipping training since max_steps has already saved.
INFO
:tensorflow:Starting evaluation at 2018-02-02-12:19:09
INFO
:tensorflow:Restoring parameters from models/test_model/model.ckpt-10000
INFO
:tensorflow:Finished evaluation at 2018-02-02-12:19:46
INFO
:tensorflow:Saving dict for global step 10000: auc_eval = 0.743603, global_step = 10000, loss = 0.151304
DEBUG
:tensorflow:Calling exporter with the `is_the_final_export=True`.

Any advice?

徐康

unread,
Apr 23, 2020, 2:04:23 AM4/23/20
to Discuss, rodr...@gmail.com
Hi Rodrigo, I am now facing exactly the same issue, would you mind sharing how did you resolve this issue? Thanks.

在 2018年2月2日星期五 UTC+8下午8:31:50,Rodrigo Silveira写道:

徐康

unread,
Apr 23, 2020, 4:22:37 AM4/23/20
to Discuss
I think I have found the solution. In tf.estimator.TrainSpec, set max_steps=None, while you have to add a hook tf.estimator.StopAtStepHook(num_steps=steps)

在 2020年4月23日星期四 UTC+8下午2:04:23,徐康写道:
Reply all
Reply to author
Forward
0 new messages