How to train model after INFO:tensorflow:Skipping training since max

Rodrigo Silveira

unread,

Feb 2, 2018, 7:31:50 AM2/2/18

to Discuss

I'm using tf.estimator.train_and_evaluate on TF 1.4.1, and training locally. If I set tf.estimator.TrainSpec(max_steps=None) and the training input_fn declares dataset.repeat(1)what I see is that the training input function will never throw an OutOfRangeError. So what I tried to do instead is to specify max_steps in the TrainSpec. But now I can only train the model once.

For example,

train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn, max_steps=max_steps)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

This will train the model for the expected max_steps (say, 10,000 steps). When training finishes, I'll see this, as expected:

INFO:tensorflow:Saving dict for global step 10000: auc_eval = 0.743603, global_step = 10000, loss = 0.151304
DEBUG:tensorflow:Calling exporter with the `is_the_final_export=True`.

The problem is that now if I want to train the model for an additional 10,000 steps, regardless of what value I use for max_steps (I've tried 10,000, 20,000, 1,000,000, and None), the model won't train, and I'll instead get:

INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 600 secs (eval_spec.throttle_secs) or training is finished.
INFO:tensorflow:Skipping training since max_steps has already saved.
INFO:tensorflow:Starting evaluation at 2018-02-02-12:19:09
INFO:tensorflow:Restoring parameters from models/test_model/model.ckpt-10000
INFO:tensorflow:Finished evaluation at 2018-02-02-12:19:46
INFO:tensorflow:Saving dict for global step 10000: auc_eval = 0.743603, global_step = 10000, loss = 0.151304
DEBUG:tensorflow:Calling exporter with the `is_the_final_export=True`.

Any advice?

徐康

unread,

Apr 23, 2020, 2:04:23 AM4/23/20

to Discuss, rodr...@gmail.com

Hi Rodrigo, I am now facing exactly the same issue, would you mind sharing how did you resolve this issue? Thanks.

在 2018年2月2日星期五 UTC+8下午8:31:50，Rodrigo Silveira写道：

徐康

unread,

Apr 23, 2020, 4:22:37 AM4/23/20

to Discuss

I think I have found the solution. In tf.estimator.TrainSpec, set max_steps=None, while you have to add a hook tf.estimator.StopAtStepHook(num_steps=steps)

在 2020年4月23日星期四 UTC+8下午2:04:23，徐康写道：

Reply all

Reply to author

Forward

How to train model after INFO:tensorflow:Skipping training since max_steps has already saved.

Rodrigo Silveira

徐康

徐康