How to restore a model by filename in Tensorflow r12?

Taylor Childers

unread,

Dec 9, 2016, 9:13:50 AM12/9/16

to Discuss

I have run the distributed mnist example:

https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/tools/dist_test/python/mnist_replica.py

Though I have set the

saver = tf.train.Saver(max_to_keep=0)

In previous release, like r11, I was able to run over each check point model and evaluate the precision of the model. This gave me a plot of the progress of the precision versus global steps (or iterations).

Prior to r12, tensorflow checkpoint models were saved in two files, model.ckpt-1234 and model-ckpt-1234.meta. One could restore a model by passing the model.ckpt-1234 filename like so saver.restore(sess,'model.ckpt-1234').

However, I've noticed that in r12, there are now three output files model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta.

I see that the the restore documentation says that a path such as /train/path/model.ckptshould be given to restore instead of a filename. Is there any way to load one checkpoint file at a time to evaluate it? I have tried passing the model.ckpt-1234.data-00000-of-000001, model.ckpt-1234.index, and model.ckpt-1234.meta files, but get errors like below:

W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

NotFoundError (see above for traceback): Tensor name "hid_b" not found in checkpoint files logdir/2016-12-08-13-54/model.ckpt-0.index [[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]

W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I'm running on OSX Sierra with tensorflow r12 installed via pip.

Any guidance would be helpful.

Thank you.

Chao Gao

unread,

Dec 9, 2016, 12:23:14 PM12/9/16

to Discuss, Taylor Childers

How about model.ckpt-1234?

--

You received this message because you are subscribed to the Google Groups "Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/76d760d3-1ff8-44b0-ae29-60416b180734%40tensorflow.org.

Taylor Childers

unread,

Dec 9, 2016, 12:42:23 PM12/9/16

to Discuss

Thanks for the suggestion and did try that and I see:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt-0

[[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

and just to be clear, the files do exist:

> ls logdir/2016-12-08-13-54//model.ckpt-0*

logdir/2016-12-08-13-54//model.ckpt-0.data-00000-of-00001

logdir/2016-12-08-13-54//model.ckpt-0.meta

logdir/2016-12-08-13-54//model.ckpt-0.index

In addition, I also tried what is suggested by the documention by just supplying the checkpoint file base 'model-ckpt' but I see this:

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt

[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Thanks

Asher Newcomer

unread,

Dec 9, 2016, 12:46:05 PM12/9/16

to Taylor Childers, Discuss

I know that double slashes are generally ignored in file paths, but I'd try to rerun your tests excluding those:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt-0

[[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

> ls logdir/2016-12-08-13-54//model.ckpt-0*

logdir/2016-12-08-13-54//model.ckpt-0.data-00000-of-00001

logdir/2016-12-08-13-54//model.ckpt-0.meta

logdir/2016-12-08-13-54//model.ckpt-0.index

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/24910870-0ab9-4fea-8e8c-01ff16c74d4e%40tensorflow.org.

Martin Wicke

unread,

Dec 9, 2016, 12:52:08 PM12/9/16

to Asher Newcomer, Taylor Childers, Discuss

Can you please ask this question on StackOverflow? You won't be the last to want to know this.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CAGSMwSGhUcJfoAb6370APWHrBWCW40XMU3eXjqiLdLiwj5RUqw%40mail.gmail.com.

mukul arora

unread,

Dec 9, 2016, 12:55:46 PM12/9/16

to Taylor Childers, Discuss

Hi! Have a look at TFLearn (www.tflearn.org)

It is an abstraction on top on Tensorflow.

You can easily save and load model by the model.save and model.load methods.

Hope it might help.

--

You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/76d760d3-1ff8-44b0-ae29-60416b180734%40tensorflow.org.

Taylor Childers

unread,

Dec 9, 2016, 1:05:19 PM12/9/16

to Discuss

Hello Asher,

You had the right idea, leave no stone unturned. I also assumed the double slashes should most definitely not be the problem, but after I removed this it works!

At the same time, I also had to include the step number in the name passed to restore so this command works:

saver.restore(sess,'model-ckpt-1234')

Thanks everyone.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/24910870-0ab9-4fea-8e8c-01ff16c74d4e%40tensorflow.org.

Reply all

Reply to author

Forward