How to restore a model by filename in Tensorflow r12?

3,560 views
Skip to first unread message

Taylor Childers

unread,
Dec 9, 2016, 9:13:50 AM12/9/16
to Discuss


I have run the distributed mnist example:

https://github.com/tensorflow/tensorflow/blob/r0.12/tensorflow/tools/dist_test/python/mnist_replica.py


Though I have set the

saver = tf.train.Saver(max_to_keep=0)

In previous release, like r11, I was able to run over each check point model and evaluate the precision of the model. This gave me a plot of the progress of the precision versus global steps (or iterations).


Prior to r12, tensorflow checkpoint models were saved in two files, model.ckpt-1234 and model-ckpt-1234.meta. One could restore a model by passing the model.ckpt-1234 filename like so saver.restore(sess,'model.ckpt-1234').


However, I've noticed that in r12, there are now three output files model.ckpt-1234.data-00000-of-000001model.ckpt-1234.index, and model.ckpt-1234.meta.

I see that the the restore documentation says that a path such as /train/path/model.ckptshould be given to restore instead of a filename. Is there any way to load one checkpoint file at a time to evaluate it? I have tried passing the model.ckpt-1234.data-00000-of-000001model.ckpt-1234.index, and model.ckpt-1234.meta files, but get errors like below:


W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?


NotFoundError (see above for traceback): Tensor name "hid_b" not found in checkpoint files logdir/2016-12-08-13-54/model.ckpt-0.index [[Node: save/RestoreV2_1 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_1/tensor_names, save/RestoreV2_1/shape_and_slices)]]


W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open logdir/2016-12-08-13-54/model.ckpt-0.meta: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?


I'm running on OSX Sierra with tensorflow r12 installed via pip.


Any guidance would be helpful.

Thank you.

Chao Gao

unread,
Dec 9, 2016, 12:23:14 PM12/9/16
to Discuss, Taylor Childers
How about model.ckpt-1234?
--


You received this message because you are subscribed to the Google Groups "Discuss" group.


To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.


To post to this group, send email to dis...@tensorflow.org.


To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/76d760d3-1ff8-44b0-ae29-60416b180734%40tensorflow.org.


Taylor Childers

unread,
Dec 9, 2016, 12:42:23 PM12/9/16
to Discuss
Thanks for the suggestion and did try that and I see:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt-0
[[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

and just to be clear, the files do exist:

> ls logdir/2016-12-08-13-54//model.ckpt-0*
logdir/2016-12-08-13-54//model.ckpt-0.data-00000-of-00001 
logdir/2016-12-08-13-54//model.ckpt-0.meta
logdir/2016-12-08-13-54//model.ckpt-0.index



In addition, I also tried what is suggested by the documention by just supplying the checkpoint file base 'model-ckpt' but I see this:

tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_INT32], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Thanks

Asher Newcomer

unread,
Dec 9, 2016, 12:46:05 PM12/9/16
to Taylor Childers, Discuss
I know that double slashes are generally ignored in file paths, but I'd try to rerun your tests excluding those:

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for logdir/2016-12-08-13-54//model.ckpt-0
 [[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

> ls logdir/2016-12-08-13-54//model.ckpt-0*
logdir/2016-12-08-13-54//model.ckpt-0.data-00000-of-00001 
logdir/2016-12-08-13-54//model.ckpt-0.meta
logdir/2016-12-08-13-54//model.ckpt-0.index
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.

Martin Wicke

unread,
Dec 9, 2016, 12:52:08 PM12/9/16
to Asher Newcomer, Taylor Childers, Discuss
Can you please ask this question on StackOverflow? You won't be the last to want to know this.

mukul arora

unread,
Dec 9, 2016, 12:55:46 PM12/9/16
to Taylor Childers, Discuss
Hi! Have a look at TFLearn (www.tflearn.org)
It is an abstraction on top on Tensorflow.

You can easily save and load model by the model.save and model.load methods.

Hope it might help.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+unsubscribe@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.

Taylor Childers

unread,
Dec 9, 2016, 1:05:19 PM12/9/16
to Discuss
Hello Asher,
You had the right idea, leave no stone unturned. I also assumed the double slashes should most definitely not be the problem, but after I removed this it works!
At the same time, I also had to include the step number in the name passed to restore so this command works:

saver.restore(sess,'model-ckpt-1234')

Thanks everyone.

Reply all
Reply to author
Forward
0 new messages