I'm training a model where the input vector is the output of another model. This involves restoring the first model from a checkpoint file while initializing the second model from scratch (using tf.initialize_varaibles()).
There is a substantial amount of code and abstraction, so I'm just pasting the relevant sections here.
The following is the restoring code:
self.variables = [var for var in all_vars if var.name.startswith(self.name)]
saver = tf.train.Saver(self.variables, max_to_keep=3)
self.save_path = tf.train.latest_checkpoint(os.path.dirname(self.checkpoint_path))
if should_restore:
self.saver.restore(self.sess, save_path)
else:
self.sess.run(tf.initialize_variables(self.variables))Each model is scoped within its own graph and session, like this:
self.graph = tf.Graph()
self.sess = tf.Session(graph=self.graph)
with self.sess.graph.as_default():
# Create variables and ops.All the variables within each model are created within the variable_scope context manager.
The feeding works as follows:
sess.run(inference_op) on input scipy.misc.imread(X) and puts the result in a blocking thread-safe queue.sess.run(train_op) on the second model.PROBLEM:
I am observing that the loss values, even in the very first iteration, of the training (second model) keep changing drastically. I confirmed that the output of the first model is exactly the same everytime. Commenting out the sess.run of the first model and replacing it with identical input from a a picked file does not show this behaviour.
This is the train_op:
loss_op = tf.nn.sparse_softmax_cross_entropy(network.feedforward())
# Apply gradients.
with tf.control_dependencies([loss_op]):
opt = tf.train.GradientDescentOptimizer(lr)
grads = opt.compute_gradients(loss_op)
apply_gradient_op = opt.apply_gradients(grads)
return apply_gradient_opI know this is vague, but I'm happy to provide more details. Any help is appreciated!