Hello,
I am new in Tensorflow and neural networks.
I decided to use Estimator Api to start learning and practicing, since it looked intuitive enough for a novice user.
All the tutorials out there advice to plot and check the training loss and validation loss for every epoch while training a neural network (e.g.,
http://cs231n.github.io/neural-networks-3/). However, I couldn't find any feature in Estimator API that enables to do that.
I found this regression example that shows to plot the train and validation loss for each epoch after training:
https://www.tensorflow.org/tutorials/keras/basic_regression. But they use Keras and it will be troublesome to switch platforms without even knowing with what kind of inflexibility I will encounter there. Also it seems like in Keras, you cannot do it while training but you have to stop training to check the loss vs. epoch plot (at least, that's how they do it in the tutorial). It might be good to be able to check it online.
One solution that I come up with is to call train and evaluate functions in a loop. Each loop will call training for 3 epochs (to make sure to see enough variation of the batches of the data while learning weights) and then evaluate. The learned weights from the last (i-1) th step will be loaded from the last checkpoint. Then by printing training and validation loss in every i th step, I can track the loss vs epochs (or other metrics as well) for every 3 epochs. Is that a correct logic?
for i in range (0, N)
#Prepare training input.
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": data_train},
y=labels,
batch_size=BATCH_SIZE,
num_epochs=3,
shuffle=True)
# Train the model.
fcn.train(input_fn=train_input_fn, steps=STEPS)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": data_eval},
y=labels_eval,
batch_size=BATCH_SIZE, #due to insufficient memory, I give validation data in batches as well
num_epochs=1,
shuffle=False)
fcn.evaluate(input_fn=eval_input_fn,
steps=(len(data_eval)/batch_s)) #the data is evaluated in batches in many steps.
I know that there is train_and_evaluate function but I don't know how to track the epoch process there. But could it be a more correct and efficient method than the loop that I wrote above?
Or is the epoch issue a trivial matter and observing the loss progress through steps is enough?