Tensorflow Estimator API to track training and evaluation loss vs. epoch

1,957 views
Skip to first unread message

puren...@gmail.com

unread,
Aug 14, 2018, 7:05:27 AM8/14/18
to Discuss
Hello,

I am new in Tensorflow and neural networks.
I decided to use Estimator Api to start learning and practicing, since it looked intuitive enough for a novice user.
All the tutorials out there advice to plot and check the training loss and validation loss for every epoch while training a neural network (e.g., http://cs231n.github.io/neural-networks-3/). However, I couldn't find any feature in Estimator API that enables to do that.

I found this regression example that shows to plot the train and validation loss for each epoch after training: https://www.tensorflow.org/tutorials/keras/basic_regression. But they use Keras and it will be troublesome to switch platforms without even knowing with what kind of inflexibility I will encounter there. Also it seems like in Keras, you cannot do it while training but you have to stop training to check the loss vs. epoch plot (at least, that's how they do it in the tutorial). It might be good to be able to check it online.

One solution that I come up with is to call train and evaluate functions in a loop. Each loop will call training for 3 epochs (to make sure to see enough variation of the batches of the data while learning weights) and then evaluate. The learned weights from the last (i-1) th step will be loaded from the last checkpoint. Then by printing training and validation loss in every i th step, I can track the loss vs epochs (or other metrics as well) for every 3 epochs. Is that a correct logic?

for i in range (0, N)
         
         
#Prepare training input.
         train_input_fn
= tf.estimator.inputs.numpy_input_fn(x={"x": data_train},
                                                          y
=labels,
                                                          batch_size
=BATCH_SIZE,
                                                          num_epochs
=3,
                                                          shuffle
=True)
         
# Train the model.
         fcn
.train(input_fn=train_input_fn, steps=STEPS)


         eval_input_fn
= tf.estimator.inputs.numpy_input_fn(x={"x": data_eval},
                                                       y
=labels_eval,
                                                       batch_size
=BATCH_SIZE, #due to insufficient memory, I give validation data in batches as well
                                                       num_epochs
=1,
                                                       shuffle
=False)
         fcn
.evaluate(input_fn=eval_input_fn,
                                steps
=(len(data_eval)/batch_s)) #the data is evaluated in batches in many steps.
       
         
I know that there is train_and_evaluate function but I don't know how to track the epoch process there. But could it be a more correct and efficient method than the loop that I wrote above?
Or is the epoch issue a trivial matter and observing the loss progress through steps is enough?

John Davis

unread,
Aug 14, 2018, 8:05:46 AM8/14/18
to puren...@gmail.com, Discuss
Since this is not a complete code sample, I'm not sure what kind of estimator you have.  I have used the google github and their class to learn the estimator api.   I suggest you use this sample, its one notebook and one set of data as an example.  


Its complete, simple and works.  If you examine the other code in that repo it will progressively enhance the model.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/b0905c13-460b-4e3f-90c9-25ffa884c142%40tensorflow.org.


--
John F. Davis
6 Kandes Court
Durham, NC 27713
919-888-8358
Public Profile https://www.linkedin.com/in/netskink

独树一帜

puren...@gmail.com

unread,
Aug 14, 2018, 9:27:57 AM8/14/18
to Discuss
Thanks for your answer. 
I implemented Fully Convolutional Network using Tensorflow Estimator API. Here is how I create the estimator in the main function. I can share the fcn8_model code if you want but I think what I ask is not really relevant about the code content. Correct me if I'm wrong.

fcn = tf.estimator.Estimator(model_fn= fcn8_model,
                                 model_dir
=PATH,
                                 
params={'num_classes': k,
                                         
'data_type': d_type,
                                         
'isBatch': isBatch,
                                         
'isDrop': isDrop
                                 
})

I just want to make sure that I evaluate the results that I get (loss, accuracy vs.) in a correct manner. 
I can see that there is a tutorial related to train_and_evaluate function in the link that you sent. I'll check that. 

Bests,
Püren

Zeynep GÖKCE

unread,
May 21, 2019, 5:48:44 AM5/21/19
to Discuss
Hi, 
I have the same problem with you and I try to learn how to use Estimator API with epoch. I want to show each sample prediction when I train and test the model and want to calculate accuracy and loss. 
Did you get any good tutorial or basic code in github to that?

14 Ağustos 2018 Salı 16:27:57 UTC+3 tarihinde puren...@gmail.com yazdı:

Püren Güler

unread,
May 23, 2019, 9:11:50 AM5/23/19
to Discuss
Hi Zeynep,

I just define the epochs in for loops (below).
I set a fixed number of training steps. Then, after training steps for one epoch is finished, I just call the evaluate function of estimator api and print evaluation metrics.
no_epochs = 20
       
for i in range(no_epochs):

print("epochs-{}".format(i))          

#training
train_input_fn = functools.partial(input_fn, data_dir=path_data, subset='train', batch_size=batch_s)#input function that I wrote to read from TFRecords.
fcn.train(input_fn=train_input_fn, steps=steps_train)

#evaluation
eval_input_fn = functools.partial(input_fn, data_dir=path_data, subset='val', batch_size=batch_s)
eval_results = fcn.evaluate(input_fn=eval_input_fn, steps=steps_eval)

print(eval_results)

Inside the model function, this is what "evaluate" returns as shown in many tf estimator api tutorials.
if mode == tf.estimator.ModeKeys.EVAL:
eval_metric_ops = {"rmse": (rmse, rmse_op)
                         
                       
}
return tf.estimator.EstimatorSpec(mode, loss=loss, eval_metric_ops=eval_metric_ops)
 
I don't know if it answers your question. But this is how I do it.
train_and_evaluate function may be more commonly used for this purpose. My problem with that function was setting start time for evaluation in seconds [1], while I prefer to do it in steps.
If there is a better way of doing it I am willing to learn it.

Bests,
Puren

Reply all
Reply to author
Forward
0 new messages