Hi folks,
I would be grateful if someone could verify if my calculations are correct -
* Let's say I have implemented a custom loss function that is based on a weighted sum of the cross-entropy loss (with no reduction) along with the MSE-based pixel loss.
* Now, in order to use this inside distributed training, here's what I am doing:
* Calculate the loss for each replica.
* Scaling the loss with tf.nn.compute_average_loss with the global batch size.
Notes:
* I am implementing my training logic by overriding train_step (refer
here).
* The labels and the predictions that go inside the cross-entropy loss are multi-dimensional. Hence, I am following what is suggested in the last point of the section "How to do this in TensorFlow?" of
this guide. The final shape of the output generated by this is (replica_batch_size, 16, 16).
* MSE-based pixel loss returns an output of (replica_batch_size, 256, 256) shape.
* In order to make the addition operation compatible, I am taking means for both of the loss terms, then adding them, and then I am scaling it with tf.nn.compute_average_loss. In code, it looks like so -
Thanks in advance for your time.