TPU vs GPU Loss is bigger on TPU

Skip to first unread message

Gil Motta

Nov 16, 2020, 1:25:18 AM11/16/20
to TPU Users

I'm training my model on SSD and I'm testing on Colab GPU and TPU at the same time.

The same dataset is on Google Storage for TPU and on Google Drive for GPU.

I noticed that at 62000 steps the loss on GPU is better than the loss on TPU.

It is showing 0.17 in average for GPU and 0.27 for TPU

TPU is faster my time is 0.10s in average against 0.62s for the GPU.

So why is the TPU showing a worse loss?


Russell Power

Nov 16, 2020, 11:56:48 AM11/16/20
to Gil Motta, TPU Users
Changes in loss/precision tend to be model dependent. The TPU might have lower precision for a portion of the network and this is causing the increase in loss. For object detection models like SSD, you will want to make sure you aren't using bfloat16 for the final regression layer, as that can cause precision problems.

You received this message because you are subscribed to the Google Groups "TPU Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Gil Motta

Nov 16, 2020, 8:50:38 PM11/16/20
to TPU Users, Russell Power, TPU Users, Gil Motta
Hi Russel,

Thanks for the suggestion but I have no idea where to look for bfloat16 for the final regression layer. Would you be able to provide some guidance?

My training on TPU seems to have a high loss. After 200K steps my results were as follows:
I1116 10:50:06.300898 140649977771904] Step 200000 per-step time 0.116s loss=0.139  

My goal is to lower the loss below 0.07 but I don't know how to achieve that.

Reply all
Reply to author
0 new messages