[best-practice] Loss reduction in Estimator with MirroredStrategy

jonath...@gmail.com

unread,

Jun 7, 2019, 4:02:52 PM6/7/19

to TensorFlow Community Testing

Hello,

I want to get a idea of the correct practice for loss reduction using Estimators with MirroredStrategy.

The distributed training guide (as well as this supplemental guide) specify that custom training loops should use tf.reduce_sum, followed by division by GLOBAL_BATCH_SIZE. But, neither specify the correct reduction method for loss functions in Estimator.

I tried to find a suitable practice by looking at a couple of official code examples, which both use tf.keras.losses. This module defaults to SUM_OVER_BATCH_SIZE, but it is not immediately obvious from the code whether batch size refers to per-replica, or global. Besides, I would prefer not to use tf.keras.losses for transparency when defining custom loss functions (GAN hinge loss, for example).

My question is this:

When defining a model training scheme in tf.estimator, should I use tf.reduce_mean(per_example_loss), or tf.reduce_sum(per_example_loss) * (1 / GLOBAL_BATCH_SIZE) ?

Thanks, and apologies if this is not the best place to ask.

-Jon

Pavithra Vijay

unread,

Jun 18, 2019, 3:26:52 PM6/18/19

to TensorFlow Community Testing, jonath...@gmail.com

Hello,

The loss reduction to use with Estimator will depend on the optimizer you are using.

In TensorFlow 1.x with Optimizer V1, the optimizer takes care of scaling per-replica loss values. In this use case, you will need to reduce loss using SUM_OVER_BATCH_SIZE/tf.reduce_mean reduction. In TensorFlow 2.0 with Optimizer v2 and custom estimators, you will have to follow the same process as custom training loops (Use tf.reduce_sum, followed by division with GLOBAL_BATCH_SIZE).

We are in the process of writing a guide with this information.

Thank you,

Pavithra

jonath...@gmail.com

unread,

Jun 18, 2019, 4:03:04 PM6/18/19

to TensorFlow Community Testing, jonath...@gmail.com

Thank you, Pavithra, that answers my question fully. I am glad to hear you are working on an Estimator v2 guide, and I'm hopeful that the Estimator API will continue to be supported and developed throughout 2.x!

Best,
Jon

Fabien Tarrade

unread,

Jul 26, 2019, 4:07:32 PM7/26/19

to Pavithra Vijay, TensorFlow Community Testing, jonath...@gmail.com

Hi Pavithra,

any update on the writting of the guide or some blog post or Colab on how to build custom estimator with Tensorflow 2.0 ?

I am still having issue to get a full and complete example working.

For example:
- not using tf.compat.v1 (for example tf.keras.optimizers.Adam, optimizer.minimize ..)
   https://stackoverflow.com/questions/57134808/tf-keras-optimizers-adam-with-tf-estimator-model-in-tensorflow-2-0-beta-is-crash
- export the model:
   https://github.com/tensorflow/tensorflow/issues/27345
   it seems that I am almost there but I need to adapt the tf.estimator.export.ServingInputReceiver

Having an complete example in the documenation will be great, so far it only cover basic things (still use tf.compat.v1, no model exporter, no tf.summary.scalar, ...)

I tested again this morning with the lastest nightly. I don't know if some part of my code are still wrong or if some component of estimator are still not fully ready.

Thanks a lot
Cheers
Fabien

--
To unsubscribe from this group and stop receiving emails from it, send an email to testing+u...@tensorflow.org.

--

Dr. Fabien Tarrade

Senior Data Scientist at AXA

I am a senior Data Scientist at AXA with the mission of helping AXA becoming a data driven organisation by using advanced analytics and Big Data.

I have over 10 years of experience in management of large projects, processing, modelling and statistical treatment of large volume of experimental data

up to 10 petabytes as well as the development and maintenance of advanced and complex computer programs.

Zurich, Switzerland