model:training() gives better results than model:evaluate()

Kiran Vaidhya

unread,

Oct 11, 2016, 6:50:57 AM10/11/16

to torch7

Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

soumith

unread,

Oct 11, 2016, 11:17:07 AM10/11/16

to torch7 on behalf of Kiran Vaidhya

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7...@googlegroups.com> wrote:

Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Kiran Vaidhya

unread,

Oct 11, 2016, 11:50:30 AM10/11/16

to torch7

Having high mini-batches is practically impossible for training deep U-Nets on a Titan X, if I'm correct. Is there a workaround for this? Thanks for your help.

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:

Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.

To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.

soumith

unread,

Oct 11, 2016, 12:56:23 PM10/11/16

to torch7 on behalf of Kiran Vaidhya

I meant, run your network 100 iterations with the same mini-batch size as you have now.

On Tue, Oct 11, 2016 at 11:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7...@googlegroups.com> wrote:

Having high mini-batches is practically impossible for training deep U-Nets on a Titan X, if I'm correct. Is there a workaround for this? Thanks for your help.

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0...@googlegroups.com> wrote:
Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "torch7" group.

To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.

Kiran Vaidhya

unread,

Oct 12, 2016, 1:48:23 AM10/12/16

to torch7

Do you mean to say that I run 100 iterations of :forward :backward and then update the weights for batchnorm to get better estimates of running_mean and running_var?

If that's the case, I'm assuming that the updates will be much smoother and I can train it with a slightly higher learning rate to counter the drop in frequency of weight updates.

On Tuesday, October 11, 2016 at 10:26:23 PM UTC+5:30, smth chntla wrote:

I meant, run your network 100 iterations with the same mini-batch size as you have now.

On Tue, Oct 11, 2016 at 11:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:

Having high mini-batches is practically impossible for training deep U-Nets on a Titan X, if I'm correct. Is there a workaround for this? Thanks for your help.

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:
Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Kiran Vaidhya

unread,

Oct 12, 2016, 1:58:36 AM10/12/16

to torch7

Or do I just take a dummy set of 100 mini-batches with and do :forward with :training to get the estimates before I switch to :evaluate?

Kiran Vaidhya

unread,

Oct 12, 2016, 3:26:50 AM10/12/16

to torch7

https://github.com/torch/nn/blob/master/BatchNormalization.lua

The running_mean and running_var don't get cleared even if I call :clearState(). Given this, I am already doing :forward during the entire training epoch. Won't it already have good estimates after the first training epoch?

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:

Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.

To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.

soumith

unread,

Oct 12, 2016, 9:34:09 AM10/12/16

to torch7 on behalf of Kiran Vaidhya

> Or do I just take a dummy set of 100 mini-batches with and do :forward with :training to get the estimates before I switch to :evaluate?

Yea this is what I meant.

even though running_mean / running_std dont get cleared up, they have a momentum term

On Wed, Oct 12, 2016 at 3:26 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7...@googlegroups.com> wrote:

https://github.com/torch/nn/blob/master/BatchNormalization.lua

The running_mean and running_var don't get cleared even if I call :clearState(). Given this, I am already doing :forward during the entire training epoch. Won't it already have good estimates after the first training epoch?

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0...@googlegroups.com> wrote:
Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "torch7" group.

To unsubscribe from this group and stop receiving emails from it, send an email to torch7+unsubscribe@googlegroups.com.

Kiran Vaidhya

unread,

Oct 22, 2016, 9:58:09 AM10/22/16

to torch7

model:training() still gives much better results. This doesn't seem to be improving the results. Is this akin to virtual batch normalization as mentioned in the "Improved GANs" paper?

On Wednesday, October 12, 2016 at 7:04:09 PM UTC+5:30, smth chntla wrote:

> Or do I just take a dummy set of 100 mini-batches with and do :forward with :training to get the estimates before I switch to :evaluate?

Yea this is what I meant.

even though running_mean / running_std dont get cleared up, they have a momentum term

On Wed, Oct 12, 2016 at 3:26 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:

https://github.com/torch/nn/blob/master/BatchNormalization.lua

The running_mean and running_var don't get cleared even if I call :clearState(). Given this, I am already doing :forward during the entire training epoch. Won't it already have good estimates after the first training epoch?

On Tuesday, October 11, 2016 at 8:47:07 PM UTC+5:30, smth chntla wrote:

Run :forward with 100 mini-batches in training mode, and then switch to evaluate mode. Then the batchnorm will have a much better estimate of running_mean and running_var for evaluate mode. that's prob the difference.

On Tue, Oct 11, 2016 at 6:50 AM, Kiran Vaidhya via torch7 <torch7+APn2wQfFR1ms6Ux0Y1iGMGXI7Cdso_yMFr3BWTX2IQcYIV8VRUe2V0NxC@googlegroups.com> wrote:
Hi,

I'm training a fully convolutional U-Net with SpatialBatchNormalization and SpatialDropout layers in my network. model:training() is consistently giving better predictions than model:evaluate() for my semantic segmentation tasks. Why is that so? I switched off SpatialDropout and yet got the same scenario. Does it have anything to do with a buggy SpatialBatchNormalization module?

--
You received this message because you are subscribed to the Google Groups "torch7" group.
To unsubscribe from this group and stop receiving emails from it, send an email to torch7+un...@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward