Batch Normalisation In Convolutional Neural Network

534 views
Skip to first unread message

akshata bhat

unread,
Jul 24, 2016, 12:04:18 PM7/24/16
to Keras-users

I am newbie in convolutional neural networks and just have idea about feature maps and how convolution is done on images to extract features. I would be glad to know some details on applying batch normalisation in CNN.

I read this paper https://arxiv.org/pdf/1502.03167v3.pdf and could understand the BN algorithm applied on a data but in the end they mentioned that a slight modification is required when applied to CNN. 

For convolutional layers, we additionally want the normalization to obey the convolutional property – so that different elements of the same feature map, at different locations, are normalized in the same way. To achieve this, we jointly normalize all the activations in a mini- batch, over all locations. In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations – so for a mini-batch of size m and feature maps of size p × q, we use the effec- tive mini-batch of size m′ = |B| = m · pq. We learn a pair of parameters γ(k) and β(k) per feature map, rather than per activation. Alg. 2 is modified similarly, so that during inference the BN transform applies the same linear transformation to each activation in a given feature map.

I am total confused when they say "so that different elements of the same feature map, at different locations, are normalized in the same way"

I know what feature maps mean and different elements are the weights in every feature map. But I could not understand what location or spatial location are?

I could not understand the below sentence at all "In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations"

I would be glad if someone could elaborate and explain me in much simpler terms.

Thanks in advance.

Daπid

unread,
Jul 24, 2016, 1:17:07 PM7/24/16
to akshata bhat, Keras-users
On 24 July 2016 at 18:04, akshata bhat <akshat...@gmail.com> wrote:
> for a mini-batch of size m and feature maps of size p × q, we use the effec-
> tive mini-batch of size m′ = |B| = m · pq.

This is how I understand it. A CNN uses a small window, shared across
the whole input image (what they call feature map), and has the
important property of being translationally invariant. Batch
normalisation per pixel breaks this invariance, as it applies a
different normalisation function at each point. So, what the authors
suggest instead is to apply the same normalisation to the whole
channel: use the same mean and std for all of them.

akshata bhat

unread,
Jul 25, 2016, 7:38:08 AM7/25/16
to Keras-users, akshat...@gmail.com
I believe that feature map is a spare output of an image what you get after convolving a kernel or a filter with an image. 

I am still unclear about what they term as different locations. 

Just wanted to clear my idea with an example. So basically if we have 10 feature maps of size 5x5 and mini batch size of 20 so do we try to normalise every feature map individually? So the new mini batch size is = 20 * 25.(25 because the feature map is of size 5x5). I am confused if individual feature map is normalised with its own mean and variance or the mean and variance is the same for all the 10 feature maps. If the latter is the case what will be the new updated mini batch size? 

Waiting for your reply.

Daπid

unread,
Jul 25, 2016, 8:12:46 AM7/25/16
to akshata bhat, Keras-users
On 25 July 2016 at 13:38, akshata bhat <akshat...@gmail.com> wrote:
> I believe that feature map is a spare output of an image what you get after
> convolving a kernel or a filter with an image.

Yes, for the whole image.

If your input is an image of 50 x 50, your convolution of size 5 and
border mode "valid", your first feature map (whole image convolved
with the filter) would be 46 x 46, the second 42 x 42...
> --
> You received this message because you are subscribed to the Google Groups
> "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to keras-users...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/keras-users/af01876e-d6da-48ea-8293-d7aa90cc4ea2%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

akshata bhat

unread,
Jul 25, 2016, 8:25:39 AM7/25/16
to Keras-users, akshat...@gmail.com
 image of 50 x 50 and and if we use 7 filters of size 5 x 5. Then the feature map output will be 46 x 46 of depth 7. 
Which means we have feature maps of depth 10. 

Example we are using mininbatch = 10 (meaning 10 images)

Based on this sentence "In Alg. 1, we let B be the set of all values in a feature map across both the elements of a mini-batch and spatial locations" 
As per our example  will our new mini batch size be 10 x (46x46) ? 

i.e are we trying to compute the mean and variance for every individuall feature map  or is through all the elements of all 7 feature maps?
Reply all
Reply to author
Forward
0 new messages