BatchNormalization with functional API

hanan

unread,

Jan 19, 2017, 7:47:56 AM1/19/17

to Keras-users

hi

I'm trying to add Batch-Normalization to my model.

the model is using the functional API, as I have to share layers.

I tried changing:

layer=Dense(d, activation='relu', init='glorot_normal', bias=True)

to:

layer=Dense(d, init='glorot_normal', bias=True)
layer=BatchNormalization()(layer)
layer=Activation('relu')(layer)

I got exceptions that keras is expecting a "keras tensor".

Tried adding dimension parameters to the layer declarations, and even tried adding the 'mode', 'axis' parameters to the BatchNormalization, but it does not seem to work.

Is it possible to use BatchNormalization with functional API (did not find such example).

If so, what is the correct syntax

thanks

yuanyu...@student.uclouvain.be

unread,

Jan 19, 2017, 8:34:09 AM1/19/17

to Keras-users

Maybe sth. like this

x = keras.layers.Input(x_train.shape[1:])

layer=Dense(d, init='glorot_normal', bias=True)(x)
layer=BatchNormalization()(layer)
layer=Activation('relu')(layer)

good luck

h.ros...@gmail.com

unread,

Jan 19, 2017, 10:30:58 AM1/19/17

to Keras-users, yuanyu...@student.uclouvain.be

I can't use that suggestion, because I have two inputs which are both used with these layers

x = Input(shape=(input_dim,), name='x')
y = Input(shape=(input_dim,), name='y')
h1_dim = ....
wx_h1 = Dense(h1_dim, activation='relu', init='glorot_normal', bias=True, name='wx_h1')
wy_h1 = Dense(h1_dim, activation='relu', init='glorot_normal', bias=True, name='wy_h1')
h1_1 = merge([wx_h1(x), wy_h1(y)], mode='sum', name='h1_1')
h1_2 = merge([wy_h1(x), wx_h1(y)], mode='sum', name='h1_2')

As you can see both wx_h1 and wy_h1 are shared layers

They are not connected to a specific input layer.

I wanted to add batch normalization after these layers

בתאריך יום חמישי, 19 בינואר 2017 בשעה 15:34:09 UTC+2, מאת yuanyu...@student.uclouvain.be:

Matias Valdenegro

unread,

Jan 19, 2017, 10:41:09 AM1/19/17

to keras...@googlegroups.com

I don't understand, layers have to be connected to some input (except output layers), so what you said is not possible. What are the inputs to these Dense layers?

h.ros...@gmail.com

unread,

Jan 19, 2017, 10:55:37 AM1/19/17

to Keras-users

the lines of code are the beginning of the model

I have more layers connected (after the h1_1, h1_2 layers)

and finally I have an output layer and the model is finished

output_layer ...
model = Model(input=[x, y], output=output_layer)

so you can see the inputs are perfectly defined using the x and y definitions of Input() layers.

the model works very well as is.

I was looking to improve its training speed and robustness with BatchNormalization.

So the question remains:

How do I insert BatchNormalization layers in my model?

thanks

בתאריך יום חמישי, 19 בינואר 2017 בשעה 17:41:09 UTC+2, מאת Matias Valdenegro:

Matias Valdenegro

unread,

Jan 19, 2017, 11:30:55 AM1/19/17

to keras...@googlegroups.com

Please provide the complete model, you are only giving us parts and it is very hard to see the whole picture. I asked how the Dense layers are connected to the input and still I don't understand that.

On Thursday, 19 January 2017 07:55:37 GMT h.ros...@gmail.com wrote:

> the lines of code are the beginning of the model

> I have more layers connected (after the h1_1, h1_2 layers)

> and finally I have an output layer and the model is finished

> output_layer ...

> model = Model(input=[x, y], output=output_layer)

>

> so you can see the inputs are perfectly defined using the x and y

> definitions of Input() layers.

> the model works very well as is.

> I was looking to improve its training speed and robustness with

> BatchNormalization.

>

> So the question remains:

> How do I insert BatchNormalization layers in my model?

>

> thanks

>

> בתאריך יום חמישי, 19 בינואר 2017 בשעה 17:41:09 UTC+2, מאת Matias Valdenegro:

> > I don't understand, layers have to be connected to some input (except

> > output layers), so what you said is not possible. What are the inputs to

> > these Dense layers?

> >

> > On Thursday, 19 January 2017 07:30:58 GMT h.ros...@gmail.com <javascript:>

Daπid

unread,

Jan 19, 2017, 11:38:33 AM1/19/17

to hanan, Keras-users

On 19 January 2017 at 13:47, hanan <h.ros...@gmail.com> wrote:
> layer=Dense(d, init='glorot_normal', bias=True)

You must give that layer an input, either an Input layer or the output
of any other layer.

layer=Dense(d, init='glorot_normal', bias=True)(prev_layer)

h.ros...@gmail.com

unread,

Jan 19, 2017, 12:09:20 PM1/19/17

to Keras-users, h.ros...@gmail.com

Here is the model (slightly trimmed down, but it gives the full picture)

x = Input(shape=(5,), name='x')
y = Input(shape=(5,), name='y')
h_dim = 64
wx_h = Dense(h_dim, activation='relu', init='glorot_normal', bias=True, name='wx_h')
wy_h = Dense(h_dim, activation='relu', init='glorot_normal', bias=True, name='wy_h')
h_1 = Dropout(0.1)(merge([wx_h(x), wy_h(y)], mode='sum', name='h_1'))
h_2 = Dropout(0.1)(merge([wy_h(x), wx_h(y)], mode='sum', name='h_2'))
wh1_o1 = Dense(1, activation='tanh', init='glorot_uniform', bias=True, name='wh1_o1')
wh2_o1 = Dense(1, activation='tanh', init='glorot_uniform', bias=True, name='wh2_o1')
o1 = merge([wh1_o1(h_1), wh2_o1(h_2)], mode='sum', name='out_1')
o2 = merge([wh2_o1(h_1), wh1_o1(h_2)], mode='sum', name='out_2')
output = merge([o1, o2], mode=softmax2, output_shape=(2,), name='output')
model = Model(input=[x, y], output=output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['categorical_accuracy'])

the input to the model are two vectors (x,y) of the same dimensionality and the same features

The output gives two probabilities using the following softmax function (in the mode argument of the merge layer)

def softmax2(two_tensors):
    return K.softmax(K.concatenate(two_tensors))

now, can someone help me in adding a BatchNormalization layer for the hidden layer

I tried modifying the code like this:

wx_h = Dense(h_dim, init='glorot_normal', bias=True, name='wx_h')
wx_h = BatchNormalization()(wx_h)
wx_h = Activation('relu')(wx_h)
wy_h = Dense(h_dim, init='glorot_normal', bias=True, name='wy_h')
wy_h = BatchNormalization()(wy_h)
wy_h = Activation('relu')(wy_h)

but received the following error:

Exception: ('Not a Keras tensor:', <keras.layers.core.Dense object at 0x......>)

בתאריך יום חמישי, 19 בינואר 2017 בשעה 18:38:33 UTC+2, מאת David Menéndez Hurtado:

Matias Valdenegro

unread,

Jan 19, 2017, 12:23:03 PM1/19/17

to keras...@googlegroups.com

Ok, the code is slightly obfuscated as you instantiate the layers and "call" them later, that is fine.

To use Batch Normalization, just replace your dropout layers with Batch Normalization, like:

h_1 = BatchNormalization()(merge([wx_h(x), wy_h(y)], mode='sum', name='h_1'))
h_2 = BatchNormalization()(merge([wy_h(x), wx_h(y)], mode='sum', name='h_2'))

> > On 19 January 2017 at 13:47, hanan <h.ros...@gmail.com <javascript:>>

h.ros...@gmail.com

unread,

Jan 19, 2017, 2:10:25 PM1/19/17

to Keras-users

I read that it is preferable to add the batch normalization before the activation.
That is the reason I wanted to add them to the individual Dense layers that make up the merged hidden layer and apply the activation after the normalization.

Is there any way to accomplish that?

Matias Valdenegro

unread,

Jan 19, 2017, 2:38:14 PM1/19/17

to keras...@googlegroups.com

Yes, of course that can be done, you just need to refactor your code a little:

x = Input(shape=(5,), name='x')
y = Input(shape=(5,), name='y')
h_dim = 64

wx_h = Dense(h_dim, activation='relu', init='glorot_normal', bias=True, name='wx_h')

wy_h = Dense(h_dim, activation='relu', init='glorot_normal', bias=True, name='wy_h')

wx_hx = BatchNormalization()(wx_h(x))

wx_hy = BatchNormalization()(wx_h(y))

wy_hx = BatchNormalization()(wy_h(x))

wy_hy = BatchNormalization()(wy_h(y))

h_1 = Dropout(0.1)(merge([wx_hx, wy_hy], mode='sum', name='h_1'))
h_2 = Dropout(0.1)(merge([wy_hx, wx_hy], mode='sum', name='h_2'))

Also, maybe you should drop the Dropout layers, they are not needed if you use Batch Normalization.

hanan

unread,

Jan 19, 2017, 5:50:21 PM1/19/17

to Keras-users

thanks

Oddly, instead of speeding up the learning it seems to learn slower and even worse it achieves lower accuracy.

I tried all combination: with/without Dropout ; before/after Activation.

It seems that the combination of dropout after BN after activation works best (as described in link) but is still inferior to the architecture without BN.

any clue? could it be that the BN adds many parameters that have to be learned? is it perhaps related to the BN parameters (mode/axis) - I tried modifying them but t did not improve performance

בתאריך יום חמישי, 19 בינואר 2017 בשעה 21:38:14 UTC+2, מאת Matias Valdenegro:

hanan

unread,

Jan 31, 2017, 9:13:03 AM1/31/17

to Keras-users

hi Matias,

thanks for your reply.

I used your method only to realize now that it breaks the symmetry of the network based on shared layers.

My initial model made sure that if [o1,o2]=net([x,y]) then [o2,o1]=net([y,x]) and that [0.5,0.5]=net([x,x])

that is no longer fulfilled when applying the batch-normalization separately on the sub layers of the network

Is there a way around that?

I would like to apply a batch normalization on the whole merged layer, and not on each of its constituents

thanks

Hanan

בתאריך יום חמישי, 19 בינואר 2017 בשעה 21:38:14 UTC+2, מאת Matias Valdenegro:

Yes, of course that can be done, you just need to refactor your code a little:

Reply all

Reply to author

Forward