Batch normalization and Weight normalization for 1d CNN

Obayogy Jaj

unread,

Jan 25, 2017, 2:12:25 AM1/25/17

to lasagne-users

Hi,

I want to implement a multi layer 1d CNN with batch normalization[link] or weight normalization [1 ]

but I found the code of author could be run correctly for conv1dlayer

convB = Conv1DLayer( h , num_filters, filter_size , pad ='same', nonlinearity=lasagne.nonlinearities.rectify )

conv1 = weight_norm(conv1)

anyone could give me some advices
and batch_norm of lasagne could be used to conv1d ?

thank you

[1] Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks

Jan Schlüter

unread,

Jan 26, 2017, 10:53:25 AM1/26/17

to lasagne-users

I want to implement a multi layer 1d CNN with batch normalization[link] or weight normalization [1 ]

but I found the code of author could be run correctly for conv1dlayer

convB = Conv1DLayer( h , num_filters, filter_size , pad ='same', nonlinearity=lasagne.nonlinearities.rectify )
conv1 = weight_norm(conv1)

anyone could give me some advices

The code of the authors does not handle 1d-convolution, but it's easy to extend. After https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L240-L243, add:
elif incoming.W_param.ndim == 3:
W_axes_to_sum = (1,2)
W_dimshuffle_args = [0,'x','x']
Untested, but this will probably be enough.

We should add weight normalization to Lasagne some time, but with a simpler implementation.

and batch_norm of lasagne could be used to conv1d ?

Yes. If you combine it with weight normalization, note that the authors recommend mean-only batch normalization, which you'll need to copy from their code as well.

Best, Jan

Obayogy Jaj

unread,

Jan 26, 2017, 6:26:47 PM1/26/17

to lasagne-users

hi, Jan

after adding, it will cause error of "ValueError: Input dimension mis-match " ....

在 2017年1月27日星期五 UTC+9上午12:53:25，Jan Schlüter写道：

Obayogy Jaj

unread,

Jan 26, 2017, 7:03:21 PM1/26/17

to lasagne-users

hi, Jan

I tested the code after add what you said

1）I found error at https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L259, the adding of input and dimshuffle b

2） may be caused by line https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L225, the k = self.input_shape[1] , but this k is the num_input_channels for 2D convolution input (batch_size, num_input_channels, input_rows, input_columns),

but for conv1d, our input_shape is (n_batch, seq_len , n_dim ), there is no num_input_channels for conv1d input.

3） after https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L232 , i add

elif len(self.input_shape)== 3:
self.axes_to_sum = (1,2,3)
self.dimshuffle_args = ['x','x','x']

But this is still not correct ...

I donot know how to set and use dimshuffle correctly here,

thanks very much.

在 2017年1月27日星期五 UTC+9上午12:53:25，Jan Schlüter写道：

I want to implement a multi layer 1d CNN with batch normalization[link] or weight normalization [1 ]

Jan Schlüter

unread,

Jan 27, 2017, 5:01:25 AM1/27/17

to lasagne-users

1）I found error at https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L259

Sorry, hadn't read the full code. Yes, this also needs to be adapted.

3） after https://github.com/openai/weightnorm/blob/55917c3/lasagne/nn.py#L232 , i add

elif len(self.input_shape)== 3:
self.axes_to_sum = (1,2,3)
self.dimshuffle_args = ['x','x','x']

By comparison to 2D convolution, it should be (0, 2) and ['x', 0, 'x'].

for conv1d, our input_shape is (n_batch, seq_len , n_dim ), there is no num_input_channels for conv1d input

conv1d input can be interpreted as (n_batch, num_input_channels, input_rows). Note that if you have a dimension that may change in length (like seq_len), it should be the last one. The number of input channels should be kept constant, it's equivalent to the number of units in a dense layer (i.e., it indicates the number of features). The Conv1DLayer will learn a separate bias for each input channel, so it's important the number of channels is fixed. And actually, even if the sequence length is fixed, you will not want to learn a separate bias per time step, but a separate bias per feature. So if you need to combine Conv1DLayer and recurrent layers, you will need a DimshuffleLayer(..., (0, 2, 1)) in between.

Best, Jan

Reply all

Reply to author

Forward