How could I transform a linear layer to conv1x1 layer in chainer?

220 views
Skip to first unread message

eminar...@gmail.com

unread,
Apr 16, 2016, 1:22:33 AM4/16/16
to Chainer User Group
Something can be done with t7 like this.

function convertLinear2Conv1x1(linmodule,in_size)
   local s_in = linmodule.weight:size(2)/(in_size[1]*in_size[2])
   local s_out = linmodule.weight:size(1)
   local convmodule = nn.SpatialConvolutionMM(s_in,s_out,in_size[1],in_size[2],1,1)
   convmodule.weight:copy(linmodule.weight)
   convmodule.bias:copy(linmodule.bias)
   return convmodule
end

How could I transform it in chainer?

I wrote something like this, but don't know if it fits like this.
input is 16x5x5
output fc(400 -> 16)
            conv_fc1 = L.Convolution2D(16,16,1)
            for x in xrange(conv_fc1.W.data.shape[0],5):
                for y in xrange(conv_fc1.W.data.shape[1],5):
                    conv_fc1.W.data[x,y,:,:] = copy.copy(self.fc1.W.data[x:x+5,y:y+5])
            conv_fc1.b.data = copy.copy(self.fc1.b.data)


dont know if it works correct or not

Kenta Oono

unread,
Apr 23, 2016, 7:31:36 PM4/23/16
to Chainer User Group, eminar...@gmail.com
Hi

I would like to ask you to make your question clear.

> nn.SpatialConvolutionMM(s_in,s_out,in_size[1],in_size[2],1,1)
I guess in_size here is (5, 5) (correct me if it is wrong). So what you make here is not a 1x1 convolution.

> input is 16x5x5
> output fc(400 -> 16)
I cannot make out here in two points.
* What does "input" here indicate?
* I thought you want make a convolution, but "output" is fully-connected layer.

Best
Kenta

2016年4月16日土曜日 14時22分33秒 UTC+9 eminar...@gmail.com:

Narcissus Emi

unread,
Apr 24, 2016, 1:43:57 AM4/24/16
to Kenta Oono, Chainer User Group
Hi Kenta,

Thx for your help,
I think I didn't make my question clear.
The idea comes from translation the full connect layer to convolution layer to make the input size not being restricted, this can be found in the Overfeat's sliding window paper.
It trains the network with full connect later and evaluate the network with convolution layer.
For example
Input(3x24x24) -> conv(32x5x5) -> max_pool -> conv(64x5x5) -> max_pool -> fc(576,128) -> fc(128,4)
Will do a classification in train time with full eco next layer and output a 1x4 vector of possibilities.

But in case scanning a full image I will either need to do a full sliding window, or transform the full connect layer to convolution layer to make the input size non restricted, otherwise full connect layer will raise a error with error input size.

As the paper said in the evaluate step it could replace the last step fc(576,128) to a convolution layer with conv(128x3x3) with stride is 1, and pipe the output to a conv(4x1x1)

The question is how to transform the chainer Linear W/b to corresponding convolution layer weight which have a totally different dimension(128,576) and (1x128x3x3) .
Currently I directly use chainer.functions reshape to directly inject the fc weight into a corresponding size conv weight, don't know if this manner works out or not(based on the test it looks works as expected)
2016年4月24日(日) 7:31 Kenta Oono <oo...@preferred.jp>:

Kenta Oono

unread,
Apr 24, 2016, 10:09:04 PM4/24/16
to Chainer User Group, eminar...@gmail.com
Hi

Thank you for your detailed explanation. I think I get the point.

I think what you did is appropriate. But I think it also works correctly,
as long as both weights are c-contiguous, which is true if you do not edit the weight tensor manually,
conv.W[...] = fc.W[...]
where conv is an instance Convolution2D fc is an instance of Linear
(conv(128x3x3) and fc(576,128) in your notation).
So we do not need to use reshape function. But I am not sure it is by far more efficient.


TR; DL.
Here is the explanation why this direct assignment works well.
The crux is that the order of elements of weight tensor Convolution2D links take
inner product with input tensor is identical to that of Linear links.
So we do not have to shuffle the weight tensor when converting from one to the other.

First, the shape of input is (B, 64, 3, 3) (note that the order is # of channels, height, width).

If the input to Linear link is more than 2 dimensional, it is flatten by concatenating axes other than 1st axis inside the link.
So in your case, input shape (B, 64, 3, 3) is converted to (B, 576)
Then Linear link does matrix multiplication with W of shape (128, 576) to get the output of size (B, 128).

On the other hand, Weight tensor W of Convolution2D link is of the shape (oc, ic, kh, kw) where
oc is # of output channels, ic the # input channels and kh and kw are height and width of the kernel, respectively.
In order to convert the tensor of shape (B, 64, 3, 3) to (B, 128, 1, 1), weight tensor of Convolution2D must be
of shape (128, 64, 3, 3).

Best
Kenta

2016年4月24日日曜日 14時43分57秒 UTC+9 Narcissus Emi:

Narcissus Emi

unread,
Apr 25, 2016, 2:47:58 AM4/25/16
to Kenta Oono, Chainer User Group
Ok, thanks for the clarification. I get the point, so in practical doing convolution2d calculation will always flatten the tensor to a 2d dimension, really appreciate the help and building this great tool, it really easy to build and maintain :)
2016年4月25日(月) 10:09 Kenta Oono <oo...@preferred.jp>:
Reply all
Reply to author
Forward
0 new messages