Thank you for valuable comment. I am looking at the VGG-16 and FCN-32s.
In the VGG-16, the parameters of fc6, fc7 are
`pool5(7x7x512)--conv(7x7x512)-->fc6(1x1x4096)--conv(1x1x4096)-->fc7(1x1x4096)--conv(1x1x4096)-->fc8(1x1x1000) (where 1000 is number of classes)`
In the FCN-32s, it is
`pool5(22x22x512)--conv(7x7x512)-->fc6(16x16x4096)--conv(1x1x4096)-->fc7(16x16x4096)--conv(1x1x4096)-->fc8(16x16x1000)`
In the VGG-16, after pool5, they used kernel size 7x7x512 to make a fully connection to all neuron between input and output. While, the FCN-32s use kernel size 7x7x512 for input 22x22x512, to make a local connection (just connect 7x7x512 neuron to a output neuron). At this point, I understood that the FCN-32s converted from fully connection to fully convolution. But after fc6 in FCN-32s, why they do not use kernel 3x3, instead of 1x1? The output of fc8 is 16x16x1000 to retain the spatial output map, instead of designing 1x1x1000. Is it right?
Vào 22:12:23 UTC+9 Thứ Hai, ngày 23 tháng 1 năm 2017, Przemek D đã viết: