Hi,
I am working on a foreground/background segmentation problem using FCN-Alexnet. I took the following steps:
1. train a Alexnet that takes a 256*256*3 image as input and outputs a single label indicates if the center pixel of the patch is foreground (1) or background (0);
2. follow "net-surgery", convert that Alexnet to a fcn, so that it can take input of any size, for example (m*n*3) pixels, and output a heatmap of size (m/32)*(n/32);
3. the result heatmap of step 2 looks reasonable;
5. finetune FCN-Alexnet, so that the network can take input of any size and output a binary mask of the same size.
I have the following questions:
1. At step 5, the loss did decrease, starting from about 0.3 to below 0.1. But during testing, the output mask are all zeros. I checked values of all layers and weights, they are not zero. The problem is from the output of 'score' layer, which has a dimension of (1, 2, m, n). The output 'score(1, 0 , m, n)' are all positive and 'score(1, 1, m, n)' are all negative. So when I take: mask=solver.net.blobs['score'].data[0].argmax(axis=0), the mask is zero.
2. Since I hope the network can take input of any size for testing, I don't know how to set parameter 'offset' in the crop layer. I understand it is related with the structure of the network, but I don't think it is independent of the size of the input image.
3. The crop layer takes the first bottom and crops it according to the dimension of the second bottom. For example, if ('data', (1, 3, 227, 227)), ('label', (1, 1, 256, 256)), and ('upscore', (1, 2, 287, 287)), I hope ('score', (1, 2, 256, 256)). If I set 'data' as the second bottom and set the crop axis to be 2, I always get an error complaining 'data' has 3 channels which is larger than the number of output (2). One option is to set 'label' as the second bottom, but I don't think it is reasonable during testing.
4. parameter 'lr_mult' of the Deconvolution layer is set to 0, does that mean weights of that layer don't need to be tuned?
5. for the structure of FCN-Alexnet, I am not sure if it is able to work by inserting a single Deconvolution layer to Alexnet without any skips mentioned in the paper (I cannot use VGG net for some other concerns).
Maybe I misunderstood something, could someone please help me out?
Thanks in advance!