Issue with Loss in Semantic Segmentation

Filip K

unread,

Jan 19, 2017, 9:46:19 AM1/19/17

to Caffe Users

Hi!

So I have been using Pascal-Context impleemntation, but I decided to reduce the number of classes from over 400 to just 21.

However, when I train my network my loss seems to stay at the same level ( it doesn't seem to decrease). Below is the log:

I0119 14:27:55.311362  8120 solver.cpp:228] Iteration 80, loss = 569988
I0119 14:27:55.311862  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:27:55.312595  8120 sgd_solver.cpp:106] Iteration 80, lr = 1e-14
I0119 14:28:02.795116  8120 solver.cpp:228] Iteration 100, loss = 543188
I0119 14:28:02.795616  8120 solver.cpp:244]     Train net output #0: loss = 551751 (* 1 = 551751 loss)
I0119 14:28:02.796355  8120 sgd_solver.cpp:106] Iteration 100, lr = 1e-14
I0119 14:28:10.056371  8120 solver.cpp:228] Iteration 120, loss = 514040
I0119 14:28:10.056730  8120 solver.cpp:244]     Train net output #0: loss = 514659 (* 1 = 514659 loss)
I0119 14:28:10.057231  8120 sgd_solver.cpp:106] Iteration 120, lr = 1e-14
I0119 14:28:17.999840  8120 solver.cpp:228] Iteration 140, loss = 541010
I0119 14:28:18.000371  8120 solver.cpp:244]     Train net output #0: loss = 514659 (* 1 = 514659 loss)
I0119 14:28:18.000991  8120 sgd_solver.cpp:106] Iteration 140, lr = 1e-14
I0119 14:28:25.613986  8120 solver.cpp:228] Iteration 160, loss = 563111
I0119 14:28:25.613986  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:28:25.615470  8120 sgd_solver.cpp:106] Iteration 160, lr = 1e-14
I0119 14:28:33.218699  8120 solver.cpp:228] Iteration 180, loss = 557161
I0119 14:28:33.218699  8120 solver.cpp:244]     Train net output #0: loss = 514659 (* 1 = 514659 loss)
I0119 14:28:33.219519  8120 sgd_solver.cpp:106] Iteration 180, lr = 1e-14
I0119 14:28:40.979645  8120 solver.cpp:228] Iteration 200, loss = 578025
I0119 14:28:40.979645  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:28:40.981118  8120 sgd_solver.cpp:106] Iteration 200, lr = 1e-14
I0119 14:28:48.750989  8120 solver.cpp:228] Iteration 220, loss = 578375
I0119 14:28:48.750989  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:28:48.751458  8120 sgd_solver.cpp:106] Iteration 220, lr = 1e-14
I0119 14:28:56.158421  8120 solver.cpp:228] Iteration 240, loss = 526522
I0119 14:28:56.158924  8120 solver.cpp:244]     Train net output #0: loss = 210685 (* 1 = 210685 loss)
I0119 14:28:56.159423  8120 sgd_solver.cpp:106] Iteration 240, lr = 1e-14
I0119 14:29:03.535884  8120 solver.cpp:228] Iteration 260, loss = 543267
I0119 14:29:03.536386  8120 solver.cpp:244]     Train net output #0: loss = 516204 (* 1 = 516204 loss)
I0119 14:29:03.536921  8120 sgd_solver.cpp:106] Iteration 260, lr = 1e-14
I0119 14:29:11.308384  8120 solver.cpp:228] Iteration 280, loss = 579332
I0119 14:29:11.308892  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:29:11.309888  8120 sgd_solver.cpp:106] Iteration 280, lr = 1e-14
I0119 14:29:18.903471  8120 solver.cpp:228] Iteration 300, loss = 555734
I0119 14:29:18.903954  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:29:18.904763  8120 sgd_solver.cpp:106] Iteration 300, lr = 1e-14
I0119 14:29:26.294591  8120 solver.cpp:228] Iteration 320, loss = 532437
I0119 14:29:26.294591  8120 solver.cpp:244]     Train net output #0: loss = 514659 (* 1 = 514659 loss)
I0119 14:29:26.295562  8120 sgd_solver.cpp:106] Iteration 320, lr = 1e-14
I0119 14:29:33.888439  8120 solver.cpp:228] Iteration 340, loss = 557779
I0119 14:29:33.888439  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:29:33.889442  8120 sgd_solver.cpp:106] Iteration 340, lr = 1e-14
I0119 14:29:41.325605  8120 solver.cpp:228] Iteration 360, loss = 533558
I0119 14:29:41.325605  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:29:41.326588  8120 sgd_solver.cpp:106] Iteration 360, lr = 1e-14
I0119 14:29:48.659155  8120 solver.cpp:228] Iteration 380, loss = 522881
I0119 14:29:48.659662  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:29:48.660660  8120 sgd_solver.cpp:106] Iteration 380, lr = 1e-14
                                               I0119 14:29:56.181491  8120 solver.cpp:228] Iteration 400, loss = 543637
I0119 14:29:56.181637  8120 solver.cpp:244]     Train net output #0: loss = 513113 (* 1 = 513113 loss)
I0119 14:29:56.181637  8120 sgd_solver.cpp:106] Iteration 400, lr = 1e-14
I0119 14:30:03.675469  8120 solver.cpp:228] Iteration 420, loss = 540114
I0119 14:30:03.675469  8120 solver.cpp:244]     Train net output #0: loss = 514659 (* 1 = 514659 loss)
I0119 14:30:03.675971  8120 sgd_solver.cpp:106] Iteration 420, lr = 1e-14
I0119 14:30:11.232576  8120 solver.cpp:228] Iteration 440, loss = 550129
I0119 14:30:11.232576  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:30:11.233045  8120 sgd_solver.cpp:106] Iteration 440, lr = 1e-14
I0119 14:30:18.889575  8120 solver.cpp:228] Iteration 460, loss = 561374
I0119 14:30:18.890089  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:30:18.892557  8120 sgd_solver.cpp:106] Iteration 460, lr = 1e-14
I0119 14:30:26.599063  8120 solver.cpp:228] Iteration 480, loss = 566732
I0119 14:30:26.599370  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:30:26.601410  8120 sgd_solver.cpp:106] Iteration 480, lr = 1e-14
I0119 14:30:34.404059  8120 solver.cpp:228] Iteration 500, loss = 579262
I0119 14:30:34.404531  8120 solver.cpp:244]     Train net output #0: loss = 550206 (* 1 = 550206 loss)
I0119 14:30:34.407035  8120 sgd_solver.cpp:106] Iteration 500, lr = 1e-14
I0119 14:30:42.113204  8120 solver.cpp:228] Iteration 520, loss = 563343
I0119 14:30:42.113204  8120 solver.cpp:244]     Train net output #0: loss = 579571 (* 1 = 579571 loss)
I0119 14:30:42.115710  8120 sgd_solver.cpp:106] Iteration 520, lr = 1e-14

I have no idea what does "Train net output #0: loss = XXXX " mean, but it always seems to return to the same value of 579571 ( which is also a value at the start).

I haven't changed anything in my solver, so I am not sure what is happening, but I was expecting loss to decrease.

Moreover, after 4000 iterations, I obtained a snapshot and decided to test it - I was using an image of sofa ( which is something that is in the list of my labels), but for every pixel I got value of 0 ( which corresponds to ground in my case).

This is my solver:

train_net: "train.prototxt"
test_net: "val.prototxt"
test_iter: 5105
# make test net, but don't invoke it from the solver itself
test_interval: 999999999
display: 20
average_loss: 20
lr_policy: "fixed"
# lr for unnormalized softmax
base_lr: 1e-12
# high momentum
momentum: 0.99
# no gradient accumulation
iter_size: 1
max_iter: 300000
weight_decay: 0.0005
snapshot: 4000
snapshot_prefix: "snapshot/train"
test_initialization: false

The only thing that I modified from the original code ( see references) is:

1) wherever I found num_output 60, I modified it to 22

2) I renamed all the layers in val.prototxt and train.prototxt, where num_output was originally 60 to perform the tuning.

This is my train.prototxt, in case needed:

layer {
  name: "data"
  type: "Python"
  top: "data"
  top: "label"
  python_param {
    module: "pascalcontext_layers"
    layer: "PASCALContextSegDataLayer"
    param_str: "{\'context_dir\': \'C:\\Users\\XXX\\Downloads\\New folder\\caffe\\python\\data\\pascal-context\', \'seed\': 1337, \'split\': \'train\', \'voc_dir\': \'C:\\Users\\XXX\\Downloads\\New folder\\caffe\\python\\data/pascal\'}"
  }
}
layer {
  name: "conv1_1"
  type: "Convolution"
  bottom: "data"
  top: "conv1_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 100
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu1_1"
  type: "ReLU"
  bottom: "conv1_1"
  top: "conv1_1"
}
layer {
  name: "conv1_2"
  type: "Convolution"
  bottom: "conv1_1"
  top: "conv1_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu1_2"
  type: "ReLU"
  bottom: "conv1_2"
  top: "conv1_2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1_2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv2_1"
  type: "Convolution"
  bottom: "pool1"
  top: "conv2_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu2_1"
  type: "ReLU"
  bottom: "conv2_1"
  top: "conv2_1"
}
layer {
  name: "conv2_2"
  type: "Convolution"
  bottom: "conv2_1"
  top: "conv2_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu2_2"
  type: "ReLU"
  bottom: "conv2_2"
  top: "conv2_2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2_2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv3_1"
  type: "Convolution"
  bottom: "pool2"
  top: "conv3_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_1"
  type: "ReLU"
  bottom: "conv3_1"
  top: "conv3_1"
}
layer {
  name: "conv3_2"
  type: "Convolution"
  bottom: "conv3_1"
  top: "conv3_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_2"
  type: "ReLU"
  bottom: "conv3_2"
  top: "conv3_2"
}
layer {
  name: "conv3_3"
  type: "Convolution"
  bottom: "conv3_2"
  top: "conv3_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu3_3"
  type: "ReLU"
  bottom: "conv3_3"
  top: "conv3_3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3_3"
  top: "pool3"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv4_1"
  type: "Convolution"
  bottom: "pool3"
  top: "conv4_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_1"
  type: "ReLU"
  bottom: "conv4_1"
  top: "conv4_1"
}
layer {
  name: "conv4_2"
  type: "Convolution"
  bottom: "conv4_1"
  top: "conv4_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_2"
  type: "ReLU"
  bottom: "conv4_2"
  top: "conv4_2"
}
layer {
  name: "conv4_3"
  type: "Convolution"
  bottom: "conv4_2"
  top: "conv4_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu4_3"
  type: "ReLU"
  bottom: "conv4_3"
  top: "conv4_3"
}
layer {
  name: "pool4"
  type: "Pooling"
  bottom: "conv4_3"
  top: "pool4"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "conv5_1"
  type: "Convolution"
  bottom: "pool4"
  top: "conv5_1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_1"
  type: "ReLU"
  bottom: "conv5_1"
  top: "conv5_1"
}
layer {
  name: "conv5_2"
  type: "Convolution"
  bottom: "conv5_1"
  top: "conv5_2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_2"
  type: "ReLU"
  bottom: "conv5_2"
  top: "conv5_2"
}
layer {
  name: "conv5_3"
  type: "Convolution"
  bottom: "conv5_2"
  top: "conv5_3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
    stride: 1
  }
}
layer {
  name: "relu5_3"
  type: "ReLU"
  bottom: "conv5_3"
  top: "conv5_3"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5_3"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "Convolution"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 7
    stride: 1
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "Convolution"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 4096
    pad: 0
    kernel_size: 1
    stride: 1
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "score_frtest"
  type: "Convolution"
  bottom: "fc7"
  top: "score_frtest"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 22
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "upscore2test"
  type: "Deconvolution"
  bottom: "score_frtest"
  top: "upscore2test"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 22
    bias_term: false
    kernel_size: 4
    stride: 2
  }
}
layer {
  name: "score_pool4test"
  type: "Convolution"
  bottom: "pool4"
  top: "score_pool4test"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 22
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "score_pool4c"
  type: "Crop"
  bottom: "score_pool4test"
  bottom: "upscore2test"
  top: "score_pool4c"
  crop_param {
    axis: 2
    offset: 5
  }
}
layer {
  name: "fuse_pool4"
  type: "Eltwise"
  bottom: "upscore2test"
  bottom: "score_pool4c"
  top: "fuse_pool4"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "upscore_pool4test"
  type: "Deconvolution"
  bottom: "fuse_pool4"
  top: "upscore_pool4test"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 22
    bias_term: false
    kernel_size: 4
    stride: 2
  }
}
layer {
  name: "score_pool3test"
  type: "Convolution"
  bottom: "pool3"
  top: "score_pool3test"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 22
    pad: 0
    kernel_size: 1
  }
}
layer {
  name: "score_pool3c"
  type: "Crop"
  bottom: "score_pool3test"
  bottom: "upscore_pool4test"
  top: "score_pool3c"
  crop_param {
    axis: 2
    offset: 9
  }
}
layer {
  name: "fuse_pool3"
  type: "Eltwise"
  bottom: "upscore_pool4test"
  bottom: "score_pool3c"
  top: "fuse_pool3"
  eltwise_param {
    operation: SUM
  }
}
layer {
  name: "upscore8test"
  type: "Deconvolution"
  bottom: "fuse_pool3"
  top: "upscore8test"
  param {
    lr_mult: 0
  }
  convolution_param {
    num_output: 22
    bias_term: false
    kernel_size: 22
    stride: 8
  }
}
layer {
  name: "score"
  type: "Crop"
  bottom: "upscore8test"
  bottom: "data"
  top: "score"
  crop_param {
    axis: 2
    offset: 31
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
  loss_param {
    ignore_label: 255
    normalize: false
  }
}

REFERENCES:

https://github.com/shelhamer/fcn.berkeleyvision.org/tree/master/pascalcontext-fcn8s

Filip K

unread,

Jan 19, 2017, 10:05:10 AM1/19/17

to Caffe Users

In addition, this is my modifiedPascalContextLayer file, where the only thing I modified is "load_label" function :

import caffe

import numpy as np
from PIL import Image
import scipy.io

import random

class PASCALContextSegDataLayer(caffe.Layer):
    """
    Load (input image, label image) pairs from PASCAL-Context
    one-at-a-time while reshaping the net to preserve dimensions.

    The labels follow the 59 class task defined by

        R. Mottaghi, X. Chen, X. Liu, N.-G. Cho, S.-W. Lee, S. Fidler, R.
        Urtasun, and A. Yuille.  The Role of Context for Object Detection and
        Semantic Segmentation in the Wild.  CVPR 2014.

    Use this to feed data to a fully convolutional network.
    """

    def setup(self, bottom, top):
        """
        Setup data layer according to parameters:

        - voc_dir: path to PASCAL VOC dir (must contain 2010)
        - context_dir: path to PASCAL-Context annotations
        - split: train / val / test
        - randomize: load in random order (default: True)
        - seed: seed for randomization (default: None / current time)

        for PASCAL-Context semantic segmentation.

        example: params = dict(voc_dir="/path/to/PASCAL", split="val")
        """
        # config
        params = eval(self.param_str)
        self.voc_dir = params['voc_dir'] + '/VOC2010'
        self.context_dir = params['context_dir']
        self.split = params['split']
        self.mean = np.array((104.007, 116.669, 122.679), dtype=np.float32)
        self.random = params.get('randomize', True)
        self.seed = params.get('seed', None)

        # load labels and resolve inconsistencies by mapping to full 400 labels
        self.labels_400 = [label.replace(' ','') for idx, label in np.genfromtxt(self.context_dir + '/labels.txt', delimiter=':', dtype=None)]
        self.labels_21 = [label.replace(' ','') for idx, label in np.genfromtxt(self.context_dir + '/21_labels.txt', delimiter=':', dtype=None)]

        # two tops: data and label
        if len(top) != 2:
            raise Exception("Need to define two tops: data and label.")
        # data layers have no bottoms
        if len(bottom) != 0:
            raise Exception("Do not define a bottom.")

        # load indices for images and labels
        split_f  = '{}/ImageSets/Main/{}.txt'.format(self.voc_dir,
                self.split)
        self.indices = open(split_f, 'r').read().splitlines()
        self.idx = 0

        # make eval deterministic
        if 'train' not in self.split:
            self.random = False

        # randomization: seed and pick
        if self.random:
            random.seed(self.seed)
            self.idx = random.randint(0, len(self.indices)-1)

    def reshape(self, bottom, top):
        # load image + label image pair
        self.data = self.load_image(self.indices[self.idx])
        self.label = self.load_label(self.indices[self.idx])
        # reshape tops to fit (leading 1 is for batch dimension)
        top[0].reshape(1, *self.data.shape)
        top[1].reshape(1, *self.label.shape)

    def forward(self, bottom, top):
        # assign output
        top[0].data[...] = self.data
        top[1].data[...] = self.label

        # pick next input
        if self.random:
            self.idx = random.randint(0, len(self.indices)-1)
        else:
            self.idx += 1
            if self.idx == len(self.indices):
                self.idx = 0

    def backward(self, top, propagate_down, bottom):
        pass

    def load_image(self, idx):
        """
        Load input image and preprocess for Caffe:
        - cast to float
        - switch channels RGB -> BGR
        - subtract mean
        - transpose to channel x height x width order
        """
        im = Image.open('{}/JPEGImages/{}.jpg'.format(self.voc_dir, idx))
        in_ = np.array(im, dtype=np.float32)
        in_ = in_[:,:,::-1]
        in_ -= self.mean
        in_ = in_.transpose((2,0,1))
        return in_

    def load_label(self, idx):
        """
        Load label image as 1 x height x width integer array of label indices.
        The leading singleton dimension is required by the loss.
        The full 400 labels are translated to the 21 class task labels.
        
        Example: Ground is first in labels21, so enumerate is "0 ground". 
        1)Then in those 400+ labels, I find position of "ground", add 1 (as the list starts from 1,so ground is 189)
        2) everywhere, where label on the image is 189, I set it to be 0.
        """
        label_400 = scipy.io.loadmat('{}/trainval/{}.mat'.format(self.context_dir, idx))['LabelMap']
        label = np.zeros_like(label_400, dtype=np.uint8)
        for idx, l in enumerate(self.labels_21):
            idx_400 = self.labels_400.index(l) + 1
            label[label_400 == idx_400] = idx
        label = label[np.newaxis, ...]
        return label

Moreover, when I am executing the training, I am adding parameter weigths with value of pascalcontext- fcn16 model.

Any help would be appreciated?

Moreover, should I add the code for the backward pass?

Ilya Zhenin

unread,

Jan 19, 2017, 10:56:00 AM1/19/17

to Caffe Users

I'll make a guess, since encountered this problem with Semantic Segmentation a lot. So with high probability it's may be it.

Check this:

1. Are you copying weigths of pretrained network? If not it is close to impossible to train big segmentation network from random initialization point.

2. Do not forget to initialize weights of layers that you have changed

3. Network may not learn, giving all output zeros or ones due to vanishing / exploding gradients. Often is enough just to decrase learning rate(like e-08)

Filip K

unread,

Jan 19, 2017, 12:30:05 PM1/19/17

to Caffe Users

1) Yes, I am copying weights of a pretrained network, namely pascalcontext- fcn16 model
2) Could you possibly elaborate on how to do that?

3) Right, I will give it a try, but I am actually curious about part 2 of your answer

Filip K

unread,

Jan 20, 2017, 8:04:57 AM1/20/17

to Caffe Users

Ok, so I have tested modifying the learning rate, but it seemed to result in same behavior. Any more ideas possibly?

Filip K

unread,

Jan 20, 2017, 10:59:49 AM1/20/17

to Caffe Users

In addition, where I try to test it after 4000 iterations ( I assume that because batch size is 1, then 4000 iterations means 4000 images were processed), I am only getting one class all the time. Please find my deploy.prototxt attached

deploy.prototxt

Ilya Zhenin

unread,

Jan 23, 2017, 7:03:04 AM1/23/17

to Caffe Users

Hello Filip.

Actually I'm having similar issue now with Softmax loss. Suprisingly Sigmoid loss always works better for me.. But I have completely other net architecture and other initialization, may problems are not connected.

I think it is weird that you have 22 output layers for 21 classes. As long as I can't answer why it is so, it is potential root problem for test also.

In my problem branch you wrote that you had similar issue as I did, errors trying to train net. But now it runs for me without errors - 2 output from last layer, (1,2,R,C) for two classes, 0 and 1 labels in masks only. Could you try train network with such settings, 21 output layers and 21 classes(max number 20 in labels mask, 255 as long as it is ignored I think fine too)

Filip K

unread,

Jan 23, 2017, 8:24:06 AM1/23/17

to Caffe Users

Hi Illya!

So could you possibly elaborate on how your architecture is different from mine? Are you also trying to finetune the network?

Do you mean that I should change all my "22" in my layers to "21"?

Ilya Zhenin

unread,

Jan 23, 2017, 9:55:13 AM1/23/17

to Caffe Users

My architecture isn't FCN and I'm not using pretrained weights from FCN, but I have worked with it earlier.

So i solved my problems, I actually have wrote a long letter to you but google ate it all :(

My current working set up: NUM of outputs that feeded into Softmax Loss layer should be equal NUM of your classes. Max value of label in data NUM - 1, so starting from zero.

Experimenting with my current architecture I found out that big learning step works good(0.01-0.001) but for FCN and Residual segmentation worked only very small step (e-8). So you should experiment with that.

Do not forget to init layers that you are changing. You have copied pretrained weights to your net, but if you changed some layers(like output NUM), I believe you also should have chnaged name of that layer, so you wont get the error and it wont be initialized. Checl prototxt: is there initialization for such a layer? If it is Deconvolution and it is FCN, I believe, there is not; FCN Deconv was initialized though Python surgery script(all layers contoining "upscale" or smth like that in their name), so check that all layers are initialized no non-zero.

If it is non solving your problem, it is worth to check out net output layer by layer, starting from input image and first conv. To see where the signal propageted from input image starts dissapearing. Also worth to look at weights values, sometimes found it useful.

понедельник, 23 января 2017 г., 16:24:06 UTC+3 пользователь Filip K написал:

Filip K

unread,

Jan 23, 2017, 9:04:17 PM1/23/17

to Caffe Users

Right, so I have couple issues and questions actually:

I am not really executing my code from Python( as I am getting PyObject Null or something similar to that when I run using python script,and it crashes), so I am running everything from command line.

1) Ok,so given that I have 21 labels, my SoftMax should get 21, and my max label should be 20. So are you setting all of your num_outputs to 21 then, or are you modifying just the score layer?

2) Initializing layers - so I have modified the names of the layers that had num_outputs different to the one that I was using. Howeve,r if I wasn't sure whether I should rename Crop layers as well and was not able to find an example doing it. Would you rename it ( although I have tested it, and it seemed to have no impact)

3) Moreover, would you initialize the layers with certain weight_filler? Coz I have tried xavier and loss increased insanely, but when I used weight_filler constant, it remained the same ( I believe it doesn't make a difference to the original)

Thanks!

Ilya Zhenin

unread,

Jan 24, 2017, 5:10:34 AM1/24/17

to Caffe Users

1. First part is correct. Didn't really understand the second. You can follow this idea: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn8s/train.prototxt

all deconv layers (decoder) have num_output equal to the amount of classes.

2-3. I believe you can leave Crop layers alone - they do not have learnable parameters, so nothing gets copied.

In example above last Deconv layer has no initialization, so if you changed its name to something else and didn't do manual initialization, its weigths should be zeros.


layer {
 name: "upscore8"


 type: "Deconvolution"
 bottom: "fuse_pool3"
 top: "upscore8"


 param {
 lr_mult: 0
 }
 convolution_param {
 num_output: 21
 bias_term: false
 kernel_size: 16
 stride: 8
 }
 }

In Shelhamer FCN Github they do initialization od Deconv layers in following manner: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc-fcn8s/solve.py


# surgeries
 interp_layers = [k for k in solver.net.params.keys() if 'up' in k]
 surgery.interp(solver.net, interp_layers)

You can use those modules.

I think Caffe has its own filler initializaion for Deconvolutional layer, you can try that as you have problems with python

вторник, 24 января 2017 г., 5:04:17 UTC+3 пользователь Filip K написал:

Filip K

unread,

Jan 25, 2017, 8:32:19 AM1/25/17

to Caffe Users

So I have just tested your solution, but the same thing seemsto be happening. I am still outputting just one class ( which is 0)

Things that I have changed:

1) So I did what you suggested regarding the classes - something that I have actually done, before, but now all the layers that I want to fine tune have num_output: 21 ( as I have labels from 0-20)

2) Renamed the layers that I want to finetune - this was also done before

3) I initialized the weight of every layer that I want to finetune - NEW

Things that I have tried:

1) I played with various Learning rates :

base_lr: 1e-10

base_lr: 1e-12

base_lr: 1e-8 ( this one gave a huge loss after 20 iterations)

2) I tried to modify the weights to see if that has any impact.Weights tested:

constant 0.2

Questions:

1) Could you possibly check whether I have actually done what you were mentioning in your comments? As I am not 100% sure. The only thing that I have modified as opposed to original train.prototxt of pascalcontext-fcn8s was just renaming the layers ( where num_output was equal to 60), replaced num_output: 60 with num_output: 21 and in each of the layer where this modification was done added weight initialization.

2) Would you also initialize bias_filler to some value?

3) Shouldn't Train net output #0:
should be decreasing?

LOG:

I0125 13:23:01.604179 5804 solver.cpp:228] Iteration 3220, loss = 547253 I0125 13:23:01.604179 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:23:01.604647 5804 sgd_solver.cpp:106] Iteration 3220, lr = 1e-10 I0125 13:23:09.005874 5804 solver.cpp:228] Iteration 3240, loss = 533239 I0125 13:23:09.005874 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:23:09.006876 5804 sgd_solver.cpp:106] Iteration 3240, lr = 1e-10 I0125 13:23:16.405020 5804 solver.cpp:228] Iteration 3260, loss = 531321 I0125 13:23:16.405495 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:23:16.405998 5804 sgd_solver.cpp:106] Iteration 3260, lr = 1e-10 I0125 13:23:23.888844 5804 solver.cpp:228] Iteration 3280, loss = 538594 I0125 13:23:23.888844 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:23:23.889313 5804 sgd_solver.cpp:106] Iteration 3280, lr = 1e-10 I0125 13:23:31.346828 5804 solver.cpp:228] Iteration 3300, loss = 530812 I0125 13:23:31.346828 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:23:31.347295 5804 sgd_solver.cpp:106] Iteration 3300, lr = 1e-10 I0125 13:23:38.890431 5804 solver.cpp:228] Iteration 3320, loss = 541240 I0125 13:23:38.890431 5804 solver.cpp:244] Train net output #0: loss = 509958 (* 1 = 509958 loss) I0125 13:23:38.890897 5804 sgd_solver.cpp:106] Iteration 3320, lr = 1e-10 I0125 13:23:46.418092 5804 solver.cpp:228] Iteration 3340, loss = 542990 I0125 13:23:46.418092 5804 solver.cpp:244] Train net output #0: loss = 496257 (* 1 = 496257 loss) I0125 13:23:46.418092 5804 sgd_solver.cpp:106] Iteration 3340, lr = 1e-10 I0125 13:23:54.032097 5804 solver.cpp:228] Iteration 3360, loss = 549080 I0125 13:23:54.032585 5804 solver.cpp:244] Train net output #0: loss = 506913 (* 1 = 506913 loss) I0125 13:23:54.032585 5804 sgd_solver.cpp:106] Iteration 3360, lr = 1e-10 I0125 13:24:01.422437 5804 solver.cpp:228] Iteration 3380, loss = 524123 I0125 13:24:01.422437 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:24:01.422912 5804 sgd_solver.cpp:106] Iteration 3380, lr = 1e-10 I0125 13:24:08.871800 5804 solver.cpp:228] Iteration 3400, loss = 530127 I0125 13:24:08.872305 5804 solver.cpp:244] Train net output #0: loss = 514524 (* 1 = 514524 loss) I0125 13:24:08.872305 5804 sgd_solver.cpp:106] Iteration 3400, lr = 1e-10 I0125 13:24:16.418731 5804 solver.cpp:228] Iteration 3420, loss = 548242 I0125 13:24:16.419366 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:24:16.419939 5804 sgd_solver.cpp:106] Iteration 3420, lr = 1e-10 I0125 13:24:24.066160 5804 solver.cpp:228] Iteration 3440, loss = 549917 I0125 13:24:24.066160 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:24:24.066649 5804 sgd_solver.cpp:106] Iteration 3440, lr = 1e-10 I0125 13:24:31.869274 5804 solver.cpp:228] Iteration 3460, loss = 570848 I0125 13:24:31.869786 5804 solver.cpp:244] Train net output #0: loss = 506913 (* 1 = 506913 loss) I0125 13:24:31.870834 5804 sgd_solver.cpp:106] Iteration 3460, lr = 1e-10 I0125 13:24:39.475894 5804 solver.cpp:228] Iteration 3480, loss = 555473 I0125 13:24:39.476419 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:24:39.476915 5804 sgd_solver.cpp:106] Iteration 3480, lr = 1e-10 I0125 13:24:47.141834 5804 solver.cpp:228] Iteration 3500, loss = 566205 I0125 13:24:47.142302 5804 solver.cpp:244] Train net output #0: loss = 758086 (* 1 = 758086 loss) I0125 13:24:47.142822 5804 sgd_solver.cpp:106] Iteration 3500, lr = 1e-10 I0125 13:24:54.680466 5804 solver.cpp:228] Iteration 3520, loss = 534777 I0125 13:24:54.680938 5804 solver.cpp:244] Train net output #0: loss = 508435 (* 1 = 508435 loss) I0125 13:24:54.680938 5804 sgd_solver.cpp:106] Iteration 3520, lr = 1e-10 I0125 13:25:02.130458 5804 solver.cpp:228] Iteration 3540, loss = 521925 I0125 13:25:02.130458 5804 solver.cpp:244] Train net output #0: loss = 323937 (* 1 = 323937 loss) I0125 13:25:02.131629 5804 sgd_solver.cpp:106] Iteration 3540, lr = 1e-10 I0125 13:25:09.450913 5804 solver.cpp:228] Iteration 3560, loss = 522927 I0125 13:25:09.451383 5804 solver.cpp:244] Train net output #0: loss = 505391 (* 1 = 505391 loss) I0125 13:25:09.451383 5804 sgd_solver.cpp:106] Iteration 3560, lr = 1e-10 I0125 13:25:17.063666 5804 solver.cpp:228] Iteration 3580, loss = 552352 I0125 13:25:17.063666 5804 solver.cpp:244] Train net output #0: loss = 496257 (* 1 = 496257 loss) I0125 13:25:17.064651 5804 sgd_solver.cpp:106] Iteration 3580, lr = 1e-10 I0125 13:25:24.667356 5804 solver.cpp:228] Iteration 3600, loss = 557300 I0125 13:25:24.667829 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:25:24.669843 5804 sgd_solver.cpp:106] Iteration 3600, lr = 1e-10 I0125 13:25:32.149098 5804 solver.cpp:228] Iteration 3620, loss = 537638 I0125 13:25:32.149098 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:25:32.151604 5804 sgd_solver.cpp:106] Iteration 3620, lr = 1e-10 I0125 13:25:39.806699 5804 solver.cpp:228] Iteration 3640, loss = 556006 I0125 13:25:39.806699 5804 solver.cpp:244] Train net output #0: loss = 505391 (* 1 = 505391 loss) I0125 13:25:39.808217 5804 sgd_solver.cpp:106] Iteration 3640, lr = 1e-10 I0125 13:25:47.254304 5804 solver.cpp:228] Iteration 3660, loss = 532791 I0125 13:25:47.254304 5804 solver.cpp:244] Train net output #0: loss = 503868 (* 1 = 503868 loss) I0125 13:25:47.254304 5804 sgd_solver.cpp:106] Iteration 3660, lr = 1e-10 I0125 13:25:54.905524 5804 solver.cpp:228] Iteration 3680, loss = 557985 I0125 13:25:54.906025 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:25:54.906497 5804 sgd_solver.cpp:106] Iteration 3680, lr = 1e-10 I0125 13:26:02.415186 5804 solver.cpp:228] Iteration 3700, loss = 536110 I0125 13:26:02.415186 5804 solver.cpp:244] Train net output #0: loss = 506913 (* 1 = 506913 loss) I0125 13:26:02.415186 5804 sgd_solver.cpp:106] Iteration 3700, lr = 1e-10 I0125 13:26:09.934406 5804 solver.cpp:228] Iteration 3720, loss = 536225 I0125 13:26:09.934880 5804 solver.cpp:244] Train net output #0: loss = 506913 (* 1 = 506913 loss) I0125 13:26:09.935382 5804 sgd_solver.cpp:106] Iteration 3720, lr = 1e-10 I0125 13:26:17.426775 5804 solver.cpp:228] Iteration 3740, loss = 541795 I0125 13:26:17.426775 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:17.429255 5804 sgd_solver.cpp:106] Iteration 3740, lr = 1e-10 I0125 13:26:25.047107 5804 solver.cpp:228] Iteration 3760, loss = 555777 I0125 13:26:25.047585 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:25.048610 5804 sgd_solver.cpp:106] Iteration 3760, lr = 1e-10 I0125 13:26:32.719717 5804 solver.cpp:228] Iteration 3780, loss = 559659 I0125 13:26:32.719717 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:32.721223 5804 sgd_solver.cpp:106] Iteration 3780, lr = 1e-10 I0125 13:26:40.339875 5804 solver.cpp:228] Iteration 3800, loss = 554483 I0125 13:26:40.340354 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:40.340849 5804 sgd_solver.cpp:106] Iteration 3800, lr = 1e-10 I0125 13:26:47.934401 5804 solver.cpp:228] Iteration 3820, loss = 551895 I0125 13:26:47.934904 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:47.935920 5804 sgd_solver.cpp:106] Iteration 3820, lr = 1e-10 I0125 13:26:55.456133 5804 solver.cpp:228] Iteration 3840, loss = 539870 I0125 13:26:55.456634 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:26:55.457135 5804 sgd_solver.cpp:106] Iteration 3840, lr = 1e-10 I0125 13:27:03.049305 5804 solver.cpp:228] Iteration 3860, loss = 558669 I0125 13:27:03.049305 5804 solver.cpp:244] Train net output #0: loss = 761131 (* 1 = 761131 loss) I0125 13:27:03.052311 5804 sgd_solver.cpp:106] Iteration 3860, lr = 1e-10 I0125 13:27:10.682392 5804 solver.cpp:228] Iteration 3880, loss = 552030 I0125 13:27:17.622148 5804 solver.cpp:244] Train net output #0: loss = 570848 (* 1 = 570848 loss) I0125 13:27:17.631783 5804 sgd_solver.cpp:106] Iteration 3880, lr = 1e-10

solver.prototxt

train.prototxt

val.prototxt

deploy.prototxt

Filip K

unread,

Jan 25, 2017, 8:34:46 AM1/25/17

to Caffe Users

One more question:

Woul you also use weight_Filler in your val.prototxt?

Ilya Zhenin

unread,

Jan 25, 2017, 9:18:19 AM1/25/17

to Caffe Users

Can you send me this python module with layer, and description of how you start training process(if it is script send it too), I'll run it on my data, shouldn't take much time as I'm doing basically same thing now

среда, 25 января 2017 г., 16:34:46 UTC+3 пользователь Filip K написал:

Filip K

unread,

Jan 25, 2017, 9:35:46 AM1/25/17

to Caffe Users

I just posted a new question as I have currently explored some issue. I haven't really modified anything from here: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/pascalcontext-fcn8s/solve.py except adjusting the paths.

My train.prototxt, solver and val are all available in hte post before.

Filip K

unread,

Jan 28, 2017, 11:02:37 AM1/28/17

to Caffe Users

So I managed to execute training, but now I basically for every picture I am getting every single class.

I have a picture of sofa without any background, and I am still getting all the classes.

Below is my list of classes and the way they are redistributed between 500x500 image:

Here is a list of my classes:

0: background
1: ground
2: floor
3: ceiling
4: wall
5: chair
6: door
7: window
8: person
9: computer
10: guitar
11: laptop
12: mattress
13: mouse
14: screen
15: sofa
16: table
17: tvmonitor
18: wardrobe
19: telephone
20: cell phone
21: book

However, the most concerning this is that the most pixels are assigned to 1,2,4,8 and 15 has a really small share.

Daniel Moodie

unread,

Jan 30, 2017, 3:14:55 PM1/30/17

to Caffe Users

It looks like you could benefit from weighted loss.

https://github.com/BVLC/caffe/pull/3855

Reply all

Reply to author

Forward