Loss is not decreasing for two parallel siamese network in caffe

M

unread,

Feb 12, 2018, 10:55:48 AM2/12/18

to Caffe Users

I am new to caffe.I want to have a network which is two parallel network and each parallel network contains two siamese network which means that every two network (in each of the parallel branches) are siamese (they share weights amongst each other) and all four are parallel with each other. I wrote this prototxt file but the loss is not decreasing. I was wondering where should I change to make the network work.

This is my prototxt file:

    name: "CaffeNet"
    layer {
      name: "data_1"
      type: "ImageData"
      top: "data_1"
      top: "label"
      include {
        phase: TRAIN
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "imageFile1.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "data_2"
      type: "ImageData"
      top: "data_2"
      top: "label_2"
      include {
        phase: TRAIN
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File2.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }

    layer {
      name: "data_3"
      type: "ImageData"
      top: "data_3"
      top: "label_3"
      include {
        phase: TRAIN
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File3.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "data_4"
      type: "ImageData"
      top: "data_4"
      top: "label_4"
      include {
        phase: TRAIN
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File4.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "data_1"
      type: "ImageData"
      top: "data_1"
      top: "label"
      include {
        phase: TEST
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "imageFile1.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "data_2"
      type: "ImageData"
      top: "data_2"
      top: "label_2"
      include {
        phase: TEST
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File2.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }

    layer {
      name: "data_3"
      type: "ImageData"
      top: "data_3"
      top: "label_3"
      include {
        phase: TEST
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File3.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "data_4"
      type: "ImageData"
      top: "data_4"
      top: "label_4"
      include {
        phase: TEST
      }
      transform_param {
        mirror: false
        crop_size: 227
      }
      image_data_param {
        source: "File4.txt"
        batch_size: 8
        new_height: 256
        new_width: 256
      }
    }
    layer {
      name: "silence_layer"
      type: "Silence"
      bottom: "label_2"
      bottom: "label_3"
      bottom: "label_4"
    }
    ####### Column_1 ############################################
    layer {
      name: "conv1_s1"
      type: "Convolution"
      bottom: "data_1"
      top: "conv1_s1"
      param {
        name: "conv1_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv1_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 96
        kernel_size: 11
        stride: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu1_s1"
      type: "ReLU"
      bottom: "conv1_s1"
      top: "conv1_s1"
    }
    layer {
      name: "pool1_s1"
      type: "Pooling"
      bottom: "conv1_s1"
      top: "pool1_s1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm1_s1"
      type: "LRN"
      bottom: "pool1_s1"
      top: "norm1_s1"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv2_s1"
      type: "Convolution"
      bottom: "norm1_s1"
      top: "conv2_s1"
      param {
        name: "conv2_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv2_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 2
        kernel_size: 5
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu2_s1"
      type: "ReLU"
      bottom: "conv2_s1"
      top: "conv2_s1"
    }
    layer {
      name: "pool2_s1"
      type: "Pooling"
      bottom: "conv2_s1"
      top: "pool2_s1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm2_s1"
      type: "LRN"
      bottom: "pool2_s1"
      top: "norm2_s1"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv3_s1"
      type: "Convolution"
      bottom: "norm2_s1"
      top: "conv3_s1"
      param {
        name: "conv3_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv3_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu3_s1"
      type: "ReLU"
      bottom: "conv3_s1"
      top: "conv3_s1"
    }
    layer {
      name: "conv4_s1"
      type: "Convolution"
      bottom: "conv3_s1"
      top: "conv4_s1"
      param {
        name: "conv4_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv4_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu4_s1"
      type: "ReLU"
      bottom: "conv4_s1"
      top: "conv4_s1"
    }
    layer {
      name: "conv5_s1"
      type: "Convolution"
      bottom: "conv4_s1"
      top: "conv5_s1"
      param {
        name: "conv5_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv5_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu5_s1"
      type: "ReLU"
      bottom: "conv5_s1"
      top: "conv5_s1"
    }
    layer {
      name: "pool5_s1"
      type: "Pooling"
      bottom: "conv5_s1"
      top: "pool5_s1"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "fc6_s1"
      type: "InnerProduct"
      bottom: "pool5_s1"
      top: "fc6_s1"
      param {
        name: "fc6s_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "fc6s_b"
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 1024
        weight_filler {
          type: "gaussian"
          std: 0.005
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu6_s1"
      type: "ReLU"
      bottom: "fc6_s1"
      top: "fc6_s1"
    }
    layer {
      name: "drop6_s1"
      type: "Dropout"
      bottom: "fc6_s1"
      top: "fc6_s1"
      dropout_param {
        dropout_ratio: 0.5
      }
    }
    ####### Column_2 ############################################
    layer {
      name: "conv1_s2"
      type: "Convolution"
      bottom: "data_2"
      top: "conv1_s2"
      param {
        name: "conv1_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv1_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 96
        kernel_size: 11
        stride: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu1_s2"
      type: "ReLU"
      bottom: "conv1_s2"
      top: "conv1_s2"
    }
    layer {
      name: "pool1_s2"
      type: "Pooling"
      bottom: "conv1_s2"
      top: "pool1_s2"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm1_s2"
      type: "LRN"
      bottom: "pool1_s2"
      top: "norm1_s2"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv2_s2"
      type: "Convolution"
      bottom: "norm1_s2"
      top: "conv2_s2"
      param {
        name: "conv2_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv2_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 2
        kernel_size: 5
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu2_s2"
      type: "ReLU"
      bottom: "conv2_s2"
      top: "conv2_s2"
    }
    layer {
      name: "pool2_s2"
      type: "Pooling"
      bottom: "conv2_s2"
      top: "pool2_s2"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm2_s2"
      type: "LRN"
      bottom: "pool2_s2"
      top: "norm2_s2"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv3_s2"
      type: "Convolution"
      bottom: "norm2_s2"
      top: "conv3_s2"
      param {
        name: "conv3_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv3_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu3_s2"
      type: "ReLU"
      bottom: "conv3_s2"
      top: "conv3_s2"
    }
    layer {
      name: "conv4_s2"
      type: "Convolution"
      bottom: "conv3_s2"
      top: "conv4_s2"
      param {
        name: "conv4_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv4_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu4_s2"
      type: "ReLU"
      bottom: "conv4_s2"
      top: "conv4_s2"
    }
    layer {
      name: "conv5_s2"
      type: "Convolution"
      bottom: "conv4_s2"
      top: "conv5_s2"
      param {
        name: "conv5_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv5_b"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu5_s2"
      type: "ReLU"
      bottom: "conv5_s2"
      top: "conv5_s2"
    }
    layer {
      name: "pool5_s2"
      type: "Pooling"
      bottom: "conv5_s2"
      top: "pool5_s2"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "fc6_s2"
      type: "InnerProduct"
      bottom: "pool5_s2"
      top: "fc6_s2"
      param {
        name: "fc6s_w"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "fc6s_b"
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 1024
        weight_filler {
          type: "gaussian"
          std: 0.005
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu6_s2"
      type: "ReLU"
      bottom: "fc6_s2"
      top: "fc6_s2"
    }
    layer {
      name: "drop6_s2"
      type: "Dropout"
      bottom: "fc6_s2"
      top: "fc6_s2"
      dropout_param {
        dropout_ratio: 0.5
      }
    }


    ####### Column_3 ############################################
    layer {
      name: "conv1_s6"
      type: "Convolution"
      bottom: "data_3"
      top: "conv1_s6"
      param {
        name: "conv1_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv1_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 96
        kernel_size: 11
        stride: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu1_s6"
      type: "ReLU"
      bottom: "conv1_s6"
      top: "conv1_s6"
    }
    layer {
      name: "pool1_s6"
      type: "Pooling"
      bottom: "conv1_s6"
      top: "pool1_s6"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm1_s6"
      type: "LRN"
      bottom: "pool1_s6"
      top: "norm1_s6"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv2_s6"
      type: "Convolution"
      bottom: "norm1_s6"
      top: "conv2_s6"
      param {
        name: "conv2_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv2_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 2
        kernel_size: 5
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu2_s6"
      type: "ReLU"
      bottom: "conv2_s6"
      top: "conv2_s6"
    }
    layer {
      name: "pool2_s6"
      type: "Pooling"
      bottom: "conv2_s6"
      top: "pool2_s6"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm2_s6"
      type: "LRN"
      bottom: "pool2_s6"
      top: "norm2_s6"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv3_s6"
      type: "Convolution"
      bottom: "norm2_s6"
      top: "conv3_s6"
      param {
        name: "conv3_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv3_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu3_s6"
      type: "ReLU"
      bottom: "conv3_s6"
      top: "conv3_s6"
    }
    layer {
      name: "conv4_s6"
      type: "Convolution"
      bottom: "conv3_s6"
      top: "conv4_s6"
      param {
        name: "conv4_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv4_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu4_s6"
      type: "ReLU"
      bottom: "conv4_s6"
      top: "conv4_s6"
    }
    layer {
      name: "conv5_s6"
      type: "Convolution"
      bottom: "conv4_s6"
      top: "conv5_s6"
      param {
        name: "conv5_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv5_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu5_s6"
      type: "ReLU"
      bottom: "conv5_s6"
      top: "conv5_s6"
    }
    layer {
      name: "pool5_s6"
      type: "Pooling"
      bottom: "conv5_s6"
      top: "pool5_s6"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "fc6_s6"
      type: "InnerProduct"
      bottom: "pool5_s6"
      top: "fc6_s6"
      param {
        name: "fc6s_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "fc6s_c"
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 1024
        weight_filler {
          type: "gaussian"
          std: 0.005
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu6_s6"
      type: "ReLU"
      bottom: "fc6_s6"
      top: "fc6_s6"
    }
    layer {
      name: "drop6_s6"
      type: "Dropout"
      bottom: "fc6_s6"
      top: "fc6_s6"
      dropout_param {
        dropout_ratio: 0.5
      }
    }
    ####### Column_4 ############################################
    layer {
      name: "conv1_s7"
      type: "Convolution"
      bottom: "data_4"
      top: "conv1_s7"
      param {
        name: "conv1_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv1_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 96
        kernel_size: 11
        stride: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu1_s7"
      type: "ReLU"
      bottom: "conv1_s7"
      top: "conv1_s7"
    }
    layer {
      name: "pool1_s7"
      type: "Pooling"
      bottom: "conv1_s7"
      top: "pool1_s7"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm1_s7"
      type: "LRN"
      bottom: "pool1_s7"
      top: "norm1_s7"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv2_s7"
      type: "Convolution"
      bottom: "norm1_s7"
      top: "conv2_s7"
      param {
        name: "conv2_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv2_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 2
        kernel_size: 5
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu2_s7"
      type: "ReLU"
      bottom: "conv2_s7"
      top: "conv2_s7"
    }
    layer {
      name: "pool2_s7"
      type: "Pooling"
      bottom: "conv2_s7"
      top: "pool2_s7"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "norm2_s7"
      type: "LRN"
      bottom: "pool2_s7"
      top: "norm2_s7"
      lrn_param {
        local_size: 5
        alpha: 0.0001
        beta: 0.75
      }
    }
    layer {
      name: "conv3_s7"
      type: "Convolution"
      bottom: "norm2_s7"
      top: "conv3_s7"
      param {
        name: "conv3_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv3_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "relu3_s7"
      type: "ReLU"
      bottom: "conv3_s7"
      top: "conv3_s7"
    }
    layer {
      name: "conv4_s7"
      type: "Convolution"
      bottom: "conv3_s7"
      top: "conv4_s7"
      param {
        name: "conv4_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv4_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 384
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu4_s7"
      type: "ReLU"
      bottom: "conv4_s7"
      top: "conv4_s7"
    }
    layer {
      name: "conv5_s7"
      type: "Convolution"
      bottom: "conv4_s7"
      top: "conv5_s7"
      param {
        name: "conv5_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "conv5_c"
        lr_mult: 2
        decay_mult: 0
      }
      convolution_param {
        num_output: 256
        pad: 1
        kernel_size: 3
        group: 2
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu5_s7"
      type: "ReLU"
      bottom: "conv5_s7"
      top: "conv5_s7"
    }
    layer {
      name: "pool5_s7"
      type: "Pooling"
      bottom: "conv5_s7"
      top: "pool5_s7"
      pooling_param {
        pool: MAX
        kernel_size: 3
        stride: 2
      }
    }
    layer {
      name: "fc6_s7"
      type: "InnerProduct"
      bottom: "pool5_s7"
      top: "fc6_s7"
      param {
        name: "fc6s_w1"
        lr_mult: 1
        decay_mult: 1
      }
      param {
        name: "fc6s_c"
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 1024
        weight_filler {
          type: "gaussian"
          std: 0.005
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu6_s7"
      type: "ReLU"
      bottom: "fc6_s7"
      top: "fc6_s7"
    }
    layer {
      name: "drop6_s7"
      type: "Dropout"
      bottom: "fc6_s7"
      top: "fc6_s7"
      dropout_param {
        dropout_ratio: 0.5
      }
    }

    ###################concatenation#############################################
    layer {
      name: "concat"
      type: "Concat"
      bottom: "fc6_s1"
      bottom: "fc6_s2"
      bottom: "fc6_s6"
      bottom: "fc6_s7"
      top: "concat"
    }

    layer {
      name: "fc7"
      type: "InnerProduct"
      bottom: "concat"
      top: "fc7"
      param {
        lr_mult: 1
        decay_mult: 1
      }
      param {
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 4096
        weight_filler {
          type: "gaussian"
          std: 0.005
        }
        bias_filler {
          type: "constant"
          value: 1
        }
      }
    }
    layer {
      name: "relu7"
      type: "ReLU"
      bottom: "fc7"
      top: "fc7"
    }
    layer {
      name: "drop7"
      type: "Dropout"
      bottom: "fc7"
      top: "fc7"
      dropout_param {
        dropout_ratio: 0.5
      }
    }
    layer {
      name: "fc8"
      type: "InnerProduct"
      bottom: "fc7"
      top: "fc8"
      param {
        lr_mult: 1
        decay_mult: 1
      }
      param {
        lr_mult: 2
        decay_mult: 0
      }
      inner_product_param {
        num_output: 1000
        weight_filler {
          type: "gaussian"
          std: 0.01
        }
        bias_filler {
          type: "constant"
          value: 0
        }
      }
    }
    layer {
      name: "accuracy"
      type: "Accuracy"
      bottom: "fc8"
      bottom: "label"
      top: "accuracy"
      include {
        phase: TEST
      }
    }
    layer {
      name: "loss"
      type: "SoftmaxWithLoss"
      bottom: "fc8"
      bottom: "label"
      top: "loss"
    }

and the solver is:

    net: "net.prototxt"
    test_iter: 1000
    test_interval: 1000
    base_lr: 0.0001
    lr_policy: "step"
    gamma: 0.1
    stepsize: 50000
    display: 20
    max_iter: 1000000
    momentum: 0.9
    weight_decay: 0.0005
    snapshot: 50000
    snapshot_prefix: "snapshots"
    solver_mode: GPU

Thank you so much for your help in advance,

akashgu...@gmail.com

unread,

Feb 12, 2018, 2:16:32 PM2/12/18

to Caffe Users

Hi,

Even I'm new to Caffe. Unlike traditional Siamese, you are combining the features from 4 networks and training it with softmax loss. Usually, the Contrastive loss is used for Siamese networks. If you still want to try this idea, which is interesting, please find some suggestions below and let me know if it works:

Assuming your data is correctly labeled.

1. Looks like you are training the network from scratch so start with a little higher learning rate. maybe base_lr = 0.001 or maybe base_lr = 0.005

2. Try changing all bias values to 0 and xavier would be a good initialization for parameters.

Best,

Akash

Przemek D

unread,

Feb 13, 2018, 2:20:42 AM2/13/18

to Caffe Users

Looking over your network is difficult as it is pasted right into the post - next time please attach the prototxt as a file, it's easier to navigate over that way.

Now for some hints: I would be careful about the loss function advice. The choice of loss function is dictated by the task that you're trying to learn, not architecture of the network you're training. Contrastive loss is used to train Siamese nets, yes, but this is for the task of identification - network trained this way learns how similar two images are (e.g. "do those two pictures represent the same person?"). I successfully trained a similar architecture using SoftmaxWithLoss for classification of several images at once.

I agree with Akash's advice to up the learning rate though. Usually it's a good idea to increase it a lot and try the highest setting at which the network doesn't diverge. Looks like you're using AlexNet, so lr=0.01 would be a good starting point.

In general, there can be dozens of reasons why your network fails to learn anything. Maybe your data is bad, or maybe you're loading it wrong? If this is a classification task and you're using 4 views of a single object, it is crucial that each 4 columns received images of the same object - otherwise the task doesn't make sense, so do check that. An interesting check to make would be to take just one column and see if it can learn anything. Maybe the task is just too hard or not learnable at all? Or maybe you don't have enough examples?

It might also be helpful if you posted your output log (as above, please don't paste it into the post but attach), maybe there's something obvious in it.

Reply all

Reply to author

Forward