Error while training a network for de-noising (network output is an image)

80 views
Skip to first unread message

SRP

unread,
Apr 2, 2016, 3:26:08 AM4/2/16
to Caffe Users
Hello everyone,

I am trying to use caffe to build a network for the purpose of de-noising. Unlike all the caffe (classification) examples provided in the Github repository / documentation, given an image as input to my network, it outputs another image (and not a singular, integer label).

After going through code, issues, and the mailing list, I was able to see that this is indeed possible in caffe.

I have prepared my dataset based on the code by @shelhamer mentioned in this issue: https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045 using 50x50, 3 channel (RGB), PNG image files.

Here is a dummy network I am working with (see here for an easy visualization):

name: "TestNet"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  include {
    phase: TRAIN
  }
  data_param {
    source: "./new_50_train"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "res"
  include {
    phase: TRAIN
  }
  data_param {
    source: "./new_42_train"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  include {
    phase: TEST
  }
  data_param {
    source: "./new_50_test"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "mnist"
  type: "Data"
  top: "res"
  include {
    phase: TEST
  }
  data_param {
    source: "./new_42_test"
    batch_size: 1
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 3
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 3
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "conv2"
  bottom: "res"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "conv2"
  bottom: "res"
  top: "loss"
}


When I run the above model, I am getting the following error:

I0402 02:11:52.000156 27783 layer_factory.hpp:77] Creating layer loss
I0402 02:11:52.000206 27783 net.cpp:91] Creating Layer loss
I0402 02:11:52.000231 27783 net.cpp:425] loss <- conv2
I0402 02:11:52.000264 27783 net.cpp:425] loss <- res
I0402 02:11:52.000301 27783 net.cpp:399] loss -> loss
I0402 02:11:52.000358 27783 layer_factory.hpp:77] Creating layer loss
F0402 02:11:52.000705 27783 softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (1764 vs. 5292) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
*** Check failure stack trace: ***
    @     0x2b0c90171daa  (unknown)
    @     0x2b0c90171ce4  (unknown)
    @     0x2b0c901716e6  (unknown)
    @     0x2b0c90174687  (unknown)
    @     0x2b0c8f53c7a7  caffe::SoftmaxWithLossLayer<>::Reshape()
    @     0x2b0c8f4b0507  caffe::Layer<>::SetUp()
    @     0x2b0c8f49c581  caffe::Net<>::Init()
    @     0x2b0c8f49a917  caffe::Net<>::Net()
    @     0x2b0c8f4c5b9b  caffe::Solver<>::InitTrainNet()
    @     0x2b0c8f4c53be  caffe::Solver<>::Init()
    @     0x2b0c8f4c4e5a  caffe::Solver<>::Solver()
    @     0x2b0c8f472bab  caffe::SGDSolver<>::SGDSolver()
    @     0x2b0c8f4831fb  caffe::Creator_SGDSolver<>()
    @           0x41b0cf  caffe::SolverRegistry<>::CreateSolver()
    @           0x41676c  train()
    @           0x418c01  main
    @     0x2b0c914a9ec5  (unknown)
    @           0x4155b9  (unknown)
    @              (nil)  (unknown)
make: *** [new] Aborted (core dumped)


Currently, I have hit a dead-end unable to understand the error and fix it. Any pointers or suggestions in helping me resolve this would be highly appreciated.

Since the documentation for this is very sparse, if I get this working, I would love to spend some time to contribute back by writing a tutorial or updating the docs so that others working on a similar problem can get started easily.


Jan

unread,
Apr 15, 2016, 7:31:12 AM4/15/16
to Caffe Users
What are you trying to achieve? Are you doing classification on the pixel level? In that case you should set the softmax axis. Or predict the image in a different resolution? then this is a regression task and you should use an euclidean loss rather than softmax.

To your mismatch error: Have you noticed that 1764 * 3 == 5292? probably you have 3 channels on the one side and only one on the other side. What does the network scaffolding messages say about the blob sizes of res and conv2?

Jan
Reply all
Reply to author
Forward
0 new messages