Output of a forward pass has 1X1X1 dimensionality for pixel-wise predictions, rather than 1XHXW

17 views
Skip to first unread message

Alycia Gailey

unread,
Jan 4, 2018, 12:38:11 PM1/4/18
to Caffe Users
Hello,
  I am trying to obtain pixel-wise predictions for a single image using an already-trained caffe network.

  I got this to work in python, but not in C++.  The reason is that the add_input_arrays function in python does not seem to be available in C++. So I changed the input layer into a memory data layer, and reset the layer using the following code, but a forward pass only generates a single integer value rather than a matrix of integer values.

  Caffe::set_mode(Caffe::CPU);

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  // blob dimensions are in this order: num, channels, height, width
  shared_ptr<MemoryDataLayer<float> > dataLayer = boost::dynamic_pointer_cast<MemoryDataLayer<float> >(net_->layers()[0]);
  int num_channels_ = net_->blobs()[0]->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(net_->blobs()[0]->width(), net_->blobs()[0]->height());
  //float loss;
  Blob<float>* blob = new Blob<float>(1, img.channels(), img.rows, img.cols);
  const float img_to_net_scale = 0.0039215684;  // assuming 0 to 255
  TransformationParameter input_xform_param;
  input_xform_param.set_scale( img_to_net_scale );
  DataTransformer<float> input_xformer( input_xform_param, TEST );
  input_xformer.Transform( img, blob );
  //net_->add_input_blob(blob);
  //net_->Update();
  float *data = new float[1 * 3 * input_geometry_.height * input_geometry_.width];
  float *labels = new float[1]; //new float[1 * 1 * input_geometry_.height * input_geometry_.width];
  labels[0] = 0.0;
  for (unsigned int c = 0; c < num_channels_; ++c)
  {
      for (unsigned int h = 0; h < input_geometry_.height; ++h)
      {
      for (unsigned int w = 0; w < input_geometry_.width; ++w)
      {
        // index (n, k, h, w) is physically located at index ((n * C + c) * H + h) * W + w
        data[((0 * num_channels_ + c) * input_geometry_.height + h) * input_geometry_.width + w] = blob->data_at(0, c, h, w);
      }
      }
  } 
  dataLayer->Reset(data, labels, input_geometry_.height * input_geometry_.width); 

  net_->ForwardFrom(0);

   Blob<float>* output_layer = net_->output_blobs()[0];
   std::vector<std::vector<int> > predictions;
   unsigned int W = output_layer->width(), H = output_layer->height(), C = output_layer->channels();
   std::cout<<"***********"<<net_->output_blobs().size()<<" , "<<C<<" , "<<W<<" , "<<H<<std::endl;

  Any suggestions are appreciated.  Thanks
  ~Alycia

Thomio Watanabe

unread,
Jan 5, 2018, 10:33:32 AM1/5/18
to Caffe Users
Hi Alycia,

I think you don't need to use the MemoryDataLayer.
A pointer to your blob may solve the problem.
if your problem is feeding data into your input blob you first must get a pointer to the mutable data and then you must use it to copy your data.
Here is an example, were input_state is an array of floats:

shared_ptr<Blob<float> > blob_ptr = testCaffeNet->blob_by_name( blob_name ); float *blob_content = blob_ptr->mutable_cpu_data(); const unsigned input_size = blob_ptr->count(); std::copy( input_state, input_state + input_size, blob_content )
Are you sure output_blobs()[0] is your output layer not a 1x1 convolution ?
Maybe you should try to get your blobs by name.

You are also confusing blobs and layers I suggest you read this:

Also mind that the network output is likely to be a float type.
The most likely class is the argmax from the discrete probability density function.

Hope this help.

Alycia Gailey

unread,
Jan 18, 2018, 10:56:07 AM1/18/18
to Caffe Users
Hello Thomio,
  Thank you so much for your advice.  It was helpful.  I got it to work for my already-trained neural network.  I am still using a Memory Data Layer, but am using your advice for making a pointer to the input blob's mutable_cpu_data.  Then I convert the cv::Mat image to a blob, then I insert that data of this blob to the input blob's mutable_cpu_data, shown below:


/ blob dimensions are in this order: num, channels, height, width
  shared_ptr<MemoryDataLayer<float> > dataLayer = boost::dynamic_pointer_cast<MemoryDataLayer<float> >(net_->layers()[0]);
  int num_channels_ = net_->blobs()[0]->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(net_->blobs()[0]->width(), net_->blobs()[0]->height());
  Blob<float>* blob = new Blob<float>(1, img.channels(), img.rows, img.cols);
  const float img_to_net_scale = 0.0039215684;  // assuming 0 to 255
  TransformationParameter input_xform_param;
  input_xform_param.set_scale(img_to_net_scale);
  DataTransformer<float> input_xformer(input_xform_param, TEST);
  input_xformer.Transform( img, blob );
  float *data = blob->mutable_cpu_data();

  float *labels = new float[1];
  labels[0] = 0.0;
  for (unsigned int c = 0; c < num_channels_; ++c)
  {
      for (unsigned int h = 0; h < input_geometry_.height; ++h)
      {
      for (unsigned int w = 0; w < input_geometry_.width; ++w)
      {
        // index (n, k, h, w) is physically located at index ((n * C + c) * H + h) * W + w
        data[((0 * num_channels_ + c) * input_geometry_.height + h) * input_geometry_.width + w] *= 255.0;

      }
      }
  }
  dataLayer->Reset(data, labels, input_geometry_.height * input_geometry_.width);
 

  ~Alycia
Reply all
Reply to author
Forward
0 new messages