Output of a forward pass has 1X1X1 dimensionality for pixel-wise predictions, rather than 1XHXW

Alycia Gailey

unread,

Jan 4, 2018, 12:38:11 PM1/4/18

to Caffe Users

Hello,
I am trying to obtain pixel-wise predictions for a single image using an already-trained caffe network.

I got this to work in python, but not in C++. The reason is that the add_input_arrays function in python does not seem to be available in C++. So I changed the input layer into a memory data layer, and reset the layer using the following code, but a forward pass only generates a single integer value rather than a matrix of integer values.

Caffe::set_mode(Caffe::CPU);

/* Load the network. */
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);

// blob dimensions are in this order: num, channels, height, width
shared_ptr<MemoryDataLayer<float> > dataLayer = boost::dynamic_pointer_cast<MemoryDataLayer<float> >(net_->layers()[0]);
int num_channels_ = net_->blobs()[0]->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
input_geometry_ = cv::Size(net_->blobs()[0]->width(), net_->blobs()[0]->height());
//float loss;
Blob<float>* blob = new Blob<float>(1, img.channels(), img.rows, img.cols);
const float img_to_net_scale = 0.0039215684; // assuming 0 to 255
TransformationParameter input_xform_param;
input_xform_param.set_scale( img_to_net_scale );
DataTransformer<float> input_xformer( input_xform_param, TEST );
input_xformer.Transform( img, blob );
//net_->add_input_blob(blob);
//net_->Update();
float *data = new float[1 * 3 * input_geometry_.height * input_geometry_.width];
float *labels = new float[1]; //new float[1 * 1 * input_geometry_.height * input_geometry_.width];
labels[0] = 0.0;
for (unsigned int c = 0; c < num_channels_; ++c)
{
      for (unsigned int h = 0; h < input_geometry_.height; ++h)
      {
    for (unsigned int w = 0; w < input_geometry_.width; ++w)
    {
        // index (n, k, h, w) is physically located at index ((n * C + c) * H + h) * W + w
        data[((0 * num_channels_ + c) * input_geometry_.height + h) * input_geometry_.width + w] = blob->data_at(0, c, h, w);
    }
      }
}
dataLayer->Reset(data, labels, input_geometry_.height * input_geometry_.width);

net_->ForwardFrom(0);

   Blob<float>* output_layer = net_->output_blobs()[0];
   std::vector<std::vector<int> > predictions;
   unsigned int W = output_layer->width(), H = output_layer->height(), C = output_layer->channels();
   std::cout<<"***********"<<net_->output_blobs().size()<<" , "<<C<<" , "<<W<<" , "<<H<<std::endl;

Any suggestions are appreciated. Thanks
~Alycia

Thomio Watanabe

unread,

Jan 5, 2018, 10:33:32 AM1/5/18

to Caffe Users

Hi Alycia,

I think you don't need to use the MemoryDataLayer.

A pointer to your blob may solve the problem.

if your problem is feeding data into your input blob you first must get a pointer to the mutable data and then you must use it to copy your data.

Here is an example, were input_state is an array of floats:

shared_ptr<Blob<float> > blob_ptr = testCaffeNet->blob_by_name( blob_name );
float *blob_content = blob_ptr->mutable_cpu_data();
const unsigned input_size = blob_ptr->count();
std::copy( input_state, input_state + input_size, blob_content )

Are you sure output_blobs()[0] is your output layer not a 1x1 convolution ?

Maybe you should try to get your blobs by name.

You are also confusing blobs and layers I suggest you read this:

http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html

Also mind that the network output is likely to be a float type.

The most likely class is the argmax from the discrete probability density function.

Hope this help.

Alycia Gailey

unread,

Jan 18, 2018, 10:56:07 AM1/18/18

to Caffe Users

Hello Thomio,
Thank you so much for your advice. It was helpful. I got it to work for my already-trained neural network. I am still using a Memory Data Layer, but am using your advice for making a pointer to the input blob's mutable_cpu_data. Then I convert the cv::Mat image to a blob, then I insert that data of this blob to the input blob's mutable_cpu_data, shown below:

/ blob dimensions are in this order: num, channels, height, width
shared_ptr<MemoryDataLayer<float> > dataLayer = boost::dynamic_pointer_cast<MemoryDataLayer<float> >(net_->layers()[0]);
int num_channels_ = net_->blobs()[0]->channels();
CHECK(num_channels_ == 3 || num_channels_ == 1)
<< "Input layer should have 1 or 3 channels.";
input_geometry_ = cv::Size(net_->blobs()[0]->width(), net_->blobs()[0]->height());

Blob<float>* blob = new Blob<float>(1, img.channels(), img.rows, img.cols);
const float img_to_net_scale = 0.0039215684; // assuming 0 to 255
TransformationParameter input_xform_param;
input_xform_param.set_scale(img_to_net_scale);
DataTransformer<float> input_xformer(input_xform_param, TEST);
input_xformer.Transform( img, blob );

float *data = blob->mutable_cpu_data();

float *labels = new float[1];

labels[0] = 0.0;
for (unsigned int c = 0; c < num_channels_; ++c)
{
      for (unsigned int h = 0; h < input_geometry_.height; ++h)
      {
    for (unsigned int w = 0; w < input_geometry_.width; ++w)
    {
        // index (n, k, h, w) is physically located at index ((n * C + c) * H + h) * W + w

data[((0 * num_channels_ + c) * input_geometry_.height + h) * input_geometry_.width + w] *= 255.0;

}
}
}
dataLayer->Reset(data, labels, input_geometry_.height * input_geometry_.width);

~Alycia

Reply all

Reply to author

Forward