export the prediction for FCN using C++ API

Jianyu Lin

unread,

Feb 4, 2016, 11:46:13 AM2/4/16

to Caffe Users

Hi all,

I am currently working on semantic segmentation using FCN, and hope to export the output using C++ API.

This is my understanding of the output of FCN, don't know is it correct:

In the deploy.prototxt the last layer is a crop layer which generates a n*h*w blob, different channels stand for probabilities of different labels for certain pixels., it looks like this:

layer { type: 'Crop' name: 'score' top: 'score'

bottom: 'upscore' bottom: 'data' }

My question is how shall I export the output for FCN using C++?

After forward pass the outcome of an FCN is a float array. So when I convert the float array into a Mat, which order shall I use?

n*h*w or w*h*n?

Many thanks.

Jianyu

Jan C Peters

unread,

Feb 5, 2016, 7:43:29 AM2/5/16

to Caffe Users

Hi,

see interleaved comments.

Am Donnerstag, 4. Februar 2016 17:46:13 UTC+1 schrieb Jianyu Lin:

Hi all,

I am currently working on semantic segmentation using FCN, and hope to export the output using C++ API.

This is my understanding of the output of FCN, don't know is it correct:

In the deploy.prototxt the last layer is a crop layer which generates a n*h*w blob, different channels stand for probabilities of different labels for certain pixels., it looks like this:

layer { type: 'Crop' name: 'score' top: 'score'
bottom: 'upscore' bottom: 'data' }

Yes, that is correct.

My question is how shall I export the output for FCN using C++?
After forward pass the outcome of an FCN is a float array. So when I convert the float array into a Mat, which order shall I use?
n*h*w or w*h*n?

That depends on what exactly you want to do with the Mat afterwards. If you just want the class with highest score for each pixel, do argmax over the n-axis and receive an integer (h,w) array.

What do you mean by "export"? Saving it as an image? As a numpy matrix? As CSV? Are you doing the forward pass yourself in your own code or do you want to modify caffe's code?

Look at the scripts in https://gist.github.com/shelhamer/80667189b218ad570e82/, especially the eval.py, which shows how to do that in python. In C++ it is very similar.

Many thanks.

Jianyu

Jan

Jianyu Lin

unread,

Feb 5, 2016, 9:11:52 AM2/5/16

to Caffe Users

Thanks Jan.

I just want to convert the output of forward pass (const float *) into Mat (find the class with highest score at each pixel). In my problem I built my own cnn and there are only two classes to segment (0 and 1)

Here is my code to read the output, find the max and convert to Mat:

////////////////////////////////////////////////////////////////

const vector<Blob<float>*>& result = caffe_net.Forward(bottom_vec, &iter_loss); // forward pass

const float* result_vec = result[0]->cpu_data();

// generate prediction from the output vector and store it in Mat

cv::Mat srcC = cv::Mat::zeros(cv::Size(512,384), CV_32FC1);

int nl= srcC.rows; //row number, height

int nc= srcC.cols; //col number, width

for (int j=0; j<nl; j++) {

float* data= srcC.ptr<float>(j);

for (int i=0; i<nc; i++) {

if (result_vec[i+j*nc+datum.height()*datum.width()] > result_vec[i+j*nc]);

// compare the value from different class and generate the prediction

data[i] = 255;

}

//////////////////////////////////////////////////////////////////

I didn't use argmax and above is my code to find the max. I don't know whether the output data is arranged in c*h*w or w*h*c order, and due to the wired Mat I got, I think maybe I did it in the wrong way. What do you think?

Thank you very much.

Jianyu

Emmanuel Benazera

unread,

Feb 5, 2016, 11:51:16 AM2/5/16

to Caffe Users

Hi Janyu,

this may only be partially relevant to your initial question, but if you look at
https://github.com/beniz/deepdetect/pull/59/files#diff-d5819ab90431aa65c2e58658d1e506b3R1789
you'll see one way to get the output of any inner layer from a Net.
The call to reshape simplifies the iteration of the potentially n-D blob's data. Unless there's a bug, this may make
it easier for you to then turn whatever rectangle into a Mat and then an image.

Em.

Jan C Peters

unread,

Feb 8, 2016, 4:13:57 AM2/8/16

to Caffe Users

Hi again,

usually the blobs contain the data in the shape (N, C, H, W), where N=number of samples per batch, C=number of channels, H=height, W=width. And arrays are stored in row-major order, i.e. in a serialized array you find the data point with indices (n, c, h, w) at

data[((n * C + c) * H + h) * W + w]

Obviously for non-image data the last two dimensions collapse to one.

That should give you enough information to solve your problem. By the way: You can directly map the blob storage to the storage of a cv::Mat, by calling an appropriate constructor: http://docs.opencv.org/3.1.0/d3/d63/classcv_1_1Mat.html#a5fafc033e089143062fd31015b5d0f40&gsc.tab=0. For the void* data you can pass the blob->mutable_cpu_data(). Just take care with multidimensional vs. multichannel matrices: in OpenCV the matrix is stored in (H, W, C) order, in caffe it is (C, H, W). So if you want to use an OpenCV Mat in the way suggested here, use a CV_32FC1 Mat with a 3D size.

Actually I searched OpenCV for an argmax function and it does not seem to provide one, so in your case it may not be very helpful to do the mapping as described above, but it is still nice to know.

Jan

Jianyu Lin

unread,

Feb 8, 2016, 6:37:05 AM2/8/16

to Caffe Users

Thank you Jan! that's very helpful! I will try to implement that again.

Jianyu

Jianyu Lin

unread,

Feb 9, 2016, 11:26:50 AM2/9/16

to Caffe Users

Hi Jan,

Thank you very much for your help. I figure it out!

indeed I used blob->cpu_data().

blob->cpu_data() is in type of const float* , and this array in arranged in WxHxC order. So we cannot directly use Mat (Size size, int type, void *data, size_t step=AUTO_STEP) to read the output. But using for loops to set value to every pixel of Mat is feasible and fast.

Best wishes,

Jianyu

Felix Abecassis

unread,

Feb 10, 2016, 3:43:12 PM2/10/16

to Caffe Users

You can also use one cv::Mat object per channel, like here:
https://github.com/BVLC/caffe/blob/master/examples/cpp_classification/classification.cpp#L171-L187
You will have NxC cv::Mat objects. No need to copy pixels here.

You can use cv::split to transform one cv::Mat object to 3 cv::Mat objects.
You can also use cv::merge for the other way around.

Jianyu Lin

unread,

Feb 11, 2016, 7:02:14 AM2/11/16

to Caffe Users

OK I see! this seems much more convenient! Thank you for your advise!

Jianyu

Ankit Dhall

unread,

Jun 21, 2016, 11:48:15 AM6/21/16

to Caffe Users

Hello,

I have been trying to get the output of the network using C++. It contains a 40 channel probability on a 300x300 image. I want to find the argmax of these per-pixel probabilities and get the result as a 1x300x300 image. I could only find few vague code snippets online and have been unable to get the desired result.

I also tried the WrapInputLayer code which will split the channels but am getting a segmentation fault.

here is my code:

int main(int argc, char** argv) {

  //Setting CPU or GPU

    Caffe::set_mode(Caffe::GPU);
    int device_id = 0;
    Caffe::SetDevice(device_id);

  //get the net
  Net<float> caffe_test_net("/home/ubuntu/caffe_parsenet/models/upnet/rgb/model_definition.prototxt", caffe::TEST);
  //get trained net
  caffe_test_net.CopyTrainedLayersFrom("/home/ubuntu/caffe_parsenet/models/upnet/rgb/pretrained_model.caffemodel");

  //get datum
  Datum datum;
  if (!ReadImageToDatum("./b1-09517_Clipped.jpg", 1, 300, 300, &datum)) {
    LOG(ERROR) << "Error during file reading";
  }

  //get the blob
  Blob<float>* blob = new Blob<float>(1, datum.channels(), datum.height(), datum.width());

  //get the blobproto
  BlobProto blob_proto;
  blob_proto.set_num(1);
  blob_proto.set_channels(datum.channels());
  blob_proto.set_height(datum.height());
  blob_proto.set_width(datum.width());
  const int data_size = datum.channels() * datum.height() * datum.width();

  int size_in_datum = std::max<int>(datum.data().size(),
                                    datum.float_data_size());
  for (int i = 0; i < size_in_datum; ++i) {
    blob_proto.add_data(0.);
  }

 const string& data = datum.data();
  if (data.size() != 0) {
    for (int i = 0; i < size_in_datum; ++i) {
      blob_proto.set_data(i, blob_proto.data(i) + (uint8_t)data[i]);
    }
  }

  //set data into blob
blob->FromProto(blob_proto);

  //fill the vector
  vector<Blob<float>*> bottom;
  bottom.push_back(blob);
  float type = 0.0;

////////////////////////////////////////////////////////////////
const vector<Blob<float>*>& result = caffe_test_net.Forward(bottom, &type);  // forward pass


std::vector<cv::Mat>* input_channels;
Blob<float>* input_layer = caffe_test_net.output_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels(); ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

Could someone suggest a better way to get the argmax 1x300x300 image? Or help fix the segmentation fault?

Regards,

Ankit

Heverton Sarah

unread,

Aug 22, 2016, 11:57:42 PM8/22/16

to Caffe Users

Hi Jianyu,

I'm trying to do the same thing as you. Can you explain to me how did you figure out? Did you use the cv::merge?

Thank you,

Heverton Sarah

Jianyu Lin

unread,

Aug 23, 2016, 9:03:52 AM8/23/16

to Caffe Users

Hi Heverton,

Here is my code to convert the image and apply a forward pass. Hope it helps.

/// fcn segmentation
    /// step 1: prepare data for the neural networks
    //get datum
    caffe::Datum datum;
    CVMatToDatum(src, &datum);

    //get the blob
    Blob<float>* blob = new Blob<float>(1, datum.channels(), datum.height(), datum.width());

    //get the blobproto

    caffe::BlobProto blob_proto;
    blob_proto.set_num(1);
    blob_proto.set_channels(datum.channels());
    blob_proto.set_height(datum.height());
    blob_proto.set_width(datum.width());

    int size_in_datum = std::max<int>(datum.data().size(),
                                      datum.float_data_size());

    for (int i = 0; i < size_in_datum; ++i) {
        blob_proto.add_data(0.);
    }
    const string& data = datum.data();
    if (data.size() != 0) {
        for (int i = 0; i < size_in_datum; ++i) {
            blob_proto.set_data(i, blob_proto.data(i) + (uint8_t)data[i]);
        }
    }
    //set data into blob

    blob->FromProto(blob_proto); // the blob is always in float!!
    //fill the vector
    src.convertTo(src, CV_32FC3, 1/255.0); //for scaling
    float* input_data = blob->mutable_cpu_data();
    for (int i = 0; i<blob->channels()*blob->height()*blob->width();i++){
        input_data[i] = input_data[i]/255;
    }

    vector<Blob<float>*> bottom_vec;
    bottom_vec.push_back(blob);

    float iter_loss;

    /// step 2: forward pass

const vector<Blob<float>*>& result =
caffe_net.Forward(bottom_vec, &iter_loss);

const float* result_vec = result[0]->cpu_data();

-------------------------------------------------------------------------
Then you can read output from result_vec.

I didn't use the original fcn-8s/32s model, but a variant of it (U-net http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/). But it shouldn't affect the C++ part.

Jianyu

Heverton Sarah

unread,

Aug 23, 2016, 8:49:33 PM8/23/16

to Caffe Users

Thank you for helping me, Jianyu.

But my doubt is, after you get the result_vec, how do you generate the image result from this result_vec?

Cheers,

Heverton

Sirim Bak

unread,

Aug 25, 2016, 2:31:38 AM8/25/16

to Caffe Users

Heverton,

Here is what I did.

result_vec is a [row x col x channels] sized vector. I reshaped it to a matrix of [row*col] x channels size so that each row has predictions for all channels (as cols). Then I computed the max of each row and noted its index (Which I suppose is equivalent to ArgMax). Then I resized the index vector to row x col matrix which gave me the prediction labeled image. her is my code, simplified to make it understandable

    for (int i = 0; i<channels; i++){
        for (int j =0; j<width*height; j++){
            class_each_row.at<float> (i,j) = result_vec[index];
            index++;
        }
    }
    class_each_row = class_each_row.t();
    for (int i=0;i<class_each_row.rows;i++){
        cv::minMaxLoc(class_each_row.row(i),0,&maxValue,0,&maxId);
        label.at<uchar>(i) = maxId.x;
    }

Here is my issue. The result is close but not as good as python code (infer.py). I'm suspecting I did some mistake while converting it to BGR and subtracting mean values. Here is how I did. Please suggest if you see any mistake
    float mean[3] = {92.9630, 95.9630, 89.5160}; // mean in [B,G,R] sequence
    vector<Mat> chs(3);
    split(img, chs);// split Color channels in BGR sequence
    chs[0] = chs[0]-mean[0]; //B
    chs[1] = chs[1]-mean[1]; //G
    chs[2] = chs[2]-mean[2]; //R
    merge(chs,BGR);

Thanks,
Sirim

Heverton Sarah

unread,

Aug 29, 2016, 9:07:16 PM8/29/16

to Caffe Users

Hi Sirim,

Thank you to help me!

I've been trying to use your code, to adapt it to use in my code, and I'm struggling with this error:

OpenCV Error: Assertion failed ((cn == 1 && (mask.empty() || mask.type() == CV_8U)) || (cn >= 1 && mask.empty() && !minIdx && !maxIdx)) in minMaxIdx, file /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/stat.cpp, line 1013
terminate called after throwing an instance of 'cv::Exception'
what(): /build/buildd/opencv-2.4.8+dfsg1/modules/core/src/stat.cpp:1013: error: (-215) (cn == 1 && (mask.empty() || mask.type() == CV_8U)) || (cn >= 1 && mask.empty() && !minIdx && !maxIdx) in function minMaxIdx

I think I'm initializing the minMaxLoc function parameters in the wrong way. Here is the part of the code where I use yours implementation:

Blob<float>* output_layer_in = net_->output_blobs()[0];

int width = output_layer_in->width();
int height = output_layer_in->height();

float* output_data = output_layer_in->mutable_cpu_data();
cv::Mat class_each_row(output_layer_in->channels(), width*height, CV_32FC2);
int index = 0;
for (int i = 0; i<output_layer_in->channels(); i++){

for (int j =0; j<width*height; j++){

        class_each_row.at<float> (i,j) = output_data[index];
        index++;
    }
}

class_each_row = class_each_row.t();
double maxValue;
double minValue;
cv::Point maxId;
cv::Point minId;
cv::Mat label(1, class_each_row.rows, CV_8U);

cv::minMaxLoc(class_each_row.row(0), &minValue, &maxValue, &minId, &maxId, cv::Mat());

cv::imwrite("imgResult.png", label); // I know if I let this function here, it will print a one array image. It is here just to tell what I want to do at the end, to generate the image result.

One thing to be noted:
The output result from the segmentation have 2 channels. So, output_layer_in->channels() is equal to 2.

And about your question, I can't tell yet the right way to do it, as I don't have any resut yet. But I know you have to decrement the mean values from every pixel value.

I hope you can help me,

Thanks!

Heverton Sarah

unread,

Aug 29, 2016, 10:58:55 PM8/29/16

to Caffe Users

One more thing,the part

cv::minMaxLoc(class_each_row.row(0), &minValue, &maxValue, &minId, &maxId, cv::Mat());

is actually:

for (int i=0;i<class_each_row.rows;i++){

cv::minMaxLoc(class_each_row.row(i), &minValue, &maxValue, &minId, &maxId, cv::Mat());

label.at<uchar>(i) = maxId.x;
}

I've commented this part to test what was happening with minMaxLoc, but the error remains.

Atena Nguyen

unread,

Jul 25, 2017, 9:13:15 AM7/25/17

to Caffe Users

HI Heverton,

Can you share your solution?

Thank you

Vào 11:58:55 UTC+9 Thứ Ba, ngày 30 tháng 8 năm 2016, Heverton Sarah đã viết:

nguyenthien anh

unread,

Jan 3, 2018, 3:02:30 AM1/3/18

to Caffe Users

HI Heverton

Can you share for me source code predict in c++?

Vào 09:58:55 UTC+7 Thứ Ba, ngày 30 tháng 8 năm 2016, Heverton Sarah đã viết:

Reply all

Reply to author

Forward