Modifying the Caffe C++ prediction code for multiple inputs

3,099 views
Skip to first unread message

Jack Simpson

unread,
Sep 19, 2015, 9:21:20 AM9/19/15
to Caffe Users

I implemented a modified version of the Caffe C++ example and while it works really well, it's incredibly slow because it only accepts images one by one. Ideally I'd like to pass Caffe a vector of 200 images and return all the predictions but I'm having difficulty working out how to modify the example to allow for this. Is the trick to create multiple "input_blobs and then change this kind of code here:


Blob<float>* input_layer = net_->input_blobs()[0];


To a loop that passes multiple input_blobs? Does anyone have any advice or know someone who's already solved the problem on GitHub?

Fanglin Wang

unread,
Sep 20, 2015, 11:21:54 PM9/20/15
to Caffe Users
I implemented this based on Caffe C++ example. As I just began to touch Caffe very recently, my implementation is just straightforward and I guess it's not that good but it's 10x faster than with single image input. Anyway, here is what I did:
- Reshape the input layer. 
input_layer->Reshape(nImages, num_channels_, input_geometries_.height, input_geometries_.width);    
- Wrap input layer:
for (int i = 0; i < input_layer->channels()*nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
- Preprocess images
   for (int i = 0; i < nImages; i++) {
       ....       
        //cv::split(sample_normalized, *input_channels);
        vector<Mat> channels;
        cv::split(sample_normalized, channels);
        // NOTE: cannot use assignment operator of cv::Mat to copy data to input layer
        for (int j = 0; j < channels.size(); j++)
            channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);       
   }
- Prediction
    /* Copy the output layer to a std::vector */
    Blob<float>* output_layer = net_->output_blobs()[0];
    const float* begin = output_layer->cpu_data();
    const float* end = begin + nImages*output_layer->channels();
    return vector<float>(begin, end);

Jack Simpson

unread,
Sep 24, 2015, 3:16:24 AM9/24/15
to Caffe Users
Hi Fanglin, thank-you so much for your help! I've been trying to implement it in my main program by passing a vector of cv::Mat objects to the Classify method. I made the changes you recommended but got a little stuck with parsing the output - was there any chance you could take a look at my progress? I found it a little easier to embed the code in my question on StackOverflow which I hope is ok:
http://stackoverflow.com/questions/32668699/modifying-the-caffe-c-prediction-code-for-multiple-inputs

Thank-you so much again for all your help, I really cannot express how grateful I am!

Fanglin Wang

unread,
Sep 27, 2015, 12:34:27 AM9/27/15
to Caffe Users
Hi Jack,

As I am quite sure my answer is correct so I just don't go through your stack-over-flow thread. To parse the output, you should understand what the output is like. To me, they are just stored in a continuous space with the order of class labels. Let's say your labels are 0 1 2 and you input 100 images, then the output would be p0 p1 p2 p0 p1p2 ...., i.e., 100 tuples of probabilities to each class for the 100 images. So now it should be quite straightforward to get the probability to each class for each image now.

Jack Simpson

unread,
Sep 27, 2015, 1:00:11 AM9/27/15
to Caffe Users
Hi Fanglin,

You're right, your answer was correct, I think was just getting confused parsing the output. I think this is how I was supposed to implement your changes and parse the output, but I can't seem to get it to compile, but I'm not sure where my code is going wrong. I removed the parts of the code that were performing the mean image subtraction (I didn't use it) and were checking for the number of channels etc so I'm effectively passing the object the variable "input_channels" containing a vector of grayscale Mats. I'm really sorry to use up more of your time after you've given me so much help but is there any chance you could have a look at the code parsing the output from multiple images?


#include "Classifier.h"


using namespace caffe;

using std::string;


Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {

#ifdef CPU_ONLY

  Caffe::set_mode(Caffe::CPU);

#else

  Caffe::set_mode(Caffe::GPU);

#endif


  /* Load the network. */

  net_.reset(new Net<float>(model_file, TEST));

  net_->CopyTrainedLayersFrom(trained_file);


  Blob<float>* input_layer = net_->input_blobs()[0];

  num_channels_ = input_layer->channels();

  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());


  /* Load labels. */

  std::ifstream labels(label_file.c_str());

  CHECK(labels) << "Unable to open labels file " << label_file;

  string line;

  while (std::getline(labels, line))

    labels_.push_back(string(line));


  Blob<float>* output_layer = net_->output_blobs()[0];

  CHECK_EQ(labels_.size(), output_layer->channels())

    << "Number of labels is different from the output layer dimension.";

}


static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {

  return lhs.first > rhs.first;

}


/* Return the indices of the top N values of vector v. */

static std::vector<int> Argmax(const std::vector<float>& v, int N) {

  std::vector<std::pair<float, int> > pairs;

  for (size_t i = 0; i < v.size(); ++i)

    pairs.push_back(std::make_pair(v[i], i));

  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);


  std::vector<int> result;

  for (int i = 0; i < N; ++i)

    result.push_back(pairs[i].second);

  return result;

}


/* Return the top N predictions. */

std::vector< std::pair<int,float> > Classifier::Classify(std::vector<cv::Mat> &input_channels) {


    std::vector< std::vector<float> > output = Predict(input_channels, input_channels.size());

    std::vector< std::pair<int,float> > predictions;

    for ( int i = 0 ; i < output.size(); i++ ) {

        std::vector<int> maxN = Argmax(output[i], 1);

        int idx = maxN[0];

        predictions.push_back(std::make_pair(std::stoi(labels_[idx]), output[idx]));

    }

    return predictions;

}


std::vector< std::vector<float> > Classifier::Predict(std::vector<cv::Mat> &input_channels, int num_images) {

  Blob<float>* input_layer = net_->input_blobs()[0];

  input_layer->Reshape(num_images, num_channels_,

                       input_geometry_.height, input_geometry_.width);

    

  

  /* Forward dimension change to all layers. */

  net_->Reshape();


  WrapInputLayer(input_channels, num_images);


  net_->ForwardPrefilled();


    std::vector< std::vector<float> > ret;

    for ( int i = 0 ; i < num_images ; i++ )

    {

        const float* begin = output_layer->cpu_data() + i*output_layer->channels();

        const float* end = begin + output_layer->channels();

        ret.push_back( std::vector<float>(begin, end) );

    }

    return ret;

}


/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */

void Classifier::WrapInputLayer(std::vector<cv::Mat>& input_channels, int num_images) {

  Blob<float>* input_layer = net_->input_blobs()[0];


  int width = input_layer->width();

  int height = input_layer->height();

  float* input_data = input_layer->mutable_cpu_data();

  for (int i = 0; i < input_layer->channels() * num_images; ++i) {

    cv::Mat channel(height, width, CV_32FC1, input_data);

    input_channels->push_back(channel);

    input_data += width * height;

  }

}



Fanglin Wang

unread,
Sep 27, 2015, 11:29:22 PM9/27/15
to Caffe Users
Hi Jack,

Did you remove function "Preprocess"? If you did, you should have it back because data copying to input layers happens there. Plus, as I mentioned in the first reply, you should put these lines there:
        vector<Mat> channels;
        cv::split(sample_normalized, channels);
        // NOTE: cannot use assignment operator of cv::Mat to copy data to input layer
        for (int j = 0; j < channels.size(); j++)
            channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);      

Jack Simpson

unread,
Sep 29, 2015, 12:44:13 AM9/29/15
to Caffe Users
You're exactly right, I'd removed the preprocessing stage, I put it back in with your code. One thing I was a bit confused about was with the Predict section, I made the changes you recommended here:

  Blob<float>* output_layer = net_->output_blobs()[0];

  const float* begin = output_layer->cpu_data();

  const float* end = begin + nImages*output_layer->channels();

  return std::vector<float>(begin, end);


Shouldn't I be returning a vector containing float vectors for each one of the images? Something like "std::vector< std::vector<float>> or have I misunderstood?

Jack Simpson

unread,
Sep 29, 2015, 8:05:13 AM9/29/15
to Caffe Users
Hi Fanglin, one of the lines of code in the preprocess method that you recommended I use seems to be having some issues:

channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);


I get a compilation error stating "error: subscripted value is not an array, pointer, or vector". Have I made a mistake or has something strange happened?


Thanks so much!

Jack Simpson

unread,
Sep 29, 2015, 9:53:36 PM9/29/15
to Caffe Users
I removed the "[0] subscript of the "num_channels_" integer variable and that seems to have solved the compilation issue I was having.

Fanglin Wang

unread,
Sep 29, 2015, 11:59:39 PM9/29/15
to Caffe Users
No. As I mentioned:
As I am quite sure my answer is correct so I just don't go through your stack-over-flow thread. To parse the output, you should understand what the output is like. To me, they are just stored in a continuous space with the order of class labels. Let's say your labels are 0 1 2 and you input 100 images, then the output would be p0 p1 p2 p0 p1p2 ...., i.e., 100 tuples of probabilities to each class for the 100 images. So now it should be quite straightforward to get the probability to each class for each image now.

The vector contains all the prediction probabilities in a row.

Fanglin Wang

unread,
Sep 30, 2015, 12:00:49 AM9/30/15
to Caffe Users
Right. My mistake. Let me know whether your code is working.
Reply all
Reply to author
Forward
0 new messages