I implemented a modified version of the Caffe C++ example and while it works really well, it's incredibly slow because it only accepts images one by one. Ideally I'd like to pass Caffe a vector of 200 images and return all the predictions but I'm having difficulty working out how to modify the example to allow for this. Is the trick to create multiple "input_blobs and then change this kind of code here:
Blob<float>* input_layer = net_->input_blobs()[0];
To a loop that passes multiple input_blobs? Does anyone have any advice or know someone who's already solved the problem on GitHub?
As I am quite sure my answer is correct so I just don't go through your stack-over-flow thread. To parse the output, you should understand what the output is like. To me, they are just stored in a continuous space with the order of class labels. Let's say your labels are 0 1 2 and you input 100 images, then the output would be p0 p1 p2 p0 p1p2 ...., i.e., 100 tuples of probabilities to each class for the 100 images. So now it should be quite straightforward to get the probability to each class for each image now.
#include "Classifier.h"
using namespace caffe;
using std::string;
Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
Caffe::set_mode(Caffe::CPU);
#else
Caffe::set_mode(Caffe::GPU);
#endif
/* Load the network. */
net_.reset(new Net<float>(model_file, TEST));
net_->CopyTrainedLayersFrom(trained_file);
Blob<float>* input_layer = net_->input_blobs()[0];
num_channels_ = input_layer->channels();
input_geometry_ = cv::Size(input_layer->width(), input_layer->height());
/* Load labels. */
std::ifstream labels(label_file.c_str());
CHECK(labels) << "Unable to open labels file " << label_file;
string line;
while (std::getline(labels, line))
labels_.push_back(string(line));
Blob<float>* output_layer = net_->output_blobs()[0];
CHECK_EQ(labels_.size(), output_layer->channels())
<< "Number of labels is different from the output layer dimension.";
}
static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
return lhs.first > rhs.first;
}
/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
std::vector<std::pair<float, int> > pairs;
for (size_t i = 0; i < v.size(); ++i)
pairs.push_back(std::make_pair(v[i], i));
std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);
std::vector<int> result;
for (int i = 0; i < N; ++i)
result.push_back(pairs[i].second);
return result;
}
/* Return the top N predictions. */
std::vector< std::pair<int,float> > Classifier::Classify(std::vector<cv::Mat> &input_channels) {
std::vector< std::vector<float> > output = Predict(input_channels, input_channels.size());
std::vector< std::pair<int,float> > predictions;
for ( int i = 0 ; i < output.size(); i++ ) {
std::vector<int> maxN = Argmax(output[i], 1);
int idx = maxN[0];
predictions.push_back(std::make_pair(std::stoi(labels_[idx]), output[idx]));
}
return predictions;
}
std::vector< std::vector<float> > Classifier::Predict(std::vector<cv::Mat> &input_channels, int num_images) {
Blob<float>* input_layer = net_->input_blobs()[0];
input_layer->Reshape(num_images, num_channels_,
input_geometry_.height, input_geometry_.width);
/* Forward dimension change to all layers. */
net_->Reshape();
WrapInputLayer(input_channels, num_images);
net_->ForwardPrefilled();
std::vector< std::vector<float> > ret;
for ( int i = 0 ; i < num_images ; i++ )
{
const float* begin = output_layer->cpu_data() + i*output_layer->channels();
const float* end = begin + output_layer->channels();
ret.push_back( std::vector<float>(begin, end) );
}
return ret;
}
/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>& input_channels, int num_images) {
Blob<float>* input_layer = net_->input_blobs()[0];
int width = input_layer->width();
int height = input_layer->height();
float* input_data = input_layer->mutable_cpu_data();
for (int i = 0; i < input_layer->channels() * num_images; ++i) {
cv::Mat channel(height, width, CV_32FC1, input_data);
input_channels->push_back(channel);
input_data += width * height;
}
}
Blob<float>* output_layer = net_->output_blobs()[0];
const float* begin = output_layer->cpu_data();
const float* end = begin + nImages*output_layer->channels();
return std::vector<float>(begin, end);
Shouldn't I be returning a vector containing float vectors for each one of the images? Something like "std::vector< std::vector<float>> or have I misunderstood?
channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
I get a compilation error stating "error: subscripted value is not an array, pointer, or vector". Have I made a mistake or has something strange happened?
Thanks so much!