There is a thing that really makes me confused in classification.
Usually, we wrap input layer and copy data to net->input_blobs()[0]->mutable_cpu_data(), then use net->Forward() to pass it and get the output.
Is this necessary? I think copying data takes so much time especially in application like object detection(there are hundreds of thousands sub-windows to be evaluated).
Is there any method to avoid copying data?