I've performed transfer learning with GoogleNet to differentiate between the various vehicle make and models in the Stanford Cars Dataset (
http://ai.stanford.edu/~jkrause/cars/car_dataset.html). This works surprisingly well, and I would now like to implement the model for detecting objects in larger images. The approach that I've tried so far is to randomly extract and evaluate 2,000 different proposal boxes, and only deal with those boxes that have the strongest class probability. The problem with this approach is that the CNN classifies blank or noisy images with artificially high confidence levels. This problem is discussed here:
https://www.youtube.com/watch?v=M2IebCN9Ht4.
Any thoughts on a way forward? Now that I have a model, how can I operationalize it for image detection? Should I be focusing on training a model for bounding box estimation? All thoughts and pointers welcome.