Hi,
I have some experience about such problem.
1. You mean getting the object proposition from algorithm and then use it for classification. From my test it will not work good, mainly because of small object which are not detected by Object Proposition algorithm. In fact, Andrej Karpathy analyse of detection from ImageNet 2014, show that the models have very big problems with small objects. And they are using Object Proposition algorithm.
2. I think this is much better, to classify responses for whole image. Then it need to be done in multi-scale. And you need a lot of training data. I am thinking exacly about sth like OverFeat paper. You can also try the modification of OverFeat:
https://github.com/Russell91/ReInspect
Or you can create your own method.