Hey Stephanie,
This difference in metrics is not so large, but I would expect that we can make it zero, since I trained and evaluated the BVLC AlexNet. That said, little changes in data preparation or computation can compound.
- What kind of interpolation did you use for making the 256x256 images? I would have to go back into some (years old) logs to know, but I believe I picked bilinear interpolation.
- What backend are you running, cuDNN or vanilla Caffe? You could try doing inference with the vanilla Caffe backend by setting `engine: CAFFE` in the proto definition, or by compiling Caffe without cuDNN.
I would suspect a difference in the data more than a difference in the computation, since cuDNN was initially developed to pass all the Caffe unit tests, but one never knows.
Hope that helps,