Thanks, I appreciate your thoughts. Do you have a link for the 7.9% - 7.0% numbers? I believe you, just haven't seen them myself.
Does the size of VGG mean that 4GB+ GPUs are required? I have been able to get by with smaller batchsizes / gpus for googlenet.
It's true that the multiple losses (1 primary classifier, 2 aux classifiers) threw me for a loop when I first attempted to fine tune GoogLeNet. I tried fine-tuning from ILSVRC weights 2 ways: removing the 2 aux classifiers, and leaving them in but decreasing the learning weight of everything but the final classifier's FC layer. Both methods actually converged nicely and gave good classification results.
My application needs to do detection across very large (gigapixel) images, using a sliding window approach. So, number of computations / execution speed is pretty important for my use case.
Cheers