I've noticed that when using residual networks on CIFAR-10, researchers chose to use a much smaller network and train it from scratch, and get a pretty good precision. But why don't they fine-tune from a pre-trained model, e.g. on ImageNet, for CIFAR-10?
They might have done this but the precision is not better than a smaller network.
If we use a small network to train on ImageNet, it will not fit this big database so we cannot do fine-tuning. If we use a big network, pre-trained on ImageNet, then fine-tune on CIFAR-10, but no better than a small network? How to explain this? Overfitting? Image sizes are different so fine-tune won't work?