I have created CNN model in Caffe with a single convolutional layer followed by two fully connected layers. My image dataset is of dimension 20*5*5 (c*h*w). I trained and deployed the model using GPU and CPU modes, but GPU mode is taking more time than CPU mode, for both prediction and training.
I used batching to fully utilize GPU memory and to minimize memory transfer overhead, but still results are same. My prediction dataset consists of 6 million images.
What could be the reason GPU is mode not showing speedup?
Thank you!