Training loss suddenly got a sharp jump, why? (Network in Network on CIFAR10)

976 views
Skip to first unread message

Qiang Guo

unread,
Nov 3, 2015, 6:43:10 AM11/3/15
to Caffe Users
My experiment set up:
  • Network in Network model, prototxt files provided by the author.
  • Preprocessed CIFAR10 data also provided by the author.
Machine:
  • Ubuntu 14.04 64bit
  • GTX 780
The model and the data are download at https://gist.github.com/mavenlin/e56253735ef32c3c296d

My modification:
  1. When using learning rate as 0.1, the network doesn't learn at all, testing error keeping at 0.1.
    So I modified the initial learning rate to 0.01.
  2. Modify the prototxt file to fit with current Caffe style. e.g. layers to layer, type from enum to string.
    This doesn't affect the performance at all, I think.
Without unexpectation, the net should got performance of 89.6% on the validation set. 
However, my running only achieve 72%, and suddenly encounted a sharp jump of the training loss. 
Then the accuracy on the validation set drop to random guess(0.1=1/10) and never improved again.


Why did this wired situation happen?

Has anyone reproduced the result as the NIN paper?

Qiang Guo

unread,
Nov 3, 2015, 8:50:32 PM11/3/15
to Caffe Users
I reruned the experiment again, and the situation didn't happen.

It seems the problem was just an occasional problem. 
I still don't know if it is the parameter initialization or GPU memory that caused the problem.

Just report these information in case someone encountered the same problem.

daviddewhurst6

unread,
Feb 23, 2018, 9:17:39 PM2/23/18
to Caffe Users
I am currently having similar problems---possibly even more dramatic. During training my validation error will suddenly jump by an order of magnitude (e.g., 0.65 to 6.5) while my training error will either continue to improve or remain static. I am quite sure of my data integrity and of my network configuration (you can check it out on https://github.com/daviddewhurst/market_prediction/blob/dev/src/classifiers/equity_classifiers.py ).

Here's the weird part: none of these issues happen when training with CPU on my laptop. When I move to desktop with GPU (nvidia 680) all hell breaks loose.

Other relevant information: I'm using conda as a package manager. Tensorflow and keras installed via conda install <package-name>.
Reply all
Reply to author
Forward
0 new messages