PSPNet training stuck at loss 87.3365 for 5K iteration

195 views
Skip to first unread message

Bhadresh Dhanani

unread,
Oct 11, 2017, 4:07:52 PM10/11/17
to Caffe Users
Hi,

I have been trying to train the PSPNet with my current dataset which has image size (360X480) and total 3 classes.

The training is at almost 6000 iteration and the loss is stuck at same value 87.3365 from the first iteration.
Here is the log preview:
 
I1011 14:57:57.942999 65978 solver.cpp:229] Iteration 5980, loss = 87.3365
I1011 14:57:57.943485 65978 solver.cpp:245]     Train net output #0: accuracy = 0
I1011 14:57:57.943513 65978 solver.cpp:245]     Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1011 14:57:57.943532 65978 solver.cpp:245]     Train net output #2: per_class_accuracy = 0
I1011 14:57:57.943541 65978 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1011 14:57:57.943549 65978 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1011 14:57:58.368386 65978 sgd_solver.cpp:106] Iteration 5980, lr = 1e-05
I1011 14:58:41.068958 65978 solver.cpp:229] Iteration 6000, loss = 87.3365
I1011 14:58:41.069432 65978 solver.cpp:245]     Train net output #0: accuracy = 0.00233507
I1011 14:58:41.069474 65978 solver.cpp:245]     Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1011 14:58:41.069483 65978 solver.cpp:245]     Train net output #2: per_class_accuracy = 0
I1011 14:58:41.069489 65978 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1011 14:58:41.069495 65978 solver.cpp:245]     Train net output #4: per_class_accuracy = 1
I1011 14:58:41.477138 65978 sgd_solver.cpp:106] Iteration 6000, lr = 1e-05
I1011 14:59:24.579336 65978 solver.cpp:229] Iteration 6020, loss = 87.3365
I1011 14:59:24.579944 65978 solver.cpp:245]     Train net output #0: accuracy = 0
I1011 14:59:24.579963 65978 solver.cpp:245]     Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1011 14:59:24.579972 65978 solver.cpp:245]     Train net output #2: per_class_accuracy = 0
I1011 14:59:24.579994 65978 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1011 14:59:24.580011 65978 solver.cpp:245]     Train net output #4: per_class_accuracy = 0
I1011 14:59:25.007410 65978 sgd_solver.cpp:106] Iteration 6020, lr = 1e-05
I1011 15:00:08.009951 65978 solver.cpp:229] Iteration 6040, loss = 87.3365
I1011 15:00:08.020134 65978 solver.cpp:245]     Train net output #0: accuracy = 8.3912e-05
I1011 15:00:08.020160 65978 solver.cpp:245]     Train net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1011 15:00:08.020177 65978 solver.cpp:245]     Train net output #2: per_class_accuracy = 0
I1011 15:00:08.020195 65978 solver.cpp:245]     Train net output #3: per_class_accuracy = 0
I1011 15:00:08.020200 65978 solver.cpp:245]     Train net output #4: per_class_accuracy = 1
I1011 15:00:08.419051 65978 sgd_solver.cpp:106] Iteration 6040, lr = 1e-05

Here is my solver parameters:

 net: "<path>" # Change this to the absolute path to your model file
base_lr: 0.0001
lr_policy: "step"
gamma: 0.1
stepsize: 15000
display: 20
momentum: 0.9
max_iter: 45000
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "<path>" # Change this to the absolute path to where you wish to output solver snapshots
solver_mode: GPU

I tried to reduce the learning rate but the weird thing is that loss will be stuck at same value. I also tried different dataset, but no luck.
I have attached my train_val.prototxt here in case you spot a mistake.

Any help would be appreciated.

Bhadresh
train_val.txt

Przemek D

unread,
Oct 12, 2017, 3:34:54 AM10/12/17
to Caffe Users
87.3365 is a kind of magical number (-log(FLT_MIN)), its appearance means that something is broken. Can't give you any more details except that I've seen it several times usually in relation to too high LR or bad weights initialization. Also, networks never recovered from this state, so once this value appears you can abort the training and debug your solver/net.

Bhadresh Dhanani

unread,
Oct 12, 2017, 1:11:53 PM10/12/17
to Caffe Users
Przemek D,

Yeah, you are right. People have come across this issues. I followed some threads and one solution was to use the pre-trained model, I used that and now I do not see that magical loss number.

But now the issue is at per class accuracy.
I1012 11:21:22.764320 103737 solver.cpp:229] Iteration 720, loss = 0.00591884
I1012 11:21:22.764439 103737 solver.cpp:245] Train net output #0: accuracy = 0.998973
I1012 11:21:22.764449 103737 solver.cpp:245] Train net output #1: loss = 0.00591873 (* 1 = 0.00591873 loss)
I1012 11:21:22.764456 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:21:22.764459 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:21:22.764463 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:21:23.167104 103737 sgd_solver.cpp:106] Iteration 720, lr = 0.0001
I1012 11:22:05.639000 103737 solver.cpp:229] Iteration 740, loss = 0.0227261
I1012 11:22:05.639124 103737 solver.cpp:245] Train net output #0: accuracy = 0.996832
I1012 11:22:05.639134 103737 solver.cpp:245] Train net output #1: loss = 0.022726 (* 1 = 0.022726 loss)
I1012 11:22:05.639139 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:22:05.639143 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:22:05.639147 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:22:06.039103 103737 sgd_solver.cpp:106] Iteration 740, lr = 0.0001
I1012 11:22:48.640102 103737 solver.cpp:229] Iteration 760, loss = 0.0143614
I1012 11:22:48.640224 103737 solver.cpp:245] Train net output #0: accuracy = 0.997564
I1012 11:22:48.640234 103737 solver.cpp:245] Train net output #1: loss = 0.0143613 (* 1 = 0.0143613 loss)
I1012 11:22:48.640239 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:22:48.640244 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:22:48.640247 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:22:49.014072 103737 sgd_solver.cpp:106] Iteration 760, lr = 0.0001
I1012 11:23:31.601939 103737 solver.cpp:229] Iteration 780, loss = 0.0409522
I1012 11:23:31.602075 103737 solver.cpp:245] Train net output #0: accuracy = 0.991696
I1012 11:23:31.602087 103737 solver.cpp:245] Train net output #1: loss = 0.0409521 (* 1 = 0.0409521 loss)
I1012 11:23:31.602092 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:23:31.602095 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:23:31.602099 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:23:31.926669 103737 sgd_solver.cpp:106] Iteration 780, lr = 0.0001
I1012 11:24:14.647243 103737 solver.cpp:229] Iteration 800, loss = 0.0438639
I1012 11:24:14.647406 103737 solver.cpp:245] Train net output #0: accuracy = 0.989936
I1012 11:24:14.647439 103737 solver.cpp:245] Train net output #1: loss = 0.0438638 (* 1 = 0.0438638 loss)
I1012 11:24:14.647457 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:24:14.647465 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:24:14.647476 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:24:15.036211 103737 sgd_solver.cpp:106] Iteration 800, lr = 0.0001
I1012 11:24:57.684042 103737 solver.cpp:229] Iteration 820, loss = 0.0286792
I1012 11:24:57.684188 103737 solver.cpp:245] Train net output #0: accuracy = 0.995645
I1012 11:24:57.684209 103737 solver.cpp:245] Train net output #1: loss = 0.0286791 (* 1 = 0.0286791 loss)
I1012 11:24:57.684213 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:24:57.684217 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:24:57.684221 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:24:58.083516 103737 sgd_solver.cpp:106] Iteration 820, lr = 0.0001
I1012 11:25:40.446902 103737 solver.cpp:229] Iteration 840, loss = 0.0117887
I1012 11:25:40.447089 103737 solver.cpp:245] Train net output #0: accuracy = 0.998924
I1012 11:25:40.447103 103737 solver.cpp:245] Train net output #1: loss = 0.0117886 (* 1 = 0.0117886 loss)
I1012 11:25:40.447121 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:25:40.447127 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0
I1012 11:25:40.447139 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0
I1012 11:25:40.865295 103737 sgd_solver.cpp:106] Iteration 840, lr = 0.0001
I1012 11:26:23.230521 103737 solver.cpp:229] Iteration 860, loss = 0.0343943
I1012 11:26:23.230664 103737 solver.cpp:245] Train net output #0: accuracy = 0.993892
I1012 11:26:23.230684 103737 solver.cpp:245] Train net output #1: loss = 0.0343942 (* 1 = 0.0343942 loss)
I1012 11:26:23.230690 103737 solver.cpp:245] Train net output #2: per_class_accuracy = 1
I1012 11:26:23.230703 103737 solver.cpp:245] Train net output #3: per_class_accuracy = 0 
I1012 11:26:23.230707 103737 solver.cpp:245] Train net output #4: per_class_accuracy = 0 

The loss seems to fluctuate little bit, but per class accuracy is pretty much darn same for all iteration.
When I worked with SegNet FCN, I used class weighting to compute loss which would consider the weighting of the class while computing the loss.
Here it throws error saying that "caffe.LossParameter" has no field named "class_weighting".
Which completely make sense as it is not implemented in PSPNet caffe related code.

So the questions is, how can I get the stable per class accuracy in my training.

My class 1 is dominating class (90% pixels belong to this class) and other class (2 and 3) are non-dominating class (10% pixels belong to either of the class).

Thank you,
Bhadresh

Przemek D

unread,
Oct 13, 2017, 6:56:55 AM10/13/17
to Caffe Users
You might be interested in using InfogainLoss if the issue is about class imbalance. I do not know implementation details, but the answer I linked to contains some useful tips and links to some further reading.


W dniu środa, 11 października 2017 22:07:52 UTC+2 użytkownik Bhadresh Dhanani napisał:

mka...@caidesystems.com

unread,
Jan 26, 2018, 3:57:37 PM1/26/18
to Caffe Users
Were you able to figure out why the per_class_accuracy isn't changing?
Reply all
Reply to author
Forward
0 new messages