Training loss discrepancy?

12 views
Skip to first unread message

Alex Ter-Sarkisov

unread,
Sep 18, 2017, 11:57:24 AM9/18/17
to Caffe Users
I use a batch size of 32 during the training phase, with average loss = 1000 iteration and Infogain loss matrix with H=[2,0;0,10]. Averaged error is reported during the work of he algorithm as 

Iteration 32600 (2.12984 iter/s, 46.952s/100 iters), loss = 0.16175
Train net output #0: loss = 0.137508 (* 1 = 0.137508 loss)

the first one is for a batch, the second is averaged over 1000 iterations. 

Yet when I do network surgery, the error reported is much higher:

nets = range(1000,31000,1000)
for iters in nets:

            net_mcn = caffe.Net("train.prototxt","network" + str(iters)+".caffemodel",caffe.TRAIN)
            print iters, net_mcn.blobs['loss'].data

I got 

1000 80.195098877
2000 60.3577766418
3000 53.1454048157
4000 73.8062515259
5000 83.9467010498
6000 51.0115280151
7000 63.0284347534
8000 43.6017570496
9000 48.1773223877
10000 44.3014030457
11000 46.4531173706
12000 41.0288658142
13000 56.1040344238
14000 35.1473693848
15000 36.48487854
16000 38.3397903442
17000 44.2434501648
18000 30.4674892426
19000 42.7231369019
20000 36.1248321533
21000 42.5675506592
22000 34.4444198608
23000 38.0397491455
24000 42.3689651489
25000 50.9166297913
26000 43.2592735291
27000 36.7138328552
28000 41.9512062073
29000 44.1816444397
30000 33.7083320618

Loss layer is 

layer{
  name: "loss"
  type: "InfogainLoss"
  bottom: "score_final"
  bottom: "label"
  top: "loss"
  infogain_loss_param {
      source: "infogain.binaryproto"
      axis: 1
   }
}

The situation with the validation set is about the same. This can't be explained by averaging. Then what? 

 
Reply all
Reply to author
Forward
0 new messages