Using MobileNet SSD training, loss always being 0

241 views
Skip to first unread message

danie...@gmail.com

unread,
Jul 31, 2018, 12:00:12 AM7/31/18
to Caffe Users
I'm using MobileNet-SSD to training the Caltech data set.
Since I only want to detect pedestrian, I selected all Annotation files with pedestrian tag and remove all others tags. And I also changed the tag from pedestrian to person

So, for now, my Caltech data set only has two type, 0 is background and 1 is person. 

I followed the instructions of training the MobileNet SSD with own data set, change the data set to lmdb format and link to the MobileNet SSD path (~/caffe/examples/MobileNet-SSD/) and run the train.sh shell script to start training.



BUT, the log shows the loss keeps being 0 from the beginning, as follows:

I0731 11:35:12.808466 23599 net.cpp:761] Ignoring source layer conv17_2_mbox_conf
I0731 11:35:12.819702 23599 upgrade_proto.cpp:77] Attempting to upgrade batch norm layers using deprecated params: mobilenet_iter_73000.caffemodel
I0731 11:35:12.819738 23599 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params.
I0731 11:35:12.824757 23599 net.cpp:761] Ignoring source layer conv11_mbox_conf
I0731 11:35:12.824801 23599 net.cpp:761] Ignoring source layer conv13_mbox_conf
I0731 11:35:12.824826 23599 net.cpp:761] Ignoring source layer conv14_2_mbox_conf
I0731 11:35:12.824844 23599 net.cpp:761] Ignoring source layer conv15_2_mbox_conf
I0731 11:35:12.824862 23599 net.cpp:761] Ignoring source layer conv16_2_mbox_conf
I0731 11:35:12.824873 23599 net.cpp:761] Ignoring source layer conv17_2_mbox_conf
I0731 11:35:12.824882 23599 net.cpp:761] Ignoring source layer mbox_loss
I0731 11:35:12.825114 23599 caffe.cpp:251] Starting Optimization
I0731 11:35:12.825124 23599 solver.cpp:294] Solving MobileNet-SSD
I0731 11:35:12.825129 23599 solver.cpp:295] Learning Rate Policy: multistep
I0731 11:35:13.192876 23599 solver.cpp:243] Iteration 0, loss = 0
I0731 11:35:13.192909 23599 solver.cpp:259]     Train net output #0: mbox_loss = 0 (* 1 = 0 loss)
I0731 11:35:13.192945 23599 sgd_solver.cpp:138] Iteration 0, lr = 0.0001
I0731 11:35:13.211261 23599 blocking_queue.cpp:50] Data layer prefetch queue empty
I0731 11:35:28.226104 23599 solver.cpp:243] Iteration 10, loss = 0
I0731 11:35:28.226153 23599 solver.cpp:259]     Train net output #0: mbox_loss = 0 (* 1 = 0 loss)
I0731 11:35:28.226164 23599 sgd_solver.cpp:138] Iteration 10, lr = 0.0001
I0731 11:35:44.267797 23599 solver.cpp:243] Iteration 20, loss = 0
I0731 11:35:44.267966 23599 solver.cpp:259]     Train net output #0: mbox_loss = 0 (* 1 = 0 loss)
I0731 11:35:44.267980 23599 sgd_solver.cpp:138] Iteration 20, lr = 0.0001
^CI0731 11:35:46.055897 23599 solver.cpp:596] Snapshotting to binary proto file snapshot/mobilenet_iter_22.caffemodel
I0731 11:35:46.158947 23599 sgd_solver.cpp:307] Snapshotting solver state to binary proto file snapshot/mobilenet_iter_22.solverstate
I0731 11:35:46.196113 23599 solver.cpp:316] Optimization stopped early.
I0731 11:35:46.196141 23599 caffe.cpp:254] Optimization Done.



What parameters should I change to make the training process work?

Btw, I tried on VOC dataset, the program ran correctly with normal loss value show.


danie...@gmail.com

unread,
Aug 7, 2018, 10:29:14 PM8/7/18
to Caffe Users
I solved this bug.

It turns out the coordinates of the bbox in the annotation xml files cannot be float number. The caltech data set I download has coordinates like 422.478320848024820, which is even not float number. 
After I modified all the coordinates from that kind of number to int number. The loss shows up.



在 2018年7月31日星期二 UTC+8下午12:00:12,danie...@gmail.com写道:
Reply all
Reply to author
Forward
0 new messages