Using MobileNet SSD training, loss always being 0

253 views

Skip to first unread message

danie...@gmail.com

unread,

Jul 31, 2018, 12:00:12 AM7/31/18

to Caffe Users

I'm using MobileNet-SSD to training the Caltech data set.

Since I only want to detect pedestrian, I selected all Annotation files with pedestrian tag and remove all others tags. And I also changed the tag from pedestrian to person

So, for now, my Caltech data set only has two type, 0 is background and 1 is person.

I followed the instructions of training the MobileNet SSD with own data set, change the data set to lmdb format and link to the MobileNet SSD path (~/caffe/examples/MobileNet-SSD/) and run the train.sh shell script to start training.

BUT, the log shows the loss keeps being 0 from the beginning, as follows:

I0731 11:35:12.808466 23599 net.cpp:761] Ignoring source layer conv17_2_mbox_conf

I0731 11:35:12.819702 23599 upgrade_proto.cpp:77] Attempting to upgrade batch norm layers using deprecated params: mobilenet_iter_73000.caffemodel

I0731 11:35:12.819738 23599 upgrade_proto.cpp:80] Successfully upgraded batch norm layers using deprecated params.

I0731 11:35:12.824757 23599 net.cpp:761] Ignoring source layer conv11_mbox_conf

I0731 11:35:12.824801 23599 net.cpp:761] Ignoring source layer conv13_mbox_conf

I0731 11:35:12.824826 23599 net.cpp:761] Ignoring source layer conv14_2_mbox_conf

I0731 11:35:12.824844 23599 net.cpp:761] Ignoring source layer conv15_2_mbox_conf

I0731 11:35:12.824862 23599 net.cpp:761] Ignoring source layer conv16_2_mbox_conf

I0731 11:35:12.824873 23599 net.cpp:761] Ignoring source layer conv17_2_mbox_conf

I0731 11:35:12.824882 23599 net.cpp:761] Ignoring source layer mbox_loss

I0731 11:35:12.825114 23599 caffe.cpp:251] Starting Optimization

I0731 11:35:12.825124 23599 solver.cpp:294] Solving MobileNet-SSD

I0731 11:35:12.825129 23599 solver.cpp:295] Learning Rate Policy: multistep

I0731 11:35:13.192876 23599 solver.cpp:243] Iteration 0, loss = 0

I0731 11:35:13.192909 23599 solver.cpp:259] Train net output #0: mbox_loss = 0 (* 1 = 0 loss)

I0731 11:35:13.192945 23599 sgd_solver.cpp:138] Iteration 0, lr = 0.0001

I0731 11:35:13.211261 23599 blocking_queue.cpp:50] Data layer prefetch queue empty

I0731 11:35:28.226104 23599 solver.cpp:243] Iteration 10, loss = 0

I0731 11:35:28.226153 23599 solver.cpp:259] Train net output #0: mbox_loss = 0 (* 1 = 0 loss)

I0731 11:35:28.226164 23599 sgd_solver.cpp:138] Iteration 10, lr = 0.0001

I0731 11:35:44.267797 23599 solver.cpp:243] Iteration 20, loss = 0

I0731 11:35:44.267966 23599 solver.cpp:259] Train net output #0: mbox_loss = 0 (* 1 = 0 loss)

I0731 11:35:44.267980 23599 sgd_solver.cpp:138] Iteration 20, lr = 0.0001

^CI0731 11:35:46.055897 23599 solver.cpp:596] Snapshotting to binary proto file snapshot/mobilenet_iter_22.caffemodel

I0731 11:35:46.158947 23599 sgd_solver.cpp:307] Snapshotting solver state to binary proto file snapshot/mobilenet_iter_22.solverstate

I0731 11:35:46.196113 23599 solver.cpp:316] Optimization stopped early.

I0731 11:35:46.196141 23599 caffe.cpp:254] Optimization Done.

What parameters should I change to make the training process work?

Btw, I tried on VOC dataset, the program ran correctly with normal loss value show.

danie...@gmail.com

unread,

Aug 7, 2018, 10:29:14 PM8/7/18

to Caffe Users

I solved this bug.

It turns out the coordinates of the bbox in the annotation xml files cannot be float number. The caltech data set I download has coordinates like 422.478320848024820, which is even not float number.

After I modified all the coordinates from that kind of number to int number. The loss shows up.

在 2018年7月31日星期二 UTC+8下午12:00:12，danie...@gmail.com写道：

Reply all

Reply to author

Forward

0 new messages