caffe-ssd (weiliu89) and mobilenet-ssd(chuanqi305) training problems

599 views
Skip to first unread message

crete

unread,
Dec 2, 2018, 10:02:41 PM12/2/18
to Caffe Users
I have been stuck on this problem for days. I have a gpu version caffe (cloned from weiliu89's caffe-ssd) on ubuntu 18.04 successfully compiled, with nvidia drivers  410.48 (GeForce GTX 1060 laptop version) and cuda 9.2 and corresponding cudnn installed. I followed the repo's instructions, downloaded VGGNet file, downloaded VOC07 and VOC12 datasets, made the lmdb files, and when i started training with "python examples/ssd/ssd_pascal.py", I got the following error:

math_functions.cpp:250] Check failed: a <= b (0 vs. -1.19209e-07)
*** Check failure stack trace: ***
    @     0x7f191a3480cd  google::LogMessage::Fail()
    @     0x7f191a349f33  google::LogMessage::SendToLog()
    @     0x7f191a347c28  google::LogMessage::Flush()
    @     0x7f191a34a999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f191aa4d1c7  caffe::caffe_rng_uniform<>()
    @     0x7f191aa893c8  caffe::SampleBBox()
    @     0x7f191aa89720  caffe::GenerateSamples()
    @     0x7f191aa89970  caffe::GenerateBatchSamples()
    @     0x7f191abb3892  caffe::AnnotatedDataLayer<>::load_batch()
    @     0x7f191ab60a7a  caffe::BasePrefetchingDataLayer<>::InternalThreadEntry()
    @     0x7f191aa1d205  caffe::InternalThread::entry()
    @     0x7f190e933bcd  (unknown)
    @     0x7f18fbccc6db  start_thread
    @     0x7f1918a8b88f  clone
Aborted (core dumped)

I searched for solutions, and commented the 250th line of math_functions.cpp, which is "CHECK_LE(a, b)", and recompiled caffe-ssd. The problem did solved, but the training stuck on the following step and wouldn't go on:

I1203 10:54:52.244547  3753 solver.cpp:294] Solving VGG_VOC0712_SSD_300x300_train
I1203 10:54:52.244570  3753 solver.cpp:295] Learning Rate Policy: multistep
I1203 10:54:52.247480  3753 blocking_queue.cpp:50] Data layer prefetch queue empty

I searched for solutions again, but did not get much useful information. I prepared custom datasets for mobilenet-ssd and followed the instructions in chuanqi305's repo, and when I started training, I got the same problem. I tested mnist training and cifar training in official tutorial of caffe, and it went smoothly. Have anyone got the same problem and solved it? Wish for a possible hint of solution and I will try it. Many thanks.

Mc Neill Ivan

unread,
Feb 3, 2019, 8:32:16 AM2/3/19
to Caffe Users
Hi @crete,

Were you able to solve the issue? I am also facing the same issue. Any ideas would be helpful.

Thanks,
Ivan

Tom Deblauwe

unread,
Feb 13, 2019, 4:24:09 PM2/13/19
to Caffe Users
Hello,

I got the EXACT same problem. I followed this guide:


So I already tried CPU/GPU and I'm also using the nvidia 1060, and did the same fix you mention with the check_le macro. I have really the same problem and can't find a solution...

I already created a script to check my lmdb is ok, and it seems so:

import caffe
import lmdb

import PIL.Image
from StringIO import StringIO
import numpy as np

lmdb_env = lmdb.open("trainval_lmdb")
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()

datum = caffe.proto.caffe_pb2.AnnotatedDatum()
for key, value in lmdb_cursor:
    print("Key: {}", key)
    datum.ParseFromString(value)
    for ann in datum.annotation_group:
print("Annotation ", ann.group_label)
for a in ann.annotation:
    print("  instance_id:", a.instance_id)
    print("  bbox:", a.bbox.xmin, a.bbox.xmax, a.bbox.ymin, a.bbox.ymax, a.bbox.label)

I got bounding boxes in my annotations, so yeah, the data is good.

So if anyone has a solution, please post it :)

Best regards,
Tom,

Tom Deblauwe

unread,
Feb 14, 2019, 5:33:12 AM2/14/19
to Caffe Users
Hi!

If I apply this fix mentioned here, then it starts to train! 


Best regards,
Tom,

Mc Neill Ivan

unread,
Feb 14, 2019, 6:59:50 AM2/14/19
to Caffe Users
Hi Tom,

Just to reconfirm, doing the changes in $CAFFE_ROOT/src/caffe/util/math_functions.cpp, we are also solving the "Data layer prefetch queue empty" issue?

Thanks,
Ivan

Mc Neill Ivan

unread,
Feb 14, 2019, 2:10:52 PM2/14/19
to Caffe Users
Hi Tom,

Thanks a lot! The changes you had suggested actually fixed the Data layer prefetch queue empty issue

regards,
Ivan

Tom Deblauwe

unread,
Feb 14, 2019, 5:18:28 PM2/14/19
to Caffe Users
Yes indeed! And the results are good of the training with the fix.
Glad to help,
Best regards
Tom

Mark

unread,
Apr 25, 2020, 2:18:40 AM4/25/20
to Caffe Users
Hey Tom,

I am also following the tutorial you have mentioned above. I am also getting the "Data layer prefetch queue empty... Killed", I already edited $caffe/src/caffe/util/math_functions.cpp but I am still unable to create any snapshot/mobilenet_iter_2035.caffemodel therefore I could not continue to the next step of the tutorial. I was wondering If you could help me solve my issue?

What checks did you perform? I ran your python script to check my lmdb and the results were something like this:

('  instance_id:', 0)

('  bbox:', 0.5465949773788452, 0.6953405141830444, 0.17891374230384827, 0.28753992915153503, 0)

('Key: {}', '00000006_Images/1A9R00.jpg')

('Annotation ', 3)

('  instance_id:', 0)

('  bbox:', 0.6666666865348816, 0.8333333134651184, 0.0702875405550003, 0.3865814805030823, 0)

('  instance_id:', 1)

('  bbox:', 0.6146953701972961, 0.740143358707428, 0.00319488812237978, 0.17571884393692017, 0)

('  instance_id:', 2)

('  bbox:', 0.5035842061042786, 0.616487443447113, 0.00319488812237978, 0.17891374230384827, 0)

('  instance_id:', 3)

('  bbox:', 0.3333333432674408, 0.45519712567329407, 0.00319488812237978, 0.23961661756038666, 0)

('Annotation ', 2)


Which I believe is good?
Reply all
Reply to author
Forward
0 new messages