The problem with training on a dataset CASIA-WebFace

133 views
Skip to first unread message

Smoleny Krivich

unread,
Aug 5, 2020, 5:41:35 PM8/5/20
to Caffe Users
Hi, the problem is that I have been trying for a long time to get at least satisfactory results in training my network on dataset CASIA-WebFace. I was guided by examples from https://github.com/xingwangsfu/FaceVerification/tree/master/caffe_proto and according to the given document. 
I ran a couple of trainings with different base_lr from [1e-2 ... 1e -5] hoping to see at least some decrease in loss. As a result, I somehow got at 170500 iterations loss = 1.34 and accuracy = 0.7712, and on this a long stagnation went into training. Tell me, what am I doing wrong? I have attached the console output during training and my .prototxt files.
Learning_Face_Representation_from_Scratch.pdf
solver.prototxt
train-val-casia.prototxt

Tamas Nemes

unread,
Aug 5, 2020, 6:07:41 PM8/5/20
to Caffe Users
Sorry, can you please specify what your problem is? A loss under 1.5 and an accuracy beyond 70% is actually really good; for example, the best Caffe object detectors you find on GitHub achieve an accuracy around 77% on PascalVOC dataset.

Smoleny Krivich

unread,
Aug 5, 2020, 6:34:22 PM8/5/20
to Caffe Users
I would like to clarify that I started studying caffe not long ago. I tested the resulting .caffemodel and the output turned out to be complete nonsense (it classifies all images in only two categories and this is very strange). Therefore, maybe somewhere I did not take into account something. therefore
1) how to avoid this behavior?
2) how was the learning process from other people and how many iterations did it take?
3) what methods can I apply to avoid stagnation in training?
3) is there an example of source code with network architecture and solver file? (It would be nice to take a look)

четверг, 6 августа 2020 г., 1:07:41 UTC+3 пользователь Tamas Nemes написал:

Tamas Nemes

unread,
Aug 6, 2020, 6:07:19 AM8/6/20
to Caffe Users
That's indeed very strange, can you please give an example about how it is expected to behave?
Sadly, I'm not able to give you deep insights about model optimization as I for myself don't work with Caffe that long. Therefore, I only can give you following advice:

1) I can't say for sure of course, but I think the problem might actually be in the model architecture itself (my second guess was the dataset, please take a closer look on it to confirm everything is fine). Generally, it is best to use recent models rather than old ones, which may not have been maintenanced over time and, additionally, typically don't have much popularity because of bad functionality. Choose young models with community support.
2) The only project I tackled with Caffe so far is object detection with MobileNetSSD. You can have a closer look to these repos (the first one is the original, the second just an adaptation to the v2 of the model):
There, it is clearly described the goal is to achieve a loss around 2.5-1.5 within 100000 iterations (I managet to get around 3.0). I trained the model on my own dataset and it turned out that in terms of accuracy, it was doing very bad (around 2%), but on real data, it detected just fine. Try to evaluate your model on data that isn't in your dataset.
3) Also, the reason could be overfitting, because the network achieves the least loss when just sorting into 2 categories (that's why I suggested you take a look on your data).
4) In the links I provided you, you find all necessary files you can look at. That would be solutions that really work. Other than that, I don't know what kind of source files you're interested in.

Best of luck!
Tamas

Smoleny Krivich

unread,
Aug 6, 2020, 8:35:56 AM8/6/20
to Caffe Users
Thanks Tamas for your recommendations and the repositories you provided (I found what interested me). I will share the results shortly )

четверг, 6 августа 2020 г., 13:07:19 UTC+3 пользователь Tamas Nemes написал:
Reply all
Reply to author
Forward
0 new messages