DeepFace

Bartosz Ludwiczuk

unread,

Feb 11, 2015, 8:40:22 AM2/11/15

to caffe...@googlegroups.com

I want to reproduce DeepFace net architecture. As their database is not public, I use biggest public face dataset WLFDB. It has 0.7 millions of images for 6025 subjects.

My net architecture is pretty the same like in paper (I check features in dimension in every step to be sure), I now learn only classification net. But, as I do not have 1k images per subject, I remove last two LOCAL layer ( taken from here).

As frontalization code for DeepFace is not released, I try to use raw faces and alignment faces delivered by WLFDB.

My problem is that net overfit (sth like train 85%, test: 15%). In paper and presentation they claim, that net does not overfiitt at all (they provide plot of logloss from train and test). Facebook use dropout only in last layer.

I do not find any information about data augmentation (like mirror, color, random crop). I understand, that I these technique may be not appropriate for face classification. But I do not why my net overfitt so bad.

Question: Is anybody try do reproduce DeepFace? Or maybe was trying do reproduce Deep cypof ID ( I use not exactly the same architecture, but it overfit too).

I think, that is not a problem with net architecture, but the problem is data. Maybe I miss some data pre-processing before start learning.

How can I reduce overfitting?

DeepFace_train_val.prototxt

lax

unread,

Feb 12, 2015, 5:17:44 PM2/12/15

to caffe...@googlegroups.com

How many images are in the DeepFace dataset? Even though they did not see over fitting with their dataset, you may still have an issue with your dataset. I would try adding dropout to more layers and increasing the dropout ratio. You could also try decreasing the overall size of your network or transferring the conv1 layers learned from training on Alexnet.

Michael Wilber

unread,

Feb 16, 2015, 4:27:10 PM2/16/15

to caffe...@googlegroups.com

This one's tricky.

- In my opinion, most of the secret behind the DeepFace paper is their awesome pose normalization strategy. They have several tricks: they find the eyes, nose, mouth, etc.; they solve for the affine camera transform that could have put the keypoints there; they project the image onto a 3D model of a face; they rotate the model to look straight into the camera; they then turn that back into an image; etc. None of this is tackled within Caffe, but I'm convinced it's absolutely vital to building a good face recognition system. How are you doing your alignment step?

- If the faces in this database really are weakly labeled, are you sure you aren't training your network on garbage? Are you sure the accuracy of a perfect classifier really is close to 100%?

- DeepFace was designed to be learned in something close to a metric learning scenario. The output is not a scalar class label, but rather, it is a single scalar distance value that represents the similarity between the two faces. This is then thresholded to get a "same-or-different" decision. The "identification" scenario you're solving is very, very different from the "verification" scenario that DeepFace was designed to solve (and the LFW dataset used to evaluate it).

My suggestions:

- Be sure you're using great pose alignment. If you aren't, try using an automatic keypoint detector like http://www.vision.caltech.edu/xpburgos/ICCV13/ and warp the faces yourself to a common reference frame. You might even recruit some mechanical turkers to give you perfect keypoint locations before switching to an automatic system, just as a simple sanity check.

- Try recasting your dataset as a verification problem rather than a classification problem. Perhaps you might even sample some training/testing pairs yourself in a way similar to what LFW does.

One example of a paper that addresses the "identification vs verification" problem is the DeepID system of Sun and friends, which uses both kinds of information. This approach is actually a bit better than DeepFace on LFW at the moment: http://arxiv.org/abs/1406.4773

Face recognition is a weird beast of a problem to tackle. Good luck! You'll need it. :)

Bartosz Ludwiczuk

unread,

Feb 16, 2015, 5:02:20 PM2/16/15

to caffe...@googlegroups.com

Thanks for all help.

I must say that the cause of low performance was database. WLFDB are really weekly labeled, so it does not provide right result. I have to find/create better database for Face Verification task.

About pose alignment:

This is really important think. I am using rather robust technique presented in "Tal Hassner, Shai Harel*, Eran Paz* and Roee Enbar, Effective Face Frontalization in Unconstrained Images".

They result look pretty impressive.

About verification task:

Firstly, I would like to pre-train net using classification task (DeepFace and DeepID do that as a first step. Second step was learning verification metric). This is why I was searching for good database. The next step will be learning Siamese network or other metric learning technique.

Kostia Antoniuk

unread,

Feb 20, 2015, 1:30:50 PM2/20/15

to caffe...@googlegroups.com

Hi Bartosz,

any success with reproducing DeepID/DeepFace? I'm trying now to reproduce it as well,

however I am consistently overfitting network. :( I am curious if you made it.

-K.

Bartosz Ludwiczuk

unread,

Feb 20, 2015, 4:45:56 PM2/20/15

to Kostia Antoniuk, caffe...@googlegroups.com

Hi,
which database do you use?
I have only learn at FaceSrub (80k images) and net was overfitting.
On WLDB I get only 20%, but this is because of weekly labeled data.

Now I am trying to get bigger database. I hope this will help with overfitting.

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b137cb8c-551f-4fc1-b891-82175427a2b1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kostia Antoniuk

unread,

Feb 20, 2015, 4:58:55 PM2/20/15

to caffe...@googlegroups.com, anto...@gmail.com

Hi,

I have tried it on MORPH database. So far it brutally overfits. :(

Ho

unread,

Feb 24, 2015, 6:16:11 PM2/24/15

to caffe...@googlegroups.com, anto...@gmail.com

Hi Bartosz,

Have you read learning face representation from scratch (http://arxiv.org/abs/1411.7923)? This paper provides a better face database to train the CNN. I hope it will help you. I have used this dataset to train the CNN model where it is similar to you but I only achieve around 50% accuracy in LFW. If you get any improvement, please let me know as well.

Ho

Bartosz Ludwiczuk

unread,

Feb 25, 2015, 3:24:07 AM2/25/15

to caffe...@googlegroups.com, anto...@gmail.com

Hi Ho,

now I am on the track to get this database. But it need the special agreement and it is not so easy to get right sign in paper. I think this is the only way to get that database.

I have read that paper and it is pretty interesting, it is not as much complicated as DeepFace and get good result. In fact, I have been trying to implement it. But they have one thing, which I have not seen in Caffe: Join Classification and Verification loss.

To do it so, I have created very simple layer, which take two botttom: label_1 and label_2. Check if the value of labels are the same, return 1, otherwise 0. This allow me to use LMDB/IMAGE_DATA( must have different start point or be shuffled in different way) layer, which can be used for classification (SOFTMAX) and verification (CONTRASTIVE_LOSS).

I check the implementation on MNIST(without classification) and it works pretty the same like original example.

Then I tested on other dataset, 30k images, 96x96, to see if that works. I have to change margin in CONTRASTIVE_LOSS, because after reading a paper I understand that X^2 + Y^2 = margin, where X and Y are feature vector. In my case it was 1024, SIGMOID layer.

Unfortunately, it does not work at all. I have very low loss for not the same classes (which is good), and very high loss for same classes(which is bad idea). I was thinking that maybe number of classes occur problem (100 in my case), but in CASIA database there are 10k classes.

So, my question is: do you try Join Classification and Verification loss in Caffe?

Rick Feynman

unread,

Feb 25, 2015, 9:42:37 AM2/25/15

to caffe...@googlegroups.com, anto...@gmail.com

I have the CASIA dataset. I am also facing the same problem regarding joint softmax and contrastive losses. I think this is kind of joint loss will also be very important for other problems as well. The value of the contrastive loss can be very usefull in making the clusters in feature space well separated on which the softmax discriminative loss learns the decision boundaries. This kind of training can be done in multiple steps. For example during step 1 of training use contrastive loss to learn features where vehicles (truck, van, bus, car etc) will cluster together and animals(dogs cats cows horses etc) will cluster together in the hyperdimensional feature space. All other broader classes will behave the same way. Then this net can be fintuned in increasingly to more modular classes in steps (like animals can be split into land, aquatic and birds). Finally the fine grained classification problem can be learned by finetuning the last version of the net using a softmax. This will increase the robustness of many classification problems. Hence, I am willing to collaborate with anyone interested in making this kind of joint loss easier to implement in caffe. Let's discuss.

Bartosz Ludwiczuk

unread,

Feb 26, 2015, 7:37:11 AM2/26/15

to caffe...@googlegroups.com, anto...@gmail.com

Ok, so we can talk about implementation of CASIA architecture in Caffe.

@Rick Feynman. You mentioned about several step of learning. But this is not Join Verification-Identification way. We would like to implement such network that will have soft-max loss and contrastive loss at one time and it will be easy. So far I created such architecture (attached). But it does not work, I mean does not converge (contrastive loss very low, softmax loss high) . Without contrastive in same dataset I get ~75%.

First thing, I see that Caffe Contrastive loss is different that in LeCun paper. In that paper "1" mean wrong, "0" right,

In Caffe "0" mean wrong, "1" right,

Caffe version: L(W,Y,X1,X2) = Y * (Ew^2) + (1-Y) * (Q -Ew^2)

I am right?

If yes, first thing to develop,

Second thing is to make learning process and architecture modeling easier. This should look like

- define one network

- contrastive loss should sample pair images from data

- softmax loss should work as normal

I need better understanding of sampling process for Contrastive loss, so I will ask author.

@Ho

Do you reproduce CASIA model? If yes, using two types of losses or only one?

mnist_siamse.prototxt

Ho

unread,

Feb 27, 2015, 5:44:58 AM2/27/15

to caffe...@googlegroups.com

Hi Bartosz and Rick,
I have go through Hybrid Deep learning for Face Verification, DeepID1, DeepID2, then you will find that they first try siamese network (contrastive loss), CNN softmax, and then combine both cost functions. In term of accuracy, they are quite similar. In Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?, it is also agree my point. Since my hardware resource is very limited, Deepface and DeepID1 first is trying first as a benchmark. If my CNN using softmax can achieve around 90% accuracy in LFW, I will try to perform siamese network then DeepID2. Honestly, implementing DeepID2, it need a lot of work. If someone here can get a good result of using siamese network, please let me know as well. Perhaps, we can share the information. Honestly, the progress in my side is very slow because of the issue of my resource.

Best regards,
Ho

Rick Feynman

unread,

Feb 27, 2015, 8:55:06 AM2/27/15

to caffe...@googlegroups.com

That is good. I have an idle Tesla k40c for training. So I can use it to train any solution we come up with. @Bartosz do you have any reply from the authors on how they sample for the softmax and contrastive loss simultaneously? If we all have the CASIA dataset, let's try and replicate the results on my machine.
Kind Regards,
Rick

Xijing Dai

unread,

Mar 2, 2015, 8:42:37 AM3/2/15

to caffe...@googlegroups.com

hey,

I tried to reproduce deepface model as well. This is my current status:

If you are using caffe, just be careful with LOCALLY connected layers, you will need other PRs to get it work.

I mix a few face databases, 1879 ppls (exclude LFW testing ppl), 225347 pics in total.

I train the deepface CNN with different parameters on dropout rate and weight decay.

most of training result is similar to graph above, the accuracy is at around 80%. (I am currently trying with large dropout rate (0.7,0.9).). I found the bigger dropout rate will improve overfit condition, u could tried that as well.

However, after extracting the features and normalize the deepface ids, I tried linear svm and kernel svm with different parameters for verification, the acc rate is around 50% on validation and testing. I also tried predict on svm training set, the best I get is 75% （This is not good, since we trained on those samples, and even then, we can not predict all of them right).

As mentioned on deepface paper, their deepface id has around 75% zeros. Mine is > 85%. When with large dropout rate and less overfit, I could get 75% zeros, however, the verification accuracy doesn't improve at all.

Do you have any idea why my svm predicting has so bad result?

Because of overfit on deepid training?

Cheers

Xijing Dai

Xijing Dai

unread,

Mar 2, 2015, 8:52:29 AM3/2/15

to caffe...@googlegroups.com

hey Ho,

Have you done with deepface and deepid? sounds like yeah

how big is your databases?

Have you tried with svm model described on deepface?

Cheers

Bartosz Ludwiczuk

unread,

Mar 2, 2015, 9:52:35 AM3/2/15

to caffe...@googlegroups.com

Hi Xijing Dai,

so your DeepFace architecture does not overfitt a lot. About which PR did you write, to start LOCALY layer. I only use PR.

But this is strange, that you get so low value at verification task. Did you try simply Cosine Similarity? Or rain Siamese Network?

@Rick Feynman

About Casia, I get answers:

1. Question: You mentioned about weighted cost function. As I understand, by alpha is multiplied SoftMax loss. The Contrastive cost was weight = 1, right? Is the step size of alpha the same as for learning rate?

Answer: My cost function is (softmax + alpha * contrastive). The learning rate the alpha are tuned in 3 times. The learning rates are:1e-2,1e-3,1e-4 and 1e-5. The alpha(s) are:3.2e-4, 9e-4, 2e-3, 6.4e-3. The final results are not very sensitive to the values of alpha, but you could refer my setting.

2. Question: Could you explain a bit more the process of sampling face pair from test set?  I understand it that way:
- take batch of faces, produce loss from SoftMax
- sample pairs from batch, use  Contrastive

The questions are:
  - do you sample pairs after each batch? or take two next samples from train set?
  - how many positive and negative pair do to sample? Same quantity or random?

  Answer:  I just sample face pairs within each batch. Usually, 10,000 positive and 10,000 negative pairs are drawn from each batch. You could adjust this number for your convenience.
3.  Question: The Contrastive cost has parameter "margin". To what value did you set it?
  Answer: You could read CUHK's "DeepID2+" paper for the details of "margin" in contrastive loss.

It clarify me sth, buy I do not understand how 10k positive and negative pairs are sampled in batch. Is batch mean epoch here? I think we need detailed explanation of sampling strategy.

Maybe you have other questions? I can ask the author again.

Ho

unread,

Mar 2, 2015, 10:12:11 AM3/2/15

to caffe...@googlegroups.com

Hi Xijing Dai,
    There are something I would like to clear it up before the discussion.
1... What is PR you stand for?
2. What do you mean 75% zeros? what do the zeros means?
3. Can you explain in detail on "If you are using caffe, just be careful with LOCALLY connected layers, you will need other PRs to get it work."??
   In CASIA webface paper, they use fully connected layer instead of locally connected layer after avg pool. so I have no idea what do you mean?

My architecture is mainly duplicated on CASIA Webface paper.
In my experiment, First, I separate the CASIA webface dataset to two parts. 90% of images for training and 10% for validation.   The subjects in training and validation sets are overlapped.
For LFW test, I only use nearest neighbor classifier with cosine angle to do the test.

I will try an advance classifier, such as SVM later.   For Xijing Dai, did you do PCA/WPCA before SVM?
For the database, I did have another big database which I got it by myself but I will not add it in this moment.   Currently, I only use CASIA database to try to achieve the result where that paper claims. Because of the problem of combining the contrastive and softmax functions, therefore, I only use softmax..   I try to make everything simple so that it is more easy to trace the problem.

Ho

Xijing Dai

unread,

Mar 2, 2015, 10:13:49 AM3/2/15

to Bartosz Ludwiczuk, caffe...@googlegroups.com

hey Bartosz,

We are using the same PR. :)

I tried weighted chi-squared distance method, which is learned with
linear SVM (Gaussian kernel as well).

I did not try Siamese network yet, I plan to try it next.

I feel so strange too, it must be something wrong, just don't know where it is.

One thing about verification datasets, I randomly choose them
(trainning size is over 10000), is that matter?

(for kernel svm, deepface said it only needs 5400 pair labels. Em...)

Cheers.

> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Caffe Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/caffe-users/e5b946af-4a67-40a5-abb9-f5608f92107f%40googlegroups.com.

Xijing Dai

unread,

Mar 2, 2015, 10:18:28 AM3/2/15

to Bartosz Ludwiczuk, caffe...@googlegroups.com

sry, should replay to group as well.

oh, Another point is, my input image size to net is 100x130, is this matter?

On Mon, Mar 2, 2015 at 10:52 PM, Bartosz Ludwiczuk <melg...@gmail.com> wrote:

Xijing Dai

unread,

Mar 2, 2015, 10:30:22 AM3/2/15

to Ho, caffe...@googlegroups.com

hi Ho,

Sry, I saw you said about deepid, deepid2, I thought you have done
played with them.

for your question，
1. PR refers to: https://github.com/BVLC/caffe/pull/1271 , which
including locally connected layers patch.
2. In the deepface paper, they said: "On average, 75% of the feature
components in the topmost layers are exactly zero. " I suppose they
mean the 4096d representation of the face.
3. Currently, caffe doesn't support locally connected layer, so you
have to use that patch in question 1.

I did not try PCA/WPCA before SVM, since deepface paper did not say
they did. Also, 75% of features are zeros, I guess it's enough.

From what I can see: I may have high bais problem (underfit)? since
for training/validation/test datasets, I had very bad accuracies.

I don't know. Need help!!!! Or I may try Siamese network first.

Cheers.

> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Caffe Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/caffe-users/02b705e9-0a98-4ada-b3a4-73289bd98831%40googlegroups.com.

Bartosz Ludwiczuk

unread,

Mar 3, 2015, 3:01:17 AM3/3/15

to caffe...@googlegroups.com, chan....@googlemail.com

Hi Xiajng,

as you Net does not overfitt, I have couple of question about your approach:

- do you use any Frontalization/Aligmnet method?

- do you normalize feature vector from DeepFace to have value between [0,1]?

I do not know if you see "Suplementary Meterial for DeepFace", so I paste learning curve from there. As they said, their net does not overfit at al.

Xijing Dai

unread,

Mar 3, 2015, 11:09:47 PM3/3/15

to Bartosz Ludwiczuk, caffe...@googlegroups.com, chi ho Chan

Hi, bartosz,

Frontalization/Aligmnet method: used dlib. which looks very good ( 2D alignment).

normalize feature vector: Yes, I did normalize before calculate the chi-square similarity.

I did google "Suplementary Meterial for DeepFace" paper, and I didn't found anything, but now I have found it, strange. I will have a look at it now.

By the way, the datasets I used do have a lot of errors, do you think this will influence final feature vectors? and it caused linear svm doesn't work?

Cheers

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/6b032af8-cc4e-4f16-8643-39df5d409ae8%40googlegroups.com.

Xijing Dai

unread,

Mar 3, 2015, 11:36:15 PM3/3/15

to caffe...@googlegroups.com

Hi, Bartosz,

I compared your prototxt with mine, I see some differences.

1. I did not use mean file, how do you calculate your mean file?

2. I do scale my input values between 0-1, and did you?

3. I suggest you to use gaussian on weight init in conv, and to see if this helps to converge. Because, when I use xavier to init weights, my training did not converge either, don't know why.

Cheers

On Wednesday, February 11, 2015 at 9:40:22 PM UTC+8, Bartosz Ludwiczuk wrote:

Bartosz Ludwiczuk

unread,

Mar 4, 2015, 3:38:19 AM3/4/15

to caffe...@googlegroups.com

Hi Xijang,

So,

By the way, the datasets I used do have a lot of errors, do you think this will influence final feature vectors? and it caused linear svm doesn't work?

I think, that it may influence on accuracy, converge rate. But if you have 80% of accuracy, it is pretty high. So, svm with chi-square should work, in my opinion. I do not know, why your net does not provide good result. Maybe do more test, like fine-tune net in small dataset, without errors. Then try fine-tuned feature vector for Face Verfication. There sth must be wrong.

1. I did not use mean file, how do you calculate your mean file?

I have to convert dataset to LMDB database. Then I use compute_image_mean from caffe/build/tools

2. I do scale my input values between 0-1, and did you?

I did not. I only use mean value. If you will read paper about Image Classification, you will not find any information about scaling data, not most of researcher use Mean_Image or Mean_Pixel.

3. I suggest you to use gaussian on weight init in conv, and to see if this helps to converge. Because, when I use xavier to init weights, my training did not converge either, don't know why.

My network did converge. But it was overfitting in smale DataBase and on WLDB does not work (I think this was because of many false labels)

Ho

unread,

Mar 5, 2015, 6:18:08 PM3/5/15

to caffe...@googlegroups.com

Hi Bartosz,

I have tried to reproduce CASIA model using CASIA-Webface database. However, I did not use joint cost function. I only use the softmax function to do it. First, I separate CASIA-Webface into training and validation set for CASIA model training. Once the learning is converged, then I used CASIA model to extract feature in LFW dataset and then making the similarity score using cosine angle. In View 2, the mean accuracy of verification is around 60%. The dimension of the CASIA feature is 320, which is similar to the paper.

I have also plotted the first rank of recognition rate of the validation set along the iteration as well, and I found that it converges around 170k iterations and got around 66%. I will try to post those result tomorrow or on next monday.

I also follow the CNN papers to subtract image to training image sample mean to input the Model as well. However, after extracting the feature, I did not normalise the feature because cosine angle is a normalised distance.

Here, after reading the comment on Xijang, I would wondering whether subtracting image to mean is a good practice for face recognition. To Bartosz, if you have questions to ask authors of CASIA, please ask

1. did they do pre-processing (such as scale pixel between 0 and 1, and/or subtract pixel mean/sample mean)?

2. Did they separate their dataset into training/validation sets? how? if they don't, how they know that their model is converged?

3. If they have validation set, did they have a plot of recognition of the validation set vs iterations/epochs? or other plots related to validation set or training set?

Regarding to Xijang, it seems that your data size is too small for training deepface as the number of the parameters of deepface model is bigger than your data size. Did you get CAISA dataset? Also, did you test your model with SVM on LFW?

Xijing Dai

unread,

Mar 5, 2015, 10:58:05 PM3/5/15

to Ho, caffe...@googlegroups.com

Hi, Ho,

Firstly, I am xijing, not xijang. :) kindly remindering.

Secondly, my registration on CASIA is pending, I will give it a try when I can.

> Regarding to Xijang, it seems that your data size is too small for training
> deepface as the number of the parameters of deepface model is bigger than
> your data size. Did you get CAISA dataset? Also, did you test your model
> with SVM on LFW?
>

yeah, I did. the accuracy is a little lower than my validation datasets.

And I visualize my chi-square feature vectors by tsne, they are evenly
distributed the 2D/3D spaces.

Do you guys know any good tool to visualize the learned features and
the filters on caffe model? I want to see what I learned in the model.

Cheers
Xijing

> On Wednesday, 4 March 2015 08:38:19 UTC, Bartosz Ludwiczuk wrote:
>>
>> Hi Xijang,
>>
>> So,
>>>
>>> By the way, the datasets I used do have a lot of errors, do you think
>>> this will influence final feature vectors? and it caused linear svm doesn't
>>> work?
>>
>> I think, that it may influence on accuracy, converge rate. But if you have
>> 80% of accuracy, it is pretty high. So, svm with chi-square should work, in
>> my opinion. I do not know, why your net does not provide good result. Maybe
>> do more test, like fine-tune net in small dataset, without errors. Then try
>> fine-tuned feature vector for Face Verfication. There sth must be wrong.
>>
>>> 1. I did not use mean file, how do you calculate your mean file?
>>
>> I have to convert dataset to LMDB database. Then I use compute_image_mean
>> from caffe/build/tools
>>
>> 2. I do scale my input values between 0-1, and did you?
>> I did not. I only use mean value. If you will read paper about Image
>> Classification, you will not find any information about scaling data, not
>> most of researcher use Mean_Image or Mean_Pixel.
>>
>> 3. I suggest you to use gaussian on weight init in conv, and to see if
>> this helps to converge. Because, when I use xavier to init weights, my
>> training did not converge either, don't know why.
>> My network did converge. But it was overfitting in smale DataBase and on
>> WLDB does not work (I think this was because of many false labels)
>>

> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Caffe Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/caffe-users/9b81a833-8a35-42c4-9498-76364abfbb4b%40googlegroups.com.

Colin

unread,

Mar 7, 2015, 8:33:07 PM3/7/15

to caffe...@googlegroups.com

Hi, everyone:

I also tried to reproduce the result in the CASIA-webface paper. I randomly shuffled the whole data set and then split it into training(80%) and validation(20%) sets. Finally, I got the validation accuracy 77% using caffe. However, I tried to implement my designed network using the same training and validation set, I always get the validation accuracy higher than training accuracy....That's very strange...Who can provide my some advices about this!

Ming Zhou

unread,

Mar 12, 2015, 6:00:27 AM3/12/15

to caffe...@googlegroups.com

I have a private face dataset, 10000 identities and 50K iamges. I use dlib to do face alignment. I tried the same arthitecture as deepID except local connected convolution, and I got 70+ accuracy. I use naive bayesian or euclidean distance or cosine distance or linear svm or guassian kernel svm, I can not get satisfied result on LFW. I do not think make CNN deeper or wider will make things better. Higher accuracy such as 90+ will change the result? anyone has idea?

Yue WU

unread,

Mar 13, 2015, 4:17:04 AM3/13/15

to caffe...@googlegroups.com

The dataset may be too small, with approximately 5 images per person on average.

For verification, Joint Bayesian is recommended.

在 2015年3月12日星期四 UTC+8下午6:00:27，Ming Zhou写道：

Xijing Dai

unread,

Mar 13, 2015, 5:58:08 AM3/13/15

to Ming Zhou, caffe...@googlegroups.com

Hi, Ming

What is your classification accuracy?

> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Caffe Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> caffe-users...@googlegroups.com.
> To post to this group, send email to caffe...@googlegroups.com.
> To view this discussion on the web visit

> https://groups.google.com/d/msgid/caffe-users/dd0dc5f7-9584-4f14-bc2e-44de5343a2e8%40googlegroups.com.

Ming Zhou

unread,

Mar 14, 2015, 5:18:19 AM3/14/15

to caffe...@googlegroups.com

Sorry, 500k images. Joint bayesian can boost the performance, but it should not be key factor. You have done something like this before? Could give more experience?

Ming Zhou

unread,

Mar 14, 2015, 5:23:46 AM3/14/15

to caffe...@googlegroups.com, zhous...@gmail.com

Hi, Xijing Dai,

The dataset has 500k images. In fact, I tried on two dataset. 500k images dataset got 70+ accuracy, I didn't tune further. Another dataset has 10k images, I got 96+ accuracy. But none give good result on lfw. How about your experience now? any idea?

Xijing Dai

unread,

Mar 14, 2015, 12:19:27 PM3/14/15

to caffe...@googlegroups.com, zhous...@gmail.com

Hi, Ming

I have 200k images dataset, and I tried on DeepID with single patch version, it gives < 70 accuracy on verification.

However, do you combine result of 60 patches for verification?

Because the paper said they combine 60 patches and do the verification, it gives a good result.

Cheers

Ming Zhou

unread,

Mar 16, 2015, 7:16:31 AM3/16/15

to caffe...@googlegroups.com, zhous...@gmail.com

I tried joint identity and verofication loss, it can not convergence....

Rick Feynman

unread,

Mar 16, 2015, 8:47:00 AM3/16/15

to caffe...@googlegroups.com

Has anyone tried face frontalization?
I think it ought to alleviate some problems.

jwm_neu

unread,

Apr 6, 2015, 11:14:16 PM4/6/15

to caffe...@googlegroups.com

Hi Bartosz!

I think we have to do some PR for this random sample and add a new layer to do such a work.

My question is : Have you began to do so? And how is it going? Can we do it together?

Thanks, JWM

Bartosz Ludwiczuk

unread,

Apr 9, 2015, 5:20:20 AM4/9/15

to caffe...@googlegroups.com

Hi Charly,

I do not begin working at random sampling stuff. I asked the CASIA authors about more details, including the flow of random sample stuff.

I do not get clearly how is should work (20k per batch is to much, maybe author was thinning about epoch), but I can image, that we should produce the new data layer.

The purpose of data layer will be:

- producing 50% positive example and 50% negative example in verification

- two type of labels: for classification task and verification for that random samples

This will allow us to produce join verification and classification task.

Everything have to be connected to Siamese architecture, which will work minimize error on verification task.

What do you think about sth like that?

StevenL

unread,

May 9, 2015, 4:15:20 AM5/9/15

to caffe...@googlegroups.com

Bartosz Ludwiczuk :

Have you got some progress in face recognition now? We are working this with DeepID2 team few weeks now (but we only get compiled lib from DeepID laboratory ) , we are using coffee to identify their result, now we now know joint Bayesian can achieve better performance than Siamese, how do u work in siamese network for face classification, do you have any performance bottle neck?

BR

StevenL

unread,

May 11, 2015, 3:27:26 AM5/11/15

to caffe...@googlegroups.com

HI :Rick Feynman

we dont use faces frontlization, but actually we use intra-face to pre-process training lib, it seems that performance can enhance significantly

BR

StevenL

unread,

May 11, 2015, 4:15:47 AM5/11/15

to caffe...@googlegroups.com

we think that frontlization is neccessary for face recognition but we don't use faces frontlization so far, but actually we use intra-face to pre-process training lib , it seems that performance can enhance significantly in HD-LBPH

On Monday, March 16, 2015 at 8:47:00 PM UTC+8, Rick Feynman wrote:

StevenL

unread,

May 12, 2015, 3:17:18 AM5/12/15

to caffe...@googlegroups.com

HI：Colin

Are your training and validation sets overlapped? this accuracy is quite high if they aren't . we use LFW to test trained caffee ,we got only 55-60% accruracy

BR

Bartosz Ludwiczuk

unread,

May 14, 2015, 4:36:10 AM5/14/15

to caffe...@googlegroups.com

Hi StevenL,

I have made some progress in face recognition process. So far my best score at LFW is 90%. I was using architecture from CASIA paper and I was training on FaceSrub (it has 70k images). Now I just started learning using CASIA-WebFace, we will see what the result would be.

More Technical stuff:

- I use Siamese architecture, because in all paper about Face Verification there is the "Verification Loss". So, you need to get two faces from database and get label same/not same and produce loss. As I do not know how to do it exactly in Caffe, So I use Siamese Architecture, where in in batch I have Identification and Verification loss (I compare images from 2 separate nets). Using Verification loss get + 10-15% on LFW.

- I tested 2 architectures, DeepID2 (79% on LFW) and CASIA (90%)

- I could not find any implementation of joint Bayesian so I do not use it. Do you have any implementation? I use Chi^2 distance + SVM

- I use only 2D Alignment

- the FaceScrub data have some ID which overlap LFW, so I am not sure about this 90%.

Regards,

Bartosz

StevenL

unread,

May 15, 2015, 6:50:52 AM5/15/15

to caffe...@googlegroups.com

HI ,Bartosz, thanks for your mail, please see my underlined comment

I have made some progress in face recognition process. So far my best score at LFW is 90%. I was using architecture from CASIA paper and I was training on FaceSrub (it has 70k images). Now I just started learning using CASIA-WebFace, we will see what the result would be.

90% classification rate is quite remarkable performance, actually by using CUHK Deep learning 2 library we merely achieve 80% on LFW, this totally match what your test ed DEEPID2. I think FaceSrub is more suitable than CASIA-Webface in caffe training due to its high quality, we plan to use FaceSrub to train our HD-LBPH.

More Technical stuff:

- I use Siamese architecture, because in all paper about Face Verification there is the "Verification Loss". So, you need to get two faces from database and get label same/not same and produce loss. As I do not know how to do it exactly in Caffe, So I use Siamese Architecture, where in in batch I have Identification and Verification loss (I compare images from 2 separate nets). Using Verification loss get + 10-15% on LFW.

Are u using Siamese network which you posted on Feb 26th ,We failed to train your Siamese architecture by using CASIA-webface(in Quadro K1100M GPU, we shall move to K40 soon), both contrastive and softmax loss are high.

BTW, you mentioned in your posts that CASIA webface suggested you to adjust learning rate according to softmax loss movement, how can you dynamically change it in protottxt or solver?

- I tested 2 architectures, DeepID2 (79% on LFW) and CASIA (90%)

Do you use FaceSrub to train your DeepID2 network , so overfit problem disappear?

- I could not find any implementation of joint Bayesian so I do not use it. Do you have any implementation? I use Chi^2 distance + SVM

One of our researchers talked with researchers in CUHK DeepID2, they told us that Joint Bayesian is the best among all methods in face verification s, but you should reduce dimensions from 4000 to 160/320 firstly. I think that is Joint Bayesian is one of critical factors that your Deep ID2 cannot achieve good performance . We also cannot find any source code of Joint Bayesian , we are developing C/C++ based on OpenCV version Joint Bayesian now, we can share you our code as soon as we finish.

- I use only 2D Alignment

BTW face pre-process shall strongly affect recognition accuracy, you might notice we have to align face and frontlize faces before we use it, and DeepID2 also need to slice a face picture into 25 overlapped patches. I remember you use dlib to do image preprocess , do you slice your face image into patches? this might seriously affect your recoginition accuracy.

We use CUHK private lib from Deep ID2(face detect)+ opencv /Intraface (face alignment) for face image preprocess, and we are developing face patches by using intraface.

Bartosz Ludwiczuk

unread,

May 16, 2015, 5:51:55 AM5/16/15

to caffe...@googlegroups.com

Hi Steven,

90% classification rate is quite remarkable performance, actually by using CUHK Deep learning 2 library we merely achieve 80% on LFW, this totally match what your test ed DEEPID2. I think FaceSrub is more suitable than CASIA-Webface in caffe training due to its high quality, we plan to use FaceSrub to train our HD-LBPH.

More Technical stuff:

- I use Siamese architecture, because in all paper about Face Verification there is the "Verification Loss". So, you need to get two faces from database and get label same/not same and produce loss. As I do not know how to do it exactly in Caffe, So I use Siamese Architecture, where in in batch I have Identification and Verification loss (I compare images from 2 separate nets). Using Verification loss get + 10-15% on LFW.

Are u using Siamese network which you posted on Feb 26th ,We failed to train your Siamese architecture by using CASIA-webface(in Quadro K1100M GPU, we shall move to K40 soon), both contrastive and softmax loss are high.

Yes, I use similar architecture, but Contrastive loss has much lower loss, like 3.2e-3. Now I am trying the CASIA-WebFace and it is much harder to use. I am know doing test with 75 epochs of learning (using FaceScrub I need only 15 epochs)

BTW, you mentioned in your posts that CASIA webface suggested you to adjust learning rate according to softmax loss movement, how can you dynamically change it in protottxt or solver?

I do not change it dynamically. Only define solver with steps-size in solver. Additional I change loss_weight of Constrastive loss (from 3.2e-3. tp 6e-2) using several *.prototxt file.

- I tested 2 architectures, DeepID2 (79% on LFW) and CASIA (90%)

Do you use FaceSrub to train your DeepID2 network , so overfit problem disappear?

I use FaceScrub, and it was sth 85% on train and 79 on validation.

- I could not find any implementation of joint Bayesian so I do not use it. Do you have any implementation? I use Chi^2 distance + SVM

One of our researchers talked with researchers in CUHK DeepID2, they told us that Joint Bayesian is the best among all methods in face verification s, but you should reduce dimensions from 4000 to 160/320 firstly. I think that is Joint Bayesian is one of critical factors that your Deep ID2 cannot achieve good performance . We also cannot find any source code of Joint Bayesian , we are developing C/C++ based on OpenCV version Joint Bayesian now, we can share you our code as soon as we finish.

- I use only 2D Alignment

BTW face pre-process shall strongly affect recognition accuracy, you might notice we have to align face and frontlize faces before we use it, and DeepID2 also need to slice a face picture into 25 overlapped patches. I remember you use dlib to do image preprocess , do you slice your face image into patches? this might seriously affect your recoginition accuracy.

I was using only 1 patch, which enclose all face. In paper DeepID2 claim, that single net achieve > 90%, so I did not try multiple patches.

Bartosz

StevenL

unread,

May 16, 2015, 8:19:15 PM5/16/15

to caffe...@googlegroups.com

Thanks Bartosz for your comments

Just let you know:

we have one guy working on classifier(Joint Bayesian) and I am working on face image slices patch, I also modify your siamese network architecture for more test as your comment after we get nvidia K80 on next week, so we can know soon how slicing path works.

One question:

Do you use OpenCV face detect tool to extract faces from CASIA-webface database? we use a lib from DeepID2 (no source code), and we can get a whole patch of face under forehead， but actually detect rate is quite low, we only successfully detect only 50% faces.OpenCV facedetect rate is lower than it (around 30%)

BR

Bartosz Ludwiczuk

unread,

May 18, 2015, 7:20:46 AM5/18/15

to caffe...@googlegroups.com

I have the problem with face detection too. Now I am thinking about running OpenCV detector with many proposals + adding post-classification.

I was briefly analyzing some failure face detection and I noticed, that there is many non-frontal face. This may cause problem with detection.

Ben Jackson

unread,

May 20, 2015, 3:34:37 PM5/20/15

to caffe...@googlegroups.com

Hi Bartosz,

Did you try this implementation of Joint Bayesian?

https://github.com/MaoXu/Joint_Bayesian/blob/master/JointBayesian.m

Ben

StevenL

unread,

May 20, 2015, 11:00:34 PM5/20/15

to caffe...@googlegroups.com

We plan to implement this into C++ version, does this works for you?

Ben Jackson

unread,

May 21, 2015, 12:19:03 AM5/21/15

to caffe...@googlegroups.com

That is exactly I was planning to work on! Let me know when you are done. I will try to come up with the best pipeline.

BR,

Ben

StevenL

unread,

May 23, 2015, 7:23:05 AM5/23/15

to caffe...@googlegroups.com

NP，I can share it for you as I finished ,but situation is we are still waiting for our hardware (Dell server and several K40Cs), so we are still have no chance to test our siamese network prototxt until we can get hardware(these shall be available on next week), so I shall start debug C code for Joint Bayesian when we finish siamese network training on caffee.

Have you done CASIA-webface siamese network on caffee (from CASIA-webface), which training database you are using now? how is your test accuracy on training ? we are exactly using siamese network prototxt proposed by Bartosz on 26th Feb, do u use same architecture as we do?

Charly

unread,

May 25, 2015, 2:34:14 AM5/25/15

to caffe...@googlegroups.com

Hi Bartosz,

I had made some change on datalayer.cpp ,now it can general 4 tops. One for class_label_first , one for class_label_second, and the other for label(0-1).

Then, my proto looks like this, but unfortunately, it doesn't work. Can you give me some advice?

ver_id_train_test.prototxt

Bartosz Ludwiczuk

unread,

May 25, 2015, 8:55:42 AM5/25/15

to Charly, caffe...@googlegroups.com

Hi,

could you clarify what is not working? Does not converge or sth else?

Thoughts:

1. How do you prepare data? Similar like Siamese example in Caffe? And your images are gray-scale, right?

2. There can be problem with "ContrastiveLoss". You should prepare same amount of positive and negative examples. If you do not do this,this loss may not work properly.

Bartosz

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/2bbbe6b8-5cb1-48b3-a8e8-559f86473470%40googlegroups.com.

Charly

unread,

May 25, 2015, 10:25:26 PM5/25/15

to caffe...@googlegroups.com, jwm...@gmail.com

Hi Bartosz!

It just doesn't converge. I make a random sample in training dataset,just make a approximately guarantee that there are same positive and negative examples.

images are gray-scale, so a pair has two channels. But even so ,it just doesn't converge. I don't know why....

Message has been deleted

StevenL

unread,

May 25, 2015, 10:47:07 PM5/25/15

to caffe...@googlegroups.com

Did you try CASIA-deepfunneled version?this has done face detection and align for training, but you should use opencv to frontilization face before training

BR

StevenL

unread,

May 25, 2015, 11:07:47 PM5/25/15

to caffe...@googlegroups.com

Which face database you are using? not converge problem is related with database you use. BTW I read you posted architecure, it is identical with what CASIA announced.

BR

Charly

unread,

May 26, 2015, 2:23:08 AM5/26/15

to caffe...@googlegroups.com

The database I use is CASIA_WEB. And the architecure is the same as professor Li. And I usa my own facedetect and face align algorithm.

Bartosz Ludwiczuk

unread,

May 26, 2015, 3:30:49 AM5/26/15

to caffe...@googlegroups.com

Two more things about your proto:

1. Why you have 15340 output? CASIA-WebFace have 10,575

2. In paper they say that they use ReLU activation function, except for Conv_52. I do not know which one they use or maybe they do not use any. This is not related to your main problem but in reproducing the result.

Converge problem:

1. First, make sure that you have equal number of positive and negative examples. Print some logs in "ContrastiveLoss".

2. For the first 50k iteration I assign to "ContrastiveLoss" loss_weight = 0. Otherwise it does not converge too (the loss from "ContrastiveLoss is too big).

3. What is your loss_weight for SoftMax and Contrastive? I assigne to each SoftMax_Loss 0.5 and to ContrastiveLoss value from paper (from 0.00032 to 0.006)

Charly

unread,

May 26, 2015, 5:17:13 AM5/26/15

to caffe...@googlegroups.com

The main dataset is CASIA-WebFace,and we add some our own data in it, There are 15340 person totally.So the output is 15340

actually，he didn't say which relu they use.But after some other example we find that add relu is a good choice. So in this proto, I add relu activation after eatch conv layers.

Bartosz Ludwiczuk

unread,

May 26, 2015, 7:48:15 AM5/26/15

to Charly, caffe...@googlegroups.com

So, do you have other version of CASIA-WebFace? My version have 10575 folders. Moreover, here they point, that it contain 10575 peoples.

--

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/94c63157-2602-4c18-ba42-663092947ccb%40googlegroups.com.

StevenL

unread,

May 26, 2015, 7:51:20 AM5/26/15

to caffe...@googlegroups.com

I took a look in your prototxt, I am not sure whether it is right :

you didnt use pair data to redirect a sim to CONTRASTIVE_LOSS layer to compute different or identical(like MNIST sample).

Actually I can share our method, so we can discuss, we are still waiting hardware (K40 and powerful server), we are only preparing prototxt and training sample in this week:

1. training sample:

a. CASIA-webface(deep-funneled version) 157340 images,

b FaceScrub 60K pictures 500 class

All above picture pre-processed by face alignment (OpenCV ) and frontlize tool from (http://www.openu.ac.il/home/hassner/projects/frontalize/)

2. Arch of network:

prototxt we modifying from Bartosz proposed on Feb 26 2015 , we added Conv5 in his prototxt, and changed layer parameter according to CASIA-web paper

BR

Charly

unread,

May 26, 2015, 10:35:54 PM5/26/15

to caffe...@googlegroups.com, jwm...@gmail.com

Not really anonter version of casia-web-face, we just add some of our own pics into casia.

Charly

unread,

May 26, 2015, 10:40:54 PM5/26/15

to caffe...@googlegroups.com

Actually,the last layer is ContrastiveLoss, we made loss-weight equal zeros just for test, cause it doesn't converege.

Charly

unread,

May 26, 2015, 10:45:34 PM5/26/15

to caffe...@googlegroups.com

We find that,K40 or Tesla is not as quick as we think for caffe. So for me ,I just use GT980M.I think it's enough for me,for this arch

On Tuesday, May 26, 2015 at 7:51:20 PM UTC+8, StevenL wrote:

StevenL

unread,

May 27, 2015, 5:58:51 AM5/27/15

to caffe...@googlegroups.com

HI Bartosz:

I am writing a prototxt of network and solver according to CASIA-webface, I saw your mail loop with CASIA member about some questions in Cost function:

question 1.

CASIA member propose to use function like: softmax + alpha * contrastive

does it mean we only need set loss_weight in contrastive layer， and leave empty in softmax layer?

question 2.

You said change loss_weight of Constrastive loss (from 3.2e-3. tp 6e-2) using several *.prototxt file, does it mean you pipeline several solver optimization prototxts to adjust loss_weight parameters gradually?

BR

Bartosz Ludwiczuk

unread,

May 27, 2015, 8:44:30 AM5/27/15

to caffe...@googlegroups.com

Hi Steven,

1. As I use Siamese architecture, each softmax have 0.5 weights (two of them sum to 1).

2. Yes, I have several *.prototxt and the same number of solvers.

Running procedure look like that:

#!/usr/bin/env sh

TOOLS=/home/blcv/LIB/caffe_master/build/tools/
 
$TOOLS/caffe train \
     --solver solver.prototxt --gpu 1 2> log.txt

#turn on Constrantive loss
$TOOLS/caffe train \
   --solver solver30k.prototxt --gpu 1 \
    --snapshot=nets/_iter_75001.solverstate 2> log_75k.txt

# Constrantive loss * 5
$TOOLS/caffe train \
   --solver solver60k.prototxt --gpu 1 \
    --snapshot=nets/_iter_150000.solverstate 2> log_150k.txt

# Constrantive loss * 4
$TOOLS/caffe train \
   --solver solver120k.prototxt --gpu 1 \
    --snapshot=nets/_iter_225001.solverstate 2> log_225k.txt

StevenL

unread,

May 28, 2015, 11:05:02 PM5/28/15

to caffe...@googlegroups.com

THanks! Bartosz, I shall try it soon

StevenL

unread,

May 29, 2015, 9:54:57 AM5/29/15

to caffe...@googlegroups.com

HI ,Bartosz:

You said you r using 2D alignment to align faces in database, are you using dlib? we found a good toolkit for face alignment and frontlizatopm,

http://www.openu.ac.il/home/hassner/projects/frontalize/

we are trying to use it to pre-process Facescrub.

BR

On Thursday, May 14, 2015 at 4:36:10 PM UTC+8, Bartosz Ludwiczuk wrote:

Hi StevenL,
I have made some progress in face recognition process. So far my best score at LFW is 90%. I was using architecture from CASIA paper and I was training on FaceSrub (it has 70k images). Now I just started learning using CASIA-WebFace, we will see what the result would be.

More Technical stuff:
- I use Siamese architecture, because in all paper about Face Verification there is the "Verification Loss". So, you need to get two faces from database and get label same/not same and produce loss. As I do not know how to do it exactly in Caffe, So I use Siamese Architecture, where in in batch I have Identification and Verification loss (I compare images from 2 separate nets). Using Verification loss get + 10-15% on LFW.

- I tested 2 architectures, DeepID2 (79% on LFW) and CASIA (90%)

- I could not find any implementation of joint Bayesian so I do not use it. Do you have any implementation? I use Chi^2 distance + SVM

- I use only 2D Alignment

Bartosz Ludwiczuk

unread,

May 31, 2015, 2:08:03 PM5/31/15

to StevenL, caffe...@googlegroups.com

Yes, I use dlib face alignment. This project is really interesting, you can try it and see if there is any difference in score.

Bartosz

--

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/62191da5-1499-4642-94c3-5f290be9d89a%40googlegroups.com.

ZhongShan HU

unread,

May 31, 2015, 7:40:20 PM5/31/15

to caffe...@googlegroups.com, steven....@gmail.com

In face recognition by deep learning method , it seems no paper mentioned training accuracy , why?

I know , most face recognition using deep learning as feature extraction .

I think the reason is the training cannot converge , Is it correct?

I trained a net on CASIA dataset , and after 260000 iteration( batchsize =128) , I only get acc less than 1% from the net output. but I found the loss value decreased.

Does anyone get the similar results , or there is something wrong in my training.

StevenL

unread,

Jun 7, 2015, 7:03:50 AM6/7/15

to caffe...@googlegroups.com, steven....@gmail.com

HI Bartosz:

I am still struggling to run frontalization tool in matlab, it is a bit hard to use it, so I will use deepfunneled CASIA-webface to train our caffe network firstly

I have couple of questions about how to prepare training set and val set

1. How about batch size you choose in FaceScrub? I think this is related with gradient converges, so I want to choose is as small as possible. (>=1)

2. how you generate positive and negative pairs, do you generate a lmdb to hold them(like example mnist_siamese_train_test.prototxt)? or use two separated train.txt holding pictures name to feed siamese network?

3. How many percentage of train/test data you choose? I notice some of our forum use 80% in train, and 20% in test(val)

丁磊

unread,

Jun 9, 2015, 2:01:33 AM6/9/15

to caffe...@googlegroups.com

hi, Bartosz ~

I'm new in Caffe and machine learning and I have trouble in the implementation of new layer type.

can you share your code on Locally connected layer with me?

thank you

在 2015年2月11日星期三 UTC+8下午9:40:22，Bartosz Ludwiczuk写道：

I want to reproduce DeepFace net architecture. As their database is not public, I use biggest public face dataset WLFDB. It has 0.7 millions of images for 6025 subjects.
My net architecture is pretty the same like in paper (I check features in dimension in every step to be sure), I now learn only classification net. But, as I do not have 1k images per subject, I remove last two LOCAL layer ( taken from here).
As frontalization code for DeepFace is not released, I try to use raw faces and alignment faces delivered by WLFDB.

My problem is that net overfit (sth like train 85%, test: 15%). In paper and presentation they claim, that net does not overfiitt at all (they provide plot of logloss from train and test). Facebook use dropout only in last layer.
I do not find any information about data augmentation (like mirror, color, random crop). I understand, that I these technique may be not appropriate for face classification. But I do not why my net overfitt so bad.

Question: Is anybody try do reproduce DeepFace? Or maybe was trying do reproduce Deep cypof ID ( I use not exactly the same architecture, but it overfit too).
I think, that is not a problem with net architecture, but the problem is data. Maybe I miss some data pre-processing before start learning.
How can I reduce overfitting?

Riwei Chen

unread,

Jun 10, 2015, 2:38:22 AM6/10/15

to caffe...@googlegroups.com

Hi,

Can you share the 70K images of FaceSrub, I also grab the database ,but i only can download 60K all so, the reason is in our country many of those website can not be reach. Thank you very much

====

Chen
在 2015年5月14日星期四 UTC+8下午4:36:10，Bartosz Ludwiczuk写道：

Bartosz Ludwiczuk

unread,

Jun 10, 2015, 4:31:24 AM6/10/15

to caffe...@googlegroups.com

Hi, 丁磊

Locally Connected Layer I take from here.

Hi Riwei Chen,

I have exactly 79k images from FaceScrub. But it take 13GB, It it pretty high, it take too much space to upload it enywhere.

Bartosz

莊仲翔

unread,

Jun 10, 2015, 5:04:43 AM6/10/15

to caffe...@googlegroups.com

Hi Bartosz,

I also trained the model by FaceSrub data. (5k train + 2k test)
The model with only softmax loss could have 89% accuracy on test data, but only get 79% accuracy on LFW view2 data. (10 fold, each fold use euclidean distance + SVM)
The model softmax + Verification Loss could have about 84% accuracy (margin : 32, alpha : 0.005), but only get 70% accuracy on LFW.

I do lots of exp, but Verification Loss do not help to improve. And I also found the softmax loss is easy to converge, but Verification Loss is not. How should I do?

How do you test on the LWF? Just only extract feature, calculate euclidean distance for training pair and put in the SVM, and then put the test pair in the SVM model?

BTW, I do not do any alignment, only crop the face. I hope deep learning could learn about that.

Regards,
Sean

Bartosz Ludwiczuk於 2015年5月14日星期四 UTC+8下午4時36分10秒寫道：

StevenL

unread,

Jun 10, 2015, 5:29:08 AM6/10/15

to caffe...@googlegroups.com

79% on LFW is a good result without alignment, you might try to use align tool(dlib) to align your database, you can significantly improve your accuracy(we prove this in HD-LBPH), you can also use deep funneled database for CASIA-webface, this contains all training samples

my plan is to use CASIA-webface to train a feature extraction which can get 320 features for a face, and use joint-Bayesian to test distance(which is recommended by deepID2), however, I am still preparing training set now.

BR

莊仲翔

unread,

Jun 10, 2015, 5:47:19 AM6/10/15

to caffe...@googlegroups.com

Hi StevenL,

Thanks for your suggestions. Using the dlib is my next step.

I just curious why verification Loss could not help in my case. After all, the 79% accuracy is the model without verification loss layer.

Yes, CASIA-webface is looks like a good databases; I would try to apply the datasets.

Regards,
Sean

StevenL於 2015年6月10日星期三 UTC+8下午5時29分08秒寫道：

StevenL

unread,

Jun 10, 2015, 6:02:06 AM6/10/15

to caffe...@googlegroups.com

HI I can have a test soon , I shall use CASIA-webface to train my network, so I can know if my constrastive loss is slower than softmax

Bartosz Ludwiczuk

unread,

Jun 10, 2015, 9:54:28 AM6/10/15

to caffe...@googlegroups.com

Sean,

do you use only Verification loss? Or Join Identification and Verification?

If the first option, it is normal than Verification loss converge so slow. That is why it is recommended to use Identification loss.

Other thing is that 5k images for training is pretty low, try to use CASIA-WebFace.

And, as Steven wrote, alignment can boost your result significantly.

Regards,

Bartosz

莊仲翔

unread,

Jun 10, 2015, 10:26:29 AM6/10/15

to caffe...@googlegroups.com

Hi Bartosz,
   I join the Identification(Classification) and Verification loss, but the performance on LFW would not better than just using Identification loss.
   I am impressive that you get + 10-15% on LFW by joining the Identification and Verification loss, and achieve 90% accuracy. So maybe I do something wrong when training model.
   The model get 90% on LWF is trained on FaceSrub datasets, right?
   I'm doing alignment and applying CASIA-WebFace datasets now. :)
Regards,
Sean

Bartosz Ludwiczuk於 2015年6月10日星期三 UTC+8下午9時54分28秒寫道：

Bartosz Ludwiczuk

unread,

Jun 10, 2015, 11:14:03 AM6/10/15

to caffe...@googlegroups.com

Hi Sean,

yes, I get +10% using both losses, but I was using the architecture from CASIA paper.

And yes, 90% in learned in 79k images from FaceScrub.

One extra thought: when using CASIA-WebFace, the join Identification and Verification is not so required, I get up to 2% using combine losses compared to only Identification.

Regards,

Bartosz

莊仲翔

unread,

Jun 10, 2015, 11:22:43 AM6/10/15

to caffe...@googlegroups.com

Hi Bartosz,
     Copy that. Thanks for your sharing.
     Oh, I made a mistake, it's 50k not 5k... (50k training, and 20k testing)
     I would keep trying. Thanks a lot.

Regards,
Sean

StevenL

unread,

Jun 11, 2015, 9:08:30 AM6/11/15

to caffe...@googlegroups.com

HI ,sean, how you can set weight alpha in caffe to balance contrastive and softmax cost

莊仲翔

unread,

Jun 11, 2015, 10:33:05 AM6/11/15

to caffe...@googlegroups.com

Hi StevenL,

It is same as Bartosz did. I change the contrastive loss weight to near about 0.005, and set both softmax layer loss weight to 0.5.

More specific, first I set contrastive loss weight to 0 and training the model until the test accuracy nearly 80%, then I set the contrastive loss weight to 0.005.

It's a lazy way, but the softmax loss still decrease. The contrastive loss would decrease dramatically at beginning, and then oscillate.

BR,
Sean

StevenL於 2015年6月11日星期四 UTC+8下午9時08分30秒寫道：

StevenL

unread,

Jun 12, 2015, 11:21:17 PM6/12/15

to caffe...@googlegroups.com

HI Sean,

My gamail box cannot receive and send message for now, I saw you have some mail in loop but I cannot open , please leave here

BTW how you describe "sim " for different classes in siamese like what Siamese example does?

Caffe team uses leveldb to store negative and positive pair data, .but I use train_1/train_2.txt to contain file name list.it seems not easy to feed "sim" layer

莊仲翔

unread,

Jun 13, 2015, 1:32:16 AM6/13/15

to caffe...@googlegroups.com

Hi Steven,

Thanks. I would check the face alignment code again.

And for your question, I use hdf5 for my data type.

Leveldb or lmdb only could only use single label, but hdf5 could take multi-labels.

So I produce 10000 positive pairs and 10000 negative paires.

Each pair have their image data and a vector labels with 3 elements. (data1 label, data 2 label, sim)

and I use 'slice' layer to cut the data and label to feed siamese model.

BR,

Sean

StevenL於 2015年6月13日星期六 UTC+8上午11時21分17秒寫道：

Steven L

unread,

Jun 13, 2015, 2:09:34 AM6/13/15

to caffe...@googlegroups.com

HI ,Bartosz,

I am still little confusing on how to generate a training set for a siamese network, I created followed four files

training_1.txt, this trains conv_1,

training_2.txt, this trains conv1_p

val_1.txt, this test conv_1,

val_2.txt this test conv_1_p

All training txt contains path of image and labels , such as "/PAHT/1000.jpg" 12334;

However I compare with what siamese example implements, I think I need to create extra file to indicate if pair is genuine or imposter pair, such as "1" is genuine, "0" is imposter, this file should transfer to sim and use it in constrastive loss during training/test

How do u think this? or what approach you use to solve it?

BR

Xmiler

unread,

Jun 16, 2015, 1:44:10 PM6/16/15

to caffe...@googlegroups.com

Hi all.

Has anyone tried the triplet loss inspired by http://arxiv.org/pdf/1503.03832.pdf? I didn't find any mention about this state-of-the-art approach in current discussion.

Ho

unread,

Jun 18, 2015, 4:43:16 AM6/18/15

to caffe...@googlegroups.com

Hi all,
    I here update my progress. First, there is a new updated CASIAWebface dataset in CASIA site in a few week ago. Images, including LFW in this new dataset have been geometric normalised. These images was directly used in their CNN network and therefore you don't have to do any pre-processing anymore except that you want to do pose correction (frontalisation).
    Second, I try to reproduce their network but only use soft-max version and testing on LFW _View 2 using just cosine angle as a distance measurment (i.e. no SVM or Bayesian network.... simliar to Algorithm A in their paper).   The mean result of test set is 79.96% verification rate. The training recognition error is around 13.8% and the validation recognition error is around 23.03%. Those error is rank 1 error. In contrast to 96.13% they achived. Mine has 16% gap have to improve.

Ho

Riwei Chen

unread,

Jun 18, 2015, 9:50:50 AM6/18/15

to caffe...@googlegroups.com

Hi, Ho

I also want to reproduce the network, however, on my machine , it can not convergence , I mean the loss function value always state in a high value( in my network is about 8.5), so can you share the network you training.

在 2015年6月18日星期四 UTC+8下午4:43:16，Ho写道：

Ho

unread,

Jun 18, 2015, 10:11:40 AM6/18/15

to caffe...@googlegroups.com

Hi Riwei,
I have quite a few experience to share. It will be great that you let me see your solver and train_val files. I guess you need to set the inital bias...

Riwei Chen

unread,

Jun 18, 2015, 10:28:13 AM6/18/15

to caffe...@googlegroups.com

Hi Ho,

many thanks for your help, the attachment is the files I used to train, in our implement, I only using 5000 identify to train and each identify have 22 image, I alignment all the face and resize all the face images to be 128*128*3.

can you point out what the most likely problems it could.

Thank you~

====

Chen

在 2015年6月18日星期四 UTC+8下午10:11:40，Ho写道：

solver.prototxt

train.sh

train_val.prototxt

Steven L

unread,

Jun 18, 2015, 11:26:30 AM6/18/15

to Riwei Chen, caffe...@googlegroups.com

I found a problem that Dropout is only connected to full connected layer, not max pool..

--

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ba79b71b-ee63-4dcb-bb52-fc147f32ce06%40googlegroups.com.

Ho

unread,

Jun 18, 2015, 12:43:06 PM6/18/15

to caffe...@googlegroups.com

Hi,
I have just read your train_val files. Why you have LRN and ReLu in each layer? In their paper, LRN are not used and ReLu is not applied on FC and the last convolutional layer (i.e. conv51 in your file). Second, the images u used are too few... Perhaps, you need to reduce the number of layer in order to make it work. Or you should use CASIAWebface....

Ho

Tharu

unread,

Jun 19, 2015, 8:38:59 AM6/19/15

to caffe...@googlegroups.com

Hi all,

I am struggling to reproduce the result from CASIA Face images (the normalized ones, provided by the authors). I am also facing problem of covergence.
I have been training two nets, one is from Imagnet. I customized the net given in the caffe lib for my purpose. This net converges ( though I am not getting good performance on LFW). But
the net ( from the paper of CASIA Web database team) does not converge for me and it is slow too.

Does anyone please can share some techinques/experiences to reproduce the result (> 90% performance on LFW).

Best,

Tharu

unread,

Jun 22, 2015, 10:54:03 AM6/22/15

to caffe...@googlegroups.com

Hi all,

I have been training CNN for ImageNet( given in examples folder ) for face recognition. I used CASIA database to train model.

I randomly splitted CASIA into two sets, train (900k ) and val( around 90K examples). I can see both the training loss is decreasing and performance on
cross validation set is increasing. Following is one of the snapshots:

I0622 16:34:36.595523 34088 solver.cpp:361] Snapshotting to DeepFace/caffe_DeepFace_train_iter_350000.caffemodel
I0622 16:34:43.505374 34088 solver.cpp:369] Snapshotting solver state to DeepFace/caffe_DeepFace_train_iter_350000.solverstate
I0622 16:34:49.888942 34088 solver.cpp:294] Iteration 350000, Testing net (#0)
I0622 16:36:36.112459 34088 solver.cpp:343] Test net output #0: accuracy = 0.959861
I0622 16:36:36.833694 34088 solver.cpp:214] Iteration 350000, loss = 0.072619

But I used this model parameters to compute Feature on LFW and compute score( unsupervised), I end up getting performance only 68%. Do you think it is normal?

For computing score, I first did L1 normalization of features followed by L2 normalization and computed eculidian distance between the pairs.

Steven L

unread,

Jun 22, 2015, 9:26:24 PM6/22/15

to Tharu, caffe...@googlegroups.com

HI Tharu:

Which distance function are u using? We try to use cosine distance

BR

--

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/ACIhR132F90/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/9ddca6fb-444a-4b58-937b-3f72462197c9%40googlegroups.com.

Steven L

unread,

Jun 23, 2015, 5:10:03 AM6/23/15

to Tharu, caffe...@googlegroups.com

HI Tharu, we are using softmax only in CASIA-Webface arch, we are now in 70% after 35K iteration, training datebase is CASIA-webface, we are going to use cosine distance to test LFW after training complete(about 300K iteration)

On Mon, Jun 22, 2015 at 10:54 PM, Tharu <bhattar...@gmail.com> wrote:

--

Tharu

unread,

Jun 23, 2015, 5:19:07 AM6/23/15

to caffe...@googlegroups.com

Hi Stevan.

Thank you for your reply.

I recheck my evaluation code, there was some bug on it.

I fixed it and I am getting 87% (unsupervised, l1 followed by l2 normalization and euclidian distance) and chi2_kernel+svm(91%, constrained setting)

I also use only softmax.

Best,
Tharu

Riwei Chen

unread,

Jun 24, 2015, 10:18:40 PM6/24/15

to caffe...@googlegroups.com

Hi Tharu,

I also using the ImageNet -like network architecture, but I can only achieve 65% accuracy on test data. (my batch size is 32 and iterate 240K), I see that you iterate 350K and achieve 95% accuracy on test data, so what the batch size you use, and is that your netwrok architecture only replace the last fc8 's num_output: 1000 with fc8's num_output: 10575

Best regards,

Chen

在 2015年6月22日星期一 UTC+8下午10:54:03，Tharu写道：