Recognition accuracy

1,640 views
Skip to first unread message

Oriol Julià Carrillo

unread,
Jan 15, 2016, 1:15:28 PM1/15/16
to CMU-OpenFace
Hi,

1) I've been recently training for face recognition and I am concerned about the scalability. While the algorithm did very well on recognition with up to 20 people (and about 90 images per person), I have been doing the same experiment with 50 people and the accuracy decreased substantially (approximately from around 0.92 to 0.83). What are your thoughts about this? Are there any ways to overcome the problem? I guess it will get much worse when I add more images (a linear decrease on the accuracy would already make it useless for large-scale use but I can perfectly imagine this decrease to be exponential or quadratic).

2) The accuracies reported, using the LFW benchmark, they are only for image comparison, aren't they? (that would explain different performance in image recognition)

3) While it's reasonable that the more people to recognize, the worse the algorithm should perform, I wonder about the accuracies of Facebook tagging recommendations or even face recognition implementation in security applications, which I imagine have good accuracies still when working with big amounts of people.

Oriol

Brandon Amos

unread,
Jan 15, 2016, 4:40:51 PM1/15/16
to Oriol Julià Carrillo, CMU-OpenFace
Hi Oriol,

> 1) I've been recently training for face recognition and I am concerned
> about the scalability. While the algorithm did very well on recognition
> with up to 20 people (and about 90 images per person), I have been doing
> the same experiment with 50 people and the accuracy decreased substantially
> (approximately from around 0.92 to 0.83). What are your thoughts about
> this? Are there any ways to overcome the problem? I guess it will get much
> worse when I add more images (a linear decrease on the accuracy would
> already make it useless for large-scale use but I can perfectly imagine
> this decrease to be exponential or quadratic).

Interesting! Are you using grid-search over SVM hyper-parameters
from my classification demo or another technique?
I wonder if this decrease is from the embedding space or classifier.

What is the accuracy on your larger dataset with exact nearest neighbor?
If it's high, then it indicates the embedding space can still
handle 50 people and that you need to find a better classifier.
If it's low, then maybe the embedding space can be improved without
losing generality by fine-tuning the neural network with your dataset.

> 2) The accuracies reported, using the LFW benchmark, they are only for
> image comparison, aren't they? (that would explain different performance in
> image recognition)

Correct, the LFW just measures performance from comparing images, not
classifying people. It's the standard for evaluating face recognition
techniques, but it's becoming dated and I expect some newer
evaluation benchmarks to become popular soon.
Both DeepFace and FaceNet are evaluated on the YouTube Faces
database/benchmark (http://www.cs.tau.ac.il/~wolf/ytfaces/),
which from my understanding is very similar to the LFW benchmark,
but more difficult because it compares video frames instead
of images.

-Brandon.
signature.asc

Oriol Julià Carrillo

unread,
Jan 15, 2016, 7:45:00 PM1/15/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu

Hi Brandon,

Hi Oriol,

> 1) I've been recently training for face recognition and I am concerned
> about the scalability. While the algorithm did very well on recognition
> with up to 20 people (and about 90 images per person), I have been doing
> the same experiment with 50 people and the accuracy decreased substantially
> (approximately from around 0.92 to 0.83). What are your thoughts about
> this? Are there any ways to overcome the problem? I guess it will get much
> worse when I add more images (a linear decrease on the accuracy would
> already make it useless for large-scale use but I can perfectly imagine
> this decrease to be exponential or quadratic).

Interesting! Are you using grid-search over SVM hyper-parameters
from my classification demo or another technique?
I wonder if this decrease is from the embedding space or classifier.

What is the accuracy on your larger dataset with exact nearest neighbor?
If it's high, then it indicates the embedding space can still
handle 50 people and that you need to find a better classifier.
If it's low, then maybe the embedding space can be improved without
losing generality by fine-tuning the neural network with your dataset.


Yes, I am using grid-search over SVM parameters as described in the recognition demo: http://cmusatyalab.github.io/openface/demo-3-classifier/

Let me try some other classification method. Do you know any good library for nearest neighbors? Which distances would you recommend me to try?
However, I am not sure nearest neighbors will perform well on this data due to its high dimensionality (almost a hundred features). Any suggestion on other methods to try?

> 2) The accuracies reported, using the LFW benchmark, they are only for
> image comparison, aren't they? (that would explain different performance in
> image recognition)

Correct, the LFW just measures performance from comparing images, not
classifying people. It's the standard for evaluating face recognition
techniques, but it's becoming dated and I expect some newer
evaluation benchmarks to become popular soon.
Both DeepFace and FaceNet are evaluated on the YouTube Faces
database/benchmark (http://www.cs.tau.ac.il/~wolf/ytfaces/),
which from my understanding is very similar to the LFW benchmark,
but more difficult because it compares video frames instead
of images.

-Brandon.

Oriol 

Brandon Amos

unread,
Jan 16, 2016, 2:34:18 PM1/16/16
to Oriol Julià Carrillo, CMU-OpenFace
Hi Oriol,

> Let me try some other classification method. Do you know any good library
> for nearest neighbors? Which distances would you recommend me to try?
> However, I am not sure nearest neighbors will perform well on this data due
> to its high dimensionality (almost a hundred features).

I recommend scikit-learn for common classification tasks in Python
since the classifiers follow a similar interface that makes
it easy to evaluate many different classifiers and variants at once.
The classification demo uses scikit-learn's SVM implementation.

They also have a nearest neighbor classifier:

http://scikit-learn.org/stable/modules/neighbors.html
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier

Use exact nearest neighbor instead of approximate nearest neighbor
even though the performance probably won't be very good so we can use
the results as a potential upper bound for other classifiers.
Try changing the default k=5 to other values like k=10 just to
see how using more neighbors helps.

You could be right about the data being too high-dimensional,
but I think a nearest-neighbor approach might work well because
the embedding space is defined to minimize Euclidean distance
between faces from the same person.

> Any suggestion on other methods to try?

A small MLP might also work well. I also recommend scikit-learn:

http://scikit-learn.org/dev/modules/neural_networks_supervised.html
http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

You might have to experiment a little with the hidden layer hyper-parameters.

-Brandon.
signature.asc

kaishi Jeng

unread,
Jan 16, 2016, 3:53:51 PM1/16/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu
Brandon

I don't know how SVM or nearest neighbors works for face recognition system which needs to do the following 2 things:
1) Decide  an image belonging to ones already in the database or not
2) If it is, then which one?

How can a SVM or nearest neighbors classifier do Item 1? It seems to me SVM or NN classifier always assumes an image to be recognized belonging to one of persons in the train which is not the case in a real system. 

Brandon Amos

unread,
Jan 16, 2016, 4:14:03 PM1/16/16
to kaishi Jeng, CMU-OpenFace, oju...@gmail.com
Hi Kaishi,

> I don't know how SVM or nearest neighbors works for face recognition system
> which needs to do the following 2 things:
> 1) Decide an image belonging to ones already in the database or not
> 2) If it is, then which one?
>
> How can a SVM or nearest neighbors classifier do Item 1? It seems to me SVM
> or NN classifier always assumes an image to be recognized belonging to one
> of persons in the train which is not the case in a real system.

Face recognition systems don't need to do #1 and #2 separately.
For example, you can use a probabilistic classifier like the
probabilistic SVM variant in the classification demo that
returns a probability distribution over a set of known people
rather than a prediction of a single person.
Then if the highest probability is below some threshold, you can say
the person is unknown and otherwise use the class with highest
probability as the prediction.

In Oriol's post, I'm assuming they are evaluating their system and
reporting accuracies over a held-out set of known faces.
This ignores #1 in your post and non-probabilistic classifiers like
SVM or KNN can be used.

-Brandon.
signature.asc

kaishi Jeng

unread,
Jan 16, 2016, 7:12:27 PM1/16/16
to Brandon Amos, CMU-OpenFace, oju...@gmail.com
Brandon

  Thanks for the info. I will try to change SVM in your web demo to  probabilistic  SVM as in classifier app to see how it works.

Oriol Julià Carrillo

unread,
Jan 17, 2016, 1:05:21 AM1/17/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu
Hi Brandon,

El dissabte, 16 gener de 2016 11:34:18 UTC-8, Brandon Amos va escriure:
Hi Oriol,

> Let me try some other classification method. Do you know any good library
> for nearest neighbors? Which distances would you recommend me to try?
> However, I am not sure nearest neighbors will perform well on this data due
> to its high dimensionality (almost a hundred features).

I recommend scikit-learn for common classification tasks in Python
since the classifiers follow a similar interface that makes
it easy to evaluate many different classifiers and variants at once.
The classification demo uses scikit-learn's SVM implementation.

They also have a nearest neighbor classifier:

http://scikit-learn.org/stable/modules/neighbors.html
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html#sklearn.neighbors.KNeighborsClassifier

Use exact nearest neighbor instead of approximate nearest neighbor
even though the performance probably won't be very good so we can use
the results as a potential upper bound for other classifiers.
Try changing the default k=5 to other values like k=10 just to
see how using more neighbors helps.

You could be right about the data being too high-dimensional,
but I think a nearest-neighbor approach might work well because
the embedding space is defined to minimize Euclidean distance
between faces from the same person.

I was first using grid-search over SVM parameters. The accuracy I was getting with the bigger dataset was 84 (I am getting it from the GridSearchCV method, as it is printed out using the best_score_() method in classifier.py. I assume that the methods works properly although I didn't check it).
I have tried more classification methods without success:
- k-NN with grid-search over k and over some distance parameters (including the case where k=1): 84
Note that the chosen distance was not the euclidean, but minkowski distance
- classification trees: 63

It seems to me that the decrease in accuracy come from the embedding space.
What do you mean by improving the embedding space by fine-tuning the neural network with my dataset?
The only related page I've found is http://cmusatyalab.github.io/openface/training-new-models/ but the hardware requirements seem like an impediment to attempt any new training.
 
"the embedding space is defined to minimize Euclidean distance between faces from the same person" 
I am not knowledgeable in face recognition and I don't know whether that's a standard that works better than others, but it seems to me that, due to non-robustness, using euclidean distances in such a high dimensional space (features \in \R^{128} if I am not mistaken) is doomed to fail. 
> Any suggestion on other methods to try?

A small MLP might also work well. I also recommend scikit-learn:
 

http://scikit-learn.org/dev/modules/neural_networks_supervised.html
http://scikit-learn.org/dev/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
 

You might have to experiment a little with the hidden layer hyper-parameters.

I haven't try MLP but I'd bet that it wouldn't get much better. 


-Brandon.


Oriol 

Brandon Amos

unread,
Jan 17, 2016, 1:52:31 AM1/17/16
to Oriol Julià Carrillo, CMU-OpenFace
Hi Oriol,

Also, are you doing this with the latest nn4.small2.v1 model?

> - k-NN with grid-search over k and over some distance parameters (including
> the case where k=1): 84
> Note that the chosen distance was not the euclidean, but minkowski distance
> - classification trees: 63
>
> It seems to me that the decrease in accuracy come from the embedding space.

Interesting, thanks for reporting back.
So since this is an issue with the embedding space, it could be
interesting for you to look at the cases where it's failing:
do the people in the images look similar to you?

> What do you mean by improving the embedding space by fine-tuning the neural
> network with my dataset?
> The only related page I've found
> is http://cmusatyalab.github.io/openface/training-new-models/ but the
> hardware requirements seem like an impediment to attempt any new training.

The following might take a while to implement correctly and
I haven't tried it.
However, if your use case is to create a highly accuracy for a set of
people that won't change very often, fine-tuning might work well by
tuning the embeddings to your dataset.

The hardware requirements for training from randomly initialized
parameters is pretty high.
However, I recently added support for CPU execution by
non-intuitively passing `-cuda` and `-cudnn` to `training/main.lua`.
If you use `nn4.small2.v1` as an initial model (with `-retrain`) and
then optimize it with your dataset, it will probably fit to your
dataset in a few hours or less on a CPU.

The approach would look something like:
1. Split your data into a static train/test directories
2. Align as noted in the training-new-models page
3. Run training/main.lua with nn4.small2.v1 as the
initial model, maybe also decrease `-epochSize`
so you can save the intermediate state more often.
4. Run a classification evaluation on your held-out
test set with an intermediate state by training
a model on the training set.

Often in fine-tuning, some neural network layers are locked
so that the parameters aren't updated to prevent over-fitting.
Without locking the parameters of the lower layers like
the current code is doing, there's a pretty high chance of
over-fitting to the training data.

> I am not knowledgeable in face recognition and I don't know whether that's
> a standard that works better than others, but it seems to me that, due to
> non-robustness, using euclidean distances in such a high dimensional space
> (features \in \R^{128} if I am not mistaken) is doomed to fail.

The input space 96x96x3 = 27648 pixels, so from this perspective,
it's nice that we can use a neural network to reduce the
representation of a face to 128 dimensions on a unit hyper-sphere
where similar face should be clustered together.
This idea is from the FaceNet paper: http://arxiv.org/abs/1503.03832

The loss function we use is simple to implement and change if you're
interested in exploring other embedding spaces.
See Alfredo Canziani's implementation at
https://github.com/Atcold/torch-TripletEmbedding/blob/master/TripletEmbedding.lua

-Brandon.
signature.asc

Oriol Julià Carrillo

unread,
Jan 20, 2016, 11:27:33 PM1/20/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu
I'll try to find some time in the future to work on the fine-tuning and report some results.

As to avoid overfitting, do you mean that I should manually lock some of the neural network layers? Or it's already implemented in the Openface algorithms?

Oriol

El dissabte, 16 gener de 2016 22:52:31 UTC-8, Brandon Amos va escriure:

Brandon Amos

unread,
Jan 20, 2016, 11:42:54 PM1/20/16
to Oriol Julià Carrillo, CMU-OpenFace
Hi Oriol,

> As to avoid overfitting, do you mean that I should manually lock some of
> the neural network layers? Or it's already implemented in the Openface
> algorithms?

It's not implemented in OpenFace and would involve modifying the
parameter optimization code to only optimize the upper layers:
https://github.com/cmusatyalab/openface/blob/0.2.0/training/OpenFaceOptim.lua

I think this can be done by changing the self.model:apply in __init()
to initialize self.modulesToOptState with only the upper layers instead
of all of them.

-Brandon.
signature.asc

Oriol Julià Carrillo

unread,
Jan 21, 2016, 4:30:04 AM1/21/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu
OK, I'll let you know if I have time to do any experiments.

Thanks,

Oriol

El dimecres, 20 gener de 2016 20:42:54 UTC-8, Brandon Amos va escriure:

Somebody Else

unread,
Mar 25, 2016, 12:38:19 PM3/25/16
to CMU-OpenFace, oju...@gmail.com, ba...@cs.cmu.edu
is it correct to run:

./main.lua -testing -cudnn -cuda -retrain ../models/openface/nn4.small2.v1.t7 -modelDef ../models/openface/nn4.small2.def.lua -data /data-aligned

for retraining the `nn4.small2.v1` model?

Will this improve the results on my data set? With just 2 persons I get very accurate results (95% and more) but already with 10 persons I get often wrong results or in the best case low confidence (max. 50%). Why is this decreasing so quickly so much? All my persons are the same gender and age about. 

Do you think retrain will help to improve me results? 

Do I need to manually adjust the file https://github.com/cmusatyalab/openface/blob/0.2.0/training/OpenFaceOptim.lua before retraining? 

Ow would it make sense to just make a new DNN model from scratch? I have a bit over 1 million faces of 60000 different persons.

Brandon Amos

unread,
Mar 25, 2016, 1:54:55 PM3/25/16
to Somebody Else, CMU-OpenFace, oju...@gmail.com
> ./main.lua -testing -cudnn -cuda -retrain
> ../models/openface/nn4.small2.v1.t7 -modelDef
> ../models/openface/nn4.small2.def.lua -data /data-aligned
>
> for retraining the `nn4.small2.v1` model?
>
> Will this improve the results on my data set?

Yes, this seems like a good starting point.
The loss is difficult to interpret since it only selects hard
triplets that aren't zero, so you'll probably want a separate
test data set that you evaluate classification accuracy on.
I expect it to improve the results, but be careful about the
network overfitting to your data.

> Why is this decreasing so quickly so much? All my persons are the
> same gender and age about.

To get more intuition into this you need to look at your dataset
and classifier to find where the failures are coming from.

> Do I need to manually adjust the file
> https://github.com/cmusatyalab/openface/blob/0.2.0/training/OpenFaceOptim.lua before
> retraining?

Possibly. It might be better to restrict how many layers are modified
during fine-tuning, but start by trying it as-is.

> Ow would it make sense to just make a new DNN model from scratch? I have a
> bit over 1 million faces of 60000 different persons.

This also might make sense, but I think you'll be able to take
advantage of the

-Brandon.
signature.asc

Somebody Else

unread,
Apr 6, 2016, 9:16:50 AM4/6/16
to CMU-OpenFace, dpiat...@gmail.com, oju...@gmail.com, ba...@cs.cmu.edu
Hi Brandon,

during the last 2 weeks I did some testing with the retraining. Currently I use the following command:

./main.lua -testing -cudnn -cuda -nEpochs 500 -epochSize 50 -imagesPerPerson 10 -peoplePerBatch 5 -epochNumber 17 -manualSeed 1 -retrain ../models/openface/nn4.small2.v1.t7 -modelDef ../models/openface/nn4.small2.def.lua -data /data-aligned-small/

which is working good with my current machine (2CPU 8GB RAM). Some questions I have now again:

- nEpochs: What is a good number for retraining? 500? 1000? 
- epochSize: What is a good number for retraining? 50? 100?
- Does the number entered in imagesPerPerson and peoplePerBatch in any way play a role in the outcome? or is it simply to adjust the CPU usage?
- epochNumber: when I restart a aborted training (see example command I use 17 since I aborted on epoch 16). DO I need to use the nn4.small.v1.t7 again or the newly generated model_16.t7 file from the work folder?
- when do I know that I have achieved the perfect/best retraining? Can you advise any way I can see the performance with my test data set besides simply trying manually?
- What is the minimum amount of aligned faces each person should have in their folder for retraining? 3? What is the optimal amount of images each person should have for retraining?

I am sure I have a few more questions but those answers would help me greatly already!

Domi

Somebody Else

unread,
Apr 6, 2016, 9:19:32 AM4/6/16
to CMU-OpenFace, dpiat...@gmail.com, oju...@gmail.com, ba...@cs.cmu.edu
one more question for now: does 

Epoch: [20][TRAINING SUMMARY] Total Time(s): 204.72     average triplet loss (per batch): 0.20

look good to you? I have always around 0.20/0.21 avg. triplet loss per batch. is this good? should this number decrease over time/epochs?

Brandon Amos

unread,
Apr 6, 2016, 10:32:49 AM4/6/16
to Somebody Else, CMU-OpenFace, oju...@gmail.com
> - nEpochs: What is a good number for retraining? 500? 1000?

It depends on your dataset and you should stop this based
on performance on your test/validation dataset.

> - epochSize: What is a good number for retraining? 50? 100?

epochSize changes how often testing is done and how often intermediate
models are written to disk.
Increase it if you need more and decrease it if you need less.

> - Does the number entered in imagesPerPerson and peoplePerBatch in any way
> play a role in the outcome? or is it simply to adjust the CPU usage?

I always use the largest possible values for these on my GPU
and haven't studied convergence of lower values.

> - epochNumber: when I restart a aborted training (see example command I use
> 17 since I aborted on epoch 16). DO I need to use the nn4.small.v1.t7 again
> or the newly generated model_16.t7 file from the work folder?

If you are resuming training, you need to give model_16.

> - when do I know that I have achieved the perfect/best retraining? Can you
> advise any way I can see the performance with my test data set besides
> simply trying manually?

See the current test code for how I automatically evaluate on the LFW
after every epoch.

> - What is the minimum amount of aligned faces each person should have in
> their folder for retraining? 3? What is the optimal amount of images each
> person should have for retraining?

I haven't studied this.

-Brandon.
signature.asc

Brandon Amos

unread,
Apr 6, 2016, 10:34:07 AM4/6/16
to Somebody Else, CMU-OpenFace, oju...@gmail.com
> Epoch: [20][TRAINING SUMMARY] Total Time(s): 204.72 average triplet
> loss (per batch): 0.20
>
> look good to you? I have always around 0.20/0.21 avg. triplet loss per
> batch. is this good? should this number decrease over time/epochs?

We remove easy triplets for training and this loss value doesn't
indicate if the network is well-trained or poorly-trained.
You need to add some test or validation to determine when to
stop training.

-Brandon.
signature.asc
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Jingfeng Liu

unread,
Nov 27, 2016, 10:37:14 PM11/27/16
to CMU-OpenFace, dpiat...@gmail.com, oju...@gmail.com, ba...@cs.cmu.edu
I am using nn4.small2.def.lua to train the model. My progress model size is 1.18GB, the nn4.small2.v1.t7 is only 30MB.  Does anybody know the reason?
Reply all
Reply to author
Forward
0 new messages