Hi Oriol,
Also, are you doing this with the latest nn4.small2.v1 model?
> - k-NN with grid-search over k and over some distance parameters (including
> the case where k=1): 84
> Note that the chosen distance was not the euclidean, but minkowski distance
> - classification trees: 63
>
> It seems to me that the decrease in accuracy come from the embedding space.
Interesting, thanks for reporting back.
So since this is an issue with the embedding space, it could be
interesting for you to look at the cases where it's failing:
do the people in the images look similar to you?
The following might take a while to implement correctly and
I haven't tried it.
However, if your use case is to create a highly accuracy for a set of
people that won't change very often, fine-tuning might work well by
tuning the embeddings to your dataset.
The hardware requirements for training from randomly initialized
parameters is pretty high.
However, I recently added support for CPU execution by
non-intuitively passing `-cuda` and `-cudnn` to `training/main.lua`.
If you use `nn4.small2.v1` as an initial model (with `-retrain`) and
then optimize it with your dataset, it will probably fit to your
dataset in a few hours or less on a CPU.
The approach would look something like:
1. Split your data into a static train/test directories
2. Align as noted in the training-new-models page
3. Run training/main.lua with nn4.small2.v1 as the
initial model, maybe also decrease `-epochSize`
so you can save the intermediate state more often.
4. Run a classification evaluation on your held-out
test set with an intermediate state by training
a model on the training set.
Often in fine-tuning, some neural network layers are locked
so that the parameters aren't updated to prevent over-fitting.
Without locking the parameters of the lower layers like
the current code is doing, there's a pretty high chance of
over-fitting to the training data.
> I am not knowledgeable in face recognition and I don't know whether that's
> a standard that works better than others, but it seems to me that, due to
> non-robustness, using euclidean distances in such a high dimensional space
> (features \in \R^{128} if I am not mistaken) is doomed to fail.
The input space 96x96x3 = 27648 pixels, so from this perspective,
it's nice that we can use a neural network to reduce the
representation of a face to 128 dimensions on a unit hyper-sphere
where similar face should be clustered together.
This idea is from the FaceNet paper:
http://arxiv.org/abs/1503.03832
The loss function we use is simple to implement and change if you're
interested in exploring other embedding spaces.
See Alfredo Canziani's implementation at
https://github.com/Atcold/torch-TripletEmbedding/blob/master/TripletEmbedding.lua
-Brandon.