Dlib Alignment Issues

1,542 views
Skip to first unread message

Dante Knowles

unread,
May 20, 2016, 4:51:27 PM5/20/16
to CMU-OpenFace
Hi,

I've been playing around with your OpenFace model in a C project of mine, using THNets and filling in the missing modules with my own implementations.

I was wondering if you noticed that your alignment script causes some really strange shears at times, seemingly counter-intuitive to trying to properly align the face. I wrote my own alignment in C that uses the same fiducial points as you, and the same template values, and most of the alignments are similar to yours. But in cases where your alignment totally shears the image, my alignment keeps it looking like (I think) it should. 

The problem is that when I run benchmarks on my aligned images, they perform worse than on your aligned images, presumably because your model is trained on these "skewed" faces. If I could replicate your seemingly "broken" alignment, that would be fine for my purposes, but I think if the model were to be retrained with better alignment, it would improve everything overall.

Here are some examples of what I mean, the image on top is your alignment, the image on bottom is mine. I've included two examples.







You might improve performance by re-training the model using a better alignment, I can provide you with mine if you'd like (I can write it in python if you need).

I'd appreciate it if you could take a look at my results and get back to me.



Best,

Dante

Brandon Amos

unread,
May 21, 2016, 4:22:24 PM5/21/16
to CMU-OpenFace
Hi Dante,

Very interesting! How are you doing the alignment?

I'd excited to try training a new OpenFace model with it and I think it could greatly improve our accuracy. Can you create a script with the same interface as our util/align-dlib.py script so that I can align a directory of images with it? You can keep what you want in C if that's easiest for you.

-Brandon.

Dante Knowles

unread,
May 23, 2016, 4:52:20 PM5/23/16
to CMU-OpenFace
I modified the align-dlib.py file to work. Just replace the one in your util/ and it'll work. Make sure to pass in innerEyesAndBottomLip, it's the only one this script supports.

I've hardcoded in the modified template matrix that I use to compute the transform matrix, but if you wanted to make it applicable for any 3 fiducial points, all you would have to do is grab the x, y points from your current MINMAX_TEMPLATE and add a 3rd column filled with 1's and then compute the inverse of that matrix. The rest of the code can remain the same

I use the inverse of the template matrix to calculate the transform matrix from the output pixels to the input pixels, then iterate through the output image and find the corresponding input pixels and its neighbors and weight those pixels for the final output pixel.


I'm also thinking it may be beneficial to average all 6 of the eye points for each eye to get a more stable eye fiducial point. 
iOS and Android also only give eye centers so if you're running the net on mobile it would be best to have a network trained on the eye centers, and less memory overhead on other platforms if you were to use a dlib model trained for only center-eye and bottom lip.

If you're interested in the center of the eyes + bottom lip, I can write an alignment script for that as well.
align-dlib.py

Dante Knowles

unread,
May 23, 2016, 5:59:11 PM5/23/16
to CMU-OpenFace
I left out some information on how I calculate the transformation matrix, heres a better description.

Aout = Template parameters, the 2x3 matrix representing the destination points.

Ain = Input fiducial points, the 2x3 matrix representing the input points found by dlib

M = 2x3 calculated transformation matrix

 We want to find a matrix M s.t. M * Aout = Ain

We have to bump Aout up to 3D by adding a row of 1's.

We want to solve for M so we multiply both sides of the equation by the inverse of Aout.

This gives us M = Ain * Aout^-1

Now we can use M to solve for the pixels of our output image by mapping from coordinates in the output to coordinates in the input.

Note that numpy's linalg.inv isn't a true inverse and you have to call transpose after (or before) for it to work.


Brandon Amos

unread,
May 31, 2016, 6:54:25 PM5/31/16
to CMU-OpenFace
Hi Dante,

I'm getting this error on a lot on a lot of images. As a temporary workaround I'm just skipping these images.

Traceback (most recent call last):
  File "./util/align-dlib-dante.py", line 378, in <module>
    alignMain(args)
  File "./util/align-dlib-dante.py", line 326, in alignMain
    landmarkIndices=landmarkIndices)
  File "./util/align-dlib-dante.py", line 230, in align
    bottomLeft = rgbImg[ty + vertOffset][tx]
IndexError: index 240 is out of bounds for axis 0 with size 240

-Brandon. 

Dante Knowles

unread,
May 31, 2016, 7:42:01 PM5/31/16
to CMU-OpenFace
What images are giving you the errors? Dimensions, filetype, etc.? 

I just dropped the script into a fresh docker container using your openface image and ran it on all the LFW images with no error.

Brandon Amos

unread,
Jun 1, 2016, 1:03:04 PM6/1/16
to CMU-OpenFace
What images are giving you the errors? Dimensions, filetype, etc.? 

Sorry, I should have attached an image. I've attached one I'm getting the error on.

-Brandon.


Brandon Amos

unread,
Jun 3, 2016, 3:57:15 PM6/3/16
to CMU-OpenFace, Bartosz Ludwiczuk, Mahadev Satyanarayanan, Zico Kolter, myme5...@gmail.com
Hi Dante, 

I've finished training an nn4.small2 model on CASIA/FaceScrub. Our best model with the current alignment gets 0.9292 LFW accuracy. Your alignment bumps us to 0.9425 accuracy! (I am using ADAM instead of adadelta now but I think this improvement is from your alignment rather than the optimization technique.) This is a great improvement and I think we can further pursue this jointly with some other ideas and datasets to get even higher accuracy. I've included some more details below about how training this network went.

If you are interested in using this model for development with your improved alignment technique, I have temporarily made it available at http://sandstorm.elijah.cs.cmu.edu:6847/model.t7. I am holding off on making this the default OpenFace model because I am hopeful that we will get even better results over the next few weeks.

I am next fixing a potentially serious bug in our Inception towers that caused not all of the feature maps to have the same dimensionality: https://github.com/cmusatyalab/openface/issues/142

Do you want to send in a PR adding your improved alignment technique so you'll show up in the contributors portion at https://github.com/cmusatyalab/openface/graphs/contributors? Let's plan for a new OpenFace version that will slightly break reverse compatibility. Can you rename the old alignment technique to AlignDlibV1 and call yours AlignDlibV2? Definitely add comments crediting your work here. Then add an option to the align-dlib script that defaults to V2. This way users can maintain compatibility with an older OpenFace version by using V1 if they want to. When we're ready I'll update the website to make these changes more visible.

By the way, I'm also thinking about starting a short contributors document for the website that has a description of everybody's role/contribution and optionally a short bio.

-Brandon.

---

Getting highest accuracy from the test log, from epoch 133:

training(master*)$ cat work/001/test.log | tail -n +2 | head -n -1 | py --ji -l 'numpy.max(l)'
0.9425



Dante Knowles

unread,
Jun 3, 2016, 6:09:29 PM6/3/16
to CMU-OpenFace, melg...@gmail.com, sa...@cs.cmu.edu, zko...@cs.cmu.edu, myme5...@gmail.com
That's great! Exciting improvement.

Unfortunately it seems I can't use the model because of the CUDA/CPU issue: https://github.com/cmusatyalab/openface/issues/127
I tried running it in the docker container on my laptop and all outputs are NaN.

I'm going to try loading it in CUDA torch on my desktop and saving it out as ascii, then trying to load in the ascii form into a CPU one as a temporary workaround.

I'd be glad to create a PR -- I'll have to fix the script first, because it fails on quite a few number of images. Oddly none in LFW or CASIA gave me errors, but some other sets have given me errors including the photo you posted. In my own trials, I noticed my script creating weird skews on some CASIA photos and sometimes creates black artifacts in the image. I'll get it fixed up and create the PR, hopefully by the end of this weekend. I imagine accuracy will improve further when it works properly on all the images.

Dante Knowles

unread,
Jun 3, 2016, 6:17:45 PM6/3/16
to CMU-OpenFace, melg...@gmail.com, sa...@cs.cmu.edu, zko...@cs.cmu.edu, myme5...@gmail.com
Correction:

The black artifacts were a product of the original image, via occlusion or a black border on the image. No need to fix this.
I still need to figure out why it gives errors on some images, I'll get that fixed up and create the PR.

Vitalius Parubochyi

unread,
Jun 10, 2016, 4:12:04 AM6/10/16
to CMU-OpenFace, melg...@gmail.com, sa...@cs.cmu.edu, zko...@cs.cmu.edu, myme5...@gmail.com
Hi,

I'm trying to use new alignment method with http://sandstorm.elijah.cs.cmu.edu:6847/model.t7. But when I get representation for aligned image (Dante, your alignment works very cool. Thanks!), the TorchNeuralNet.forward() method returns NaN array. 

openface/demos/compare.py images/examples/{lennon*,clapton*}
[ nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan]

I tried my and your (compare.py) example, but all works the same. 
In your example, I replaced few lines:

Line 45-46:
parser.add_argument('--networkModel', type=str, help="Path to Torch network model.",
                               default=os.path.join(openfaceModelDir, 'model.t7'))    (I copied model.t7 into openface/models/openface/ folder)

Line 84-85:
alignedFace = align.align_v2(args.imgDim, rgbImg, bb,
                                          landmarkIndices=openface.AlignDlib.INNER_EYES_AND_BOTTOM_LIP)

Do you have any thoughts on this? What am I doing wrong? 

Brandon Amos

unread,
Jun 10, 2016, 11:12:36 AM6/10/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com
Hi Vitalius,

Something in optimize-net or cudnn is causing them but I'm not sure what.
My only solution for now is to not use optimize-net or cudnn in training.
I have trained another model that shouldn't have the nan issue at

-Brandon.

Vitalius Parubochyi

unread,
Jun 13, 2016, 9:00:44 AM6/13/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com
Thanks! Now it works.

Brandon Amos

unread,
Jun 15, 2016, 12:43:17 PM6/15/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com, Dante Knowles

Hi Dante,


How is your training with your new alignment technique going?


My last results had some images aligned with your original technique (with the height/width bug) and some images aligned with your corrected version, which is currently in OpenFace's master branch. However now I'm only getting ~85% on CASIA-FaceScrub with nn4.small2 and before I was getting ~95% with the same dataset and model.



-Brandon.

Dante Knowles

unread,
Jun 16, 2016, 2:32:56 PM6/16/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com, god...@gmail.com
Here are my results training on CASIA webface. I used average eye alignment instead of inner eye alignment because a model trained on average eye alignment works best for mobile devices since mobile face detection API gives an eye center point only. 

I trained on inner eye as well and got similar results, my highest accuracy was ~94.7%
lfw-accuracy.pdf
train-loss.pdf

Brandon Amos

unread,
Jul 15, 2016, 3:26:25 PM7/15/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com, god...@gmail.com, Vitalius Parubochyi
Hi all,

Sorry for changing the API again, but I have removed align_v2 from OpenFace. As the image I have attached here shows, v2 computes the same affine transformation as align_v1, but is slower because it's manually implemented. The differences we saw earlier in this thread are from using a different set of reference landmarks. We are actively looking into training new models with the less distorted versions and into different alignment techniques.

A few ideas I'm interested in pursuing are:
  1) An alignment that just rotates the images so that the eyes are on a horizontal line and then cropping to the landmarks. First step is as shown here: http://stackoverflow.com/questions/22322265/calculate-image-rotation-based-on-old-and-new-positions-of-two-points
  2) Adding an option to expand the cropped region beyond the landmarks. I tried doing this a year ago before I released OpenFace and didn't see good results, but it could be interesting to try again since it seems like the forehead, ears, and neck could improve the accuracy.

To get results faster, is anybody here interested in helping with the implementation?

-Brandon.


Short script I used to plot these:


<html>
   <body>
        <script type="text/javascript">
         function pad(n, width, z) {
             z = z || '0';
             n = n + '';
             return n.length >= width ? n : new Array(width - n.length + 1).join(z) + n;
         }

         function add(tag) {
            document.body.innerHTML += "<h1>"+tag+"</h1>";
            for (i=1; i <= 42; i++) {
                var img = document.createElement("img");
                img.src = tag+"/raw/Arnold_Schwarzenegger_"+pad(i,4)+".png";
                document.body.appendChild(img);
            }
         }
         add("v1-eyes_1")
         add("v2-eyes_1")
         add("v1-inner")
         add("v2-inner")
         add("v1-outer")
         add("v2-outer")
        </script>
    </body>
</html>



Ilya Shapiro

unread,
Jul 19, 2016, 5:37:21 AM7/19/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com, god...@gmail.com, vitalius....@vakoms.com.ua
Hi,

You reported accuracy results of 94.25. Is it because you used different landmarks? If so, shouldn't OpenFace use these landmarks instead?
I agree that the affine transform warping the face image once it applied. The affine transform applies the following geometric distortions or deformations: translation, scale, shear, and rotation.
I guess it is a good idea to try it without the shearing, but I think it is going to work only for almost exact frontal face images.

-Ilya 

Brandon Amos

unread,
Jul 19, 2016, 11:12:14 AM7/19/16
to CMU-OpenFace, melg...@gmail.com, myme5...@gmail.com, god...@gmail.com, vitalius....@vakoms.com.ua
You reported accuracy results of 94.25. Is it because you used different landmarks? If so, shouldn't OpenFace use these landmarks instead?

This slight improvement is from using INNER_EYES_AND_BOTTOM_LIP alignment instead of OUTER_EYES_AND_NOSE. I plan to release a new model when we've improved the accuracy even more.

-Brandon.
Reply all
Reply to author
Forward
0 new messages