Hi Eduard,
> Any news on this?
The demo isn't ready yet, but it's probably not quite what you're
looking for if you want to train a face detector. All the demo does is
show activations from a pre-trained model; there's no extra training
going on.
> I'm trying to get my heads around face recognition using caffe. So far, I
> can theoretically follow your statement of intercepting face-recognising
> neurons directly.
> Did you implement a sliding window that triggers only when those face
> neurons fire? Or did you go with r-CNN?
> (I can't follow on r-CNN, because I don't have matlab :'( )
If by "sliding window" you mean "convolution", and by "triggers" you
mean "the relu activation is greater than 0", then yes. But this is
just the operation of every conv/relu layer. The network used was
trained on ILSVRC12 classification only (no detection labels). You
could just use the similar model at models/bvlc_reference_caffenet.
> My initial thought, however, was to reduce fine-tune a modified net where
> the ultimate fully-connected layer has only 2 outputs (face/no-face). What
> do you think about this?
Great idea! Just grab a dataset and stick an FC layer atop, say,
conv5. Given the locality of the representation (that is, the face or
no-face information seems to be contained in a few channels as opposed
to distributed across many), you might have good luck using L1 weight
decay instead of or in addition to L2.
jason
---------------------------
Jason Yosinski, Cornell Computer Science Ph.D. candidate
http://yosinski.com/ +1.719.440.1357
>
https://groups.google.com/d/msgid/caffe-users/0afab1e2-cfe7-425e-9dac-84b87f07b23b%40googlegroups.com.