Hey,
I apparently can't respond to messages in the Google groups. I can send out an initial message, but I can't respond to anything. Huh.
Anyway, I've been thinking about the architecture for this project, and I think we want to have 128 x 120 input nodes (one for each pixel) and two output nodes (one for male and one for female). It looks like part of the problem is figuring out how many hidden layer nodes we need--too few obviously won't work, but too many leads to over fitting. This seems like a simple fix though: just run it a few times with a different number of hidden layer nodes each time. Additionally, the assignment says we should use SIGMOID units. A quick Google search turns up this result
http://en.wikipedia.org/wiki/Sigmoid_function (which we mentioned very briefly in class), and is in the book on page 726. We're going to need to use the dot product variant, but it should be pretty simple to implement. The SIGMOID function doesn't just return 0 or 1, though, so the output nodes are going to be any number between -1 and 1. To solve this, we can just take the max of the two nodes. For example, if we show the network a male face, the male output node should return 0.95 (for example) and the female one should return -.7. We just take the max of these two and get the male node, so we identify the picture as male. Similarly, a female picture will likely be positive for the female node, and negative for the male node.
In the assignment, it wants us to do "stochastic gradient descent back-propagation scheme" which (from my understanding) is a very complex way of saying we should do what we discussed in class where we update the weights with w=w+(delta)a. An example is on page 734 of the book (along with pseudo-code for the assignment) or check out Wikipedia here:
http://en.wikipedia.org/wiki/Stochastic_gradient_descent
Finally, it looks like we're going to need to visualize the hidden nodes like he showed in class where each hidden node creates a 128 x 120 image of what it thought was important. I liked his idea of doing gray scaling based off the weights from the input nodes, but I really have no idea how to do this.
I can start working on the write up. Is there anything else you want me to do?
Tom