Hi!
Sorry if this is old news, but I missed last session and I am trying to catch up.
In exercise one, we are "experimenting" to come up with a network that can train an XOR.
Q: Is there a more precise way of knowing before hand (or at least to have an idea of) how many layers/neurons/activations/etc we would need?
Also, these are my comments on this exercise based on my experiments. If you can comment about yours or just give me your thoughts, I would really appreciate it!
- With one hidden layer, I found that:
- With few neurons (I started with 5), I had to put a lot of training steps (around 5000) to start watching the cost to come down.
- Either increasing the training steps or increasing the parameter eta minimizez the cost function more rapidly (which I guess makes sense
as we are just "covering more distance").
- Keeping all the parameters the same but doubling the amount of neurons
(with 10 neurons), the cost started to come down at
about half the steps than when it had 5 neurons. Accuracy did not improve.
- Only "sigmoid" activation gave me nice results. When I switched either
one activation or both to "reLU", the cost just remained constant no matter what.
- When a successful (or a "good enough", see my image below) setup was found, either adding
more neurons or changing the amount of steps or batchsize didn't improve
the accuracy. If I wanted more precision, I had to add
another layer.
- When I added another layer, accuracy increased drastically but I had to add either a lot more steps or increase the eta parameter a lot to make the cost function come down.
These are the results I achieved:

