Question and comments about training an XOR

97 views
Skip to first unread message

Carlos Pegueros

unread,
May 9, 2020, 1:27:47 AM5/9/20
to Machine Learning for Physicists
Hi!

Sorry if this is old news, but I missed last session and I am trying to catch up.

In exercise one, we are "experimenting" to come up with a network that can train an XOR.

Q: Is there a more precise way of knowing before hand (or at least to have an idea of) how many layers/neurons/activations/etc we would need?

Also, these are my comments on this exercise based on my experiments. If you can comment about yours or just give me your thoughts, I would really appreciate it!
  • With one hidden layer, I found that:
    • With few neurons (I started with 5), I had to put a lot of training steps (around 5000) to start watching the cost to come down.
    • Either increasing the training steps or increasing the parameter eta minimizez the cost function more rapidly (which I guess makes sense as we are just "covering more distance").
    • Keeping all the parameters the same but doubling the amount of neurons (with 10 neurons), the cost started to come down at about half the steps than when it had 5 neurons. Accuracy did not improve.
    • Only "sigmoid" activation gave me nice results. When I switched either one activation or both to "reLU", the cost just remained constant no matter what.
  • When a successful (or a "good enough", see my image below) setup was found, either adding more neurons or changing the amount of steps or batchsize didn't improve the accuracy. If I wanted more precision, I had to add another layer.
  • When I added another layer, accuracy increased drastically but I had to add either a lot more steps or increase the eta parameter a lot to make the cost function come down.
These are the results I achieved:

one.pngtwo.png

Carlos Pegueros

unread,
May 9, 2020, 1:44:54 AM5/9/20
to Machine Learning for Physicists
lol so I just read the following in the "course overview" about session 2:

After this lecture, you will in principle be able to train a deep neural network on arbitrary tasks (using pure python code that we provide in the lecture). But you don’t yet know how to choose a smart representation of the data in more complicated cases, how best to choose the parameters during training, how to accelerate the gradient descent, etc.


So I guess my question will be covered in future lectures.


Still if someone would like to give me their comments, I would appreciate it!

Reply all
Reply to author
Forward
0 new messages