Question and comments about training an XOR

97 views

XORsession2

Skip to first unread message

Carlos Pegueros

unread,

May 9, 2020, 1:27:47 AM5/9/20

to Machine Learning for Physicists

Hi!

Sorry if this is old news, but I missed last session and I am trying to catch up.

In exercise one, we are "experimenting" to come up with a network that can train an XOR.

Q: Is there a more precise way of knowing before hand (or at least to have an idea of) how many layers/neurons/activations/etc we would need?

Also, these are my comments on this exercise based on my experiments. If you can comment about yours or just give me your thoughts, I would really appreciate it!

With one hidden layer, I found that:

With few neurons (I started with 5), I had to put a lot of training steps (around 5000) to start watching the cost to come down.
Either increasing the training steps or increasing the parameter eta minimizez the cost function more rapidly (which I guess makes sense as we are just "covering more distance").
Keeping all the parameters the same but doubling the amount of neurons (with 10 neurons), the cost started to come down at about half the steps than when it had 5 neurons. Accuracy did not improve.
Only "sigmoid" activation gave me nice results. When I switched either one activation or both to "reLU", the cost just remained constant no matter what.

When a successful (or a "good enough", see my image below) setup was found, either adding more neurons or changing the amount of steps or batchsize didn't improve the accuracy. If I wanted more precision, I had to add another layer.
When I added another layer, accuracy increased drastically but I had to add either a lot more steps or increase the eta parameter a lot to make the cost function come down.

These are the results I achieved:

Carlos Pegueros

unread,

May 9, 2020, 1:44:54 AM5/9/20

to Machine Learning for Physicists

lol so I just read the following in the "course overview" about session 2:

After this lecture, you will in principle be able to train a deep neural network on arbitrary tasks (using pure python code that we provide in the lecture). But you don’t yet know how to choose a smart representation of the data in more complicated cases, how best to choose the parameters during training, how to accelerate the gradient descent, etc.