Ex: XOR with 1 hidden layer

477 views
Skip to first unread message

Riccardo

unread,
Apr 29, 2020, 12:01:30 PM4/29/20
to Machine Learning for Physicists
Hi guys,
the XOR problem seems to be a "classical" issues of NN (see for instance: here). I had to think a bit before getting it (see attachment), so I was wondering if someone else found any alternative or more economical (e.g. less neurons) solution.

Cheers,
Riccardo
Ex1_XOR_RP.ipynb

Stefan Berg-Johansen

unread,
Apr 29, 2020, 12:51:12 PM4/29/20
to Machine Learning for Physicists
Hi Riccardo, the XOR with 1 HL you shared is very impressive! So far, the only more economical solution I can offer is an "approximate" XOR. It has only two neurons in the HL and the output neuron does not need a nonlinear activation function. However, for intermediate input values, the output values are not as well defined as in the solution you gave (the threshold for one input depends on the value of the other input). More examples & comments welcome!
Ex1_XOR_SBJ.ipynb
Message has been deleted

Konstantin Rushchanskii

unread,
Apr 29, 2020, 1:26:23 PM4/29/20
to Machine Learning for Physicists
Dears,

Both of you have "1" in output for "0" and "0" inputs. You have to bias your plot upwards.

Here is my solution:

visualize_network(weights=[ [ 
    [-1.2,-1.2], # weights of 2 input neurons for 1st hidden
    [0.9,0.9]  # weights of 2 input neurons for 2nd hidden
    ],                 
    [ 
        [-1,-1] # weights of 2 hidden neurons for output
    ]  ],
    biases=[ 
        [1,-1], # biases of 2 hidden neurons
        [1] # bias for output neuron
            ],
    activations=[ 'jump', # activation for hidden
                'jump' # activation for output
                ],
    y0range=[-0.1,1.1],y1range=[-0.1,1.1])

Cheers,
KR

Stefan Berg-Johansen

unread,
Apr 29, 2020, 1:36:18 PM4/29/20
to Machine Learning for Physicists
Yeah, we just used whatever input range was already set in the notebook (-3,3), but formally it should be (0,1) of course. Cheers!

Florian Marquardt

unread,
Apr 29, 2020, 3:14:34 PM4/29/20
to Machine Learning for Physicists
Very nice!

Fabian Lux

unread,
Apr 29, 2020, 4:00:43 PM4/29/20
to Machine Learning for Physicists
After the tutorial I was under the impression that it should be impossible to represent the XOR with one hidden layer. Hence the quest for an "approximative" XOR. Nicely done

Fabian Lux

unread,
Apr 29, 2020, 4:13:49 PM4/29/20
to Machine Learning for Physicists
Maybe the caveat is that you have a representation of a negated XOR gate. One can get the XOR gate by inverting the weights to the output neuron and choosing an infinitesimal bias:
visualize_network(weights=[
    [ # weights of 2 input neurons for 1st hidden layers
    [1,1], [0,1], [1,0]],
    [ # weights of 4 input neurons the output
    [1,-1,-1]]
    ],
    biases=[ 
        [0,0,0], # biases of 3 hidden neurons
        [1e-20], # biases of the output
            ],
    activations=[ 'reLU', # activation for hidden
                 'jump', # activation for output
                ],
    y0range=[-3,3],y1range=[-3,3])

In a certain sense this would then be an approximatively exact XOR. What is your opinion?

Florian Marquardt

unread,
Apr 29, 2020, 5:20:27 PM4/29/20
to machine-learnin...@googlegroups.com
@Fabian Lux: Sorry for that impression. Maybe this impression came about because I had in mind a general scheme for approximating arbitary functions approximately (arbitrarily well) with 1 hidden layer, hence the hint towards the last part of the notebook (it's still very nice, look at it, and try it out!). But yes, one can solve it exactly with only a few neurons and one hidden layer.

The famous 'no-go' theorem concerns the fact that it is impossible to represent XOR *without* any hidden layer, just input and output (while AND can be represented in this way).

Best regards,
Florian

Stefan Berg-Johansen

unread,
Apr 29, 2020, 5:20:47 PM4/29/20
to Machine Learning for Physicists
How nice, we have an XNOR gate as well! Cf. the last cell of Riccardo's notebook for a similar "exact" XOR gate with three neurons in the hidden layer.

Konstantin Rushchanskii

unread,
Apr 29, 2020, 5:29:52 PM4/29/20
to Machine Learning for Physicists
Hi Fabian,

I think, this solution is extremely unstable against of noise. In contrast, I have feeling that approximative XOR solution is very stable. One can fine tune parameters (in my example, 0.9 and -1.2) to define threshold where all construction will fire as XOR by flexible inputs. And this can be learned!

The beauty of this exercise is demonstration, that you can actually build boolean logic within neural network in the place where it is needed. Therefore, mixing different activation functions is not a good idea here, if you have homogeneous network.

Cheers,
K

Anoop Kulkarni, PhD

unread,
Apr 30, 2020, 9:31:54 AM4/30/20
to Machine Learning for Physicists
Very nice! Cleanest I have seen so far :)

~anoop

Rodolfo Ferro

unread,
Apr 30, 2020, 10:06:20 AM4/30/20
to Machine Learning for Physicists

Hi,

I got into this for the XOR with any hidden layers:
download_!.png


And to this for the approximation of XOR with only one hidden layer:

download".png


As you can see in both cases, I also added the testing points (0, 0), (0, 1), (1, 0), (1, 1) of the original XOR problem.

The reason why my plots are slightly different is that I translated the original code provided by Professor Marquardt into a Python class where you can also specify the colors to be used (for both network structure and colormap).


I've read the sent comments in this thread and I've seen that we actually came to very similar solutions!


– Rodo.


Sabrina

unread,
Apr 30, 2020, 10:39:27 AM4/30/20
to Machine Learning for Physicists
Hey everybody,

so I am on the problem and it takes me forever. Does anyone have some hints? I think it is very hard to think about all the possibilities.
Did you start from the bottom or from the top, what do you think is the easiest ?
Did you go through all possibilities or do you have kind a feeling for it?
I spend already almost 2 -3 hours and I am not even close.

Sabrina

Stefan Berg-Johansen

unread,
Apr 30, 2020, 2:56:21 PM4/30/20
to Machine Learning for Physicists
Hints:
  • use pen and paper
  • write down the elementary equation for a single neuron, and internalise it
  • for simplicity, use the 'jump' activation function (a.k.a. Heaviside function)
  • make AND and OR with just a single neuron (two inputs), still pen and paper
  • keep the boolean truth tables around to remind yourself what you are trying to do
  • make a 1D rectangle function, i.e. a function that jumps from 0 to 1 and then back to 0 again (what network achieves that?) -- still pen and paper
  • generalise that to 2D and you have the XOR
  • finally, verify your work by entering the values into Python
My point is that you should start with the elementary operation and build it from there. Gradually, you see what can and what cannot work. Only later you arrive at abstract conclusions of the type in Konstantin's second post. The fact that it's not immediately obvious is exactly what makes it interesting, so don't worry. Hope that helps, have fun!

Dimitry Ayzenberg

unread,
Apr 30, 2020, 4:23:14 PM4/30/20
to Machine Learning for Physicists
I don't know if it's staying within the "rules", but you can generate an XOR with ZERO hidden layers by using an inverted top-hat function (e.g. f(z) = 1 for abs(z) > 0.25; f(z) = 0 for abs(z) <= 0.25) for the nonlinear function. Then the weight of one neuron just needs to be negative of the other.

visualize_network(weights=[ [[0.5,-0.5]] ],
    biases=[ [0] ],
    activations=[ 'tophat' ],
    y0range=[0,1],y1range=[0,1])

Sabrina

unread,
May 1, 2020, 3:52:32 AM5/1/20
to Machine Learning for Physicists
Thanks for your nice discribtion.
I think with the tables I got it now, but i did kind of a layer for making the input variable to [0,1] instead of (0,1).
So I wondering if there is a easy way to get it with (0,1), because this makes it harder to get what to do and so I get problems to do something with one hidden layer. 


# Here you set the values, the values are given in function, are default values
visualize_network(weights=[ [ 
    [1.0,0], # weights of 2 input neurons for 1.st hidden neuron in first layer
    [0,1.0]   # weights of 2 input neurons for 2.st hidden neuron in first layer # input neurons zu 0 oder 1 machen
    ],
    [
    [-1,-1], #weight of 2 neurons from 1. hidden layer to 1st neuron 2nd hidden layer (NAND)
    [1,1] #OR
    ],
    [
        [1,1] #And
    ]],              
    biases=[ 
        [0,0],
        [1.5, -0.1],# NAND and Or
        [-1.5] #AND
            ],
    activations=[ 'jump', # activation for output
                  'jump',
                  'jump'
                ],
    y0range=[-1,1],y1range=[-1,1])  

Stefan Berg-Johansen

unread,
May 1, 2020, 6:43:36 AM5/1/20
to Machine Learning for Physicists
@Dimitry: It seems that by shifting more "workload" into the nonlinear function you make the neuron more powerful but the network on the whole less general, since apparently only the weights and biases can be learned, but not the nonlinearity itself. It's an interesting point though. Thank you very much for sharing this.

Stefan Berg-Johansen

unread,
May 1, 2020, 7:01:27 AM5/1/20
to Machine Learning for Physicists
@Sabrina: Congratulations, you are clearly making progress. Notice that scaling/shifting the input range is exactly what the linear part of first layer does. By changing only one line you can now get from (-1,1) to (0,1) input range.
The simplest and apparently most stable solution emerges when you loosen the criteria somewhat to make an "approximate" XOR., where you only care about the output when the inputs are close to their "proper" values zero and one, but not in between. For AND and OR, you already did exactly this (by XOR in my previous post, I actually mean this "approximate" XOR).

Florian Marquardt

unread,
May 1, 2020, 9:25:49 AM5/1/20
to Machine Learning for Physicists
@Dimitry Ayzenberg: Funny, I hadn't seen this before. But as Stefan says, it is a bit of 'cheating' since you needed to choose an activation function specifically for this problem. Somewhere in the no-go theorem (on the impossibility of making an XOR with no hidden layers) one needs to assume the activation function is monotonous.

Dimitry Ayzenberg

unread,
May 1, 2020, 11:40:11 AM5/1/20
to Machine Learning for Physicists
@Stefan @Florian: That's actually related to some questions I had but forgot to ask during the last session. Is there always only one activation function in a neural net or can several be used at different neurons? Is there a reason the activation function itself cannot be trained, at least from a select subset of functions? It would certainly make the training more difficult, but I imagine it would make the neural net overall more powerful.

Shilan

unread,
May 1, 2020, 5:55:54 PM5/1/20
to Machine Learning for Physicists
Hi every one,
I have a question regarding "y_0range" and "y_1range"! If I understand correctly, y_0 and y_1 are our input for example in XOR case which is 0 or 1. Then with the help of our activation function, I should design a network with appropriate weights and biases that validate the table for that gate. What should be my strategy for choosing  "y_0range" and "y_1range"? Is the way that I explain for myself correct?!

Thanks for your help

Tobias S

unread,
May 2, 2020, 10:25:41 AM5/2/20
to Machine Learning for Physicists
Hey guys,

I have found a relatively simple solution to this problem; however I am not sure if my approach can be considered cheating as I'm using the reLU function for the hidden layer and the sigmoid function for the output.

xor_one_layer.png

My result looks like this, using the following weights and biases:
weights=[[[1,1],[-1,1],[1,-1], [-1,-1]], 
                [[-1000000000,1000000000,1000000000,-1000000000]]]
biases=[[0,0,0,0],[-25]]

Shilan

unread,
May 2, 2020, 2:31:22 PM5/2/20
to Machine Learning for Physicists
Hi all,
I think undrestood now!   "y_0range" and "y_1range" are the range of y_out. 
Thanks :)

Shilan

unread,
May 2, 2020, 6:25:43 PM5/2/20
to Machine Learning for Physicists
Hello,

I try to use the following for XOR. It is working when checking the table but 
when running the code doesn't reproduce correct visualization for XOR! Why?
Thanks

# Combination of OR,NAND, AND is XOR
visualize_network(weights=[ 
    [# weights of 2 input neurons for hidden layers 
    [2,2],[-1,-1]] ,
    [ # weights of 2 HL to output
    [1,1]]
    ],
    biases=[ 
        [-1,2], # biases of 2 hidden neurons
        [-1], # bias for output neuron
            ],
    activations=[ 'jump','jump' # activation for HL and output

Daniel Walter

unread,
May 2, 2020, 7:24:43 PM5/2/20
to Machine Learning for Physicists
Hallo Shilan

Your solution works, at least for the digital 0 and 1.
If you change the view by changing the values of y0rang to [0,1] and y1range to [0,1] you should get the XOR.
If that will display an diagonal line that try to add ,M=2 behind y1range=[0,1] that should solve that.

Hope that will help you

With kind regards,
Daniel

Anoop Kulkarni, PhD

unread,
May 2, 2020, 9:55:36 PM5/2/20
to Machine Learning for Physicists
While I understand M=2 does the trick, I guess if I understood Florian right, he wasnt looking at us to internalize the code and function parameters. The task was primarily an exercise to play around with weights and biases to produce specific patterns (logical gate, square, any convex figure etc)

~anoop

Darshan Kumar

unread,
May 3, 2020, 2:09:18 AM5/3/20
to Machine Learning for Physicists
Dear All

Here I'm sharing my result that I just got for XOR gate using 1 hidden layer with 2 neurons. Please have a look at this in your spare time and please suggest any hint of suggestions to fine-tune this plot. 

And also if someone having a problem understanding this XOR gate, please allow me to help you anyways.

Sincerely. 
Darshan_XOR_1_Hidden_Layer_with_2_Neurons.png

Shilan

unread,
May 3, 2020, 4:40:32 AM5/3/20
to Machine Learning for Physicists
Thanks Daniel,

I changed as you said then saw what I expected! Also in the  M=2, y0range=[-1,2],y1range=[-1,2] same result achieved.
I think still I have difficulty in figure out ranges!

Thanks for your help
Shilan

Alexander Duplinskiy

unread,
May 4, 2020, 5:33:51 AM5/4/20
to Machine Learning for Physicists

One more 'cheating' example, using delta function for the output neuron

Florian Marquardt

unread,
May 4, 2020, 1:28:52 PM5/4/20
to Machine Learning for Physicists
@ Tobias S: I wouldn't consider it cheating, but it is of course a bit fine-tuned for the task (when we train, we will not change activation functions according to the training target).

Emiliano Pallecchi

unread,
May 4, 2020, 5:45:49 PM5/4/20
to Machine Learning for Physicists
If I am not mistaken, an alternative economical approximate solution coud use the two hidden neurons to verify if the input are 01 or 10 (instead of having hidden neurons checking for 00 or 11 as in the example)  in this case the weights are swapped and bias is the same. The output simply adds the hidden neurons with no bias.


Reply all
Reply to author
Forward
0 new messages