Ok, so this is a continuation of the thread where I'm trying to build a neural network. I took a step back, and am trying to implement the backpropagation algorithm (to understand the gradient descent principle). At the moment, I'm just trying to predict the Ask price (Bid & Ask will come later). I've gotten to the point where I can i) create a neural network with 3 layers: input, hidden and output. The neural-network has a structure as in B) (see below). I'm trying to predict EURUSD tick data and each tick has a structure like in A) (see below). I can also do ii) feed forward calculations (which calculates the total error) and iii) the backpropogation (with weight updates). So my intuition is a little better, but I'm still getting funny results. Can anyone provide specific critiques of my code and algorithms?
The problem is that, during training, the error progression looks something like in C). This means that the backpropaagation and weight updates, are making the connection weights progressively smaller. I'm trying to figure out what in my training calculation is throwing the numbers off.
I used this paper as a guideline. See pages 17 - 19 (click here). As well, I used some of Richard Ng's online video guides (click here)
My code is online (click here). And I've hard-coded some start points at the bottom of neural-net-clj (click here).
Based on the abouve paper (p. 17 - 19), my understanding of the crucial steps are below:
total-error: predicted-ask - actual-ask
backpropagated-error (starting from the output, we need to propagate the total error, back through the NN). For the output layer, it's i) , where o = neuron's calculated output , t = actual value . For the hidden and input layers, the backpropogated value is ii).
i): δ(2) = o(2) ( 1 − o(2) ) ( o(2) − tj )
ii): δ(1) = o(1) ( 1 − o(1) ) sum of output edges( w(2)δ(2) ).
partial-derivative: is that neuron's calculated output * it's backpropogated error, or: δ(1)oi.
weight update (where γ is a learning constant): −γo(1)δ(2),
Hey, thanks for these. I'll take the evening to read through them. It's the gradient descent calculation that I find to be convoluted. I understand the concept. But I haven't seen a clear example of how that's done. That's what I was trying to nail down with that output neuron example. And reading the encog-java source code wasn't any clearer either.
Thanks very much for the resources. Perhaps these docs will help.
Hmmm...I am not sure I follow...
the error is simply (- target actual). Then you take the square of
that difference (2 reasons for that) and there you have the error
per neuron at the output layer. Then you simply add them up to get
the entire network's error. so far so good...
the tricky bit comes now when you need to find out how much the
error depends on the output, inputs and weights so you can feed
that in to the gradient descent formula via which the weights will
be adjusted. Basically gradient descent says this:
"the adjustment of each weight will be the negative of a constant
theta multiplied by the dependence of the previous weight on the
error of the network, which is the derivative of that error with
respect to that weight."
so essentially you're only looking to find the derivative of E
(total error) with respect to Wij , but in order to find that you
will need to know
how much the error depends on the output
how much the output depends on the activation (which depends
on the weights)
It is not straight forward to type formulas here but I've put a
relevant document that describes the procedure exactly and the
encog book which i cannot guarantee it goes that deep, in my
public dropbox for you. the link is
:https://dl.dropbox.com/u/45723414/for_Tim.tar.gz ...I'll leave it
up until tomorrow...