So, when we calculate the gradients, we do so with respect to our loss function, not just Q. Because we are trying to minimize the loss function (squared difference between our target and our prediction, where our target is the reward we saw after taking an action plus the q-value of the following state, and our prediction is the q-value of the state we acted from), not minimize our Q-values. The gradient for a linear function approximator would then be the error (difference) * W. In a neural network like DQN, we have a composition of many functions, because of the sequential layers. When we calculate the gradients for a neural network, we use an algorithm called "backpropagation", where we just use the chain rule to calculate the derivatives for previous layers.
I recommend reading
this to understand backpropogation in neural networks. The whole e-book is really good.
On Tuesday, February 23, 2016 at 9:04:03 PM UTC-5, Ashley Edwards wrote:
Hi everyone,
I am trying to understand how to algebraically compute the gradient of the loss function in DQN. Particularly, I'm interested in computing the partial derivative of Q w.r.t. the weights W. I know most libraries will compute this automatically but I'd like to know what is actually happening. With a linear function approximator, it is clear that the derivative is phi since Q = W * phi. What is the derivative for DQN? It's hard for me to interpret what the partial derivative will be since Q is based on a forward pass through the network and many different computations.
Thanks!