Hello!
I'm new to Chainer, but already love it! It's so clean and simple!!
In order to learn to use it, I'm trying to implement my own non-linear optimizer.
In particular, if f : R^m -> R^n is a nonlinear function that maps vector x in R^m to a vector y in R^n,
then the derivative of y = f(x) with respect to x is a m x n Jacobian matrix.
However, I have not been able to get Chainer to give me a x.grad that is a Jacobian matrix.
Instead, I get a vector which is the sum of the Jacobian over the columns x.grad = sum(Jacobian,axis=0)
Is it possible to get a Jacobian as a .grad output directly, or do I have to iterate over each dimension of y and build the rows of the Jacobian one at a time?
Here's a super simple example, just with matrix multiplication:
>>import numpy as np
>>import chainer
>>from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
>>from chainer import Link, Chain, ChainList
>>import chainer.functions as F
>>import chainer.links as L
>>W = Variable(np.arange(15).reshape((3,5)).astype(np.float32))
array([[ 0., 1., 2., 3., 4.],
[ 5., 6., 7., 8., 9.],
[ 10., 11., 12., 13., 14.]], dtype=float32)
>>x = Variable(np.arange(5).astype(np.float32))
>>array([ 0., 1., 2., 3., 4.], dtype=float32)
array([[ 30.],
[ 80.],
[ 130.]], dtype=float32)
>>y.grad = np.ones((3,1),dtype=np.float32)
array([ 15., 18., 21., 24., 27.], dtype=float32)
array([ 15., 18., 21., 24., 27.], dtype=float32)
The gradient of y = W x with respect to x should be W, a (3,5) matrix.
But instead, we get the rows of W summed together.
How can we get the full Jacobian matrix as the x.grad ?
Thanks!!