Using a chainer module in Pytorch code.

MS

unread,

Apr 6, 2018, 12:03:03 PM4/6/18

to Chainer User Group

Hello,

I am trying to use a custom chainer.Function class (written by somebody else) in my Pytorch code. To this end, I intend to write a Pytorch module on top of the chainer module, so that the chainer module is actually called for all computations. For example, if chainerModule is a custom chainer.Function class with defined forward and backward functions, I create a Pytorch torch.autograd.Function class, and a Pytorch torch.nn.Module class. The autograd.Function class's forward function calls the chainerModule's forward function, and the same for the backward. However, I observe absurd gradients returned by the backward of the chainerModule. Is calling the backward of the chainerModule in such a manner inconsistent with chainer's rules? Am I missing something by forcing the call to the backward function? If so, is there another way to use the chainer module in Pytorch code?

I include below a bird's-eye view of my code.

class chainerModule(chainer.Function):
    def __init__():
        ...
    def forward(self, inputs):
        ...
    def backward(self, inputs, grad_outputs):
        ...

class PytorchFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, inputs):
        ctx.save_for_backward(inputs)
        ctx.CM = chainerModule()
        # Convert inputs from Pytorch tensor to CuPy matrix here ...
        outputs    = ctx.CM(inputs)
    @staticmethod
    def backward(ctx, grad_outputs):
        inputs,    = ctx.saved_variables
        # Convert inputs and grad_outputs from Pytorch tensors to CuPy matrices here ...
        grad       = ctx.CM.backward(inputs, grad_outputs)
        # Convert grad to Pytorch Variable here ...
        return grad

class PytorchModule(torch.nn.Module):
    def __init__():
        ...
    def forward(self, inputs):
        return PytorchFunction.apply(inputs)

Thanks very much, and have a nice day,
Mihir

Kenta Oono

unread,

Apr 18, 2018, 11:08:57 AM4/18/18

to Chainer User Group

Hi

I'm not confident it can be helpful but we usually call the backward method of a variable of which we want to compute gradient (e.g. a loss value) in ordinal workflow. So, how about the following procedure?

1. Hold outputs variable in forward propagation

2. Set grad_outputs to `.grad` attribute of outputs, manually

3. Call outputs.backward

4. Extract grad attribute of inputs

I think it would be good to do experiments with a Chainer function that implements simple transform (e.g. identity function).

Best

Kenta

2018年4月7日土曜日 1時03分03秒 UTC+9 MS:

mihir.s.sa...@gmail.com

unread,

May 4, 2018, 8:33:07 AM5/4/18

to Chainer User Group

Hi,

Thanks for your reply.

Actually, the short code I wrote above worked finally (there was another bug which was why it wasn't working the first time). But the general strategy of explicitly calling forwards and backwards seems to work.

Thanks again,

Mihir

Kenta Oono

unread,

May 7, 2018, 7:45:30 PM5/7/18

to Chainer User Group

Hi

I'm glad to hear that. Thank you for using Chainer.

Best

Kenta

Reply all

Reply to author

Forward