Using a chainer module in Pytorch code.

256 views
Skip to first unread message

MS

unread,
Apr 6, 2018, 12:03:03 PM4/6/18
to Chainer User Group
Hello,

I am trying to use a custom chainer.Function class (written by somebody else) in my Pytorch code. To this end, I intend to write a Pytorch module on top of the chainer module, so that the chainer module is actually called for all computations. For example, if chainerModule is a custom chainer.Function class with defined forward and backward functions, I create a Pytorch torch.autograd.Function class, and a Pytorch torch.nn.Module class. The autograd.Function class's forward function calls the chainerModule's forward function, and the same for the backward. However, I observe absurd gradients returned by the backward of the chainerModule. Is calling the backward of the chainerModule in such a manner inconsistent with chainer's rules? Am I missing something by forcing the call to the backward function? If so, is there another way to use the chainer module in Pytorch code?

I include below a bird's-eye view of my code.


class chainerModule(chainer.Function):
    def __init__():
        ...
    def forward(self, inputs):
        ...
    def backward(self, inputs, grad_outputs):
        ...


class PytorchFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, inputs):
        ctx.save_for_backward(inputs)
        ctx.CM     = chainerModule()
        # Convert inputs from Pytorch tensor to CuPy matrix here ...
        outputs    = ctx.CM(inputs)
    @staticmethod
    def backward(ctx, grad_outputs):
        inputs,    = ctx.saved_variables
        # Convert inputs and grad_outputs from Pytorch tensors to CuPy matrices here ...
        grad       = ctx.CM.backward(inputs, grad_outputs)
        # Convert grad to Pytorch Variable here ...
        return grad

class PytorchModule(torch.nn.Module):
    def __init__():
        ...
    def forward(self, inputs):
        return PytorchFunction.apply(inputs)
 

Thanks very much, and have a nice day,
Mihir

Kenta Oono

unread,
Apr 18, 2018, 11:08:57 AM4/18/18
to Chainer User Group
Hi

I'm not confident it can be helpful but we usually call the backward method of a variable of which we want to compute gradient (e.g. a loss value) in ordinal workflow. So, how about the following procedure?

1. Hold outputs variable in forward propagation
2. Set grad_outputs to `.grad` attribute of outputs, manually
3. Call outputs.backward
4. Extract grad attribute of inputs

I think it would be good to do experiments with a Chainer function that implements simple transform (e.g. identity function).

Best
Kenta

2018年4月7日土曜日 1時03分03秒 UTC+9 MS:

mihir.s.sa...@gmail.com

unread,
May 4, 2018, 8:33:07 AM5/4/18
to Chainer User Group
Hi, 

Thanks for your reply. 
Actually, the short code I wrote above worked finally (there was another bug which was why it wasn't working the first time). But the general strategy of explicitly calling forwards and backwards seems to work.

Thanks again,
Mihir

Kenta Oono

unread,
May 7, 2018, 7:45:30 PM5/7/18
to Chainer User Group
Hi

I'm glad to hear that. Thank you for using Chainer.

Best
Kenta
Reply all
Reply to author
Forward
0 new messages