Casadi - Pytorch Callback. Adding 'get_reverse' for automatic backwards differentiation

Bruno M

unread,

May 16, 2020, 7:45:19 AM5/16/20

to CasADi

Hi all,

I'm working on a Pytorch callaback. The following code works well

import casadi as ca
import torch
import torch.nn.functional as F
import numpy as np

class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.input = torch.nn.Linear(n_feature, n_hidden)  # hidden layer
        self.hidden = torch.nn.Linear(n_hidden, n_hidden)  # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)  # output layer
        self.device = torch.device("cpu")
        self.dtype = torch.float
        self.n_feature = n_feature
        self.n_output = n_output

    def forward(self, x):
        # activation function for hidden layer
        x = F.relu(self.input(x))
        x = F.relu(self.hidden(x))
        # linear output, here r should be the output
        r = self.predict(x)
        return r

net = Net(n_feature=2, n_hidden=10, n_output=1)

class NetCallback(ca.Callback):
    def __init__(self, name, net, opts={}):
        ca.Callback.__init__(self)
        self.net = net
        self.construct(name, opts)

    def get_n_in(self): return 1

    def get_n_out(self): return 1

    def get_sparsity_in(self, i):
        return ca.Sparsity.dense(self.net.n_feature, 1)

    def get_sparsity_out(self, i):
        return ca.Sparsity.dense(self.net.n_output, 1)

    def eval(self, arg):
        arg0 = torch.tensor(np.array(arg[0]).T, device=self.net.device, dtype=self.net.dtype)
        return [ca.DM(self.net(arg0).detach().numpy())]

w = ca.MX.sym('w', 2)
casadi_net = NetCallback('test', net, {"enable_fd": True})

prob = {'f': casadi_net(w), 'x': w, }
options = {"ipopt": {"hessian_approximation": "limited-memory"}}
solver = ca.nlpsol('solver', 'ipopt', prob, options)
sol = solver(x0=ca.DM([1, 3]))
w_opt = sol['x'].full().flatten()

but now I want to take advantage of the backward differentiation that pytorch provides. I am looking at this blog post (https://web.casadi.org/blog/tensorflow/) and trying to do the same thing.

Now pytorch computes the backward AD with the method '.backward()'. But it does it on pytorch tensors, that as far as I understood are not symbolic like tensorflow graphs. So I don't know how to do it.

I know that the get_reverse should return a casadi function of the AD. Do you have an idea how I can proceed?

def get_reverse(self, nadj, name, inames, onames, opts):

    self.net.forward()
    adj_seed = [torch.Tensor] #???? here I am stuck how I should define the seeds

    nominal_in = self.mx_in()
    nominal_out = self.mx_out()
    adj_seed = self.mx_out()
    return ca.Function(name, nominal_in + nominal_out + adj_seed, ca.callback.call(nominal_in + adj_seed),
                           inames, onames)

Joris Gillis

unread,

May 16, 2020, 8:32:31 AM5/16/20

to CasADi

Dear Bruno,

The trick is to return an instance of another Callback.
It should be fairly easy to create an abstraction on this whole Calback thing were you simple provide numerical evaluation and numerical derivatives.

I'm open to syntax suggestions..

Best regards,
Joris

Bruno M

unread,

May 16, 2020, 12:12:40 PM5/16/20

to CasADi

Hi Joris,

I see, like in the tensorflow case. Ok I'll have a look and let you know tomorrow.

Thanks!

Bruno M

unread,

May 17, 2020, 4:19:05 AM5/17/20

to CasADi

Hi Joris,

here is were I am at the moment. I am testing the PytorchEvaluator alone by trying to solve an nlp

class PytorchEvaluator(ca.Callback):
    def __init__(self, t_in, t_out, opts={}):
        """
          t_in: list of inputs (pytorch tensors)
          t_out: list of outputs (pytorch tensors)
        """
        ca.casadi.Callback.__init__(self)
        assert isinstance(t_in, list)
        self.t_in = t_in
        assert isinstance(t_out, list)
        self.t_out = t_out
        self.construct("PytorchEvaluator", opts)
        self.refs = []

    def get_n_in(self): return len(self.t_in)

    def get_n_out(self): return len(self.t_out)

    def get_sparsity_in(self, i):
        return ca.Sparsity.dense(*list(self.t_in[i].size()))

    def get_sparsity_out(self, i):
        return ca.Sparsity.dense(*list(self.t_out[i].size()))

    def eval(self, arg):
        # arg0 = torch.tensor(np.array(arg[0]).T, device=torch.device('cpu'), dtype=torch.float)
        return [ca.DM(arg0.detach().numpy()) for arg0 in self.t_out]

    # Vanilla tensorflow offers just the reverse mode AD
    def has_reverse(self, nadj): return nadj == 1


    def get_reverse(self, nadj, name, inames, onames, opts):

        # Construct tensorflow placeholders for the reverse seeds
        adj_seed = [torch.ones(self.sparsity_out(i).shape[0], dtype=torch.float, device=torch.device("cpu"),
                               requires_grad=True) for i in
                    range(self.n_out())]

        # Create another TensorFlowEvaluator object
        for i, t_out in enumerate(self.t_out):
            t_out.backward(torch.ones(self.t_in[i].size()))

        out = [t_in.grad for t_in in self.t_in]

        # callback = PytorchEvaluator(self.t_in + adj_seed, self.t_out.backward())
        callback = PytorchEvaluator(self.t_in + adj_seed, out)
        # Make sure you keep a reference to it
        self.refs.append(callback)

        # Package it in the nominal_in+nominal_out+adj_seed form that CasADi expects

        nominal_in = self.mx_in()
        nominal_out = self.mx_out()
        adj_seed = self.mx_out()

        return ca.Function(name, nominal_in + nominal_out + adj_seed, callback.call(nominal_in + adj_seed),
                           inames, onames)
        # return ca.Function(name, nominal_in, callback.call(nominal_in))

x = torch.tensor([4], dtype=torch.float, device=torch.device('cpu'), requires_grad=True)
y = (x - 2) ** 2
evaluator = PytorchEvaluator([x], [y])

w = ca.MX.sym('w')

prob = {'f': evaluator.call([w])[0], 'x': w, }
solver = ca.nlpsol('solver', 'ipopt', prob)
sol = solver(x0=ca.DM([1]))
print(sol['x'])

I think I am almost getting there. Now I am a stuck in the following problem. As I understood I need to provide the numerical gradients. This is done in the list out.
The first time get_reverse() is called everything works fine. But when I create the function

ca.Function(name, nominal_in + nominal_out + adj_seed, callback.call(nominal_in + adj_seed),
                   inames, onames)

It seems that get_reverse() is called a second time. This causes an error since I lose the gradient graph once I compute t_in.grad once.

Why is get_reverse() called twice?

Here is the error I get

RuntimeError: .../casadi/core/function_internal.cpp:144: Error calling IpoptInterface::init for 'solver':

Error in Function::factory for 'nlp' [MXFunction] at .../casadi/core/function.cpp:1634:

Failed to create nlp_hess_l:[x, p, lam:f, lam:g]->[hess:gamma:x:x] with {"gamma": [f, g]}:

.../casadi/core/factory.hpp:387: Hessian generation failed:

Error in MX::hessian at .../casadi/core/mx.cpp:1679:

Error in MX::jacobian at .../casadi/core/mx.cpp:1663:

Error in XFunction::jac for 'helper_jacobian_MX' [MXFunction] at .../casadi/core/x_function.hpp:716:

Error in MXFunction::ad_forward at .../casadi/core/mx_function.cpp:831:

Error in MX::ad_forward for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2030:

Error in Call::ad_forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/casadi_call.cpp:123:

Error in Function::forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/function.cpp:1017:

Error in XFunction::get_forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:763:

Error in MXFunction::ad_forward at .../casadi/core/mx_function.cpp:831:

Error in MX::ad_forward for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2030:

Error in Call::ad_forward for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/casadi_call.cpp:123:

Error in Function::jacobian for 'wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/function.cpp:824:

Error in XFunction::get_jacobian for 'wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:888:

Error in XFunction::jac for 'flattened_jac_wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:716:

Error in MXFunction::ad_reverse at .../casadi/core/mx_function.cpp:1042:

Error in MX::ad_reverse for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2039:

Error in Call::ad_reverse for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/casadi_call.cpp:147:

Error in Function::reverse for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/function.cpp:1025:

.../casadi/core/callback_internal.cpp:153: Error calling "get_reverse" for object PytorchEvaluator:

.../casadi/build/swig/casadiPYTHON_wrap.cxx:3798: element 0 of tensors does not require grad and does not have a grad_fn

Joris Gillis

unread,

May 17, 2020, 4:28:05 AM5/17/20

to CasADi

Our default for ipopt is 'hessian_approximation' 'exact'.
You could either switch to 'limited-memory', or create the reverse PyTorchEvaluator with 'enable_fd' to do finite diff for second derivative.

I suspect that you could get an infinitely differentiable PyTorchEvaluator too with a bit more effort.

Best,
Joris

Bruno M

unread,

May 20, 2020, 5:02:31 AM5/20/20

to CasADi

Hi Joris,

ok. I'm working on it. I'll make the code available when I am done.

By the way, what is nadj that goes as the input of the .reverse() and .forward() method? The documentation says

Get a function that calculates nadj adjoint derivatives.

but what does it mean? The number of variables used for the derivatives? Ordered from the first one?

Joris Gillis

unread,

May 20, 2020, 5:06:49 AM5/20/20

to CasADi

nadj is used to lump together a series of adjoint sweeps. This may be beneficial for evaliation speed in some cases.
You may also ignore it and just specify that your Callback can only have nadj=1 as maximum

Message has been deleted

Bruno M

unread,

May 21, 2020, 7:40:42 AM5/21/20

to CasADi

Hi Joris,

I learned that Pytorch computes immediately the graph and cannot postpone it like tensorflow does with the placeholders. So for a function y=f(x) I need to give a value for the x, and f(x) will be immediately computed. This means that it is hard to build a callback that given, for example, a DX variable, transforms that in tensor, compute f(x), re-transform that back to DX and returns it. So I can't leave the x free, to be imputed afterwards, because f(x) is executed with the given value as soon as I create the function.

I don't know if I can get around it somehow...

Adam Hall

unread,

Feb 15, 2021, 11:32:55 PM2/15/21

to CasADi

Any luck here?

I'm trying to build off of the gpflow + casadi example to do the same with gpytorch + casadi.

br.mo...@gmail.com

unread,

Feb 22, 2021, 12:04:03 PM2/22/21

to CasADi

Hi,

no, at the end I didn't manage

Message has been deleted

Tim Salzmann

unread,

Feb 5, 2023, 7:48:03 PM2/5/23

to CasADi

Hi,

for everyone stumbling across this thread (like I did initially).

I ended up writing a small library enabling the use of trained PyTorch models in a CasADi graph. While dense multi-layer perceptron models are supported directly, more complicated models can be used as approximations.

For more details see

https://github.com/TUM-AAS/ml-casadi

and

https://arxiv.org/pdf/2203.07747.pdf

Reply all

Reply to author

Forward