Casadi - Pytorch Callback. Adding 'get_reverse' for automatic backwards differentiation

1,106 views
Skip to first unread message

Bruno M

unread,
May 16, 2020, 7:45:19 AM5/16/20
to CasADi
Hi all,

I'm working on a Pytorch callaback. The following code works well

import casadi as ca
import torch
import torch.nn.functional as F
import numpy as np

class Net(torch.nn.Module):
def __init__(self, n_feature, n_hidden, n_output):
super(Net, self).__init__()
self.input = torch.nn.Linear(n_feature, n_hidden) # hidden layer
self.hidden = torch.nn.Linear(n_hidden, n_hidden) # hidden layer
self.predict = torch.nn.Linear(n_hidden, n_output) # output layer
self.device = torch.device("cpu")
self.dtype = torch.float
self.n_feature = n_feature
self.n_output = n_output

def forward(self, x):
# activation function for hidden layer
x = F.relu(self.input(x))
x = F.relu(self.hidden(x))
# linear output, here r should be the output
r = self.predict(x)
return r

net = Net(n_feature=2, n_hidden=10, n_output=1)

class NetCallback(ca.Callback):
    def __init__(self, name, net, opts={}):
        ca.Callback.__init__(self)
        self.net = net
        self.construct(name, opts)

    def get_n_in(self): return 1

    def get_n_out(self): return 1

    def get_sparsity_in(self, i):
        return ca.Sparsity.dense(self.net.n_feature, 1)

    def get_sparsity_out(self, i):
        return ca.Sparsity.dense(self.net.n_output, 1)

    def eval(self, arg):
        arg0 = torch.tensor(np.array(arg[0]).T, device=self.net.device, dtype=self.net.dtype)
        return [ca.DM(self.net(arg0).detach().numpy())]

w = ca.MX.sym('w', 2)
casadi_net = NetCallback('test', net, {"enable_fd": True})

prob = {'f': casadi_net(w), 'x': w, }
options = {"ipopt": {"hessian_approximation": "limited-memory"}}
solver = ca.nlpsol('solver', 'ipopt', prob, options)
sol = solver(x0=ca.DM([1, 3]))
w_opt = sol['x'].full().flatten()

but now I want to take advantage of the backward differentiation that pytorch provides. I am looking at this blog post (https://web.casadi.org/blog/tensorflow/) and trying to do the same thing. 

Now pytorch computes the backward AD with the method '.backward()'. But it does it on pytorch tensors, that as far as I understood are not symbolic like tensorflow graphs. So I don't know how to do it.

I know that the get_reverse should return a casadi function of the AD. Do you have an idea how I can proceed? 

def get_reverse(self, nadj, name, inames, onames, opts):

    self.net.forward()
    adj_seed = [torch.Tensor] #???? here I am stuck how I should define the seeds

    nominal_in = self.mx_in()
    nominal_out = self.mx_out()
    adj_seed = self.mx_out()
    return ca.Function(name, nominal_in + nominal_out + adj_seed, ca.callback.call(nominal_in + adj_seed),
                           inames, onames)



Joris Gillis

unread,
May 16, 2020, 8:32:31 AM5/16/20
to CasADi
Dear Bruno,

The trick is to return an instance of another Callback.
It should be fairly easy to create an abstraction on this whole Calback thing were you simple provide numerical evaluation and numerical derivatives.

I'm open to syntax suggestions..

Best regards,
Joris

Bruno M

unread,
May 16, 2020, 12:12:40 PM5/16/20
to CasADi
Hi Joris,

I see, like in the tensorflow case. Ok I'll have a look and let you know tomorrow.

Thanks!

Bruno M

unread,
May 17, 2020, 4:19:05 AM5/17/20
to CasADi
Hi Joris,

here is were I am at the moment. I am testing the PytorchEvaluator alone by trying to solve an nlp


class PytorchEvaluator(ca.Callback):
def __init__(self, t_in, t_out, opts={}):
"""
t_in: list of inputs (pytorch tensors)
t_out: list of outputs (pytorch tensors)
"""
ca.casadi.Callback.__init__(self)
assert isinstance(t_in, list)
self.t_in = t_in
assert isinstance(t_out, list)
self.t_out = t_out
self.construct("PytorchEvaluator", opts)
self.refs = []

def get_n_in(self): return len(self.t_in)

def get_n_out(self): return len(self.t_out)

def get_sparsity_in(self, i):
return ca.Sparsity.dense(*list(self.t_in[i].size()))

def get_sparsity_out(self, i):
return ca.Sparsity.dense(*list(self.t_out[i].size()))

def eval(self, arg):
# arg0 = torch.tensor(np.array(arg[0]).T, device=torch.device('cpu'), dtype=torch.float)
return [ca.DM(arg0.detach().numpy()) for arg0 in self.t_out]

# Vanilla tensorflow offers just the reverse mode AD
def has_reverse(self, nadj): return nadj == 1

def get_reverse(self, nadj, name, inames, onames, opts):
        # Construct tensorflow placeholders for the reverse seeds
adj_seed = [torch.ones(self.sparsity_out(i).shape[0], dtype=torch.float, device=torch.device("cpu"),
requires_grad=True) for i in
range(self.n_out())]

# Create another TensorFlowEvaluator object
for i, t_out in enumerate(self.t_out):
t_out.backward(torch.ones(self.t_in[i].size()))

out = [t_in.grad for t_in in self.t_in]

# callback = PytorchEvaluator(self.t_in + adj_seed, self.t_out.backward())
callback = PytorchEvaluator(self.t_in + adj_seed, out)
# Make sure you keep a reference to it
self.refs.append(callback)

# Package it in the nominal_in+nominal_out+adj_seed form that CasADi expects
        nominal_in = self.mx_in()
nominal_out = self.mx_out()
adj_seed = self.mx_out()
        return ca.Function(name, nominal_in + nominal_out + adj_seed, callback.call(nominal_in + adj_seed),
inames, onames)
# return ca.Function(name, nominal_in, callback.call(nominal_in))

x = torch.tensor([4], dtype=torch.float, device=torch.device('cpu'), requires_grad=True)
y = (x - 2) ** 2
evaluator = PytorchEvaluator([x], [y])

w = ca.MX.sym('w')

prob = {'f': evaluator.call([w])[0], 'x': w, }
solver = ca.nlpsol('solver', 'ipopt', prob)
sol = solver(x0=ca.DM([1]))
print(sol['x'])

I think I am almost getting there. Now I am a stuck in the following problem. As I understood I need to provide the numerical gradients. This is done in the list out.
The first time get_reverse() is called everything works fine. But when I create the function 

ca.Function(name, nominal_in + nominal_out + adj_seed, callback.call(nominal_in + adj_seed),
                   inames, onames)

It seems that get_reverse() is called a second time. This causes an error since I lose the gradient graph once I compute t_in.grad once. 

Why is get_reverse() called twice?


Here is the error I get

RuntimeError: .../casadi/core/function_internal.cpp:144: Error calling IpoptInterface::init for 'solver':
Error in Function::factory for 'nlp' [MXFunction] at .../casadi/core/function.cpp:1634:
Failed to create nlp_hess_l:[x, p, lam:f, lam:g]->[hess:gamma:x:x] with {"gamma": [f, g]}:
.../casadi/core/factory.hpp:387: Hessian generation failed:
Error in MX::hessian at .../casadi/core/mx.cpp:1679:
Error in MX::jacobian at .../casadi/core/mx.cpp:1663:
Error in XFunction::jac for 'helper_jacobian_MX' [MXFunction] at .../casadi/core/x_function.hpp:716:
Error in MXFunction::ad_forward at .../casadi/core/mx_function.cpp:831:
Error in MX::ad_forward for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2030:
Error in Call::ad_forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/casadi_call.cpp:123:
Error in Function::forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/function.cpp:1017:
Error in XFunction::get_forward for 'adj1_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:763:
Error in MXFunction::ad_forward at .../casadi/core/mx_function.cpp:831:
Error in MX::ad_forward for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2030:
Error in Call::ad_forward for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/casadi_call.cpp:123:
Error in Function::jacobian for 'wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/function.cpp:824:
Error in XFunction::get_jacobian for 'wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:888:
Error in XFunction::jac for 'flattened_jac_wrap_PytorchEvaluator' [MXFunction] at .../casadi/core/x_function.hpp:716:
Error in MXFunction::ad_reverse at .../casadi/core/mx_function.cpp:1042:
Error in MX::ad_reverse for node of type N6casadi4CallE at .../casadi/core/mx.cpp:2039:
Error in Call::ad_reverse for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/casadi_call.cpp:147:
Error in Function::reverse for 'PytorchEvaluator' [CallbackInternal] at .../casadi/core/function.cpp:1025:
.../casadi/core/callback_internal.cpp:153: Error calling "get_reverse" for object PytorchEvaluator:
.../casadi/build/swig/casadiPYTHON_wrap.cxx:3798: element 0 of tensors does not require grad and does not have a grad_fn

Joris Gillis

unread,
May 17, 2020, 4:28:05 AM5/17/20
to CasADi
Our default for ipopt is 'hessian_approximation' 'exact'.
You could either switch to 'limited-memory', or create the reverse PyTorchEvaluator with 'enable_fd' to do finite diff for second derivative.

I suspect that you could get an infinitely differentiable PyTorchEvaluator too with a bit more effort.

Best,
Joris

Bruno M

unread,
May 20, 2020, 5:02:31 AM5/20/20
to CasADi
Hi Joris,

ok. I'm working on it. I'll make the code available when I am done.

By the way, what is nadj that goes as the input of the .reverse() and .forward() method? The documentation says

Get a function that calculates nadj adjoint derivatives.

but what does it mean? The number of variables used for the derivatives? Ordered from the first one?

Joris Gillis

unread,
May 20, 2020, 5:06:49 AM5/20/20
to CasADi
nadj is used to lump together a series of adjoint sweeps. This may be beneficial for evaliation speed in some cases.
You may also ignore it and just specify that your Callback can only have nadj=1 as maximum
Message has been deleted
Message has been deleted

Bruno M

unread,
May 21, 2020, 7:40:42 AM5/21/20
to CasADi
Hi Joris,

I learned that Pytorch computes immediately the graph and cannot postpone it like tensorflow does with the placeholders. So for a function y=f(x) I need to give a value for the x, and f(x) will be immediately computed. This means that it is hard to build a callback that given, for example, a DX variable, transforms that in tensor, compute f(x), re-transform that back to DX and returns it. So I can't leave the x free, to be imputed afterwards, because f(x) is executed with the given value as soon as I create the function.

I don't know if I can get around it somehow...

Adam Hall

unread,
Feb 15, 2021, 11:32:55 PM2/15/21
to CasADi
Any luck here? 
I'm trying to build off of the gpflow + casadi example to do the same with gpytorch + casadi.

br.mo...@gmail.com

unread,
Feb 22, 2021, 12:04:03 PM2/22/21
to CasADi
Hi,
no, at the end I didn't manage

Message has been deleted

Tim Salzmann

unread,
Feb 5, 2023, 7:48:03 PM2/5/23
to CasADi
Hi,

for everyone stumbling across this thread (like I did initially).

I ended up writing a small library enabling the use of trained PyTorch models in a CasADi graph. While dense multi-layer perceptron models are supported directly, more complicated models can be used as approximations.

For more details see


and

Reply all
Reply to author
Forward
0 new messages