Hessians for multivariate functions?

Stephen Welch

unread,

Jun 17, 2014, 5:40:29 PM6/17/14

to alg...@googlegroups.com

Hello:

I'm fitting systems of ordinary differential equations of the form dy/dt=f(y,p,t) where y is the vector of model outputs an p is a vector of parameters. To accomplish what I want to do, I need the Hessian of each component of f wrt to p each time f is called.

I'm quite new to AlgoPy but I've been both googling and reading such published papers as I can find and have not come across anything exactly meeting this need. I'm guessing that this either requires using a multi-dimensional tensor or some sort of iteration over the components of f.

I'd be interested in both forward and reverse methods of doing this.

(As an aside, I teach a grad level modeling class and am very interested in including AlgoPy next semester.)

Steve Welch

Sebastian Walter

unread,

Jun 18, 2014, 9:38:09 AM6/18/14

to alg...@googlegroups.com

Hello Stephen,

On Tue, Jun 17, 2014 at 11:40 PM, Stephen Welch <stephen.middl...@gmail.com> wrote:

Hello:

I'm fitting systems of ordinary differential equations of the form dy/dt=f(y,p,t) where y is the vector of model outputs an p is a vector of parameters. To accomplish what I want to do, I need the Hessian of each component of f wrt to p each time f is called.

I'm quite new to AlgoPy but I've been both googling and reading such published papers as I can find and have not come across anything exactly meeting this need. I'm guessing that this either requires using a multi-dimensional tensor or some sort of iteration over the components of f.

What you want to do requires some additional for loops and use of more low-level driver functions.

The following example shows how to compute the 3-tensor H in the forward and the combined forward-reverse mode.

```

import algopy

import numpy

N = 3

A = numpy.random.random((N, N))

def f(x):

return algopy.dot(A, x**2)

# forward mode

x = numpy.random.random(N)

x = algopy.UTPM.init_hessian(x)

y = f(x)

H = numpy.zeros((N, N, N))

for i in range(N):

H[i] = algopy.UTPM.extract_hessian(N, y[i])

print H

H_forward = H

# reverse mode

cg = algopy.CGraph()

x = algopy.Function(numpy.random.random(N))

y = f(x)

cg.trace_off()

cg.independentFunctionList = [x]

cg.dependentFunctionList = [y]

x = algopy.UTPM.init_jacobian(numpy.random.random(N))

cg.pushforward([x])

H = numpy.zeros((N,N,N))

y_bar = algopy.UTPM(numpy.zeros(y.x.data.shape))

for i in range(N):

y_bar[:] = 0

y_bar[i] = 1

cg.pullback([y_bar])

H[i, :, :] = cg.independentFunctionList[0].xbar.data[1]

H_reverse = H

print H_forward - H_reverse

```

If you need such "Hessians" on a regular basis I can add additional driver routines to AlgoPy.

I'd be interested in both forward and reverse methods of doing this.

If N is of moderate size, the 2nd order forward mode should be faster and easier to handle than the combined reverse mode.

So I would start with the forward mode and only switch to the reverse mode if necessary.

(As an aside, I teach a grad level modeling class and am very interested in including AlgoPy next semester.)

That's good news. What kind of modeling are you planning to cover?

cheers,

Sebastian

Steve Welch

--
You received this message because you are subscribed to the Google Groups "algopy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to algopy+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephen Welch

unread,

Jun 20, 2014, 11:57:13 AM6/20/14

to alg...@googlegroups.com

Hello Sebastian:

Thanks for your quick reply. It has proved helpful but has not gotten me quite all the way there as yet, probably because I failed to put sufficient details in my first post.

To remedy that I've written some code (below) that gives an explicit example of what we are trying to do. It also incorporates what I was able to deduce either directly or from experiments based on your answer. It is a tinker-toy example but I've included some comments to illustrate the scale of a "real" problem. (The fitting operation requires solving the diffeq system between 100K and 1M times using different parameter vector estimates, which - fortunately - can be readily distributed across a cluster.)

The models that I teach/research deal with genetic network control of plant growth and development, which are often described by systems of diffeqs. Standard ODE solving approaches are adequate, esp. when implemented as in scipy with provisions to pass extra info to the user function that computes the vector of time derivatives. So, if we can solve this problem, it should have wide applications despite the niche aspect of my particular interests.

Toward this end, and to simplify things for students, I was thinking that once it is working, I might write a decorator for f that would do most of the work making driver calls, etc. I'd be happy to share this if there was interest.

I think that the problem in the code that follows has to do with the dx[ ] left hand sides in the f function. I can't get it to run at all without the dx=algopy.zeros_like(p) statement, probably because the right hand sides have an incompatible shape (the error has to do with assigning a sequence to an element). But p has already been altered by the p=algopy.UTPM.init_hessian(p) statement in myEuler. I suspect that the shape of the resulting tensor is such that extract_hessian ends up looking in the wrong place for the values it is supposed to return. The result executes, but the "Hessians" are identically zero.

Here is the code; thanks in advance for any help you can give...

#-------------------------------------------------------------------------------

# Name: Modified LotKa-Volterra equations

# Purpose: To test differentiating a system of diffeq's

# wrt the parameters p=[alpha, beta, gamma, delta]

#

# d_x0/dt = alpha**2*x0-beta**2*x1*x0

# d_x1/dt = gamma**2*x0*x1-delta**2*x1

#

# Author: welchsm

# Created: 19/06/2014

#-------------------------------------------------------------------------------

import algopy

import numpy as np

import matplotlib.pyplot as plt

# Function that calculates time derivatives. The AD task is to calculate

# the partials of these time derivatives wrt to the parameter vector p.

# Although the focus here is on the Hessians, the Jacobean of f wrt p will

# ultimately also be needed.

def f(x,p,e,t):

dx=algopy.zeros_like(p)

dx[0] = (p[0]**2*x[0]-p[1]**2*x[1]*x[0])*e[t]

dx[1] = (p[2]**2*x[0]*x[1]-p[3]**2*x[1])*e[t]

return dx

# The integration routine. Here Euler is used but this will be replaced

# by RK45. Most inputs are clear from context."e" is additional information

# to be passed to f. Normally *args would be used for this.

def myEuler(f,p,x0,t0,tf,dt,Hn,O,e):

X=x0.size # Number of state variables (the x's)

P=p.size # Number of parameters (the p's)

tso=np.zeros(X) # Time integral of state variables

TSO=np.zeros((len(O),X)) # State variables reported at desired time points

HI=np.zeros((len(Hn),P,P)) # Time integral of Hessians

H=np.zeros((len(O),len(Hn),P,P)) # Hessians reported at desired time points

R=0 # Next report number

p=algopy.UTPM.init_hessian(p) # We want the Hessian wrt to p

tso=x0 # Initial value for x

t=t0 # Initial value for time

while t<tf: # Loop thru time interval

dx=f(tso,p,e,t)

# Extract state variarble derivatives for this point in time

# and perform Euler integration

tso+=dt*dx.data[0,0,0:2] # Why is "dx.data[0,0,0:2]" correct?

# It seems messy

t+=dt

# Extract the Hessians for this point in time.

# and perform Euler integration as per Liebnitz Rule

for i,sv in enumerate(Hn):

HI[i]+=dt*algopy.UTPM.extract_hessian(P,dx[sv])

# Check if time to record desired outputs

if t>=O[R]:

TSO[R]=tso

for i in xrange(len(Hn)):

H[R,i]=HI[i]

R+=1

# Done

return TSO,H

def main():

P=4 # Number of parameters to fit (typ. ca. 40)

X=2 # Model state variables (typically. ca. 60)

Hn=[0,1] # State variables for which Hessians needed (typ. <=4)

O=np.arange(0,90)# When are outputs wanted

p=np.array([.25,.12,.10,.07]) # Parameter guesses (set each optimizer interation)

x0=np.array([10.,1.]) # Initial state variable values

e=np.ones(90) # Make-believe scalar weather (normally 4-vectors)

# Integrate diff'eqs

TSO,H=myEuler(f,p,x0,0,89,0.1,Hn,O,e)

# Plot diffeq results

plt.plot(TSO[:,0],TSO[:,1]) # Phase plane plot: x[1](t) vs. x[0](t)

plt.show()

""" Not yet working right - all "Hessian" values are zero """

# Plot d^2x[1]/dp[0]dp[1] vs d^2x[0]/dp[2]dp[3] over time

#plt.plot(H[:,1,0,1],H[:,0,2,3])

#plt.show()

if __name__ == '__main__':

main()

Stephen Welch

unread,

Jun 20, 2014, 12:34:39 PM6/20/14

to alg...@googlegroups.com

Hello Sebastian:

Thank you for your quick reply. It is helpful but hasn't gotten me all the way there, probably because I did not give enough details in my original post. Toward that end, I have written some code (attached) that gives an explicit example of what we are trying to do. It is a tinker-toy version but I've added some comments to show what the scale of a real problem would be. The fitting operation will typically require integrating the system of differential equations anywhere from 100K to 1M times using a different vector of parameter estimates in each case. Fortunately, because they are largely or totally independent of one another, these integrations can be easily distributed across a cluster. The code is based on information from your reply and also some trial and error of my own.

The models that I teach/research have to do with the genetic network control of plant development and growth and are often expressed as systems of ODE's. Common ODE solvers are adequate, especially when implemented as in scipy with options to pass extra information to the user function, f, that calculates the time derivatives. This means that if this can be made to work, there should be broad applications despite the niche aspect of my particular interests.

Toward that end and to simplify things for students, I've been thinking that once it is working, I would write a decorator for f that adds in all needed driver calls, etc. I would be happy to share it if there was interest.

I believe the problem in the attached code has to do with the dx=algopy.zeros_like(p) statement in f. I could not get the program to run without this because the assignment statements that follow it would fail. The error message had to do with trying to store a sequence in a single element. However, by this point the shape of p has already been altered by the p=algopy.UTPM.init_hessian(p) in myEuler. I'm thinking that perhaps the resulting tensor has a shape within which extract_hessian looks in the wrong place for the answers. The program runs to completion but the "Hessians" are all zero.

Thanks in advance for any help you can provide.

Steve Welch

On Wednesday, June 18, 2014 8:38:09 AM UTC-5, Sebastian Walter wrote:

Lotka_Volterra_1.py

Stephen Welch

unread,

Jun 21, 2014, 10:18:11 AM6/21/14

to alg...@googlegroups.com

Progress:

The dx=algopy.zeros_like(p) was, indeed, the problem. I replaced it with dx=2*[[]] which makes the code agnostic to the right hand sides of the two dx[...] assignment statements. Then I figured out which two elements to pull out of the returned list to make the tso Euler step work and everything was fine. (I also had to change which components of the two Hessians to plot - the ones I'd previously selected really were zero.)

So I've got the forward method working; see attached file. Now, as per your suggestion, I will try to see if I can make a reverse version.

Steve Welch

Lotka_Volterra_2.py

Stephen Welch

unread,

Jun 22, 2014, 2:05:36 PM6/22/14

to alg...@googlegroups.com

Hello Sebastian:

I may be misunderstanding how to go about this but I've perhaps run up against a severe problem with using the reverse method in this application. As I said in my first email, what is needed are the partials of the elements of the vector returned by f(x,p,etc.) with respect to the elements of the vector p. In the case of the first derivatives df/dp this is just the Jacobean. In the case of the second derivatives d^2f/dpdp this is a set of Hessians, one for each element of f.

The function f is returning the time derivatives of the variables, x, in a system of ODE's. The p vector contains the parameters of that system and "etc" is other information (eg. forcing function data) for the system.

However, unless I am somehow fooling myself it appears that whatever values happen to be used for x, and "etc" when CGraph is tracing f get "locked in" and are used when later computing the derivatives no matter what the x and "etc" values happen to be at that point.

Is that correct?

Or am I just doing things too simplistically (or incorrectly)?

I hope so because I think there could be real advantages in using the reverse method.

Steve Welch

Sebastian Walter

unread,

Jun 22, 2014, 3:16:36 PM6/22/14

to alg...@googlegroups.com

Hi Stephen,

On Sat, Jun 21, 2014 at 4:18 PM, Stephen Welch <stephen.middl...@gmail.com> wrote:

Progress:

The dx=algopy.zeros_like(p) was, indeed, the problem. I replaced it with dx=2*[[]] which makes the code agnostic to the right hand sides of the two dx[...] assignment statements.

I'd recommend to use

def f(x,p,e,t):

dx=algopy.zeros(p.shape, dtype=p)

dx[0] = (p[0]**2*x[0]-p[1]**2*x[1]*x[0])*e[t]

dx[1] = (p[2]**2*x[0]*x[1]-p[3]**2*x[1])*e[t]

return dx

dx = 2*[[]] creates a list of lists of UTPM instances, but you really want dx to be a UTPM instance.

As for the why:

UTPM instances are mutable data structures, very much alike numpy arrays.

This is also true for "scalar" UTPM instances. However, scalars in Python are **immutable**. This often results in unexpected and hard to track bugs.

example:

```

In [8]: def f(a):

...: b = a

...: b *= 3

...: return a

In [9]: a = 2.

In [10]: print f(a)

2.0

In [12]: print f(algopy.UTPM.init_jacobian(a))

[[ 6.]

[ 3.]]

```

As one can see, in the first case one obtains 2. as result, but 6. in the UTPM case.

This is troublesome, but I'm not aware how to fix this deficiency of Python. It would require to overload the = operator, which is impossible in Python.

Therefore: always treat UTPM as generalizations of numpy ndarrays, not as generalizations of floats.

Sebastian Walter

unread,

Jun 22, 2014, 3:40:51 PM6/22/14

to alg...@googlegroups.com

Hi Stephen,

On Sun, Jun 22, 2014 at 8:05 PM, Stephen Welch <stephen.middl...@gmail.com> wrote:

Hello Sebastian:

I may be misunderstanding how to go about this but I've perhaps run up against a severe problem with using the reverse method in this application. As I said in my first email, what is needed are the partials of the elements of the vector returned by f(x,p,etc.) with respect to the elements of the vector p. In the case of the first derivatives df/dp this is just the Jacobean. In the case of the second derivatives d^2f/dpdp this is a set of Hessians, one for each element of f.

The function f is returning the time derivatives of the variables, x, in a system of ODE's. The p vector contains the parameters of that system and "etc" is other information (eg. forcing function data) for the system.

However, unless I am somehow fooling myself it appears that whatever values happen to be used for x, and "etc" when CGraph is tracing f get "locked in" and are used when later computing the derivatives no matter what the x and "etc" values happen to be at that point.

Is that correct?

Or am I just doing things too simplistically (or incorrectly)?

No, you are absolutely right. All variables that are not algopy.Function instances enter the computation graph as constant values.

There are two possibilities:

1) regenerate the computational graph every time ``x`` changes.

2) set ``fx = Function(x)`` and trace the function evaluation. Then your computational graph contains a unique node for ``fx``.

Before you call cg.hessian(x) you can simply change the value of the node in the computational Graph using ``fx.x.data = ...```.

I hope so because I think there could be real advantages in using the reverse method.

The reverse mode only excels when ``p.size`` is much greater than 10.

cheers,

Sebastian

Stephen Welch

unread,

Jun 22, 2014, 5:22:37 PM6/22/14

to alg...@googlegroups.com

Hi Sebastian:

This helps a lot. Over the next few days I'll do further experimentation with both the forward and reverse methods. I'll incorporate your suggestions in the forward method and I've also got some ideas how to reduce the amount of computation.

I do want to continue to look at the reverse approach because (at least in my arena) p.size is often much greater than 10 - not uncommon values can range from 20 to 40. And x.size may go as high as 60. (This is why the life sciences are moving rapidly into high performance computing.)

Note, however, that these numbers suggest that the number of parameters per x component will be rather small. This means that the Jacobean for f and the Hessian for each component of f will have a great many zeros.

Does either method (forward or reverse) have an advantage over the other in this kind of situation?

Also, unlike the situation in Listing 13 of your paper (J. Computational Sci. 4:334-344, 2013), the derivatives are not needed every time step. So that may offer some economies as well.