[ADMB Users] non-convergence and "inner maxg = 0"

86 views
Skip to first unread message

Ian Fiske

unread,
Mar 19, 2010, 12:00:00 AM3/19/10
to us...@admb-project.org
Dear ADMBers,

Thank you for freely providing such an amazing piece of software.  I'm super excited about finally having access to all of ADMB's power and flexibility and hope to eventually incorporate it into my R package, unmarked, to replace R's optim as the engine.  Now, my question...

First, I fit a fairly complex model in ADMB and it works great.  Next, I tried to add random effects and use ADMB-RE, but ADMB-RE appears to not converge.  The oddity is that when the program prints out (via cout statements) the parameters, they appear to stabilize to the same values as in the non-RE version.  And (omitting the parms), the program prints

... [I think phase 2 starts here] ...

  Inner second time = 5.48908e+30  Inner f = -3.14161e+61

Newton raphson 1

 inner maxg = 5.48908e+28

0
 inner maxg = 0

0
 inner maxg = 0

0
 inner maxg = 0

0
 inner maxg = 0

which continues seemingly indefinitely (at least several minutes), printing "inner maxg = 0".  If the inner maximum gradient is 0, then why isn't another outer step taken or convergence declared?  Instead, it looks as if the inner optimization just repeats indefinitely.  That is, I never get to Newton raphson 2.

I appreciate any ideas as to what is going on here or why the optimizer is stuck with maxg = 0.  I'm using ADMB 9.1 on Ubuntu 9.10, 64bit.  Let me know if you could use any more details regarding the model or my system.

Thanks!
Ian



--
Ian Fiske
PhD Candidate
Department of Statistics
North Carolina State University

H. Skaug

unread,
Mar 19, 2010, 8:12:15 AM3/19/10
to Ian Fiske, us...@admb-project.org
Hi Ian,

The gradient jumps from 5.48908e+28 to 0 (exactly). That should happen
more gradually.
I suspect something is wrong with your model. See suggestions in
"3.8 Building a random effects model that works" in the Re-manual.
Also, I would start with someting really simple.

Hans

> _______________________________________________
> Users mailing list
> Us...@admb-project.org
> http://lists.admb-project.org/mailman/listinfo/users
>
>
_______________________________________________
Users mailing list
Us...@admb-project.org
http://lists.admb-project.org/mailman/listinfo/users

len...@ii.uib.no

unread,
Mar 19, 2010, 8:28:22 AM3/19/10
to us...@admb-project.org
Hi,

I am an end user of admb, not an expert, so don't necessarily take my word for it, but:

I had the same problem. In my case this was (as far as I could tell) caused by an infinite loop in the file df1b2-separable/dflocmin.cpp, line 74, which reads:

while (counter<20);

This is the end of a do...while loop. The problem is that counter never gets updated, so I just replaced it with

while (++counter<20);

..recompiled ADMB, recompiled my program, and everything was fine. By fine I mean that the program terminates --- there is probably still something wrong with the model, since the termination message now reads:

Error cannot find better value to try and get a positive definite hessian
Can't make a positive definite Hessian


Hope this helps,

-Lennart

dave fournier

unread,
Mar 19, 2010, 2:33:16 AM3/19/10
to us...@admb-project.org
The inner optimization is trying to find the maximum of the
log-likelihood function wrt the random effects parameters conditioned on
the fixed values of the other parameters. There is some numerical
problem at this point. If you post your code I can take a look at it,
and post a note on how to diagnose problems in this case.

Dave

Ian Fiske

unread,
Mar 19, 2010, 2:03:53 PM3/19/10
to us...@admb-project.org
Thanks Hans, Lennart, and Dave for your suggestions and advice.  After
poking around, I found that the problem only occurs if I initialize
the value of the objective function within the PROCEDURE_SECTION.
Specifically, if g is my objective function, mu_r is random, and y is
data, then

g = -nroute*log_sigma_mu - 0.5*norm2(mu_r)/mfexp(2*log_sigma_mu);
g -= -nobs*log_sigma_y - 0.5*norm2(y - Ey)/mfexp(2*log_sigma_y);

causes the inner maxg = 0 infinite loop, whereas

g -= -nroute*log_sigma_mu - 0.5*norm2(mu_r)/mfexp(2*log_sigma_mu);
g -= -nobs*log_sigma_y - 0.5*norm2(y - Ey)/mfexp(2*log_sigma_y);

appears to work (notice g is only decremented, not initialized). Some
examples initialize the objective function (logistic in admb-re
examples) and others do not (e.g., buscycle in admb examples). Which
is correct?

Dave, I've posted my tpl file is below as you generously requested.
Be warned, it's ugly right now -- I'm working on replacing the hideous
indexing with ragged 3d arrays.

Thanks,
Ian

DATA_SECTION

init_int nobs // Number of data points
init_int nroute // Number of routes
init_int nrouteyr // Number of unique
route*year combinations
init_int ncovs // Number of predictors
for each parameters
init_int nparams // Number of parameters
in the seasonal model
init_vector y(1,nobs) // Response vector
init_vector jd(1,nobs) // Julian day
init_vector route(1,nobs) // Vector of routes
init_vector routeyr(1,nobs)
init_vector routeyr_route(1,nrouteyr)
init_matrix X(1,nrouteyr,1,ncovs) // Design matrix of covariates

PARAMETER_SECTION

init_bounded_vector mu_pars(1,ncovs,-10,10,1)
init_bounded_vector beta_pars(1,ncovs,-10,10,1)
init_bounded_vector phi_pars(1,ncovs,-10,10,1)
init_bounded_number log_sigma_y(-5,5,1)
init_bounded_number log_sigma_mu(-5,5,2)

random_effects_vector mu_r(1,nroute,2)

vector Emu_siteyr(1,nrouteyr)
vector Ebeta_siteyr(1,nrouteyr)
vector Ephi_siteyr(1,nrouteyr)
vector mu(1,nrouteyr)
vector beta(1,nrouteyr)
vector phi(1,nrouteyr)
vector Ey(1,nobs)

objective_function_value g

GLOBALS_SECTION

#include <df1b2fun.h>

PROCEDURE_SECTION

Emu_siteyr = X * mu_pars;
Ebeta_siteyr = X * beta_pars;
Ephi_siteyr = X * phi_pars;

// Create route-yr level seasonal curve parameters
for (int i = 1; i <= nrouteyr; i++) {
mu(i) = Emu_siteyr(i) + mu_r(routeyr_route(i));
beta(i) = mfexp(Ebeta_siteyr(i));
phi(i) = mfexp(Ephi_siteyr(i));
}

// Compute expected value for an observation
for (int i = 1; i <= nobs; i++) {
dvariable z = (jd(i) - mu(routeyr(i)))/beta(routeyr(i));
Ey(i) = phi(routeyr(i)) * mfexp(-z + 1) * mfexp(-mfexp(-z));
}

g = -nroute*log_sigma_mu - 0.5*norm2(mu_r)/mfexp(2*log_sigma_mu);
g -= -nobs*log_sigma_y - 0.5*norm2(y - Ey)/mfexp(2*log_sigma_y);

REPORT_SECTION

report << mu_pars << endl;
report << beta_pars << endl;
report << phi_pars << endl;

TOP_OF_MAIN_SECTION

arrmblsize = 40000000L;
gradient_structure::set_GRADSTACK_BUFFER_SIZE(3000000);
gradient_structure::set_CMPDIF_BUFFER_SIZE(200000);
gradient_structure::set_MAX_NVAR_OFFSET(60000);

RUNTIME_SECTION

maximum_function_evaluations 500
convergence_criteria 1.e-3

dave fournier

unread,
Mar 19, 2010, 2:57:39 PM3/19/10
to us...@admb-project.org
I guess I will need the data file to be able to run this.
You can send it to me direct if you want.

--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

dave fournier

unread,
Mar 19, 2010, 3:07:58 PM3/19/10
to us...@admb-project.org

but anyway I think the sign is wrong. I think it should be

g = nroute*log_sigma_mu + 0.5*norm2(mu_r)/mfexp(2*log_sigma_mu);
g += nobs*log_sigma_y+0.5*norm2(y - Ey)/mfexp(2*log_sigma_y);

Also it is better to handle the std dev parameterization differently.

--
David A. Fournier
P.O. Box 2040,
Sidney, B.C. V8l 3S3
Canada
Phone/FAX 250-655-3364
http://otter-rsch.com

Ian Fiske

unread,
Mar 20, 2010, 10:52:35 AM3/20/10
to us...@admb-project.org
but anyway I think the sign is wrong. I think it should be 
g = nroute*log_sigma_mu + 0.5*norm2(mu_r)/mfexp(2*log_sigma_mu); 
g += nobs*log_sigma_y+0.5*norm2(y - Ey)/mfexp(2*log_sigma_y); 

Thanks, Dave!  How embarrassing, given the emphasis on "ADMB does minimization" throughout the docs.


Also it is better to handle the std dev parameterization differently.

Is it better to put std dev inside the norm like the following?
  0.5*norm2((y - Ey)/exp(log_sigma_y))

Thanks,
Ian

Mark Maunder

unread,
Mar 20, 2010, 11:09:30 AM3/20/10
to Ian Fiske, us...@admb-project.org

 

 

I think Dave probably means his trick where you assume that the random effect is N(0,1) and when you use it you multiply it by the sd

 

Pred=mu+y*sd

g=0.5*norm2(y)

dave fournier

unread,
Mar 20, 2010, 4:27:37 AM3/20/10
to us...@admb-project.org

>I think Dave probably means his trick where you assume that th>e random


>effect is N(0,1) and when you use it you multiply it by the sd

>Pred=mu+y*sd

>g=0.5*norm2(y)

It is not really a trick. Just a different parameterization of the
problem. The point is that when sd-->0 it acts nicely. So you cant start
the model off with a small value of sd such as .05 or .01 which is
important for some more numerically difficult problems. Also you should
use the concentrated likelihood as in

init_bounded_number u(-0.5,1.5)

g = 0.5*norm2(mu_r);
dvariable r2=norm2(y - Ey);
dvariable vhat=r2*u/nobs;
g += 0.5*nobs*log(vhat) + 0.5*r2/vhat;

This replaces the numerically more difficult log_sigma
with the dimensionless quantity u whose value is known to be 1.

Also for comparison purposes you should add the log(sqrt(2*pi)) terms
to the -log-likelihood.

Finally don't tell us how good ADMB is for some problems. Tell the
ordinary users on the R list. We already know and they will never hear
it from the "experts" there.

Ian Fiske

unread,
Mar 20, 2010, 2:44:11 PM3/20/10
to us...@admb-project.org
Thanks Mark and Dave for the tips about parameterizing std dev.  My
optimization problem seems to be working, recovering parameters from
the simulated model quite well!

> Finally don't tell us how good ADMB is for some problems. Tell the ordinary users on the R list.
> We already know and they will never hear it from the "experts" there.

It seems like more people from the R crowd are getting ready to take
the leap towards trying to harness ADMB's power -- there's a proposal
for Google's Summer of Code 2010 for establishing better interfaces
between either R and either ADMB or CppAD.  The summary is at
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2010:adinr.
It looks like they are already connected to the cppAD group -- is
anyone here in touch with them about ADMB?

Ian

Reply all
Reply to author
Forward
0 new messages