[ADMB Users] comparing highly parameterized models

100 views
Skip to first unread message

Fowler, Mark

unread,
May 1, 2013, 10:41:15 AM5/1/13
to us...@admb-project.org

I want to compare tweaks to stock assessment models using AIC, but am challenged by deviation parameters. I found a (draft?) PhD thesis by Ian Stewart (reviewers indicated as Ray Hilborn, Andre´ Punt and Richard Methot) that tackles the issue on pages 208-210, excerpted below. But I’m not clear on how the data (fdata) and parameter (fparameter)  scalars are derived. The data scalar, and I expect the parameter scalar, should range between 0 and 1. Anybody know more about these?

Both AICc and BIC require specification of the number of data points, an area

of some disagreement when diverse data sets are included with multiple error

assumptions in the same model. Specifically, the sample sizes of multinomial likelihoods

applied to compositional data are often tuned during model fitting to be more consistent

with observed residual error (although no tuning of multinomial sample sizes was

employed in the 2005 English sole assessment). Therefore, the number of categories has

been used as a proxy for the dimension of these data in some studies (Helu et al., 2000).

Assuming only that the likelihoods and error structures are correctly specified, the

effective dimension of these data should lie somewhere between the number of

multinomially distributed observations and the sum of the input sample sizes for all such

distributions used in the model. Therefore, for the purposes of calculating measures of

model fit, a range in the dimension of the data is explored via a multiplicative scalar

(bounded by zero and one) describing the additional dimension added through the

number of observations greater than one per multinomial component. All other data

sources, including survey indices, discard and mean weight observations are enumerated

directly for a total count of data points equal to:

D =Σ(Nindex obs,discard obs,...) + M +fdata i(ΣNmult -M) ,

where D is the dimension of the data, N is the number of individual data points, or input

sample size, M is the number of multinomially distributed length- or age-frequency

distributions, and fdata is the multiplicative data scalar. This generalized approach

allows full exploration of the role of sample size in model comparison.

209

Both metrics also require specifying the number of parameters estimated in

the model, however, the dimension of model parameters differs from many statistical

applications because many parameters (e.g., recruitment deviations) are explicitly

constrained. This means that the effective number of parameters is a function of the

relative constraint. Although Bayesian approaches for calculating the effective number of

parameters have been developed (e.g., Spiegelhalter et al., 2002), there are currently no

maximum likelihood based tools available. However, the effective number of parameters

must lie within a definable range, so an approach similar the data scalar is employed here.

As the limit as the variance (σ2) of the distribution constraining deviation parameters

goes to infinity, the number of constrained parameters is equal to the number of

deviations estimated, while as σ 2 goes to 0, there are effectively zero estimated

parameters. Another multiplicative scalar is used to describe this range:

Peff =Σ(Nnon-dev ) +f parameter i(ΣNdev ) ,

where Peff is the effective number of parameters, N is the raw number of parameters,

either deviations or non-deviation parameters, and f parameter is the multiplicative

parameter scalar. This approach allows direct exploration of the relative model weights

conditioned on both the data and parameter dimensions (via the scalars); allowing a

determination of the conditions under which alternate model weighting might occur.

To draw inference from more than one model, it is necessary to obtain the relative

model weights. First, the metrics (AICc or BIC) for all candidate models are standardized

to the relative difference between the value for each model and the minimum value for

any model considered (Burnham and Anderson, 2002):

210

i= AICci - AICcmin

The likelihood or probability of the data (D) given the model (Mi), or relative strength of

evidence for each model, is then proportional (after renormalizing) to wi:

p(D Mi )µ wi = exp(-0.5i )

Model weights were calculated for each model under values of the data and parameter

scalars ranging from 0 to 1.0. Resulting sensitivity to the assumed dimension of the data

and parameters are then jointly explored.

dave fournier

unread,
May 2, 2013, 2:15:19 PM5/2/13
to us...@admb-project.org

This is far too long to read, but we approached model selection in the
multifan cl model
using approximate Bayes factors whihc enabled us to copmoare non nested
models.

This link might have survived being copied.



http://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&sqi=2&ved=0CEAQFjAD&url=http%3A%2F%2Fwww.soest.hawaii.edu%2Fpfrp%2Freprints%2Fmultifan.pdf&ei=lKyCUc-lMYPHiwKk_YDQDA&usg=AFQjCNEW-tR28NpujfAkDNC5BDVIumdQhA&sig2=49idpQ7NXkxhrLbIYcULfg&bvm=bv.45960087,d.cGE

_______________________________________________
Users mailing list
Us...@admb-project.org
http://lists.admb-project.org/mailman/listinfo/users

John Sibert

unread,
May 2, 2013, 3:02:36 PM5/2/13
to us...@admb-project.org
A less garbled link perhaps
http://www.soest.hawaii.edu/PFRP/reprints/multifan.pdf

John Sibert
Emeritus Researcher, SOEST
University of Hawaii at Manoa
Honolulu HI (GMT-10)
808-294-3842

Visit the ADMB project http://admb-project.org/

Fowler, Mark

unread,
May 3, 2013, 8:04:51 AM5/3/13
to John Sibert, us...@admb-project.org
Thanks, John, I've got it. Slight misunderstanding, my subject header
was inadequate. I'm okay for comparing models where the number of
parameters are easily determined. My question concerned determination of
the number of 'real' parameters represented by deviation parameters. Ian
Taylor developed an approach to approximate the number of parameters for
comparing Stock Synthesis models (the excerpt in my email), and I was
asking for details on how a couple of scalars used by Ian were derived.
Both Ian and Steve Martell responded, giving me a handle on them. I'll
need to compute DIC's to estimate the effective number of parameters.

> Mark Fowler
Population Ecology Division
> Bedford Inst of Oceanography
> Dept Fisheries & Oceans
> Dartmouth NS Canada
B2Y 4A2
Tel. (902) 426-3529
Fax (902) 426-9710
Email Mark....@dfo-mpo.gc.ca
Home Tel. (902) 461-0708
Home Email mark....@ns.sympatico.ca


Steve Martell

unread,
May 3, 2013, 9:55:45 AM5/3/13
to Fowler, Mark, <users@admb-project.org>
Hi Mark,

If you paste the following code in your GLOBALS_SECTION and run your model with the -mceval option after you've done your -mcmc run, then the on screen output will spit out the DIC and effective number of parameters.

Note that I've modified the mcmc_eval code from the source code.


void function_minimizer::mcmc_eval(void)
{
// |---------------------------------------------------------------------------|
// | Added DIC calculation. Martell, Jan 29, 2013 |
// |---------------------------------------------------------------------------|
// | DIC = pd + dbar
// | pd = dbar - dtheta (Effective number of parameters)
// | dbar = expectation of the likelihood function (average f)
// | dtheta = expectation of the parameter sample (average y)

gradient_structure::set_NO_DERIVATIVES();
initial_params::current_phase=initial_params::max_number_phases;
uistream * pifs_psave = NULL;

#if defined(USE_LAPLACE)
#endif

#if defined(USE_LAPLACE)
initial_params::set_active_random_effects();
int nvar1=initial_params::nvarcalc();
#else
int nvar1=initial_params::nvarcalc(); // get the number of active parameters
#endif
int nvar;

pifs_psave= new
uistream((char*)(ad_comm::adprogram_name + adstring(".psv")));
if (!pifs_psave || !(*pifs_psave))
{
cerr << "Error opening file "
<< (char*)(ad_comm::adprogram_name + adstring(".psv"))
<< endl;
if (pifs_psave)
{
delete pifs_psave;
pifs_psave=NULL;
return;
}
}
else
{
(*pifs_psave) >> nvar;
if (nvar!=nvar1)
{
cout << "Incorrect value for nvar in file "
<< "should be " << nvar1 << " but read " << nvar << endl;
if (pifs_psave)
{
delete pifs_psave;
pifs_psave=NULL;
}
return;
}
}

int nsamp = 0;
double sumll = 0;
independent_variables y(1,nvar);
independent_variables sumy(1,nvar);

do
{
if (pifs_psave->eof())
{
break;
}
else
{
(*pifs_psave) >> y;
sumy = sumy + y;
if (pifs_psave->eof())
{
double dbar = sumll/nsamp;
int ii=1;
y = sumy/nsamp;
initial_params::restore_all_values(y,ii);
initial_params::xinit(y);
double dtheta = 2.0 * get_monte_carlo_value(nvar,y);
double pd = dbar - dtheta;
double dic = pd + dbar;
dicValue = dic;
dicNoPar = pd;

cout<<"Number of posterior samples = "<<nsamp <<endl;
cout<<"Expectation of log-likelihood = "<<dbar <<endl;
cout<<"Expectation of theta = "<<dtheta <<endl;
cout<<"Number of estimated parameters = "<<nvar1 <<endl;
cout<<"Effective number of parameters = "<<dicNoPar <<endl;
cout<<"DIC = "<<dicValue <<endl;
break;
}
int ii=1;
initial_params::restore_all_values(y,ii);
initial_params::xinit(y);
double ll = 2.0 * get_monte_carlo_value(nvar,y);
sumll += ll;
nsamp++;
// cout<<sumy(1,3)/nsamp<<" "<<get_monte_carlo_value(nvar,y)<<endl;
}
}
while(1);
if (pifs_psave)
{
delete pifs_psave;
pifs_psave=NULL;
}
return;
}

On 2013-05-03, at 5:06 AM, "Fowler, Mark" <Mark....@dfo-mpo.gc.ca>
wrote:
| ---------------------- |
| steven martell |
| ste...@iphc.int |
| (206) 552-7683 |
| ---------------------- |




________________________________

This internet e-mail message, and any files transmitted with it, contains confidential, privileged information that is intended only for the addressee. If you have received this e-mail message in error, please call us at (206) 634-1838 collect if necessary) and ask to speak to the message sender. Nothing in this e-mail or the act of transmitting it, is to be construed as a waiver of any rights or privileges enjoyed by the sender or the International Pacific Halibut Commission pursuant to the International Organizations Immunities Act, 22 U.S.C. Sec. 288 et seq.

Fowler, Mark

unread,
May 3, 2013, 10:17:30 AM5/3/13
to Steve Martell, us...@admb-project.org
Thank you, Steve. I'll tackle that shortly (I just crashed - I don't
think ADMB and WinBUGS share toys very well).

Is this the code on the TODO list for ISCAM (safe to add to ISCAM as
well)?

Ian Taylor - NOAA Federal

unread,
May 3, 2013, 12:43:13 PM5/3/13
to Fowler, Mark, us...@admb-project.org
Credit where credit is due: it was Ian Stewart, not me, who developed the approach and responded to the previous emails.
-Ian

Fowler, Mark

unread,
May 3, 2013, 1:51:36 PM5/3/13
to Ian Taylor - NOAA Federal, us...@admb-project.org

Ouch, My apologies. One of the first responses indicated a good chance that Ian Stewart would reply, and I emulated Pavlov’s favourite subjects when an ‘Ian’ replied. BTW I got Steve Martell’s code working in my model (DIC and estimated numbers of parameters) so problem solved.

 

Mark Fowler
Population Ecology Division
Bedford Inst of Oceanography
Dept Fisheries & Oceans
Dartmouth NS Canada
B2Y 4A2
Tel. (902) 426-3529
Fax (902) 426-9710
Email Mark....@dfo-mpo.gc.ca
Home Tel. (902) 461-0708
Home Email mark....@ns.sympatico.ca

From: Ian Taylor - NOAA Federal [mailto:ian.t...@noaa.gov]

Sent: May 3, 2013 1:43 PM
To: Fowler, Mark

dave fournier

unread,
Jun 7, 2013, 5:53:51 PM6/7/13
to us...@admb-project.org
R is better.

dave fournier

unread,
Jun 7, 2013, 5:55:31 PM6/7/13
to us...@admb-project.org
R is better
Reply all
Reply to author
Forward
0 new messages