comparing population growth models

772 views
Skip to first unread message

Justin

unread,
Oct 30, 2007, 4:00:29 AM10/30/07
to beast-users
I'm comparing the Exponential, Logistic and Constant population growth
models with a Bayes factor. What argument should I use to compare
these results? So far I've tried the likelihood, Posterior and the
Coalescent. Each gives me similar results (Constant loses, Log
slightly better than Expo).

Thanks

J

alexei....@gmail.com

unread,
Nov 5, 2007, 3:38:03 AM11/5/07
to beast-users
Dear Justin,

If you are comparing different coalescent models then you are treating
the coalescent as part of the model rather than part of the prior. So
the appropriate marginal likelihood to compare between models is the
product of the tree likelihood and the coalescent likelihood (the sum
of the log tree likelihood and log coalescent likelihood). So you will
actually need to sum the likelihood and the coalescent columns of the
log file (e.g. in Excel) and then do a BF on this summed column. This
summed column will be the same as the posterior column *only* if you
haven't got any other interesting priors in the analysis (like
Jeffreys, Lognormal calibrations et cetera)... Rather than risk
including any of the priors in the marginal likelihood calculation by
mistake I would recommend summing the likelihood and coalescent in
Excel and then making a new log file with one column to calculate the
marginal likelihood of.

Note: comparing different coalescent models in this way relies on
their likelihoods being calculated to the same proportionality
constant. I believe this to be the case for all coalescent models in
BEAST. However you should note that the Bayesian skyline plot has an
extra smoothing prior that must *not* be included in the marginal
likelihood calculation -- so don't use the posterior as a shortcut for
the likelihood as I outlined above...

If you are only doing a test between exponential growth and constant
then you can simply look at the posterior distribution of the growth
rate and see if zero is contained in the 95% HPD. If it is, then you
can't really reject a constant population. Of course this is only
valid if you are using a parameterization that allows negative growth
rates (to do this you need to ensure that you are using the
exponential.growthRate parameter rather than the
exponential.doublingTime parameter and you need to make sure that the
operator on the growthRate is a randomWalkOperator so that negative
values are possible)

Cheers
Alexei

stru...@retrovirology.lu

unread,
Jan 23, 2008, 9:53:39 AM1/23/08
to alexei....@gmail.com, beast-users

Dear Beast Users,


I have a question concerning this topic in relation to the new published paper: "High Resolution Molecular Epidemiology and Evolutionary History of HIV-1 Subtypes in Albanie" (plos, january 2008):

If I understood the "bayesian selection of coalescent models" in the "materials and methods" section correctly, the coalescent model was choosen by the bayes factor calculated with the likelihoods?

Is this the right way to choose the coalescent model or should one take the approach listed in this reply of Drummond, by calculating the bayes factor of the sum "tree likelihood + coalescence"?


Best regards,
Daniel Struck
---
Laboratoire de Rétrovirologie
CRP-Santé
1A, rue Thomas Edison
L-1445 Strassen

phone: (+352) 26970 - 219
fax: (+352) 26970 - 719
web:  http://www.retrovirology.lu
e-mail: stru...@retrovirology.lu

stru...@retrovirology.lu

unread,
Jan 24, 2008, 6:12:53 AM1/24/08
to stru...@retrovirology.lu, alexei....@gmail.com, beast-users, Andrew Rambaut

Maybe I should explain why I did pose that question. I am studying a HIV-1 B/F1 recombinant (21 patients) and this would be the beast results:

        subst. model        mol. clock        chain        growth model        BF                BF_b(treelike & coalescent)
1.        gtr,gi                strict        20m        const                        -14601        -14674
2.        hky,gi                strict        20m        const                        -14621        

3.        gtr,gi                rellog        20m        const                        -14556        -14618
4.        gtr,gi                rellog        20m        log                        -14557        -14601

5.        gtr,gi                rellog        20m        skyline                -14557

(The coefficient of variation is not abutting zero in the 3.&4. analysis.)

Unfortunately the data is only sampled from 2003 to 2006 and the sequences are very close together. Although we have the advantage to have access to the near full length sequences of the 21 B/F1 recombinants.
Under the 1. analysis the trace of the substitution rate is bimodal. Interestingly this bimodal trace disappears if only one certain patient is removed. But the removal of this patient doesn't change the results of the 4. analysis including the HDP 95% intervals (mean rate, root height, popSize, growth rate, t50, ...)

Problem is also that under the 4. analysis the 95% HDP interval of the growth rate is quite large: mean 15.255, median 10.816, lower 1.633, upper 37.573.

Relating to this data I have 3 questions:
- If I look at the BSP from the 5. analysis I would graphically choose the logistic growth model?
- If I look at the BF value I would have to choose the constant growth model with a relaxed clock?
- If I look at the BF_b values I would choose the logistic growth model with a relaxed clocck in accordance with the BSP from analysis 5.?

Could it be that the sequences don't have enough resolution so that the BF values don't show the same results as the BF_b?
Is the raise in the effective population size not high enough to choose a logistic growth model?


Best regards,
Daniel Struck
ps: I have attached the pdf of the BSP from 5. analysis
5_gtr_gi_rellog_skyline.pdf

alexei....@gmail.com

unread,
Jan 24, 2008, 4:37:52 PM1/24/08
to beast-users
Hi Daniel,

The difference in what you label BF between 3 and 4 is insignificant.
It is only 1 log unit and if you take into account the error
associated with calculating the BF using the harmonic mean approach
then this result would not cause you to have a strong preference
between these models based on this result - they are essentially the
same. Basically both 3 and 4 fit the sequence data similarly well.
However the difference between BF_b in 3 and 4 is significant and
suggests that the logistic coalescent model fits the coalescent
intervals better than the constant coalescent model.

Cheers
Alexei
> > e-mail: struc...@retrovirology.lu
> 5_gtr_gi_rellog_skyline.pdf
> 4KDownload

stru...@retrovirology.lu

unread,
Jan 28, 2008, 7:29:24 AM1/28/08
to alexei....@gmail.com, beast-users

Thank you for the clarification,
Daniel
---
Laboratoire de Rétrovirologie
CRP-Santé
1A, rue Thomas Edison
L-1445 Strassen

phone: (+352) 26970 - 219
fax: (+352) 26970 - 719
web:  http://www.retrovirology.lu
e-mail: stru...@retrovirology.lu
Reply all
Reply to author
Forward
0 new messages