Model selection: how many free parameters with linked vs. unlinked models?

165 views
Skip to first unread message

Tatyana Livshultz

unread,
May 20, 2013, 1:31:44 PM5/20/13
to ra...@googlegroups.com

Hi there:

My colleague and I are puzzling over this and it isn’t just academic: which model we select has a major impact on our result.

We have a tree with 13 taxa (N), 26 data partitions (P), and we are using the GTR+G model (9 parameters, S)

 How many free parameters are we estimating if we use linked models (i.e. all 26 partitions fitted to one set of branch length, "estimate individual per-partition branch lengths (-m)" unchecked in CIPRES) vs. if we use unlinked models (i.e. each of the 26 partitions has its own set of branch lengths, "estimate individual per-partition branch lengths (-m)" checked in CIPRES).

 The formulas we’ve come up with are:

 Linked model (2N+P-2)+(P*S) 

(2*13+26-2) + (26*9)= 282 free parameters

Unlinked models  (2PN-3P)+(P*S)

 (2*26*13-3*26) +(26*9)=780 free parameters

Are these correct? Any insight or reference would be greatly appreciated.

 

Thanks,

 

Tanya

 

Tatyana Livshultz

Assistant Professor| Department of Biodiversity Earth and Environmental Sciences| Drexel University

Assistant Curator| Department of Botany| Academy of Natural Sciences

1900 Benjamin Franklin Parkway| Philadelphia, PA 19103-1101| USA

tatyana....@drexel.edu| PH: 215-299-1051

Alexandros Stamatakis

unread,
May 20, 2013, 2:09:35 PM5/20/13
to ra...@googlegroups.com
Hi Tatyana,

I think your equation is almost correct, since the number of brach
lengths is (2N - 3) for linked branch lengths, i.e., 23 + 9 * 26 = 257
and for unlinked branch lengths 23 * 26 + 9 * 26 = 832

You should apply an AIC test to compare the models.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Joseph Brown

unread,
May 20, 2013, 2:16:33 PM5/20/13
to ra...@googlegroups.com
Hi Tanya. I get different numbers based on the info you provide.

For all models you mention, there will be S X P = 234 substitution parameters (SUBS).
For each set of estimated edge lengths (unrooted phylogram), there are (2 X N) - 3 = 23 EL parameters (ELP)

For the unlinked model, the formula is: (P X S) + (P X ELP)
= SUBS + (P X 23)
= 234 + (26 X 23)
= 234 + 598
= 832
This is equivalent to the formula you gave; I think there is simply a typo in computing.

For linked models, only one set of ELP is estimated, but each partition has its own relative rate parameter (such that partition-specific edge lengths are perfectly proportional). The formula thus is: (P X S) + ELP + P
= SUBS + 23 + 26
= 234 + 23 + 26
= 283
In this case, I think your formula is incorrect, despite getting very close to the answer. Whereas your formula is:
(2N+P-2)+(P*S)
it should be:
P*(2N-2)+(P*S)

HTH.
Joseph.



--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tatyana Livshultz

unread,
May 20, 2013, 2:23:38 PM5/20/13
to ra...@googlegroups.com
Thank you!

Tanya


--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/BXn1ETluavk/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.

Joseph Brown

unread,
May 20, 2013, 2:31:14 PM5/20/13
to ra...@googlegroups.com
Erg. Mistyped in the bottom formula (one above is correct). Unlinked should not be P*(2N-2)+(P*S), but instead:

P+ELP+SUBS
= P+(2N-3)+(P*S)
= 283

Note that this is different from what Alexis provided by exactly P. It appears he is not counting relative rate parameters.

HTH.
Joseph.

Tatyana Livshultz

unread,
May 20, 2013, 2:44:35 PM5/20/13
to ra...@googlegroups.com
Thanks for the quick replies! You both agree on the unlinked model: 

(2N-3)*P +P*S=P*ELP+SUBS

but not on the linked model: 

Alex's formula for the linked model is

1. (2N-3) + P*S=ELP+SUBS

Joseph's is

2. (2N-3) + P + P*S=ELP + P + SUBS

1. seems more intuitive to me than 2.  But which is correct?

Thanks for your help!

Tanya

On Mon, May 20, 2013 at 2:16 PM, Joseph Brown <phyl...@gmail.com> wrote:

--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/BXn1ETluavk/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.

Joseph Brown

unread,
May 20, 2013, 4:32:33 PM5/20/13
to ra...@googlegroups.com
Tanya,

Model 1 assumes that edge lengths for each partition are equal. This is an extreme model for partitions of varying rates, e.g. 2nd codon positions vs. 3rd codon positions, or protein-coding vs. non-coding; a pretty biologically unrealistic model. Model 2 breaks this restriction: although edge lengths must be perfectly proportional across partitions, they can nonetheless be of arbitrarily different magnitudes, say 50X different. This is an economical way to treat among-partition rate heterogeneity, as you are only adding P extra parameters to the model. At the other extreme (the unlinked EL model), you estimate distinct edge length vectors for each partition = P X (2N-3) extra parameters. In other words, a pretty statistically unrealistic model, especially if N is large.

Model 2 is the best choice of the three, in accommodating rate heterogeneity while avoiding overparameterization. This is the default behaviour for many programs that allow among-partition rate heterogeneity, e.g. GARLI or MrBayes (and, I assume, RAxML). In MrBayes you can test each EL model. For trees of 13 taxa the unlinked model may fare well, but as N gets large you need an immense increase in likelihood to justify the extra parameters. This would be the case if ELs showed absolutely no correlation across partitions (while still sharing the same topology), but this isn't observed.

Of course, Alexis knows better than I what is going on in RAxML.

HTH.
Joseph

Tatyana Livshultz

unread,
May 20, 2013, 6:05:43 PM5/20/13
to ra...@googlegroups.com
Thanks for the clarification, Joseph. I'll need to check if RaxmL has identical edge lengths for all partitions (model 1) or if it just has proportional edge lengths (model 2).

With only 13 taxa, AIC selects the unlinked model over either linked model (1 or 2) while BIC selects the linked model (1 or 2) over the unlinked model. In this case, I think I'll just need to discuss the results of both.

Best regards,

Tanya

Alexandros Stamatakis

unread,
May 21, 2013, 5:41:01 AM5/21/13
to ra...@googlegroups.com
Hi Tanya and Joseph,

Actually my equation is correct, RAxML doesn't use branch length scalers
in which case Joseph's equation would be correct, but RAxML does indeed
conduct completely independent branch length estimates for each
partition.

Cheers,

Alexis
> > On Mon, May 20, 2013 at 2:44 PM, Tatyana Livshultz <tatyanal...@gmail.com<javascript:>
> > > wrote:
> >
> >> Thanks for the quick replies! You both agree on the unlinked model:
> >>
> >> (2N-3)*P +P*S=P*ELP+SUBS
> >>
> >> but not on the linked model:
> >>
> >> Alex's formula for the linked model is
> >>
> >> 1. (2N-3) + P*S=ELP+SUBS
> >>
> >> Joseph's is
> >>
> >> 2. (2N-3) + P + P*S=ELP + P + SUBS
> >>
> >> 1. seems more intuitive to me than 2. But which is correct?
> >>
> >> Thanks for your help!
> >>
> >> Tanya
> >>
> >> On Mon, May 20, 2013 at 2:16 PM, Joseph Brown <phyl...@gmail.com<javascript:>
> >>> tatyanal...@gmail.com <javascript:>> wrote:
> >>>
> >>>> Hi there:****
> >>>>
> >>>> My colleague and I are puzzling over this and it isn’t just academic:
> >>>> which model we select has a major impact on our result.
> >>>>
> >>>> We have a tree with 13 taxa (N), 26 data partitions (P), and we are
> >>>> using the GTR+G model (9 parameters, S)
> >>>>
> >>>> ** **How many free parameters are we estimating if we use linked
> >>>> models (i.e. all 26 partitions fitted to one set of branch length,
> >>>> "estimate individual per-partition branch lengths (-m)" unchecked in
> >>>> CIPRES) vs. if we use unlinked models (i.e. each of the 26 partitions has
> >>>> its own set of branch lengths, "estimate individual per-partition branch
> >>>> lengths (-m)" checked in CIPRES).
> >>>>
> >>>> ** **The formulas we’ve come up with are:
> >>>>
> >>>> ** **Linked model (2N+P-2)+(P*S)
> >>>>
> >>>> (2*13+26-2) + (26*9)= 282 free parameters
> >>>>
> >>>> Unlinked models (2PN-3P)+(P*S)****
> >>>>
> >>>> ** (2*26*13-3*26) +(26*9)=780 free parameters**
> >>>>
> >>>> Are these correct? Any insight or reference would be greatly
> >>>> appreciated.****
> >>>>
> >>>> ** **
> >>>>
> >>>> Thanks,****
> >>>>
> >>>> ** **
> >>>>
> >>>> Tanya****
> >>>>
> >>>> ** **
> >>>>
> >>>> Tatyana Livshultz****
> >>>>
> >>>> Assistant Professor| Department of Biodiversity Earth and Environmental
> >>>> Sciences| ****Drexel** **University********
> >>>>
> >>>> Assistant Curator| Department of Botany| ****Academy** of **Natural
> >>>> Sciences********
> >>>>
> >>>> 1900 Benjamin Franklin Parkway| **Philadelphia**, PA 19103-1101| ****
> >>>> USA********
> >>>>
> >>>> tatyana....@drexel.edu <javascript:>| PH: 215-299-1051****
> >>>>
> >>>> --
> >>>> You received this message because you are subscribed to the Google
> >>>> Groups "raxml" group.
> >>>> To unsubscribe from this group and stop receiving emails from it, send
> >>>> an email to raxml+un...@googlegroups.com <javascript:>.
> >>>>
> >>>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> You received this message because you are subscribed to a topic in the
> >>> Google Groups "raxml" group.
> >>> To unsubscribe from this topic, visit
> >>> https://groups.google.com/d/topic/raxml/BXn1ETluavk/unsubscribe?hl=en.
> >>> To unsubscribe from this group and all its topics, send an email to
> >>> raxml+un...@googlegroups.com <javascript:>.
> >>>
> >>> For more options, visit https://groups.google.com/groups/opt_out.
> >>>
> >>>
> >>>
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "raxml" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an
> >> email to raxml+un...@googlegroups.com <javascript:>.
> >> For more options, visit https://groups.google.com/groups/opt_out.
> >>
> >>
> >>
> >
> >
>

--
Reply all
Reply to author
Forward
0 new messages