unlinked vs linked branchlengths parameter

2,278 views
Skip to first unread message

Erica

unread,
May 16, 2012, 12:03:44 AM5/16/12
to PartitionFinder
Hi Rob,

I have a question regarding your program PartitionFinder, which is
very neat by the way and very user friendly- thank you!
My question relates to why the results are so different between
otherwise identical runs emplying either 'linked' or 'unlinked'
branchlengths. I am relatively new to phylogenetic analysis and will
admit that I do not fully understand what this parameter means for the
analysis and my final tree.

My dataset contains three mitochondrial loci (control region, ND4 with
some tRNAs on the end, and 16S).
I investigated 'all' partitioning schemes for the following data
blocks:
CR = 1-485;
ND4_codon1 = 486-1176\3;
ND4_codon2 = 487-1176\3;
ND4_codon3 = 488-1176\3;
tRNA_HSR = 1177-1354;
16S = 1355-1806;

When branchlengths are 'linked' I get the following best schemes:
For AIC/AICc:
Scheme Name : 199
Scheme lnL : -5029.30577
Scheme AIC : 10258.61154
Scheme AICc : 10270.4590473
Scheme BIC : 10808.4985134
Num params : 100
Num sites : 1806
Num subsets : 5

Subset Best Model Subset Partitions Subset
Sites
1 K81uf+G CR
1-485
2 GTR+I 16S, ND4_codon1 486-1176\3, 1355-1806
3 HKY ND4_codon2 487-1176\3
4 TrN ND4_codon3 488-1176\3
5 HKY+I tRNA_HSR 1177-1354

Under BIC, the results are equivalent except that subset 5 is grouped
with subset 3.

When I run the same analysis with branchlengths 'unlinked', I get the
following best scheme:
For AIC/AICc:
Scheme Name : 94
Scheme lnL : -5097.85225
Scheme AIC : 10493.7045
Scheme AICc : 10520.6972536
Scheme BIC : 11313.0360904
Num params : 149
Num sites : 1806
Num subsets : 2

Subset Best Model Subset
Partitions Subset Sites
1 TIM+G CR,
ND4_codon3 1-485, 488-1176\3
2 TVM+G 16S, ND4_codon1, ND4_codon2, tRNA_HSR
486-1176\3, 487-1176\3, 1177-1354, 1355-1806

Under BIC, all data blocks are groups into a single subset with model
TVM+I+G.

I intend to analyse my data with Maximum Likelihood in GARLI, as well
as In MrBayes. I am wondering why the results are so different between
the two runs, with more simplistic partitioning and more parameters
for the 'unlinked' run? Also, should I be implementing the 'unlinked'
partition scheme in programs that allow this option (e.g. MrBayes),
and the 'linked' scheme in programs that do not?
Also, GARLI has the option in partitioned analyses
'subsetspecificrates', which, if YES, allows "different subset rates"
and if NO, "branch lengths are equal". Is this parameter perhaps
equivalent to your 'branchlengths' parameter, meaning that GARLI can
allow 'unlinked' branch lengths between subsets in its calculations?

Any help with this would be greatly appreciated.

Cheers,
Erica.

Rob Lanfear

unread,
May 16, 2012, 12:49:56 AM5/16/12
to partiti...@googlegroups.com
Hi Erica,

Thanks for the question - this one is quite common and it's a tricky topic if you're new to the field, so I'll try to answer in some detail here. Do let me know if anything's still unclear. 

With 'linked' branch lengths, PartitionFinder (indeed, any phylogenetics program) estimates a single set of underlying branch lengths for the tree. Each partition then has to use these branch lengths, but is also given it's own 'rate multiplier' parameter which can multiple ALL the branch lengths in the tree by a given number between 0 and something very large. Biologically, this means that we're assuming that the relative rates of evolution among lineages in the tree are constant across partitions. This assumption is often pretty reasonable, because lineage effects (like generation times, effective population sizes, and anything that affects the overall mutation rate) tend to affect most sites in the genome in similar ways. For instance, an increase in the genome-wide mutation rate will increase rates of evolution proportionally at all sites in the genome. The 'rate multiplier' allows each partition to have a different overall rate of evolution, for instance 3rd codon sites usually evolve much quicker than 1st or 2nd sites, so the rate multipliers can account for this kind of difference (also, mitochondrial DNA tends to change much quicker than nuclear DNA in animals).

With 'unlinked' branchlengths, we're estimating an entirely independent set of branch lengths for each partition. This means that not only can the partitions have different overall rates of evolution (which is also possible with linked branch lengths), but also the relative rates of evolution among lineages can differ between partitions. For instance, with unlinked branchlengths it would be possible to account for a situation where one partition evolves quicker in mice than humans, and another evolves slower in mice than humans. With linked branch lengths it wouldn't be possible to account for this. Basically, 'unlinked' branchlengths can account for a lot more biological variation in rates of evolution, but it comes at the price of having to estimate a lot more parameters from the data.

This difference in parameters is surprisingly large. If you have a dataset with N species, and P partitions in the partitioning scheme, linked branch lengths require 2N+P-2 free parameters to be estimated, but unlinked branch lengths require 2PN-3P branch lengths. Since the point of model selection is to balance the number of parameters being estimated with the improvement of the fit of the model to the data, these differences can have a big influence. 

It's difficult to know beforehand which of these assumptions will fit your data better. In your case it looks like your using just mt DNA. Since this is a single locus, my best guess a priori would be that linked branch lengths would be better. But it's still good to check by running two analyses (one linked and one unlinked) as you have done. The short solution to your problem is that once you've run your two analyses, you just pick the scheme with the best AIC, AICc, or BIC score (depending on which one you're using). In your case, this is the scheme you got with the 'linked' analysis. 

You also asked why the schemes you get are so different using the two approaches. This is because 'unlinked' models have a LOT of branch length parameters to estimate for each partition, and the AIC, AICc and BIC all penalise extra parameters in some way. So, if all these parameters aren't actually adding much biology to the model (which in your case, they don't seem to be) each extra partition 'costs' a lot in terms of extra parameters, but doesn't bring much in terms of improved likelihood. As a result, the best partitioning scheme has very few partitions, because this is the best way to minimise those information theoretic metrics. With 'linked' branch lengths, each extra partition costs a lot less in terms of extra parameters, so you can add more partitions. Which approach is best will depend on the particular dataset, but in your case the answer is clear that linked branch lengths are best.

Finally, 'linked' in partitionfinder is equivalent to 'subsetspecificrates=1' in GARLI. GARLI doesn't support 'unlinked' rates across subsets as far as I can tell (I think I might have got this wrong in the manual - I'll check and update it). So, to implement your 'linked' partitioning scheme in GARLI set 'subsetspecificrates=1' and 'linkmodels=0' (see here: https://www.nescent.org/wg_garli/Using_partitioned_models). 

'linked' branchlengths in partitionfinder are equivalent to 'prset ratepr=variable' in MrBayes. For 'unlinked' branch lengths in MrBayes, you need the command 'unlink brlens=(all)'. 

Hope some of that helps!

Cheers,

Rob

Erica

unread,
May 16, 2012, 2:14:32 AM5/16/12
to PartitionFinder
Thank you Rob,

That is a very helpful, and wonderfully clear explanation.
I've run the analyses in GARLI and the trees look great.

Thanks again,

Erica.
Message has been deleted

Rob Lanfear

unread,
Apr 20, 2016, 7:25:11 PM4/20/16
to PartitionFinder, evt...@hotmail.com
Dear Sharma,

There are a couple of points of confusion to clear up here. First, the 'best' number of partitions is usually neither the most nor the least. It's usually somewhere in between. Exactly how to define what 'best' means is a topic of ongoing debate, but most people (including me) just rely on the AICc or the BIC to judge what's 'best'. I.e. the partitioning scheme with the lowest score is the 'best'.

Second, it's not sensible to judge the performance of a phylogenetic analysis on the support you get across the tree. Support values are better thought of as a parameter we are trying to estimate. Trees with higher support are not necessarily better than trees with lower support. For example, in some datasets it may well be the case that many different trees are supported by the data. In this case, the 'true' support values will be very low.

I think most people would be happy to say that using the AICc or the BIC is the best way we have right now of comparing partitioning schemes for ML analyses.

Cheers,

Rob

On Thursday, 21 April 2016 09:18:04 UTC+10, Sharma wrote:
Dear Sirs,

I found this discussion very interesting and I was just realized that how good it would be if everything regarding phylogeny could be explained in terms of biology in such precise and informative way. As Dr. Lanfear suggested that the best partition scheme has lesser number of partitions, I would like to comment something on that. 

I recently, analysed my data with linked branch length and just gene partitions and, codon partitions. What I found is that the branch support and topologies stayed more or less similar in ML analysis by RAxML. Later, when I estimated per partition branch length, the gene partitions gave slightly better supports at some nodes, however, no big changes and topologies stayed same. However, the per partition branch length estimates for codon partitioned dataset reduced branch supports drastically, although topologies stayed same most of the times, in most of the cases. 

Now, as most of the researchers argue that more partitions are better than lesser partitions, now, I am a little confused. My analysis, as far as branch supports are concerned, are in accordance with what Dr. Lanfear suggested that increased partitions (i.e., codon partitions) could not be the best choice especially with unlinking branch length. However, as other researchers suggest to increase partitions, it means increase them while being linked and decrease them while being unlinked? 

Best regards,
Reply all
Reply to author
Forward
0 new messages