Do you think I should be using partition finder in my BEAST analysis?

1,148 views
Skip to first unread message

charlot...@griffithuni.edu.au

unread,
May 26, 2015, 7:25:18 AM5/26/15
to partiti...@googlegroups.com
Hello

I initially ran a BEAST analysis for 40 COI sequences from one species with 7 additional species sequences.  My supervisor suggested I use partition finder.  I did so and now one of my reviewers is querying my use of partition finder in BEAST.

I have read that I can run the rbs package in BEAST2 which is probably more appropriate for small sample sizes like I have. However, I queried this with my supervisor and he thinks that I should continue to use partitionfinder for the substitution model. As using a more complicated approach like RBS is likely to have very minimal impact on a shallow tree like mine.

I was wondering what your thoughts are on this?

Thanks

Rob Lanfear

unread,
May 26, 2015, 7:45:56 AM5/26/15
to partiti...@googlegroups.com, charlot...@griffithuni.edu.au
Hi Charlotte,

You are right. With a dataset of your size, there is absolutely no reason to use PartitionFinder with BEAST. I'll explain why in as much detail as I can.

First, PartitionFinder is a built in a likelihood framework, not a Bayesian framework. That matters. Primarily it matters because PartitionFinder is trying to find the 'best' model of sequence evolution, assuming that you are forced to use a single model. This is not what you should be doing in a Bayesian framework - you should be integrating over all possible models, just like the RBS package does in BEAST. 

Another issues is that PartitionFinder penalises models based on the number of parameters. The issue here is PartitionFinder doesn't account for all the additional parameters that are typically estimated when you use BEAST. It assumes you are using something less parameter-rich like RAxML or PhyML. BEAST models typically have many many more parameters (that can interact in complex ways) including dates, rates, population sizes, tree priors, etc. etc. For that reason, PartitionFinder does not really penalise models in a way that is appropriate for use in BEAST, even if you were willing to fix on a single partitioned model. 

In general, I always recommend people use the RBS package in BEAST if their data set is small enough for that to be an option. The models in that package are the right solution to the partitioning problem. In fact, they are WAY better than any attempt to use partitionfinder on a dataset of any size. This is because they account for uncertainty in what the 'right' model of evolution is, and they allow you to infer a posterior distribution of trees that has integrated out that uncertainty. That's a really huge advantage over fixing a single model and assuming it's correct,  as you are forced to do with PartitionFinder. The only drawback, and it's a pretty big one, is that the Bayesian models are hard to mix, and so are limited to small datasets. You should be OK though. The only reason to use PartitionFinder with BEAST is if your datasets are too large for the RBS pacakge. And even then, you should only take the PartitionFinder model as a suggestion for a BEAST analysis. You may need to fine-tune the model of sequence evolution based on the behaviour of the MCMC. I have often observed that models from PartitionFinder are quite overparameterised for a BEAST analysis, and so fail to mix adequately.

So, I am completely on your side here. Don't use PartitionFinder if you can successfully implement the models in the RBS package.

For what it's worth, I also doubt that changing the model will impact your tree. It rarely does (we just wrote a paper about that: http://mbe.oxfordjournals.org/content/early/2015/02/05/molbev.msv026). However, it's important to use the best model, because changing the model can sometimes make a difference to the tree. In this case, the models in the RBS package are definitely better than using the best model from PartitionFinder, so if the tree changes you should believe the trees from the RBS model more than those from the PartitionFinder approach. 

Cheers,

Rob

charlot...@griffithuni.edu.au

unread,
May 26, 2015, 11:35:36 PM5/26/15
to partiti...@googlegroups.com
A fantastically clear answer
thank you

Ron

unread,
Nov 3, 2015, 10:48:59 AM11/3/15
to PartitionFinder, charlot...@griffithuni.edu.au
Hi Rob,

Thank you very much for this guidance. It is very useful. However, this still leaves the question of how to define data partitions for BEAST 2? I can install the subst-BMA package in BEAUti v2.3.x. This should estimate the number of partitions, as well as the substitution model for each partition. However, it cannot currently be implemented in BEAUTi, and I'm not enough of an xml wizard to alter the example files to suit my purposes.

Would it be self-defeating to determine the number of partitions in PartionFinder and then use the RB model on these partitions in BEAST 2? I have a decent sized protein-coding DNA dataset. I could do the basic option of partitioning all my genes by codon position, but this may not be wise. How do you suggest I perform partition selection for BEAST 2? BEAST 2.3.x does have an autopartition package, but the number of partitions needs to be specified before the analysis.

Thank you for your help.

Cheers,
Ron

Rob Lanfear

unread,
Nov 4, 2015, 10:56:24 PM11/4/15
to PartitionFinder, charlot...@griffithuni.edu.au
Hi Ron,

I'm afraid I don't have good answers to your questions. Probably more useful to post on the BEAST forum. 

Getting the models right (as at least, good enough) is obviously an issue. My own feeling is that if you do a reasonable, and pragmatic, job of accounting for known sources of heterogeneity in sequence evolution, then you will usually not go too far wrong. 

For example, many people still just partition by gene and codon position, and then run analyses in an MCMC framework. This is probably fine in most cases. Especially since overparameterisation is less of an issue in an MCMC framework (since we are dealing with distributions of parameters, not point estimates) than in an ML framework. In other words, in an MCMC framework you can look at your distribution of trees knowing that you have integrated out uncertainty in nuisance parameter values (like model parameters). The same is not true in an ML framework. 

Rob

--
You received this message because you are subscribed to the Google Groups "PartitionFinder" group.
To unsubscribe from this group and stop receiving emails from it, send an email to partitionfind...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Rob Lanfear
School of Biological Sciences,
Macquarie University,

Ron

unread,
Nov 5, 2015, 8:57:32 AM11/5/15
to PartitionFinder, charlot...@griffithuni.edu.au
Hi Rob,

Thank you for your reply.


Cheers,
Ron
Reply all
Reply to author
Forward
0 new messages