constrain the species tree in STARBEAST?

1,032 views
Skip to first unread message

Jim McGuire

unread,
Apr 1, 2016, 2:12:53 PM4/1/16
to beast-users
Hi All,

I'm working with a reasonably large data set (162 tips representing 53 species, 1076 loci). I have used a summary 'coalescent' method (ASTRAL-II) to estimate the species tree for these data but I am also interested in estimating divergence dates and branch lengths. Thus far, I have been running STARBEAST with 50-locus subsets of my full data set, and even these reduced analyses will take a very long time to burn in. I'm hoping I might be able to expedite searches and perhaps use larger numbers of loci if I constrain the species tree to match my full data ASTRAL-II topology to reduce the scope of the search. I know that I can add monophyletic groups one-by-one via the priors panel, but I'm hoping there might be a more direct way to constrain the species tree that does not require constraining nodes to be monophyletic one at a time. Does anyone have advice on this? I suppose I should also ask the BEAST community whether they think this is a wrong-headed approach given the nature of gene tree-species tree inference.

Thanks for any advice that you might have for me!

Jim

Santiago Sánchez

unread,
Apr 2, 2016, 4:02:52 AM4/2/16
to beast...@googlegroups.com
Hi Jim,

I would also like to have an answer to your question. I'm kind of in the same situation.

Cheers,
Santiago


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.



--
Santiago Sánchez-Ramírez
Environmental Genomics Group
Max Planck Institute for Evolutionary Biology
August-Thienemann-Str. 2
24306 Plön
Germany

Graham

unread,
Apr 2, 2016, 5:51:43 AM4/2/16
to beast-users
You could try STACEY. This should be faster than *BEAST. You could also try the StarBeastWithSTACEYops template.

One way of constraining the topology is to use a starting tree and omit the operators which change it. This would mean omitting CoordinatedPruneRegraft and
StaceyNodeReheight, leaving NodesNudge, FocusedNodeHeightScaler, and ThreeBranchAdjuster. I am dubious about relying on the ASTRAL-II topology.

I have put some preliminary results using STACEY on larger data sets here.
http://www.indriid.com/workingnotes2015.html

This could also be of interest.
Huw A. Ogilvie, Joseph Heled, Dong Xie, and Alexei J. Drummond. Computational performance and statistical
accuracy of *BEAST and comparisons with other methods. 2015. doi: arXiv:1506.06446.

Graham Jones

Jim McGuire

unread,
Apr 3, 2016, 11:09:51 AM4/3/16
to beast-users
Hi Graham,

Thanks for your suggestions. I will try STACEY and StarBeast with the STACEY template. Can I follow up with a few quick questions? First, I'm curious why you are dubious about the ASTRAL-II tree. I understand that it seems to be a coalescent-method in name only, but I do like that I can use all of the data at once in a way that I cannot with StarBeast without resorting to concatenation. I'd be interested in hearing how you would approach my conundrum (a phylogenetic analysis of 162-315 tips, 1076 loci, 53 species). Second, is it obvious how to incorporate a starting tree with STACEY? It was not obvious how I should do this with StarBeast (hence my post). Third, do you think STACEY would allow me to utilize more of my data for species tree estimation than StarBeast? Finally, I've also performed species delimitation analyses with these data using a BFD* and BPP v 3.0. BFD* seemed to have no discriminating power finding the max number of species in all cases. Even BPP v 3.0 seemed to be oversplitting, at least under the assumption that if STRUCTURE does not find a cluster, its probably not a legit species. I'd love to hear how STACEY has stacked up against other coalescent methods under simulation. I'll definitely try it out with my data set.

Thanks again,

Jim 

Christopher Blair

unread,
Apr 3, 2016, 11:46:55 AM4/3/16
to beast-users
Hi Jim, 

Great questions. Have you tried SVDquartets for species tree inference? May be worthwhile...

From my correspondence with Graham (and reading the STACEY paper) it sounds like STACEY is a bit faster than *BEAST and hence you should theoretically be able to utilize more of your data. However, to my knowledge you cannot yet utilize temporal calibrations in STACEY. 

Out of curiosity, how much of your data were you able to run in BPP? 

I am also curious to hear of potential issues with ASTRAL-II, as the authors of the software have fairly convincingly demonstrated that it is one of the better summary-based coalescent methods out there right now. 

Chris

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.



--
Christopher Blair, Ph.D.
Assistant Professor
Department of Biological Sciences-New York City College of Technology
Ecology, Evolution and Behavior Program-Graduate Center
The City University of New York
300 Jay Street
Brooklyn, NY 11201

Jim McGuire

unread,
Apr 3, 2016, 12:13:11 PM4/3/16
to beast-users
Hi Chris,

I have not tried SVDquartets - I should give that a look.. I was able to include 500 loci in the BPP analyses, but this required that I reduce the species delimitation analyses to their smallest logical units. The clade in question has 9 currently recognized species, but three of them are in fact species assemblages comprising from 2-5 cryptic species each. Rather than consider the entire clade simultaneously, I considered each species assemblage independently. Incidentally, I worked my way up to 500 loci as I did not initially think that including so many loci would be feasible. Even with 500 loci, I ran each delimitation 10 times and then interpreted the results conservatively as each run does not return the same species inference. This was not super satisfying, but I chalk it up to species delimitation being a challenging inference problem. I followed up the species delimitation analyses with G-PhoCS analyses to assess divergence timing and migration in an effort to further validate species limits.

As for ASTRAL-II, I think the estimated topology for my group is very sensible, for what that's worth (perhaps not much).

Not being able to use temporal calibrations in STACEY would limit its value for me on the phylogeny side, though it sounds like the STACEY template can be used to fine-tune the StarBeast xml file in ways useful for me.  

Thanks for your suggestions,

Jim 

Graham

unread,
Apr 4, 2016, 3:47:14 AM4/4/16
to beast-users

I said I am dubious about relying on the ASTRAL-II topology. I don't mean you shouldn't use ASTRAL (I think you should), but it seems a shame to prevent STACEY/StarBeast from considering other topologies. I think merging the output from STACEY/StarBeast runs on subsets of loci (like http://sysbio.oxfordjournals.org/content/64/5/727) is sensible and unlikely to mislead, although it may not resolve the tree well. I think STACEY would allow you to use larger subsets.

It's not obvious how to add a starting tree. You have to edit the XML. In the init element, the default random tree can be replaced by one defined using the Newick format. It refers to a tree in the state element via the initial attribute. ( In the state element you should see a tree with ID "Tree.t:Species".) You need to replace the default starting tree with something like this.

<init id="StartingTree.t:Species" spec="beast.util.TreeParser" initial="@Tree.t:Species" IsLabelledNewick="true">
<input name="newick">
((a:3.7851027975664307E-4,c:3.7851027975664307E-4):3.367008420775328E-4,b:7.152111218341759E-4):0.0;
</input>
</init>

This should work for StarBEAST or STACEY. However, StarBEAST only has one operator for the species tree (apart from overall scaling), which changes heights and topology, so you can't omit that.

Perhaps you can tell me (by email) how to get BPP v 3.0 to converge on medium-sized data sets (with no guide tree). My attempts to compare STACEY and BPP on species delimitation got stuck at this point.

Jim McGuire

unread,
Apr 4, 2016, 10:37:41 AM4/4/16
to beast-users
Thanks for the scoop on adding a starting species tree, Graham - much appreciated! 

I'm still going to run analyses on subsets of loci wth full species tree searching - but I'm also interested in making the attempt with a fixed species tree topology. The best case scenario would be to have all of the analyses return congruent estimates..

I can't help with BPP as I ran my analyses with a guide tree. However, I will be able to let you know if STACEY can handle my data sets with hundreds of loci if you are interested. 

Cheers,

Jim  

michaelm

unread,
Apr 4, 2016, 10:43:08 AM4/4/16
to beast-users
Hi all,

as a compromise between completely constraining the topology to the ASTRAL-II topology and not constraining it at all, it might be reasonable to place topological constraints for all nodes that are well-supported in the ASTRAL-II topology (above a certain support threshold) and thus can be considered reliable. This should speed up the analysis, but would not be as fast as removing all topology operators.

Cheers,
Michael

Jim McGuire

unread,
Apr 6, 2016, 6:41:01 PM4/6/16
to beast-users
I might give this a try, Michael. It will require individually loading a lot of constraints though!

Cheers,

Jim 

Jim McGuire

unread,
Apr 7, 2016, 12:43:28 AM4/7/16
to beast-users
Hi Graham,

I attempted to implement an analysis with a fixed starting tree using your suggestions but my analysis failed with the following error message:

WARNING: no up item specified in UpDownOperator

===============================================================================

Citations for this model:


Bouckaert RR, Heled J, Kuehnert D, Vaughan TG, Wu C-H, Xie D, Suchard MA,

  Rambaut A, Drummond AJ (2014) BEAST 2: A software platform for Bayesian

  evolutionary analysis. PLoS Computational Biology 10(4): e1003537


Heled J, Drummond AJ (2012) Calibrated Tree Priors for Relaxed Phylogenetics

  and Divergence Time Estimation. Systematic Biology 61(1):138-149.


Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed Phylogenetics and

  Dating with Confidence. PLoS Biol 4(5): e88


===============================================================================

Fatal exception: null


Do you have any suggestions for how to correct this problem? I deleted the CoordinatedPruneRegraft and StaceyNodeReheight operators from the xml file, which I assume is what you were suggesting since I did not see a Beauti function for deleting these operators.

Thanks,

Jim



On Monday, April 4, 2016 at 12:47:14 AM UTC-7, Graham wrote:

Graham

unread,
Apr 7, 2016, 5:20:43 AM4/7/16
to beast-users
My guess is there is something wrong your starting tree, but it's very hard to say. The leaves need to be labelled, not just 1,2,3...

Usually, when you get an error like
Fatal exception: null
there is more information about where in the code it happened. If you're sure about your tree, you can email the XML to look at.

I think you can set the operator weights to 0 in Beauti.

Graham Jones

Jim McGuire

unread,
Apr 7, 2016, 1:23:53 PM4/7/16
to beast-users
Hi Graham,

I initially had issues with my starting tree, but StarBeast threw informative error messages and when I finally got the topology issues resolved, I ended up where I am now. I went back and set the weights for the CoordinatedPruneRegraft and StaceyNodeReheight operators to zero (rather than deleting those operators from the xml file manually), but the analysis was again terminated with a Fatal exception: null error message.

Since the only part of the file that I changed is the inclusion of the Newick tree, I think the error may be in the syntax used to do so. I've pasted in below the original text for the random starting tree and then the text for the Newick tree. If anything jumps out at you, please let me know! 

Jim

NEWICK TREE:

    <init id="StartingTree.t:Species" spec="beast.util.TreeParser" initial="@Tree.t:Species" IsLabelledNewick="true">
    <input name="newick">
(fimbriatus_MP:1.0,(hennigi:1.0,(puncatus:1.0,((maculatus:1.0,(((quinquefasciatus:1.0,(maximus:1.0,mindanensis:1.0):1.0):1.0,((blanfordii:1.0,(taeniopterus:1.0,(formosus:1.0,obscurus:1.0):1.0):1.0):1.0,(melanopogon:1.0,(haematopogon:1.0,indochinensis:1.0):1.0):1.0):1.0):1.0,(dussumieri:1.0,((Aphaniotis:1.0,Japalura:1.0):1.0,((((cornutus:1.0,palawanensis:1.0):1.0,((reticulatus:1.0,cyanopterus:1.0):1.0,((guentheri:1.0,ornatus:1.0):1.0,(quadrasi:1.0,(spilopterus_Visayas:1.0,((spilopterus_Bicol:1.0,spilopterus_Luzon:1.0):1.0,(jareckii:1.0,(spilopterus_Cagayan:1.0,(Babuyan:1.0,Camiguin_Norte:1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0,((modigliani:1.0,(sumatranus:1.0,Simueleue:1.0):1.0):1.0,(volans:1.0,((boschmai_Lombok:1.0,boschmai_WSumbawa:1.0):1.0,(boschmai_ESumbawa:1.0,(boschmai_Flores_Lemb:1.0,(timoriensis:1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0,(bimaculatus:1.0,((walkeri_CC1CC2:1.0,(supriatnai:1.0,(spilonotus:1.0,(caerhulians:1.0,(biaro:1.0,iskandari:1.0):1.0):1.0):1.0):1.0):1.0,((walkeri_CC3CC4CC5:1.0,((rhytisma:1.0,lineatus:1.0):1.0,(beccarii:1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0,cristatellus:1.0):1.0):1.0):1.0);
</input>
    </init>

 
RANDOM STARTING TREE:

    <init id="RandomTree.t:Species" spec="beast.evolution.tree.RandomTree" estimate="false" initial="@Tree.t:Species" taxonset="@taxonsuperset">
        <populationModel id="ConstantPopulation0.Species" spec="ConstantPopulation">
            <parameter id="randomPopSize.Species" name="popSize">1.0</parameter>
        </populationModel>
    </init>

Jim McGuire

unread,
Apr 7, 2016, 6:45:32 PM4/7/16
to beast-users
For anyone who is following along with an interest in setting a starting tree in StarBeast, the following syntax at least allows the analysis to be initialized for my data set. This is using the StarBeast with Stacey ops template. Following Graham's advice, I set the weights for the CoordinatedPruneRegraft and StaceyNodeReheight operators to zero in the operators panel which should prevent topological changes during the search. I don't know quite yet if its truly doing what I am hoping its doing, but I should know soon enough and I'll post again to reflect what I find out. Thanks again to Graham for his guidance.

Jim 

    <init id="StartingTree.t:Species" initial="@Tree.t:Species" spec="beast.util.TreeParser" IsLabelledNewick="true">
    <input name="newick">
    ((((((cornutus:1.0,palawanensis:1.0):1.0,((reticulatus:1.0,cyanopterus:1.0):1.0,((guentheri:1.0,ornatus:1.0):1.0,(quadrasi:1.0,(spilopterus_Visayas:1.0,((spilopterus_Bicol:1.0,spilopterus_Luzon:1.0):1.0,(jareckii:1.0,(spilopterus_Cagayan:1.0,(Babuyan:1.0,Camiguin_Norte:1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0):1.0,((modigliani:1.0,(sumatranus:1.0,Simueleue:1.0):1.0):1.0,(volans:1.0,((boschmai_Lombok:1.0,boschmai_WSumbawa:1.0):1.0,(boschmai_ESumbawa:1.0,(boschmai_Flores_Lemb:1.0,timoriensis:2.0):1.0):1.0):1.0):1.0):1.0):1.0,(bimaculatus:1.0,((walkeri_CC1CC2:1.0,(supriatnai:1.0,(spilonotus:1.0,(caerhulians:1.0,(biaro:1.0,iskandari:1.0):1.0):1.0):1.0):1.0):1.0,(walkeri_CC3CC4CC5:1.0,((rhytisma:1.0,lineatus:1.0):1.0,beccarii:2.0):1.0):2.0):1.0):1.0):1.0,(dussumieri:1.0,(((quinquefasciatus:1.0,(maximus:1.0,mindanensis:1.0):1.0):1.0,((blanfordii:1.0,(taeniopterus:1.0,(formosus:1.0,obscurus:1.0):1.0):1.0):1.0,(melanopogon:1.0,(haematopogon:1.0,indochinensis:1.0):1.0):1.0):1.0):1.0,(maculatus:1.0,(cristatellus:1.0,(puncatus:1.0,(hennigi:1.0,fimbriatus_MP:2.0):1.0):1.0):1.0):1.0):1.0):1.0):0.5,(Aphaniotis:1.0,Japalura:1.0):0.5);
</input>

Jim McGuire

unread,
Apr 7, 2016, 7:51:43 PM4/7/16
to beast-users
Well, the starting tree appears to have loaded correctly, but some species tree branch swapping (of some sort) is still taking place (perhaps through node height adjustments?). The topology is not fixed. I don't know enough about STACEY's operators to determine how to proceed. Graham, if you see this and have a suggestion on how to further adjust the operators, I'd be grateful for the advice!

Cheers,

Jim   

Graham

unread,
Apr 8, 2016, 3:25:13 AM4/8/16
to beast-users
The starting tree must be ultrametric (same distance from root to each tip). I would have expected a crash with your non-ultrametric tree. I imagine something pretty crazy is going on.

Jim McGuire

unread,
Apr 8, 2016, 12:41:10 PM4/8/16
to beast-users
Yes, it pretty quickly deviates from the starting tree. It also runs very slowly. Now, after 2 million generations, the tree is very different from where we started and I suspect it will have trouble finding its way back without the standard branch swapping operators.

I'll try again with an ultrametric tree! However, this exercise is already making it pretty clear that this approach will probably ultimately fail. StarBeast finds a high quality topology pretty quickly - its the other parameters that converge very slowly and this won't change that - might even slow things down..

Jim 

Jim McGuire

unread,
Apr 8, 2016, 2:01:29 PM4/8/16
to beast-users
The ultrametric starting species tree does not improve the situation. I believe that starting gene trees are also needed as the initial gene trees are likely to be quite distant from the starting species tree and are likely driving the initial moves away from what should be an excellent species tree topology. Trying to figure out how to load starting gene trees now..

Jim  



On Friday, April 8, 2016 at 12:25:13 AM UTC-7, Graham wrote:
Reply all
Reply to author
Forward
0 new messages