*BEAST: enforcing monophyly/outgroup in both species tree and gene tree

1,814 views
Skip to first unread message

Sonja Schwartz

unread,
Apr 2, 2012, 5:25:41 PM4/2/12
to beast-users
I am doing a *BEAST run with two mtDNA genes (16S and the control
region) in *BEAST. I have ~150 individuals from 3 ingroup species
and one outgroup. In the current set up, I have the substitution
rates and clock models unlinked, but the tree linked since these are
effectively one locus. I have a very strong prior for an outgroup and
have enforced this on my species tree. When I run the data, however,
the gene tree is not rooted the same way.

Is there a way to enforce the outgroup in the gene tree as well as the
species tree? Beauti 1.7 allows me to set up species sets, but not
taxon sets once I select the *BEAST option. Do I have to go in and
edit the XML directly? Is there some methodological concern that I'm
not taking into account?

Thanks,

Sonja

----------------------------------
Sonja Schwartz
PhD Candidate
Department of Environmental Science, Policy & Management
University of California, Berkeley
sonja.s...@berkeley.edu

Alexei Drummond

unread,
Apr 3, 2012, 4:24:56 PM4/3/12
to beast...@googlegroups.com
Why would you want to enforce monophyly on the gene tree as well as the species tree? What sort of prior information would make you certain that there was no incomplete lineage sorting, hybridization, gene duplication or other reasons for inconsistency between gene tree and species tree?

Alexei

Sent from my iPhone

> --
> You received this message because you are subscribed to the Google Groups "beast-users" group.
> To post to this group, send email to beast...@googlegroups.com.
> To unsubscribe from this group, send email to beast-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.
>

Sonja Schwartz

unread,
Apr 3, 2012, 5:54:22 PM4/3/12
to beast-users
Well, since I'm only running the analysis with one locus, shouldn't
the species tree and gene tree match?

To clarify, all my species themselves come out as monophyletic in both
the gene tree and the species tree. The only disagreement is in the
position of the root since since I enforced the outgroup in the
species tree in order to calibrate my tree. I have a very strong
prior on this outgroup from both the literature and from other
analyses that I've done with this same data set. This locus is super
variable, however, so I get very long branches between clades which I
think is contributing to the gene tree rooting issue.

On Apr 3, 1:24 pm, Alexei Drummond <alexei.drumm...@gmail.com> wrote:
> Why would you want to enforce monophyly on the gene tree as well as the species tree? What sort of prior information would make you certain that there was no incomplete lineage sorting, hybridization, gene duplication  or other reasons for inconsistency between gene tree and species tree?
>
> Alexei
>
> Sent from my iPhone
>
> On 3/04/2012, at 9:25 AM, Sonja Schwartz <sonjaschwa...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I am doing a *BEAST run with two mtDNA genes (16S and the control
> > region) in *BEAST.   I have ~150 individuals from 3 ingroup species
> > and one outgroup.   In the current set up, I have the substitution
> > rates and clock models unlinked, but the tree linked since these are
> > effectively one locus.  I have a very strong prior for an outgroup and
> > have enforced this on my species tree.  When I run the data, however,
> > the gene tree is not rooted the same way.
>
> > Is there a way to enforce the outgroup in the gene tree as well as the
> > species tree?  Beauti 1.7 allows me to set up species sets, but not
> > taxon sets once I select the *BEAST option.  Do I have to go in and
> > edit the XML directly?  Is there some methodological concern that I'm
> > not taking into account?
>
> > Thanks,
>
> > Sonja
>
> > ----------------------------------
> > Sonja Schwartz
> > PhD Candidate
> > Department of Environmental Science, Policy & Management
> > University of California, Berkeley
> > sonja.schwa...@berkeley.edu

Pip Griffin

unread,
Apr 4, 2012, 5:20:45 AM4/4/12
to beast...@googlegroups.com
Hi Sonja,

maybe you need to rethink the idea of calculating a species tree based
on a single gene tree...

Pip

Sonja Schwartz

unread,
Apr 4, 2012, 12:49:07 PM4/4/12
to beast-users
I have thought about this, but this is the data I have. I am only
using *BEAST because I want to use the multispecies coalescent prior.
I am actually more interested in the gene tree output from *BEAST,
which is why I want to be able to root that tree correctly.

alexei

unread,
Apr 4, 2012, 5:37:26 PM4/4/12
to beast-users
On Apr 4, 9:54 am, Sonja Schwartz <sonjaschwa...@gmail.com> wrote:
> Well, since I'm only running the analysis with one locus, shouldn't
> the species tree and gene tree match?

Not necessarily. Especially if you are enforcing a particular
relationship on the species tree that doesn't match the gene tree.
Also: Bayesian methods return *distributions* not point estimates.
Considering only the point estimates can muddy the water when the
uncertainty is substantial. Perhaps the gene tree topology you believe
is the true one is actually in the 95% credible set. If so, then there
is no conflict with your prior and you can constrain both the gene
tree and species tree if you really want to.

> To clarify, all my species themselves come out as monophyletic in both
> the gene tree and the species tree.  The only disagreement is in the
> position of the root since since I enforced the outgroup in the
> species tree in order to calibrate my tree.  I have a very strong
> prior on this outgroup from both the literature and from other
> analyses that I've done with this same data set.  This locus is super
> variable, however, so I get very long branches between clades which I
> think is contributing to the gene tree rooting issue.

Having said that, if the gene tree topology is not what you strongly
expect then here are a couple of other thoughts to consider:

(1) Maybe your prior expectation should be modified by the data at
hand.
(2) Perhaps the "incorrect" rooting of the gene tree is due to model
misspecification. If that is the case perhaps you should try to
correct the model misspecification rather than forcing the topology by
constraints.

alexei

unread,
Apr 4, 2012, 5:39:55 PM4/4/12
to beast-users


On Apr 4, 9:20 pm, Pip Griffin <pip.grif...@gmail.com> wrote:
> Hi Sonja,
>
> maybe you need to rethink the idea of calculating a species tree based
> on a single gene tree...

Estimating a species tree from a single gene tree is perfectly valid.
In fact, I would highly recommend it, as it will give much more
realistic assessments of the posterior clade supports (i.e. it will
correctly reduce the level of certainty on species tree groupings,
since incomplete lineage sorting may mean the species tree is
different from the gene tree, even in the face of high posterior
support for groupings in the gene tree topology).

Alexei

alexei

unread,
Apr 4, 2012, 5:47:47 PM4/4/12
to beast-users


On Apr 5, 4:49 am, Sonja Schwartz <sonjaschwa...@gmail.com> wrote:
> I have thought about this, but this is the data I have.  I am only
> using *BEAST because I want to use the multispecies coalescent prior.
> I am actually more interested in the gene tree output from *BEAST,
> which is why I want to be able to root that tree correctly.

So in conclusion: you can enforce constraints on both the gene tree
and species tree if you want. However you can't do it using BEAUti.
Instead you will have to add a taxa block, a monophyly statistic and a
boolean likelihood into the BEAST input XML for the gene tree in-
group. It may be easiest to just use BEAUti to create a simple Yule or
coalescent analysis on your gene tree with the gene tree topology
constraint you are after, and then copy the resulting taxa block,
monophyly statistic and boolean likelihood XML from that output file
to your *BEAST output file. As well as copying those three XML
elements (and changing their id's to avoid clashes if necessary), you
will also have to add an idref of the boolean likelihood into the
prior element of the mcmc element. If you haven't edited the XML
directly before you may need to find a friend to help.

Cheers
Alexei

Sonja Schwartz

unread,
Apr 5, 2012, 12:57:58 AM4/5/12
to beast-users
Great. I will take another look another look at my model and proceed
from there. Thanks for all of the advice!

-Sonja

Pip Griffin

unread,
Apr 5, 2012, 3:26:00 AM4/5/12
to beast...@googlegroups.com
Thanks Sonja and Alexei for explaining the
species-tree-with-single-gene-tree idea, that makes a lot of sense.

Pip

antoine

unread,
Jul 30, 2012, 5:26:48 PM7/30/12
to beast...@googlegroups.com
Hi
I have a somewhat similar problem but I haven't figure it out
I have a 5 loci dataset with 50 terminals; one mtDNA and 4 nuDNA partitions that I want to calibrate with 2 priors; one being enforced as monophyletic and compute the species tree
From Beauti I obtained a XML that I had to modify to have the taxa/calibration defined in the species tree block, as advised in a previous post
it seems to work...

then I wanted to set the monophyly in the species tree block
the original from Beauti looks like that:
    <monophylyStatistic id="monophyly(clade)">
        <mrca>
            <taxa idref="clade"/>
        </mrca>
        <treeModel idref="mtDNA.fas.treeModel"/>
    </monophylyStatistic>

it seems to arbitrarily take one of the gene, here the mtDNA, into account but only that.
I've tried to add a block for each gene creating a new id and completing the MCMC block
 but keep obtaining the zero probability problem..

Can someone give some advises on how to enforce monophyly for each gene tree? and/or the species tree?

Cheers

Arman Bilge

unread,
Jul 31, 2012, 8:12:51 AM7/31/12
to beast...@googlegroups.com
Hi Antoine,

I have a feeling that you are on the right track. If I understand correctly, you need to independently inform the random starting tree for each gene of any monophyletic constraints you have put in place to avoid the zero probability error. You can edit the starting tree XML blocks to fix that.

Regarding a monophyletic constraint over the entire species tree, I took a look at some of my own *BEAST XMLs and they seem to incorporate that feature as follows:

<coalescentTree id="spStartingTree">
<constrainedTaxa>
<taxa idref="allSpecies"/>
<tmrca monophyletic="true">
<taxa idref="clade"/>
</tmrca>
</constrainedTaxa>
<constantSize id="spInitDemo" units="substitutions">
<populationSize>
<parameter id="sp.popSize" value="####"/>
</populationSize>
</constantSize>
</coalescentTree>

. . . 

<tmrcaStatistic id="tmrca(clade)">
<mrca>
<taxa idref="clade"/>
</mrca>
<speciesTree idref="sptree"/>
</tmrcaStatistic>
<monophylyStatistic id="monophyly(clade)">
<mrca>
<taxa idref="clade"/>
</mrca>
<speciesTree idref="sptree"/>
</monophylyStatistic>

I know I didn't do anything special to create these (I used BEAUti 1.7.1). Are you sure *BEAST is checked in BEAUti? If not then the monophyletic constraints are asked to be created on a per-gene basis.

Best,
Arman
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/H0bNiPf-J9QJ.
Reply all
Reply to author
Forward
0 new messages