MRCA for internal node of gene tree and ESS for gene tree height

146 views
Skip to first unread message

kazee

unread,
Jan 15, 2023, 1:46:47 PM1/15/23
to beast-users
Hi All,

I have tried using multi-species coalescent model with a goal of reconstructing a gene for a single locus (as a test) from 3 species (two ingroup and one outgroup). I fixed prior for TMRCA between ingroup and outgroup. But I can't get MRCA for ingroup. Why is it so?

I have mentioned the parameter and problem below. Please let me know if anyone have any idea on it.

***********Parameters****************
1. site model - gamma category count 4, shape checked for estimate, substitution model GTR

2. gene clock model - clock rate unchecked for estimate

3. Species clock model - species tree relaxed clock, stdev 0.1 & checked for estimate and clock rate 1.0 & checked for estimate

4. Priors - Yule model 

5. Add prior  i) MRCA prior ---> Tree.t:Species
                        a. internal node prior - uniform species (A & B) as monophyletic;
                        b. root node prior - lognormal MRCA prior for species A, B, C (M = 1,  S = 5.5, offset = 42.0, and checked for mean in real space) 
                        
                      ii) MRCA prior ---> Tree.t.filesuffix (for gene tree and MRCA)
                       a. internal node prior - uniform alleles of a locus from all individuals species A & B as monophyletic
                       b. root node prior - lognormal MRCA prior for alleles of a locus from all individuals across species A, B, C (M = 1,  S = 5.5, offset = 42.0, and checked for mean in real space)

6. MCMC  - chain length 10 millions


*********** Warnings and Errors*******************

A. Warning : initial value <= 0, which is not appropriate for a LogTransform

B. Error message: Start likelihood: -Infinity after 10 initialisation attempts
                                 P(posterior) = -Infinity (was -Infinity)

                               - java.lang.RuntimeException: Could not find a proper state to initialise. Perhaps try another seed.

When I re-ran without step 5 ii) from Add prior parameter; the run initialized and completed with good ESS value for species clade. However, I want MRCA for internal node.

When I ran loganalyzer I could not find MRCA for internal node (i.e. MRCA for alleles of a locus from all individuals of species A and B. Also, ESS for gene tree total height was very low.

Is possible to increase ESS for gene tree total height as well as extract MRCA from gene tree or loganalyzer output? Please let me know.

Regards,
Kazee

Jordan Douglas

unread,
Jan 19, 2023, 7:38:44 PM1/19/23
to beast-users
Hi Kazee,

Thank you for reporting this, I am looking into it now. It seems that beast is having an issue creating an initial state around the mrcapriors. I will let you know when we have a solution here.



In the meantime, if you want to see the mrca height logged, you could generate the full xml file and then find the block of code which looks like this:
<distribution id="tres.prior" spec="beast.base.evolution.tree.MRCAPrior" monophyletic="true" tree="@Tree.t:Species">
...
</distribution>

and move it above the <run id="mcmc"...> block. This will ensure that the object still exists but it does not affect the model.


Unfortunately the gene tree  heights often mix quite slowly. If you use a strict clock they will mix slightly faster, but otherwise this is quite a general problem with the multispecies coalescent.


Thanks
Jordan

kazee

unread,
Jan 21, 2023, 2:31:46 PM1/21/23
to beast-users
Hi Jordan,

Thank you for letting me.

Is it possible at least reconstruct gene tree by just using all alleles (of a given locus) from all individuals from two species (without providing a root but only approximate divergence time)? This is because I want determine whether the gene tree for a give pair of species is reciprocally monophyletic or not? Please let me know.

Regards,
Kazee

Jordan Douglas

unread,
Jan 22, 2023, 5:42:20 PM1/22/23
to beast-users

Hi Kazee,

If you want to find the probability that two taxa form a clade, you could generate the xml with the MRCAPrior, and then edit the xml file by moving the MRCAPrior xml block above the mcmc block (as described in my previous comment above). This will ensure that the log-prior and the height of the clade are still being logged, but without affecting the actual posterior distribution / model. If the log-prior being logged is negative infinity, this means that the taxa are not monophyletic for that state (or their height is outside the bounds of the distribution). By counting the number of times the prior is greater than negative infinity, you can estimate the posterior probability of the taxa being monophyletic.


Jordan

kazee

unread,
Feb 11, 2023, 1:43:25 PM2/11/23
to beast-users
Hi Jordan,

I am following up on the problem.

There are two things I would like to mention.

First, thank you for answering my question.

Second,  I was trying to ask if I take all the alleles for a given locus from all the individuals of two species, is there a way to find out if the gene tree is reciprocally monophyletic (i.e. alleles from individuals of one species form one clade and those from another species form another clade)? The outgroup (the third species) I was trying to use failed to produce enough number of loci so I can't use the outgroup for estimating genome-wide TMRCA between alleles of two species. In the absence of outgroup, is it still possible to get the topology of gene tree from alleles of a given homologus locus in two species only and ask if the gene tree is reciprocally monophyletic?

Please let me know. It will help me to make progress on this analysis.

Regards,
Kazee

Jordan Douglas

unread,
Feb 15, 2023, 8:01:07 PM2/15/23
to beast-users
Hi Kazee,

If you assign all of the individuals of a species to the same 'species' in the species tree, then no you can't measure that - because starbeast3 will force each species to be monophyletic.

But if you assign each individual to its own 'species', then yes you can measure this by counting the number of times each group forms a clade, and dividing by the number of posterior samples. This will require some postprocessing, and I am not aware of any existing code which does this.

You can still estimate the topology without an outgroup, just make sure you assign each individual to its own species, otherwise there will only be 1 possible species tree topology

Hope this helps
Jordan

Jordan Douglas

unread,
Mar 7, 2023, 3:13:09 PM3/7/23
to beast-users
Hi Kazee,

Just following up on this: this mrca prior initialisation problem has been resolved now, with starbeast3 v1.1.7. 

Jordan

Reply all
Reply to author
Forward
0 new messages