TMRCAs and EBSP

868 views
Skip to first unread message

elena

unread,
Aug 16, 2011, 10:46:11 AM8/16/11
to beast...@googlegroups.com
Hi all,

I would like to use divergence date estimates for the MRCA of each lineage obtained from *BEAST for an EBSP analysis  in order to place an estimate of time on these demographic analyses . I have tried to do so in Beauty by defining taxon sets for each gene partition, in this case the taxon set includes all sequences for that gene, and then assigning corresponding HPD 95% upper and lower bounds to the TMRCAs priors. However Beauty behaves weirdly and I am not able to generate the xml file.
I am not sure if it is a bug of beauty or if there is some exception in the EBSP analysis or more probably there is some thing wrong in what I am doing. Perhaps is in the csv output where I should use MRCA *beast estimates to place and estimate of time in the population size function?

Hope anyone can help me

elena

elena

unread,
Aug 17, 2011, 4:39:34 AM8/17/11
to beast...@googlegroups.com
Related to my question...

Given that all sequences in the EBSP analysis correspond to the same species/lineage and thus there may be no logic in defining taxon sets to set priors for the age of the node, should I use TMRCAs obtained from *beast for that particular clade to define bounds for the treemodel.rootheight parameter for each gene partition in the EBSP analysis?

thanks again
Message has been deleted

elena

unread,
Aug 24, 2011, 4:22:33 AM8/24/11
to beast...@googlegroups.com
I re-formulate my questions in case anyone can give me some advice on this:

Which would be the best way to set age estimates on Extended Bayesian Skyline Plots? I ran first *BEAST and I obtained global estimates for substitution rates per gene as well as TMRCAs for each node. I guess it would be more appropiate to use those TMRCAs as informative priors for the EBSP analysis on each particular lineage instead of using a global rate for the whole tree, however I can not find the way to do it in Beauty. Should I define taxon sets or modify the treemodel.rootheight parameter? In either case Beauty seems not happy with any of these options.

thanks

elena

Claudia Ciotir

unread,
Aug 24, 2011, 10:41:31 PM8/24/11
to beast...@googlegroups.com
Hi Elena,

I saw your message and it attired my attention because I am trying to figure out how to generate mutation rates for my data and I cannot find out how? In some tutorial in BEAST it says..' that rate shall come from an external source'... I am a  beginner and I don't know the answer to your question, but I would kindly ask you to tell me how did you calculate the global mutation rates which you mention? quote ' I obtained global estimates for substitution rates per gene as well as TMRCAs for each node".

I hope someone will answer your question.

Thanks very much in advance,

Claudia

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/vKUSOlVEJXkJ.

To post to this group, send email to beast...@googlegroups.com.
To unsubscribe from this group, send email to beast-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.

christophe Lemaire

unread,
Aug 29, 2011, 8:08:27 AM8/29/11
to beast...@googlegroups.com
Hello
I am trying to run EBSP analyses on a haploid organisms population for
which I have heterochronous sampling and a dateset of seven genes. BSP
is compatible with heterochronous sampling but I suspect EBSP to only
accept isochronous samples. So, should I run BSP analyses on each of the
genes instead of EBSP?
Thanks very much in advance
Christophe

christophe Lemaire

unread,
Aug 29, 2011, 8:31:55 AM8/29/11
to beast...@googlegroups.com
Hello
In the new version of Beast (1.6.2) the performance suggestion has
disappeared from the output window. Is there another way to better tune
the operators?

Andrew Rambaut

unread,
Aug 29, 2011, 8:39:05 AM8/29/11
to beast...@googlegroups.com
Dear Christophe,

For many years, BEAST has auto-tuned the operators (if the auto-optimize check box in BEAUti is on). The performance suggestions were then a bit confusing to many users so in the latest version we took the decision not to report these. You can still see the acceptance rate so you can see how well optimized they actually were (assuming 25% acceptance is in fact optimal). If you want to manually tune your operators, turn auto-optimize off and the performance suggestions will be shown.

Andrew

> --
> You received this message because you are subscribed to the Google Groups "beast-users" group.

elena

unread,
Aug 29, 2011, 9:55:42 AM8/29/11
to beast...@googlegroups.com
Hi Claudia,

I am not an expert Beast user but you may be able to get estimates of mutation rates from your Beast analysis by viewing the log file in Tracer, this will give you the mean and standard deviation for each gene partition. In the case of TMRCAs these age estimates are provided in the .tree files.

Hope it helps.

elena

elena

unread,
Aug 29, 2011, 9:59:23 AM8/29/11
to beast...@googlegroups.com
Dear Andrew,

I posted this message some days ago and I would greatly appreciate your help because I still did not find the solution.


I've been trying to run an EBSP analysis on a lineage for which I have the TMRCAs estimates. I have set uniform priors on the treeModel.rootHeight parameters  with upper and lower bounds as those obtained in *BEAST for each locus using beauty. I left priors for ucld.mean and sd by default. However I got the following error message:


Random number seed: 1314456285784

Parsing XML file: ho_EBSP_test2.xml
  File encoding: MacRoman
Looking for plugins in /Users/Elena/Documents/plugins
Read alignment: alignment1
  Sequences = 19
      Sites = 294
   Datatype = nucleotide
Read alignment: alignment2
  Sequences = 38
      Sites = 668
   Datatype = nucleotide
Read alignment: alignment3
  Sequences = 18
      Sites = 513
   Datatype = nucleotide
Read alignment: alignment4
  Sequences = 38
      Sites = 454
   Datatype = nucleotide
Site patterns 'ho_cyb-2.fasta.patterns' created from positions 1-294 of alignment 'alignment1'
  pattern count = 16
Site patterns 'ho_mc1r_phased-p70.fasta.
patterns' created from positions 1-668 of alignment 'alignment2'
  pattern count = 10
Site patterns 'ho_ND4-2.fasta.patterns' created from positions 1-513 of alignment 'alignment3'
  pattern count = 33
Site patterns 'ho_PRDX4_phased-p70.fasta.patterns' created from positions 1-454 of alignment 'alignment4'
  pattern count = 7
Creating the tree model, 'ho_cyb-2.fasta.treeModel'
  initial tree topology = ((((((Ho_S3021,Ho_S3442),Ho_S5329),(Ho_S2608,Ho_S5337)),((Ho_S2531,Ho_S2812),Ho_S3002)),(Ho_S5181,Ho_S5348)),(((((Ho_S3033,Ho_S5446),(Ho_S4058,Ho_S5382)),Ho_S3575),(Ho_S5394,Ho_S5435)),(Ho_S2624,Ho_S5319)))
  tree height = 116.63266258493287
Creating the tree model, 'ho_mc1r_phased-p70.fasta.treeModel'
  initial tree topology = (((((((((((Ho_S2608b,Ho_S3021a),(Ho_S3033b,Ho_S4058b)),Ho_S2624b),Ho_S5337a),(Ho_S3575a,Ho_S3575b)),Ho_S5337b),((Ho_S2624a,Ho_S5181a),Ho_S5319b)),(((((Ho_S3021b,Ho_S5329a),Ho_S5394a),Ho_S5435a),(Ho_S2531a,Ho_S5435b)),(Ho_S5382b,Ho_S5446b))),Ho_S3002a),((Ho_S2608a,Ho_S3002b),Ho_S3033a)),(((((((Ho_S2812a,Ho_S5348a),Ho_S5446a),(Ho_S5348b,Ho_S5394b)),(Ho_S4058a,Ho_S5181b)),((Ho_S2531b,Ho_S3442b),Ho_S5382a)),Ho_S3442a),((Ho_S5319a,Ho_S5329b),Ho_S2812b)))
  tree height = 148.98998856122984
Creating the tree model, 'ho_ND4-2.fasta.treeModel'
  initial tree topology = (((((((Ho_S3033,Ho_S5337),Ho_S3002),Ho_S3575),Ho_INFO: resetting length of parameter demographic.popSize(size 1) in variable demographic model to 110
INFO: resetting length of parameter demographic.indicators in variable demographic model to 109
S2531),Ho_S5348),((Ho_S3442,Ho_S5446),Ho_S5382)),((((((Ho_S4058,Ho_S5319),Ho_S5329),(Ho_S5394,Ho_S5435)),(Ho_S2608,Ho_S5181)),Ho_S2624),Ho_S3021))
  tree height = 133.37326093254723
Creating the tree model, 'ho_PRDX4_phased-p70.fasta.treeModel'
  initial tree topology = (((((((((Ho_S3442a,Ho_S5337a),Ho_S3033a),Ho_S3575a),(Ho_S3021a,Ho_S3021b)),(Ho_S5337b,Ho_S5394b)),((((Ho_S2531a,Ho_S2812b),Ho_S5181a),Ho_S4058b),((Ho_S2624b,Ho_S3442b),Ho_S5446a))),((Ho_S5181b,Ho_S5435a),Ho_S2608a)),((((Ho_S4058a,Ho_S5382b),Ho_S5348a),(Ho_S2812a,Ho_S5319a)),((Ho_S2531b,Ho_S5446b),Ho_S2608b))),((((((Ho_S5348b,Ho_S5435b),Ho_S5394a),(Ho_S3575b,Ho_S5319b)),Ho_S3002b),(Ho_S5329a,Ho_S5382a)),(((Ho_S3002a,Ho_S5329b),Ho_S3033b),Ho_S2624a)))
  tree height = 350.4322685700231
Variable demographic: linear control points
Using discretized relaxed clock model.
  over sampling = 1
  parametric model = logNormalDistributionModel
   rate categories = 36
Using discretized relaxed clock model.
  over sampling = 1
  parametric model = logNormalDistributionModel
   rate categories = 74
Using discretized relaxed clock model.
  over sampling = 1
  parametric model = logNormalDistributionModel
   rate categories = 34
Using discretized relaxed clock model.
  over sampling = 1
  parametric model = logNormalDistributionModel
   rate categories = 74
Creating state frequencies model: Using empirical frequencies from data = {0,29592, 0,29825, 0,14608, 0,25976}
Creating HKY substitution model. Initial kappa = 2.0
Creating site model.
Creating state frequencies model: Using empirical frequencies from data = {0,17746, 0,34424, 0,2283, 0,24999}
Creating HKY substitution model. Initial kappa = 2.0
Creating site model.
Creating state frequencies model: Using empirical frequencies from data = {0,34362, 0,35153, 0,10223, 0,20262}
Creating site model.
  4 category discrete gamma with initial shape = 1.0
Creating state frequencies model: Using empirical frequencies from data = {0,32217, 0,14758, 0,20199, 0,32827}
Creating HKY substitution model. Initial kappa = 2.0
Creating site model.
TreeLikelihood(ho_cyb-2.fasta.treeModel) using native nucleotide likelihood core
  Ignoring ambiguities in tree likelihood.
  With 16 unique site patterns.
Branch rate model used: discretizedBranchRates
TreeLikelihood(ho_mc1r_phased-p70.fasta.treeModel) using native nucleotide likelihood core
  Ignoring ambiguities in tree likelihood.
  With 10 unique site patterns.
Branch rate model used: discretizedBranchRates
TreeLikelihood(ho_ND4-2.fasta.treeModel) using native nucleotide likelihood core
  Ignoring ambiguities in tree likelihood.
  With 33 unique site patterns.
Branch rate model used: discretizedBranchRates
TreeLikelihood(ho_PRDX4_phased-p70.fasta.treeModel) using native nucleotide likelihood core
  Ignoring ambiguities in tree likelihood.
  With 7 unique site patterns.
Branch rate model used: discretizedBranchRates
Creating swap operator for parameter ho_cyb-2.fasta.branchRates.categories (weight=10.0)
Creating swap operator for parameter ho_mc1r_phased-p70.fasta.branchRates.categories (weight=10.0)
Creating swap operator for parameter ho_ND4-2.fasta.branchRates.categories (weight=10.0)
Creating swap operator for parameter ho_PRDX4_phased-p70.fasta.branchRates.categories (weight=10.0)
Likelihood is using -1 threads.
Creating the MCMC chain:
  chainLength=50000000
  autoOptimize=true
  autoOptimize delayed for 500000 steps
Error running file: ho_EBSP_test2.xml
The initial model is invalid because state has a zero probability.

If the log likelihood of the tree is -Inf, his may be because the
initial, random tree is so large that it has an extremely bad
likelihood which is being rounded to zero.

Alternatively, it may be that the product of starting mutation rate
and tree height is extremely small or extremely large.

Finally, it may be that the initial state is incompatible with
one or more 'hard' constraints (on monophyly or bounds on parameter
values. This will result in Priors with zero probability.

The individual components of the posterior are as follows:
The initial posterior is zero:
  CompoundLikelihood(compoundModel)=(
    DistributionLikelihood=-1,8654,
    DistributionLikelihood=-1,8654,
    DistributionLikelihood=-3,184,
    DistributionLikelihood=-3,1687,
    DistributionLikelihood=-3,184,
    DistributionLikelihood=-3,184,
    DistributionLikelihood=-3,184,
    DistributionLikelihood=-1,8654,
    DistributionLikelihood=0,0986,
    DistributionLikelihood=0,0986,
    DistributionLikelihood=0,0986,
    DistributionLikelihood=0,0986,
    DistributionLikelihood=-Inf,
    DistributionLikelihood=-Inf,
    DistributionLikelihood=-Inf,
    DistributionLikelihood=-0,6931,
    OneOnX(demographic.populationMean)=0,0,
    OldAbstractCoalescentLikelihood(coalescentLikelihood)=-10700,9785,
    MixedDistributionLikelihood(MixedDistributions)=-110,0
  ),
  CompoundLikelihood(compoundModel)=(
    TreeLikelihood(treeLikelihood)=-6990,5378,
    TreeLikelihood(treeLikelihood)=-29474,3254,
    TreeLikelihood(treeLikelihood)=-9092,6744,
    TreeLikelihood(treeLikelihood)=-20173,1908
  )
For more information go to <http://beast.bio.ed.ac.uk/>.

BEAST has terminated with an error. Please select QUIT from the menu.


Thanks

elena

Andrew Rambaut

unread,
Aug 29, 2011, 10:06:01 AM8/29/11
to beast...@googlegroups.com
Looking at the output, the 13th, 14th and 15th priors have a 0 initial likelihood (log likelihood -Inf). This means (if you are using uniform priors) then the starting values for the parameters lie outside the prior bounds. Probably the best thing is not to use uniform priors (use a continuous distribution such as a normal.

Andrew

On 29 Aug 2011, at 14:59, elena wrote:

>

___________________________________________________________________
Andrew Rambaut
Institute of Evolutionary Biology University of Edinburgh
Ashworth Laboratories Edinburgh EH9 3JT
EMAIL - a.ra...@ed.ac.uk TEL - +44 131 6508624

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

christophe Lemaire

unread,
Aug 29, 2011, 10:33:09 AM8/29/11
to beast...@googlegroups.com
Thanks Andrew
I posted just before a question about EBSP and heterochronous sampling.
What do you suggest concerning this issue? Should I use separate BSP for
each gene?
Thank you very much in advance

Andrew Rambaut

unread,
Aug 29, 2011, 11:15:56 AM8/29/11
to beast...@googlegroups.com
I think Joseph needs to answer this one. I am not familiar enough with the EBSP to know whether this handles heterochronous sampling.

Andrew

___________________________________________________________________

Claudia Ciotir

unread,
Aug 31, 2011, 9:43:59 PM8/31/11
to beast...@googlegroups.com
Thanks Elena,
That helped a lot. I estimated the substitution rates, and I got the trmca in substitutions(branch lengths I understand). It's been two days that I am trying to thansform rates of substitution in millions of years and I cannot figure that out. I looked at tutorials...i.e. Primate divergence dates..but they knew their dates to calibrate few nodes..Can you give me a suggestion about how to convert those numbers?
I appreciate your help!
Thanks,
Claudia



elena

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/zxNH-POYvQoJ.

Hassan Ebrahimi

unread,
Aug 31, 2011, 10:07:03 PM8/31/11
to beast...@googlegroups.com
Sorry for entering your discussion without permission. I am not an expert too but I understand your problem.
Check this out:  http://beast.bio.ed.ac.uk/Parameters . I think it helps.

If clock.rate is fixed to 1.0, the tmrca is in units of "substitution per site" and you need to know the actual  STRICT clock.rate ( from somewhere else) in order to convert it to  units of time (years or million years). To this end , you just need to divide the tmrca ( in units of substitution per site) by the clock.rate.

Regards,

Hassan


From: Claudia Ciotir <claudi...@googlemail.com>
To: beast...@googlegroups.com
Sent: Wednesday, August 31, 2011 6:43 PM
Subject: Re: TMRCAs and EBSP

Claudia Ciotir

unread,
Sep 1, 2011, 11:59:27 AM9/1/11
to beast...@googlegroups.com
Thank you Hassan, this is great advance in understanding parameters. Though, how do people calculate the  actual  STRICT clock.rate? For instance, one of my outgroups has an age between 3.5-6 my. Is this going to help me to find the strict clock rate? Thanks a lot!
Claudia

Hassan Ebrahimi

unread,
Sep 1, 2011, 9:41:03 PM9/1/11
to beast...@googlegroups.com
Your information about the out-group helps a lot but you should run BEAST again.

Now you know the root age ( root.Height). With this information, you should tell BEAST to estimate the clock rate( check the box  in BEAUTi otherwise it fixes the clock rate to 1.0). Include the outgroup in the taxa set in case you haven't, Then set a prior to root.Height based on the information you have.You might give a uniform prior with lower  and higher limits of 3.5 and 6 Myr, respectively. Run BEAST and you will have  age estimates of TMRCAs in Myr ( the same units as the prior).

If you don't have time for re-running BEAST and you would like to have rough estimates of the clock.rate as well as the ages of the TMRCAs using the previous results,  convert them using the following equation provided that the outgroup has been included in the previous analysis:

(3.5+6)/2=4.75

clock.rate in subst. per site per Myr= (root.Heigh in subst. per site )/4.75

and

age of TMRCA in Myr= (age of TMRCA in subst. per site )/(clock.rate in subst. per site per Myr )



Hassan.

Sent: Friday, September 2, 2011 12:59 AM

Ricardo Ramiro

unread,
Sep 19, 2011, 3:41:20 PM9/19/11
to beast...@googlegroups.com
Dear Andrew,

I was looking at the operator output for a *Beast analysis I am doing and got the following results quite regularly: 

Operator                                                         Tuning   Count      Time     Time/Op  Pr(accept)  Performance suggestion

Wide Exchange(26Snew_align2.treeModel)                     767477     54246    0.07     0.0222      good

wilsonBalding(26Snew_align2.treeModel)                        767129     112403   0.15     0.0235      slightly low

Given your suggestion above, I have two hopefully simple questions:
- why would the performance suggestion be different for these two operators, if the acceptance rate is quite similar?
- The optimal acceptance rate will be around 25% and not >25%, right? so that the operators should be changed not only when these are low but also at high values?

Thanks,

Ramiro

alexei

unread,
Sep 19, 2011, 9:26:42 PM9/19/11
to beast-users
Dear Ramiro,

The performance suggestions don't mean anything for the Wide Exchange
and wilsonBalding operators since neither of them is tunable. In the
case of non-tunable operators the performance suggestions just give
you an idea of where the acceptance probability is compared to
something "typical" for that operator -- i.e. the suggestion is
basically meaningless. Both of those moves are moves that make fairly
large changes to the tree topology, so they are mainly useful for burn-
in and for (infrequent) jumps between distant modes. Their acceptance
probabilities are generally low unless you are sampling the prior.
Because the Wide Exchange is a "bigger" move an acceptance probability
of 0.02 is (arbitrarily) deemed "good" whereas for wilsonBalding it is
not.

Cheers
Alexei
Reply all
Reply to author
Forward
0 new messages