strict vs relaxed clock

3,471 views
Skip to first unread message

Christopher Blair

unread,
Mar 5, 2012, 11:01:35 AM3/5/12
to beast...@googlegroups.com
Dear Beast users,

I am analyzing a dataset that consists of mtDNA for about 150 terminals composed of 3 species and multiple populations per species. I also downloaded 11 additional species for external calibrations. I have analyzed the data under both a strict and ucld relaxed clock model. The analysis runs quite nicely under a strict model with good mixing and very high ESS values. In contrast, mixing is very poor under the relaxed clock and ESS values for many parameters (eg posterior and likelihood) do not appear to increase. The mean for ucld.stdev is 3.65, but again the ESS value is 28. However, if I calculate Bayes Factors in Tracer the relaxed model is strongly favored. I was hoping some of you might have some thoughts on which model I should choose and if a random local clock might be a good option to try. Any help would be greatly appreciated.

Chris

--
Christopher Blair, Ph.D. Candidate
Department of Ecology and Evolutionary Biology
University of Toronto and
Department of Natural History
Royal Ontario Museum
100 Queen's Park
Toronto, ON M5S 2C6
Canada
(416) 586-8094 (office)
http://individual.utoronto.ca/chrisblair/index.html

alexei

unread,
Mar 5, 2012, 4:25:52 PM3/5/12
to beast-users
I personally don't trust any relaxed clock analysis that has
ucld.stdev >> 1. Having a standard deviation (in log space!) of 3.65
represents ridiculously large amounts of branch rate heterogeneity and
if that were really the case it would be incredibly hard to sample
from the posterior distribution due to the strong correlations between
the divergence times and the highly variable branch rates. What prior
do you have on ucld.stdev? It would also be pointless to try to
estimate divergence times if the rates really varied that much. Have
you run the relaxed clock model multiple times to make sure you
converge on the same divergence times and topology each time? If you
are looking mostly at intraspecific data then I wouldn't recommend an
uncorrelated relaxed clock. Perhaps a random local clock might fit
better, since you would expect most of the diversity within a species
to be generated by the same underlying evolutionary rate.

Alexei

Christopher Blair

unread,
Mar 5, 2012, 4:45:19 PM3/5/12
to beast...@googlegroups.com
Hi Alexei,

Thanks a bunch for your email. The prior on ucld.stdev is exponential with a mean and initial value of 0.3333333. I have only run a relaxed clock model a few times as the analysis takes over 100,000,000 generations to get high ESS values for most parameters (except ucld.mean and ucld.stdev which never increase). The tree and divergence times are similar, but I also am having difficulty trusting the results due to the poor mixing and low ESS values. Also, the divergence estimates from the relaxed clock are MUCH older than those based on a strict clock (a factor of 10 mya), the latter of which ran quite nicely with my data. However, would a strict clock be appropriate with this degree of rate heterogeneity?

The data consist of 160 mtDNA sequences encompassing 12 species in total (3 species have an average of about 40 individuals each). I have also specified two calibration priors.

Yes I have considered trying the random local clock model. However, I am running these analyses on my home desktop computer (which takes forever). Is BEAST 1.7 currently available via a public cluster?

Thanks!

Chris

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To post to this group, send email to beast...@googlegroups.com.
To unsubscribe from this group, send email to beast-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.

alexei

unread,
Mar 5, 2012, 6:43:03 PM3/5/12
to beast-users
I guess what I was trying to say is that when there is not much
genetic diversity (typical of intraspecies datasets - where you have
lots of branches and little in the way of sequence changes to define
each branch) then I am not convinced that the inference of a high
ucld.stdev is really indicative of branch rate heterogeneity. Perhaps
this apparent branch rate heterogeneity is caused by
overparameterization or model misspecification. Have you used FigTree
to look at which branches are causing the signal for high branch-rate
heterogeneity? The problem could also be caused be the outgroup
species. Have you tried running the analysis with a fixed mean rate,
and excluding the outgroup taxa? Often outgroup species are chosen in
ways that inadvertently bias the analysis towards greater branch rate
heterogeneity.

Cheers
Alexei

Christopher Blair

unread,
Mar 5, 2012, 7:25:52 PM3/5/12
to beast...@googlegroups.com
Interestingly genetic differentiation between species is substantial (about 30%) so I could potentially trust this much branch rate heterogeneity. Yes I have considered removing the outgroup species I used for fossil calibrations and simply place a prior on mutation rate from the literature. However, with that much sequence divergence is it safe to conclude that a strict clock is probably inappropriate?

Chris

Jamie Oaks

unread,
Mar 5, 2012, 8:08:47 PM3/5/12
to beast...@googlegroups.com
Hi Chris,

Are you using a birth-death or Yule process prior for the tree?  These priors assume the terminal nodes are species and the internal nodes represent speciation events.  I have found using these priors in combination with a relaxed clock when there is intra-species sampling can lead to problems.  Basically, the variance in the relaxed clock goes through the roof to allow dramatic changes in rate to make the tree more Yule-like.

You say you have 12 highly divergent species, some with ~40 sampled individuals.  Under this sampling scheme, I'm guessing your tree is very un-Yule-like, and thus has a very low prior probability under a Yule or BD prior.  The high variance in the relaxed clock allows BEAST to sample trees that have much higher prior probability.

Given you have 12 divergent species, a coalescent prior is not appropriate either (this assumes all samples are from the same population).  The best model in BEAST for your sampling scheme is the multi-species coalescent model (*BEAST) which allows you to appropriately put a Yule or BD prior on the species tree.  However, with only one locus, this is unlikely to work well.

However, you do have a couple of options.  First, you can reduce your dataset to one representative per species.  The second, and more "hacky", option is to put a strong prior on the standard deviation of the relaxed clock, restricting it to biologically reasonable values.  In other words, a relaxed, but not too relaxed, clock model.

Best of luck!

Jamie

~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~
Jamie Oaks
Biodiversity Institute
Department of Ecology & Evolutionary Biology
University of Kansas
Dyche Hall, 1345 Jayhawk Blvd
Lawrence, KS 66045-7561

Office Phone:  785-864-3439
Office Fax:  785-864-5335
E-mail:  joa...@ku.edu
~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~

alexei

unread,
Mar 5, 2012, 8:17:03 PM3/5/12
to beast-users
Hey Jamie,

Great suggestion re:*BEAST. Yes, the interaction between the tree
prior and the relaxed clock rates can be very problematic, especially
near to the root. I concur with your assessment. As far as running
*BEAST on one gene is concerned, there is no problem with that -- the
estimated species tree will just be appropriately uncertain. But as
far as I can tell Christopher is not necessarily interested in the
species tree so its just a nuisance parameter that gets averaged out
by the MCMC.

Cheers
Alexei


On Mar 6, 2:08 pm, Jamie Oaks <joa...@gmail.com> wrote:
> Hi Chris,
>
> Are you using a birth-death or Yule process prior for the tree?  These priors assume the terminal nodes are species and the internal nodes represent speciation events.  I have found using these priors in combination with a relaxed clock when there is intra-species sampling can lead to problems.  Basically, the variance in the relaxed clock goes through the roof to allow dramatic changes in rate to make the tree more Yule-like.
>
> You say you have 12 highly divergent species, some with ~40 sampled individuals.  Under this sampling scheme, I'm guessing your tree is very un-Yule-like, and thus has a very low prior probability under a Yule or BD prior.  The high variance in the relaxed clock allows BEAST to sample trees that have much higher prior probability.
>
> Given you have 12 divergent species, a coalescent prior is not appropriate either (this assumes all samples are from the same population).  The best model in BEAST for your sampling scheme is the multi-species coalescent model (*BEAST) which allows you to appropriately put a Yule or BD prior on the species tree.  However, with only one locus, this is unlikely to work well.
>
> However, you do have a couple of options.  First, you can reduce your dataset to one representative per species.  The second, and more "hacky", option is to put a strong prior on the standard deviation of the relaxed clock, restricting it to biologically reasonable values.  In other words, a relaxed, but not too relaxed, clock model.
>
> Best of luck!
>
> Jamie
>
> ~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~
> Jamie Oaks
> Biodiversity Institute
> Department of Ecology & Evolutionary Biology
> University of Kansas
> Dyche Hall, 1345 Jayhawk Blvd
> Lawrence, KS 66045-7561
>
> Office Phone:  785-864-3439
> Office Fax:  785-864-5335
> E-mail:  joa...@ku.edu
> ~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~
>
> On Mar 5, 2012, at 6:25 PM, Christopher Blair wrote:
>
>
>
>
>
>
>
> > Interestingly genetic differentiation between species is substantial (about 30%) so I could potentially trust this much branch rate heterogeneity. Yes I have considered removing the outgroup species I used for fossil calibrations and simply place a prior on mutation rate from the literature. However, with that much sequence divergence is it safe to conclude that a strict clock is probably inappropriate?
>
> > Chris
>
> > For more options, visit this group athttp://groups.google.com/group/beast-users?hl=en.

Christopher Blair

unread,
Mar 5, 2012, 8:32:41 PM3/5/12
to beast...@googlegroups.com
Hi Jamie,

Thanks for your email. Yes I have also been considering potential issues with the choice of tree prior. My main objective with this analysis is to date the time of origin for major mtDNA lineages. I do have phased nDNA data for two introns. However, I do not have nDNA data for the species I am using for fossil calibration and this is why I have been focusing on obtaining divergence dated for the mtDNA alone. I have not implemented *BEAST before, but from my knowledge it returns a single terminal per species or population and thus I would not be able to determine when lineages evolved (only when lineages diverged from one-another).

Chris

alexei

unread,
Mar 5, 2012, 9:40:06 PM3/5/12
to beast-users
*BEAST estimates both the species tree (one tip per species) and the
individual gene trees (one tip per chromosome). You can look at
whichever of them you care about. If you care about the gene tree you
can ignore the species tree and regard it as just part of the prior on
the gene tree.
> ...
>
> read more »

C.-j. Mei

unread,
Mar 5, 2012, 10:27:52 PM3/5/12
to beast...@googlegroups.com
Can any body help me to have a look of this continuous phylogeography analysis input xml? I did all the modification as the tutorials descripted. But it still doesn't work.
 
Thank you in advance.
 
C.-J. Mei
ASU

Job_Input.xml

Christopher Blair

unread,
Mar 5, 2012, 11:33:49 PM3/5/12
to beast...@googlegroups.com
Thanks for all your help Alexei. I will give *BEAST a try.

Chris

Alexei Drummond

unread,
Mar 5, 2012, 5:30:29 PM3/5/12
to beast...@googlegroups.com
I guess what I was trying to say is that when there is not much genetic diversity (typical of intraspecies datasets - where you have lots of branches and little in the way of sequence changes to define each branch) then I am not convinced that the inference of a high ucld.stdev is really indicative of branch rate heterogeneity. Perhaps this apparent branch rate heterogeneity is caused by overparameterization or model misspecification. Have you used FigTree to look at which branches are causing the signal for high branch-rate heterogeneity? The problem could also be caused by the outgroup species. have you tried running the analysis with a fixed mean rate, and excluding the outgroup taxa? Often outgroup species are chosen in ways that inadvertently bias the analysis towards greater branch rate heterogeneity.

Cheers
Alexei

C.-j. Mei

unread,
Mar 6, 2012, 9:48:18 PM3/6/12
to beast...@googlegroups.com
Hi, Gurus,
 
   When I try to run BEAST with Beagle options, I've got this error:
 
Failed to load BEAGLE library: no hmsbeagle-jni in java.library.path
I tried to google it but failed to get a solution. Any one can point me to a correct dirrection? It's on a Windows Server 2008 R2 platform. It was all working untill a virus crashing. I re-installed Java and Beagle. And it never worked again.
 
Help.
 
C.-J.
On Mon, Mar 5, 2012 at 9:40 PM, alexei <alexei....@gmail.com> wrote:

Santiago Sánchez

unread,
Mar 6, 2012, 11:25:58 PM3/6/12
to beast...@googlegroups.com
Hi C.-J.,

I had success on a linux server by specifying the path where the beagle library is installed, for example:

java -Xms100m -Xmx2048m -Djava.library.path=$PATH -jar lib/beast.jar -beagle <FILE>

In my case the $PATH was /usr/lib/lib

Try locating the hmsbeagle-jni file with a search function like LOCATE in linux. And specify that path like above.

Good luck,
Santiago

C.-j. Mei

unread,
Mar 6, 2012, 11:30:36 PM3/6/12
to beast...@googlegroups.com
Well, there is no such a file "hmsbeagle-jni" in the system, at least in Windows system. Many people tried and failed to find.
 
Thank you for your help.
 
C.-J.

Santiago Sánchez

unread,
Mar 6, 2012, 11:34:36 PM3/6/12
to beast...@googlegroups.com
If there is no hmsbeagle-jni file in the system, I guess that beagle wasn't installed properly.

Cheers,
Reply all
Reply to author
Forward
0 new messages