Setting up analysis with prior on clockRate (and not ucldMean)

1,379 views
Skip to first unread message

Diego Caraballo

unread,
Jan 23, 2017, 1:22:42 PM1/23/17
to beast-users
Hi everyone, I am having trouble in setting up the following analysis. I have a mtDNA gene, with two partitions (1st+2nd, and 3rd codon positions) and have estimated the clock rate for a specific group using a set of 8 calibrations and an uncorrelated lognormal clock. Now I want to make a phylogeographic analysis with this group and since I am working at a shallower level of the phylogeny, I have no calibrations available. Hence, I want to use the range matching the 95% HPD of the group of interest's clock rate as a prior uniform distribution, using the mean as starting value and using a relaxed lognormal clock.

The thing is that when I set up in BEAUTi 2 a Relaxed Log Normal clock (unclicking the Automatic set clock rate flag, and checking the "estimate" box), in the Priors tab I get 2 associated priors: ucldMean and ucldStdev. But, since I have prior information on the overall rate I want to reflect that with a prior distribution on the meanRate statistic and leave the ucld.Mean without an informative prior:

1) How can I set up a prior on the meanRate? 

2) Is it correct to use the 95% HPD range to set a uniform prior on this parameter?

Hope you can help me with this issue.

Cheers,

Diego

Remco Bouckaert

unread,
Jan 24, 2017, 4:29:18 PM1/24/17
to beast...@googlegroups.com
Hi Diego,

For the relaxed clock, the ucldMean is the clock rate. If you link the clock between the two partitions, you can put a prior on the ucldMean based on the larger analysis (I would try a distribution matching the posterior distribution of the larger analysis instead of a uniform prior). To ensure the partitions have different rates, ensure the ‘estimate’ box is checked next to the substitution rate for both partitions (in the site model panel). The mean substitution rate will be 1 (averaged over all sites in the partitions), so the mean rate will be the rate of the ucldMean.

If you do no link the clock, you can still link the ucldMean parameters (check menu Mode/Allow parameter linking first, and a link icon appears next to the clock rate parameter, which you can then link to the ucldMean of the other partition) so you can set the prior on it like above.

Cheers,

Remco

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Diego Caraballo

unread,
Jan 25, 2017, 1:23:28 PM1/25/17
to beast-users
Thanks Remco, your response cleared my doubts. I ran an analysis linking clocks and trees between the two partitions, since they belong to the same locus.

I used a uniform prior for ucldMean since I did not realize how to extract the distribution for the rate of a specific node from the posterior. Can you recommend me any source where I can find how to do it?

I ran the analysis for 10E6 generations, using a Bayesian Skyline model, a relaxed LogNormal Clock and using the Geo_Sphere package to infer ancestral locations. To avoid any potential problem of using duplicated geographic coordinates, I applied an R script to add jitter to identical coordinates. 

At the end of the run, most parameters show pretty high ESS values, but the ones affecting the clock model of the geographic partition, as well as the molecular clock parameters and tree likelihoods are especially low.


So I tried using fixed values for the ucldMean and/or using an outgroup. Although ESS values were slightly improved, they are way too low still.

Is there any recommendation I should follow, or check anything that might be impeding convergence? If not, should I give more weight to the concerned operators, or I should better run the chain for a higher number of generations?

I attach the xml corresponding to the traces shown in the picture, in case you would like to give a look to it.


Cheers,

Diego
torquatus_group_no_OG_jittered.xml

Diego Caraballo

unread,
Jan 25, 2017, 1:24:59 PM1/25/17
to beast-users
Sorry, one erratum: I ran the analysis for 100E6 mcmc generations.

Diego Caraballo

unread,
Jan 27, 2017, 8:23:29 AM1/27/17
to beast-users
And these are the results of a 1E9 mcmc generations run. Now it seems that only goeographic ucld clock parameters (and the tree likelihood of one partition) have very los ESS, producing low posterior ESS.
Any hint about it?

Remco Bouckaert

unread,
Jan 29, 2017, 2:54:59 PM1/29/17
to beast...@googlegroups.com
Hi Diego,

It looks like the relaxed clock on the geography is the source of the problems. Things you might want to try if you are using a relaxed clock on the geography is to use fewer categories (by setting the dimension on the categories parameter), and/or use a Gamma distribution with one parameter for the relaxed clock distribution. If you use a log-normal, you can put an upper bound of say 2 on the stddev to keep things in check — the trace suggests it escapes to about 10, which is extremely high.

Hope this helps,

Remco

Diego Caraballo

unread,
Feb 1, 2017, 1:34:15 PM2/1/17
to beast-users
Thank you Remco for the invaluable help! I understand conceptually your recommendations, but it seems that in BEAUti there is not a straightforward way to set some of these. So I tried editing the XML in some cases and would be grateful if you could inspect what I did.

1) Using fewer categories of the relaxed lognormal clock on geography
I changed the default dimension value of 188 to a fewer one in the following line:

<stateNode id="rateCategories.c:Geo" spec="parameter.IntegerParameter" dimension="188">1</stateNode>

But, which would be a realistic dimension value? I assume this will depend on geographic distance and some expectation about diffusion rate heterogeneity.


2) Setting a gamma distribution for the relaxed clock on geography

I modified the following part of the XML:

            <prior id="MeanRatePrior.c:Geo" name="distribution" x="@ucldMean.c:Geo">
                <Uniform id="Uniform.0" name="distr" upper="Infinity"/>
            </prior>

Instead I set:

            <prior id="MeanRatePrior.c:Geo" name="distribution" x="@ucldMean.c:Geo">
                <Gamma id="Gamma.0" name="distr">
                    <parameter id="RealParameter.03" estimate="false" name="alpha">0.5396</parameter>
                    <parameter id="RealParameter.04" estimate="false" name="beta">0.3819</parameter>
                 </Gamma>
            </prior>
If this is the correct way to do it, I am still puzzled about the values the alpha and beta parameters should have, or in other words, what would be a realistic range for the ulcd mean of the geographic clock.

3) Setting an upper bound in the ucldStDev
This was an easy one! :-)

<parameter id="ucldStdev.c:Geo" lower="0.0" name="stateNode" upper="2.0">0.1</parameter>

Hope you could check these settings and give me some advice about choosing parameters' values. In the meantime I am running 3) and 1) with 50 rate categories.

Thanks again,

Diego
Reply all
Reply to author
Forward
0 new messages