exabayes chain convergence

mht

unread,

Feb 3, 2015, 5:08:00 AM2/3/15

to exab...@googlegroups.com

Hi,

I've been running ExaBayes on a partitioned dataset of 30 taxa with around 800,000 positions.

At 25,000 iterations, the average sdsf was at 15.62% and the max at 50.00%. These numbers remained constant even after 45,000 iterations have been run.

Is this behavior expected (perhaps due to lack of iterations, etc)? Even so, I would still expect the asdsf value to fluctuate.

Is there something wrong with the analysis?

My command is:

./exabayes -f matrix.phy -q partitions.model -n myRun -s 8285 -c config.nex -R 2 -m PROT -T 4 -S

Thanks,

mh

Andre J. Aberer

unread,

Feb 4, 2015, 4:25:36 AM2/4/15

to exab...@googlegroups.com

Hi mh,

that far, I would not suspect yet that anything is wrong with the
analysis.

45,000 generations is simply far too few generations, probably your
chains have not burned in yet (are the trace log-likelihood values still
decreasing?).

If the asdsf does not move forward between 1,000,000 and 2,000,000
generations, you may consider if going to 10,000,000 is worth it or
whether something needs to be changed.

Let me add that it is probably a good idea to use as many CPU cores as
possible with this dataset, it is quite big...

--
Best regards,
Andre

--
Best regards,
Andre Aberer

PreDoc (Bioinformatics) in the Exelixis Lab, Heidelberg Institute for Theoretical Studies

mht

unread,

Feb 4, 2015, 5:59:05 AM2/4/15

to exab...@googlegroups.com

Hi Andre,

The reason I used -T 4 the last time was because my memory capacity did not allow for more (I am using a workstation with 32 processors and 132Gb ram).

Instead, I've now restarted the run with mpirun -np 20 without setting -R and -T parameters.

Do you think this is reasonable or is there a better way to parallelize my run?

Also, I checked the trace log likelihood (lnl) values and it looks like the value has reached a plateau for each individual chain. But one of the chains (R2) plateaus at a different lnl value (see figure attached).

When I do a separate sdsf command with only R0, R1 and R3 chains, both my average and max sdsf values are at 0%.

Is it advisable for me to discard this chain and run more chains?

Much thanks for your help.

mh

exabayes_trace.png

Andre J. Aberer

unread,

Feb 4, 2015, 6:29:33 AM2/4/15

to exab...@googlegroups.com

Hi Mh,

I agree,e in that case, I would perform the run using -np 32 only. You
don't need -R for such a large dataset, but it will increase the total
memory requirements. Still, many more generations are needed and I fear
it will take quite some time on your machine. (but you can always
restart runs using the checkpointing)

I would advise to conduct the runs separately. Start four different runs
./exabayes -n R1 -s $RANDOM
./exabayes -n R2 -s $RANDOM
./exabayes -n R3 -s $RANDOM
./exabayes -n R4 -s $RANDOM

Then execute each of the run (using -np 32) for a day and check for
convergence using ./sdsf ExaBayes_topologies* and resume a run for
another day.

In case a chain got stuck (like R2 here), you can consider discarding
it, if it does not recover in a reasonable amount of time. That is a way
to save yourself some runtime. If several chains got stuck, this may be
a reason to start to worry.

I would expect that if you compute the asdsf manually on all your
samples except those from ExaBayes_topologies.R2, then you should obtain
a more optimistic asdsf and maybe even a decreasing tendency (but maybe
it's still too early for that). R0,R1 and R3 look fine to me.

mht

unread,

Feb 4, 2015, 11:23:25 AM2/4/15

to exab...@googlegroups.com

Thanks for your suggestion on parallelization, Andre. I'll try that.

Executing ./sdsf on topologies of R0, R1 and R3 resulted in asdsf and msdsf values of 0% even only after 45,000 iterations with 25% burn in.

This means the asdsf is lower than the 5% or 1% threshold. Does this indicate that the chains have converged with so few iterations? How do I know how many more iterations to run?

Alexandros Stamatakis

unread,

Feb 4, 2015, 4:08:15 PM2/4/15

to exab...@googlegroups.com

but what about chain R2?

alexis

> --
> You received this message because you are subscribed to the Google
> Groups "ExaBayes" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to exabayes+u...@googlegroups.com
> <mailto:exabayes+u...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Andre J. Aberer

unread,

Feb 4, 2015, 5:40:27 PM2/4/15

to exab...@googlegroups.com

Hi Mh,

are you certain? This most likely means that there is only a single tree
that dominates all others in terms of posterior probability. I would not
be too surprised, if this is the case with 8e5 aa characters, the
whole-genome analysis (100,000,000 bp) that we performed on a simulated
alignment converged against the correct tree after ~20,000 generations.

If nothing is wrong with the analysis (e.g., you used three times the
same seed), you are close to having a publishable result.

I would recommend that you run several more independent analysis (a
total of 8 is a good start) and discard those that got stuck (e.g., if
their posterior probability is 1000 log-units worse). I would expect
reviewers to request something like 1,000,000 generations per chain.

Essentially, you must make sure, that there is no alternative tree with
comparable or better posterior probability. Maybe 1-2 analysis using
Metropolis-coupling (e.g., numCoupledChains 4) may help to assure this,
but it makes the analysis accordingly more runtime- and
memory-intensive.

mht

unread,

Feb 4, 2015, 9:05:23 PM2/4/15

to exab...@googlegroups.com

Alexis,

I still plan to rerun the analysis with more chains based on Andre's recommendation. I guess my question was more on whether or not 45,000 generations is enough to call it a confident tree even if the sdsf values tell me the analysis has converged. This is because I see most publications have number of generations in the millions.

Andre,

I will try to get more workstations or resources to try out your recommendations and see if I get the same results. I do have access to a cluster, but I believe it is not possible to submit exabayes jobs since exabayes can't be run in background? The cluster uses qsub for job submission.

Additionally, for my own curiosity, any explanation for the observation of the stuck R2 chain?

Thanks again for your advice and very quick replies!

Alexandros Stamatakis

unread,

Feb 5, 2015, 2:00:26 AM2/5/15

to exab...@googlegroups.com

> I still plan to rerun the analysis with more chains based on Andre's
> recommendation. I guess my question was more on whether or not 45,000
> generations is enough to call it a confident tree even if the sdsf
> values tell me the analysis has converged. This is because I see most
> publications have number of generations in the millions.

I'd run substantially more gens, in particular because there is this
apparently stuck chain there as well, even if sdsf values seem to be
stable, reviewers will probably not like this, what kind of starting
tree are you using?

alexis

Andre J. Aberer

unread,

Feb 5, 2015, 2:43:42 AM2/5/15

to exab...@googlegroups.com

Hi Mh,

I fear, this is just something that can happen.

I have a gut feeling, that it might be a tad less likely to happen with
the proposal configuration prior to exabayes 1.4. The new branch length
proposal yields branch length updates close to the optimum, which maybe
steer some chains into a sub-optimal region during burn-in (since we are
stuck with branch lengths that are optimal for sub-optimal trees).

In your scenario this older configuration is also useful to save a bit
of runtime, since you will not need the more advanced proposals (i.e.,
the posterior-guided SPR), when there is only a single tree in the
credible set.

You can add this to your configuration file:

begin proposals ;
[ topology ]
eTBR 5
eSPR 5
stNNI 5
parsimonySPR 5
likeSpr 0

[ branch lengths ]
branchMulti 15
treeLengthMult 1
nodeSlider 5
blDistGamma 0
biasBLMult 0
end;

mht

unread,

Feb 8, 2015, 1:05:22 PM2/8/15

to exab...@googlegroups.com

Got it, I will work on getting more generations with my data.

Thanks very much Alexis and Andre for your help and advice!

Reply all

Reply to author

Forward