getting convergence/ESS >200

203 views
Skip to first unread message

EvoClive

unread,
Jan 30, 2012, 6:49:27 AM1/30/12
to beast-users
Hi

I've been trying for a few months now to generate an ultrametric tree
in BEAST. I'm not interested in dating per se, I just need a high
quality ultrametric for other analyses (assessment of no. of species).
I have no calibration points. I have around 150 taxa which I believe
to be from 5 sister species. My data set is around 400bp of mtDNA
(alternatively, I have around 40 individuals sequenced for around
800bp of mtDNA that I could use - would this be better?). Despite
running 4 combined analyses of 120mil gen sampled every 6000, and then
iteratively altering my .xml file according to BEAST log suggestions
(which generally say to increase a windowsize variable by 2), my
outputs never reach acceptable ESS levels for 'posterior' and 'prior'
priors (<<200) - I've run 5 or 6 analyses now.

Any suggestions? I've used both Yule and Coalescent tree priors as
well as GTR+I+G, partitioned codons & relaxed exponential clock
(likelihood analysis suggests my data isn't clocklike) .

Would it help if I loaded a rather simplistically created ultrametric
tree generated in R as a prior?

I'm also puzzled by some of the BEASTlog outputs e.g.
"up:down:nodeHeights(treeModel) [Tuning] 0.697
slightly high Try setting scaleFactor to about 0.6978" - I cannot see
anywhere on my original .xml files were previous settings were set at
0.697, and setting it to 0.6978 afterwards seems a rather trivial
adjustment.

Many thanks

C

RC

unread,
Feb 3, 2012, 9:44:16 PM2/3/12
to beast-users
Dear Clive,

I think this might be a case of your data being "spread too thinly",
given the parameterisations you have specified. Put simply, lots of
parameters require lots of information to get reliable estimates.

I suggest that given the relatively small information content of your
short sequences, starting with a more simple analysis: a HKY+G model
and a strict clock, with no partitioning or invariant sites
parameters.

I would imagine that will immediately improve the ESS. Then, you will
need to work out whether increasing the parameterisation is justified
to remove any systematic biases from the "simple" approach.

Generally, providing a ML starting tree generated from the same data
is considered "data dredging", and not therefore not ideal. Two
separate, but combined runs from random starting trees would be
better.

Hope this helps.

EvoClive

unread,
Feb 8, 2012, 10:15:30 AM2/8/12
to beast-users
Many thanks will give it a whirl

C
Reply all
Reply to author
Forward
0 new messages