mcmctree for large alignment

477 views
Skip to first unread message

Fang Li

unread,
Apr 26, 2018, 4:52:57 AM4/26/18
to PAML discussion group

Hi,

I am using mcmctree to estimate the divergence time for alignments length about 32M and contain 47 species. I used the approximate method but have been stuck in the first step, using baseml to calculate the branch lengths. It runs more than two days and stopped by getting the error report only said "Segmentation fault". And the outfile ended with a Distances matrix a tree.  I have noticed that the memory is about 3GB before it crashed. So does anyone have an idea of how to use mcmctree to estimate the time for the large data? And do there exists any parameters to set the cpu or something of the kind to speed up the process?

Best,
Fang

Ziheng

unread,
Jul 31, 2018, 2:03:59 PM7/31/18
to PAML discussion group
perhaps reducing the data size is a practical solution.  You can sample 1M sites, say, for the same partition.
The number of partitions is important, but the number of sites (if you are analyzing all sites in one partition) is not important to the posterior time estimates.  when you reach 1kb or 10kb, having more sites won't really help.
here are some papers characterizing the uncertainties in the posterior time estimates:

dos Reis M, Yang Z. 2013. The unbearable uncertainty of Bayesian divergence time estimation. J Syst Evol 51:30-43.

Zhu T, dos Reis M, Yang Z. 2015. Characterization of the uncertainty of divergence time estimation under relaxed molecular clock models using multiple loci. Syst Biol 64:267-280.

if you want to struggle with the large number of sites, i used raxml to get branch lengths, and then use baseml to calculate the hessian matrix.  you can use in.baseml to specify parameter values or initial values for baseml.
ziheng

Reply all
Reply to author
Forward
0 new messages