Can I combine the tree files from the MCMC analyses of different datasets using Logcombine?

655 views
Skip to first unread message

Claire Zhou

unread,
Mar 31, 2020, 12:10:54 AM3/31/20
to beast-users
Hi,

I hava a question about Logcombiner and hope you can help me to deal with it.

In tutorial, it's said that:
a: "Logcombiner is a program to combine log and tree files from multiple runs of BEAST"
b: " Warning: The files you have selected must be from independent runs of BEAST from the same XML file. If the log files differ an error will occur stating that the number of columns in the first file does not match that of the second file."
c: "Important: It does not make sense to combine log files from MCMC analyses of different models or different data sets."

I have performed the divergenece dating analysis of 10 different molecular data sets. Each analysis was run for the same generation. Each dataset have the same species and the same length of the alignment but different sequences. All runs have converged.
Can I combine the tree files from these 10 independent runs using logcombiner? Dose it make sense?

Thanks in advance!

Best,
Claire 

HS

unread,
Mar 31, 2020, 6:39:45 AM3/31/20
to beast-users
Dear Claire,

I don't go deep on the issue, but technically, I believe, it is impossible. I assume by different sequences you mean different taxa. And to combine the trees with LogCombiner the taxa must be exactly the same in the analyses. If not, LogCombiner most probably would come up with an error.

Best,
Hovhannes

Claire Zhou

unread,
Mar 31, 2020, 10:25:58 PM3/31/20
to beast-users
Hi Hovhannes,

Thanks for your reply! 
Sorry for the confusion. 
"Different seuqences" means "different genes (or sequence regions) from the genome alignment of the same taxa". 

For example, 
input files:
test1.xml contained 50 taxa and two genes: geneA geneB; length of alignment 100kbp;
test2.xml contained the same taxa, the same length of alignment but three different genes: gene C, gene D, gene E;
test3.xml contained the same taxa, the same length of alignment but two different sequence regions: region j; region i;
etc.
output files (converged):
          test1.trees; test1.logs
          test2.trees; test2.logs
          test3.trees; test3.logs
          etc.
If I combine all testN.trees into one tree file using LogCombiner, it could run successfully without any error information. 

It would take me too long time (No GPU) if I combine all sequences into one large .xml file, so I'd like to know if I could combine all converged results into one final time tree, and if not, why? Or if you have any more suggestions, please let me know.

Thanks!

Best wishes,
Claire

在 2020年3月31日星期二 UTC+8下午6:39:45,HS写道:

langgeng agung

unread,
Mar 31, 2020, 10:42:49 PM3/31/20
to beast...@googlegroups.com
Dear Claire,

As the log combiner tutorial mentioned that it can combine log and tree files from independent run BEAST. This approach mainly due to a need of high MCMC number and faster result. So, we can run in the different computer and combine it later.

As for different gene alignment but would like to have a single tree output, I think it is better to make a single xml file but with different partition. I am not sure how you select your gene alignment, i.e alignment 1 contain gene A, gene B, etc. Each gene alignment should have a different substitution model. You can set different subs model in each partition. Then combine in the same tree model. If it is too heavy for your computer to run. Then the log combiner come to solve. Run it independently in different PC, or perhaps send to institution which have a super computer then combine it into one single log file for further analysis

Hope it may help,

Regards

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/909904c3-a71b-45c3-b28a-1a355a7be377%40googlegroups.com.

HS

unread,
Apr 1, 2020, 11:03:26 AM4/1/20
to beast-users
Dear Claire,

Thank you for the details!

I think in principle you can do that. The resulting tree will be an average tree among all the gene trees. Here I don't have much expertise. I believe specialists will tell more on this.

I would say only that most probably I will not go in that way. I imagine that if a gene, say geneA, yields a tree with an odd node that is supported with HPD of 1, and the other gene trees support other typologies with a smaller HPD, then geneA tree will bias the average tree a lot.

Best,
Hovhannes 

Juan Carlos Zamora Señoret

unread,
Apr 1, 2020, 11:11:44 AM4/1/20
to beast...@googlegroups.com
If you want a single tree from a set of alignments in which you allow each one to have an independent tree, then I think you are looking for a species tree that reconciles the topologies of each single-locus (or single-alignment) tree.
Try *BEAST or StarBEAST2.
Definitely not combining all of them in Logcombiner. What would be the meaning of that "merged" tree (what would the topology and branch lengths mean?), in case it is even possible to compute it?

Best regards,

Juan Carlos Zamora

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

Claire Zhou

unread,
Apr 2, 2020, 9:55:16 PM4/2/20
to beast-users
Hi all, Thanks!

Dear Agung, your suggestions do help me to have a faster result! It's a good idea.

Dear Hovhanns, Dear Juan Carlos Zamora, your comments both remind me the existence of different trees. What if all runs have the identity topology and similar divergence time for each node? Is that ok for us to obtain the final time tree with the "mean" divergence time from all runs? Then, the branch lengths might also be the "divergence time"? 

Thanks and best regards,
Claire

在 2020年4月1日星期三 UTC+8下午11:11:44,Juan Carlos Zamora Señoret写道:
To unsubscribe from this group and stop receiving emails from it, send an email to beast...@googlegroups.com.

HS

unread,
Apr 3, 2020, 6:58:21 AM4/3/20
to beast-users
Dear Claire,

I assume that if the topology is identical, which means all the branching patterns and the posteriors are identical, then yes, in my opinion, you can make an average tree and trust the results. Nevertheless, if you get everything already with geneA, why do you then need to add the analysis with geneB?

Branch length is different from the divergence time. Branch length is the time during which the branch does evolve. Divergence time is the time when to taxa or branches started to diverge.

Best,
Hovhannes

Simon Ho

unread,
Apr 3, 2020, 7:52:17 PM4/3/20
to beast-users
Hi Claire,

It sounds as though your data set consists of sequences from 10 different loci for the same set of taxa. There are two major approaches to analysing such a data set: concatenation and the (multispecies) coalescent.
- For the concatenation approach, you should concatenate all of your sequences and analyse them together in BEAST. You might wish to partition the data and assign a separate substitution model to each of the 10 loci. In this approach, BEAST will estimate a single species tree (with a single topology and a single set of divergence times).
- For the multispecies coalescent approach, the 10 loci should be treated as separate sequence alignments (not concatenated) in *BEAST. In this approach, *BEAST will estimate a gene tree for each locus, as well as the underlying species tree.

You can find more information about these two approaches by searching for "coalescence vs concatenation" on the Internet, but a potential starting point is the paper Estimating phylogenetic trees from genome-scale data" by Liang Liu and colleagues (https://www.ncbi.nlm.nih.gov/pubmed/25873435).

Cheers,
Simon

Claire Zhou

unread,
Apr 8, 2020, 4:13:25 AM4/8/20
to beast-users
Hi all,
I'm appretiate for your comments and suggestions that help me a lot.

Best,
Clarie

在 2020年4月3日星期五 UTC+8下午6:58:21,HS写道:
Reply all
Reply to author
Forward
0 new messages