MCC tree from treeannotator too large to be viewed in FigTree

390 views
Skip to first unread message

Fredrick Nindo

unread,
Apr 9, 2014, 5:40:02 AM4/9/14
to beast...@googlegroups.com
Dear beast_users,
I have recently been experiencing difficulty in post-beast mcmc analyses for both divergence time estimations and discrete diffusion process analyses in that the mcc tree generated from 5 combined and resampled runs of a beast analysis is too large in the order of 5GB for default 10% burnin and this size of tree turns out to be difficult to be viewed in FigTree. See below the script i use to generate the mcc tree, is it just the size of my dataset? I normally all node support values ie everything, as i want every node not just those at certain threshold values. my datasets have been in the order of 400-800 taxa and 500-2200 bp long. As for the models, I have been using hky or GTR gamma and constant size coalescent and bsp for nucleotide substitution and demographic models and both assymetrical and symmetrical for discrete phylogeography analyses. Could attribute these to size of the datasets or the models am using ie the more parameters in the model, the larger the output file, but i thought mcc is a single tree containing all this information?

--
Fredrick Nindo
PhD Student
UCT Computational Biology Group
Department of Clinical Laboratory Sciences
Institute of Infectious Disease and Molecular Medicine
University of Cape Town Health Sciences Campus
Anzio Rd
Observatory
7925
South Africa

Tel: +27 21 406 6058/6176
Fax: +27 21 406 6068
skype: fredrick.nindo

Alexei Drummond

unread,
Apr 9, 2014, 5:47:05 AM4/9/14
to beast...@googlegroups.com
Dear Fredrick,

Generally there is no need to use more than say 10,000 trees to produce an MCC summary (especially as the ESS is usually much lower than that anyway). So perhaps you are logging trees to file too frequently?

Cheers
Alexei

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Fredrick Nindo

unread,
Apr 9, 2014, 6:28:04 AM4/9/14
to beast...@googlegroups.com
Thanks Alexei for your observation but the trees were not more than 10,00. pls see the  logcombiner and treeannotator steps and when i checked the output file ie mcc tree it was in the order of 5gb!

1. Logcombiner steps:

fnindo@login01:~/scratch5/beast_runs/LogCombiner/run_2 $ cat logcombiner.sh
#PBS -N logcombiner
#PBS -l select=1:ncpus=1:mem=24000mb:jobtype=dell,place=free:group=nodetype
#PBS -l walltime=336:00:00
#PBS -q workq
#PBS -m abe
#PBS -o /export/home/fnindo/scratch5/beast_runs/LogCombiner/run_2/std.out
#PBS -e /export/home/fnindo/scratch5/beast_runs/LogCombiner/run_2/std.err
#PBS -M fni...@gmail.com
cd /export/home/fnindo/scratch5/beast_runs/LogCombiner/run_2
source /etc/profile.d/modules.sh
module add beast
logcombiner -trees -resample 50000  rsv_hky_ucln_strict_1.trees rsv_hky_ucln_strict_2.trees rsv_hky_ucln_strict_3.trees rsv_hky_ucln_strict_4.trees rsv_hky_ucln_strict_5.trees rsv_hky_strict_resampled_combined.trees

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

fnindo@login01:~/scratch5/beast_runs/LogCombiner/run_2 $ cat std.out

               LogCombiner v1.7.5, 2002-2013
                    MCMC Output Combiner
                             by
           Andrew Rambaut and Alexei J. Drummond

             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.ra...@ed.ac.uk

               Department of Computer Science
                   University of Auckland
                  ale...@cs.auckland.ac.nz


Creating combined tree file: 'rsv_hky_strict_resampled_combined.trees


Combining file: 'rsv_hky_ucln_strict_1.trees' without removing burnin, resampling with frequency: 50000
Combining file: 'rsv_hky_ucln_strict_2.trees' without removing burnin, resampling with frequency: 50000
Combining file: 'rsv_hky_ucln_strict_3.trees' without removing burnin, resampling with frequency: 50000
Combining file: 'rsv_hky_ucln_strict_4.trees' without removing burnin, resampling with frequency: 50000
Combining file: 'rsv_hky_ucln_strict_5.trees' without removing burnin, resampling with frequency: 50000
Finished.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

2. TreeAnnotator steps:

fnindo@login01:~/scratch5/beast_runs/TreeAnnotator/run_3 $ cat treeannotator.sh
#PBS -N treeannotator2
#PBS -l select=1:ncpus=1:mem=24000mb:jobtype=dell,place=free:group=nodetype
#PBS -l walltime=336:00:00
#PBS -q workq
#PBS -m abe
#PBS -o /export/home/fnindo/scratch5/beast_runs/TreeAnnotator/run_3/std.out
#PBS -e /export/home/fnindo/scratch5/beast_runs/TreeAnnotator/run_3/std.err
#PBS -M fni...@gmail.com
cd /export/home/fnindo/scratch5/beast_runs/TreeAnnotator/run_3
source /etc/profile.d/modules.sh
module add beast
treeannotator -burnin 1000 -heights median rsv_hky_strict_resampled_combined.trees rsv_hky_strict_resampled_combined.tre

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

fnindo@login01:~/scratch5/beast_runs/TreeAnnotator/run_3 $ cat std.err
JRI not available. Using Java bivariate attributes

              TreeAnnotator v1.7.5, 2002-2013
                    MCMC Output analysis
                             by
           Andrew Rambaut and Alexei J. Drummond

             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.ra...@ed.ac.uk

               Department of Computer Science
                   University of Auckland
                  ale...@cs.auckland.ac.nz


Reading trees (bar assumes 10,000 trees)...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
***************************************************

Total trees read: 8628
Ignoring first 1000 trees.
Total unique clades: 455439

Finding maximum credibility tree...
Analyzing 7628 trees...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
************************************************************

Highest Log Clade Credibility: -1333.4021806355477
Collecting node information...
0              25             50             75            100
|--------------|--------------|--------------|--------------|
************************************************************

Annotating target tree...
Writing annotated tree....

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

3. Checking the size of the output MCC tree:
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

fnindo@login01:~/scratch5/beast_runs/TreeAnnotator/run_3 $ du rsv_hky_strict_resampled_combined.tre
5956652    rsv_hky_strict_resampled_combined.tre
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

So what could have happened here?

Cheers,

Fredrick

Leendert Cloete

unread,
Apr 9, 2014, 10:47:25 AM4/9/14
to beast...@googlegroups.com
Hi Fredrick,

When you set up your analysis for the nucleotide partition, did you set up your xml to reconstruct the nucleotide ancestral states at all ancestors? These ancestral states will be annotated directly into your logged trees and for 400-800 sequences sampled quite frequently, will definitely result in very big output files. So even if you resample and annotate, it will still output a tree with a nucleotide sequence of ~500-2200 bp at each node.

If this is the case, my suggestion would be to rerun your analysis without reconstructing all the ancestral nodes. As this might take too long, I could suggest using a simple python script and some regular expression to parse through your tree log file and remove all the ancestral sequences so that you are left with only the estimates and discrete parameters. (Be wary of this method though, as you might remove or alter some of the estimates in the log file by accident).

Alternatively, you might also try launching Figtree from your terminal and allocating more memory to it (I've never tried this so not sure if this will work, but you could give it a go).

Hope this helps.

Kind regards
Leendert Cloete

Fredrick Nindo

unread,
Apr 9, 2014, 12:15:37 PM4/9/14
to beast...@googlegroups.com
Thanks Leendert.
I had not thought of the effect of selecting 'reconstruct tree at all ancestors’ would have on the size of the annotated mcc tree. As of now seems to be the reason for extra-ordinarily large mcc tree size.


Cheers,

Fredrick  
Reply all
Reply to author
Forward
0 new messages