Log Combiner Help

908 views
Skip to first unread message

Emily Waddell

unread,
Jul 30, 2013, 10:03:03 AM7/30/13
to beast...@googlegroups.com
Hi,

I am really struggling with BEAST at the moment as this is the first time I am using it and my uni deadline is very soon! 

My data set is quite large (225 individuals, 140 species, 4 genes, 6 partitions) so I am running it for 100,000,000 and sampling every 1,000 but the ESS scores are still really low ~30 (cf with the REALLY low scores when I ran it for only 10,000,000 ~8) so I want to combine four runs in Log combiner.

I have two of these runs already (two more are running which I am sampling every 10,000 instead) but I am unsure what frequency to sample at to get 10,000 and what to put as the burnin. I have tried a combination of different figures all of which have given me much lower ESS scores than the original log output, but from what I have been told (by people at my uni) resampling should have increased these scores.

Any advice would be very much appreciated!

Thanks,
Emily


cmh

unread,
Jul 30, 2013, 5:01:57 PM7/30/13
to beast...@googlegroups.com
Hi Emily,

I have had the same issue in the past, so I think I can give you a solution...not for your ESS values, but at least to get the resulting 10,000 generations that you will need for tree annotator.

First, you do not have to have just 10,000 generations to see the combined ESS values in Tracer. If you load your .log files from each of your runs into Tracer, you will have output from each one of the four and then also a combined output...it may take a few minutes for the combined output to generate. Click on the combined output to see what your ESS values are for the four combined runs.

The 10,000 generations becomes important for memory issues when using Tree Annotator. If your number of generations (or really number of trees) greatly exceeds 10,000, Tree Annotator crashes with an out of memory error (in my experience). You must use log combiner to remove the burnin, resample the trees from each of your runs and combine them (in that order) so that your resulting tree file has around 10,000 trees.

So,if you have 4 runs with 100,000,000 generations, sampling every 1,000 generations:

A 10 % burnin will be 10,000 (take #generations/samplinginterval: 100,000,000/1000 = 100,000; then 100,000 * 10% = 10,000)

After the 10% burning is removed, you are left with 90000000 trees for each run (although only 90000 were initially sampled). You must get that number down to 2500 trees so that the runs combine to 10000 trees. The resampling number, with four runs, is 90000000/2500 = 36000.

I think this should work for you...and others may have a better strategy. Of course, you would use the same burnin and resampling numbers if you wanted to combine the log files and put that into Tracer to see the ESSs (as you originally stated), although I am not sure that this would be of any benefit.

Good luck!

Crystal
Reply all
Reply to author
Forward
0 new messages