Run Biogeobears on multiple trees?

1,099 views
Skip to first unread message

Liming Cai

unread,
Aug 12, 2015, 9:46:38 PM8/12/15
to BioGeoBEARS
Hi Nick,

I am doing a Biogeobears analysis across multiple bootstrap trees. I notice that there is a source code available for such purpose in BioGeoBEARS_on_multiple_trees_v1.R. I prepared all the input files, including one that contains the filenames of the tree files. I scan this file into the newick_fns argument as follows:

newick_fns = scan(“treelist.txt", what="character", sep=NULL)


And trying to run the run_bears_optim_on_multiple_trees function with following line:

res = run_bears_optim_on_multiple_trees(BioGeoBEARS_run_object, newick_fns, model_name="Elatinaceae_DEC", geog_fns, resfns, run_or_collect="run", start_treenum=1, end_treenum=length(newick_fns), runslow=TRUE, plot_params=FALSE) 


As a result, it turns out I can only run the first tree. And when the analysis goes into the second tree, the following error occurred (this is a test run, so there are only three trees):

Could you please help me with that? In addition, is it correct to have each bootstrap tree separately in a file, and list the filename as a string in newick_fns? There’s also a function to summarize the result the plot the parameters onto a master tree. I think I need guide to use those functions as well: what the input argument should be like and how to call the function correctly in a R console.


ps. My input files and R scripts are attached here.


Thank you very much and best regards,

Liming


BioGeoBears_Analysis_1508.zip

Nick Matzke

unread,
Aug 13, 2015, 8:22:02 AM8/13/15
to BioGeoBEARS
Thanks for the question, and files.

OK, so the initial problem here was that geog_fns should have as many filenames as newick_fns. I originally programmed this for a case where OTUs and geography were not identical between runs, because of uncertain fossil placement. But, usually, geography stays the same, so I've added a check.

I also discovered that even when it was looping through successfully, it was using the same master tree (!) rather than actually reading the different input trees (the script was missing the readfiles_BioGeoBEARS_run and section_the_tree commands inside the loop.  So, if anyone else was using this feature, please re-run. I've uploaded the revised file to PhyloWiki.

Script is below, and modified zipfile. Cheers!
Nick


PS: I should say -- this is beta code, so "use at your own risk."  I put up the multiple-trees file semi-accidentally, the "pseudo-Bayesian" approach of running models over a bunch of trees is not a high priority for me.  I know it is popular and commonly requested, but: 

(a) The cost-benefit of the pseudo-Bayesian approach seems dubious to me, at least for typical analyses (the tree is actually pretty confident, non-time-stratified analysis, the pseudo-Bayesian averaged result looks about the same as the standard ML result)

(b) There are some theoretically dubious features about it (see mini-rant at the end of the script, below) -- plus, another one I just thought of -- if you have a collection of bootstrap or posterior trees, when are nodes "equivalent".  E.g., if:

- node 1 in tree 1 has species A and species B above it, and 
- node 1351 in tree 2 has species A and species B above it, 
- are they equivalent if (A,B) is sister to (C,D) in tree 1, and sister to (Z, X) in tree 2?  

That's what we are averaging when we average the pseudo-Bayesian analyses.  It gets even more complex when you think about "corners" at the branch bottoms. I made some fairly arbitrary decisions about this stuff in the script, others might prefer a different way.

(c) I figure anyone who knows how to run BioGeoBEARS can probably figure out a for-loop and customize it themselves for whatever scale of problem they are doing; it's difficult to program for all of the different ways people might want to do it (as there are significant issues of speed, file storage, etc., at least for large analyses).


==================
###########################################################
# Run Biogeobears on multiple trees and store the output 
###########################################################

###########################################################
#Load libraries and additional source code
###########################################################
library(optimx)         
library(FD)       
library(snow)     
library(parallel)
library(BioGeoBEARS)
library(stringr)


calc_loglike_sp = compiler::cmpfun(calc_loglike_sp_prebyte)    # crucial to fix bug in uppass calculations
calc_independent_likelihoods_on_each_branch = compiler::cmpfun(calc_independent_likelihoods_on_each_branch_prebyte)

############################################################
# Universal setup, input file path
############################################################
wd = "/Users/lcai/Dropbox/Davis\ Lab/2013-2014_Elatinaceae/BioGeoBears_Analysis_1508/"
wd = "/drives/Dropbox/_njm/__packages/BioGeoBEARS_setup/z_help/Liming/"
setwd(wd)
extdata_dir = np(system.file("extdata", package="BioGeoBEARS"))
extdata_dir
list.files(extdata_dir)
trfn = "RAxML_best0821.newick"
geogfn = "speciesrange.data"
max_range_size = 3

newick_fns = scan("treelist", what="character", sep=NULL)
#geog_fns = "speciesrange.data"
geog_fns = scan("geoglist", what="character", sep=NULL)
resfns = scan("outputnames", what="character", sep=NULL)
############################################################
# Run DEC on multiple trees
############################################################

BioGeoBEARS_run_object = define_BioGeoBEARS_run()
BioGeoBEARS_run_object$force_sparse=FALSE    
BioGeoBEARS_run_object$speedup=TRUE          # shorcuts to speed ML search; use FALSE if worried (e.g. >3 params)
BioGeoBEARS_run_object$use_optimx = TRUE
BioGeoBEARS_run_object$calc_ancprobs=TRUE    # get ancestral states from optim run

BioGeoBEARS_run_object$timesfn = "time" 
BioGeoBEARS_run_object$dispersal_multipliers_fn = "sevenarea.mat.rm.ram" 
BioGeoBEARS_run_object$areas_allowed_fn = "areas_allowed.txt"
BioGeoBEARS_run_object$geogfn = geogfn
BioGeoBEARS_run_object$trfn = trfn
BioGeoBEARS_run_object$max_range_size = max_range_size
BioGeoBEARS_run_object$num_cores_to_use=5

BioGeoBEARS_run_object$force_sparse=FALSE
BioGeoBEARS_run_object = readfiles_BioGeoBEARS_run(BioGeoBEARS_run_object)
BioGeoBEARS_run_object = section_the_tree(inputs=BioGeoBEARS_run_object, make_master_table=TRUE, plot_pieces=FALSE)
BioGeoBEARS_run_object$return_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_TTL_loglike_from_condlikes_table = TRUE
BioGeoBEARS_run_object$calc_ancprobs = TRUE
BioGeoBEARS_run_object
BioGeoBEARS_run_object$BioGeoBEARS_model_object
BioGeoBEARS_run_object$BioGeoBEARS_model_object@params_table

check_BioGeoBEARS_run(BioGeoBEARS_run_object)

# multiple tree settings
#' @BioGeoBEARS_run_object Set up the inference model you want to run on 
#' each tree, like you would for a normal single-tree run.
#' @newick_fns A list of the Newick files (e.g., you should extract some trees
#' from a BEAST NEXUS MCMC output)
#' @model_name The name you would like added to output filenames
#' @geog_fns A list of corresponding geography files (by default, these are just .geog instead of .newick)
#' @resfns A list of results filenames for .Rdata files, either for saving BioGeoBEARS analyses on each tree, or 
#' for loading previously-saved runs (by default, these are just _model_name.Rdata instead of .newick)
#' @run_or_collect If you want to run BioGeoBEARS on each tree (slow), use "run". If you just want to 
#' collect the results over all the trees, use "collect".  For both, pick "both".
#' @start_treenum Default 1. Change if you want to skip some trees
#' @end_treenum Default is length(newick_fns). Change if you want to run a subset of trees.
#' @runslow If FALSE, old, saved .Rdata results are loaded via load(). Default TRUE generates new .Rdata files and 
#' saves them via save()
res = run_bears_optim_on_multiple_trees(BioGeoBEARS_run_object, newick_fns, model_name="Elatinaceae_DEC", geog_fns, resfns, run_or_collect="run", start_treenum=1, end_treenum=length(newick_fns), runslow=TRUE, plot_params=FALSE)
res



# Now, collect the saved results from each individual run
res2 = run_bears_optim_on_multiple_trees(BioGeoBEARS_run_object, newick_fns, model_name="Elatinaceae_DEC", geog_fns, resfns, run_or_collect="collect", start_treenum=1, end_treenum=length(newick_fns), runslow=TRUE, plot_params=FALSE)
#res2


names(res2)

# This has the (averaged) state probabilities at nodes and corners
# for the master tree (whatever you put in BioGeoBEARS_run_object)
dim(res2$state_probs_at_nodes_across_all_trees)
dim(res2$state_probs_at_corners_across_all_trees)

# This has the averaged parameter values
res2$optim_results_mean

# To plot with the standard BioGeoBEARS plot function you will need to 
# - run a standard analysis on the master tree, and 
# - then copy these average state probabilities into that res object 
#   (into the state probabilities at nodes, and state probabilities at branch bottoms)
# - then copy res2$optim_results_mean over the standard optim results


# Nick's random rant on the pseudo-Bayesian approach
# After you've done all this, you'll have a plot of probabilities of states, 
# averaged over many trees. Yay.
# It will probably look pretty much like a plot of one run on the single ML or MCC tree.
# This is why I am skeptical of the popularity of "Bayes DIVA", "Bayes Lagrange", etc. 
# approaches.  They aren't fully Bayesian -- more like pseudo-Bayesian.  It is mostly
# done to mollify reviewers who heard somewhere it was a good idea.  The danger is that
# people think they are "accounting for uncertainty" by doing this approach.  The REALLY BIG 
# uncertainty, though, is in things like model misspecification, missing species (due to 
# sampling or extinction).  And the "single most probable state" plots hide the uncertainty
# just the ML ancestral state estimates anyway (do stochastic mapping to real feel that
# uncertainty).
# It would be better to spend time thinking about and discussing all that stuff, 
# rather than spending days running a model on a bunch of (typically) very 
# similar trees that (typically) all say about the same thing.
#
# The best argument for doing the pseudo-Bayesian approach is if you have a particular 
# node you are interested in, and the geography estimated at that node is strongly 
# affected by nearby uncertainty in the tree.  In that case,
# running a model over a bunch of trees can be useful.  
#
# I may be overly skeptical. But I have yet to see a case where the pseudo-Bayesian approach
# gave any real additional insight. It's only likely if your tree is very uncertain, but
# typically people get enough data to avoid that. Very often, I guess, people just want to
# be able to say, "we ran it on 100 trees and got the same result".  But really, if your
# tree is decently estimated, you actually knew that going in, from first principles.
#
# The pseudo-Bayesian approach might be useful if your dating is highly uncertain (as is common), 
# and you are doing a time-stratified analysis where uncertain nodes are near the time-strata.
# This would be the case I would focus on.
#


==================
Liming.zip

Liming Cai

unread,
Aug 17, 2015, 12:25:43 PM8/17/15
to BioGeoBEARS
Hi Nick, 
Thank you very much for the thoughtful advice and comments.
Best regards,
Liming

Liming Cai

unread,
Aug 23, 2015, 9:08:12 PM8/23/15
to BioGeoBEARS
Hi Nick,

I have a following question. This might be naive for you, but after I ran the analysis and got the averaged state probs at corners/nodes across all trees, I ran into problems when I want to plot this on my master tree. Supposing that the results from master tree are loaded by resDEC and results from multiple trees are loaded with res2. I simply copy res2 to resDEC as follows:

> resDEC$optim_result=res2$optim_results_mean
> resDEC$ML_marginal_prob_each_state_at_branch_bottom_below_node=res2$state_probs_at_corners_across_all_trees
> resDEC$ML_marginal_prob_each_state_at_branch_top_AT_node=res2$state_probs_at_nodes_across_all_trees
 
Then I tried to plot the result what I usually do with single-tree analysis. The following error occurred:

> plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.3, tipcex=0.3, statecex=0.3, splitcex=0.3, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE)
Read 9 items
Read 404 items
Error in floating.pie.asp(XX[i], YY[i], pie[i, ], radius = xrad[i], col = piecol) : 
  floating.pie: x values must be non-negative
Called from: floating.pie.asp(XX[i], YY[i], pie[i, ], radius = xrad[i], col = piecol)
Browse[1]> 

I wonder whether I did it correctly to plot the results like this. Or are there any additional analysis should be conduct to summarize the results before I can plot them onto a master tree?

Best regards,
Liming

ps. Here is the google drive link for the files:

On Thursday, August 13, 2015 at 8:22:02 AM UTC-4, Nick Matzke wrote:

Nick Matzke

unread,
Aug 23, 2015, 9:21:49 PM8/23/15
to bioge...@googlegroups.com
That's pretty weird. It suggests you have negative values in the state_probs somehow, which then cause the pie chart to malfunction. But I don't see how you could get that out of an averaging operation. I am doing workshops today/tomorrow, so it will be awhile before I can look. You could look for negative values in the state_probs and try to trace where they came from. Perhaps more probable is NA values caused by some mis-match between your actual master_tree and some other tree that got put in accidentally?

Cheers, Nick

--
You received this message because you are subscribed to the Google Groups "BioGeoBEARS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biogeobears...@googlegroups.com.
To post to this group, send email to bioge...@googlegroups.com.
Visit this group at http://groups.google.com/group/biogeobears.
To view this discussion on the web visit https://groups.google.com/d/msgid/biogeobears/59d49dc9-933b-403d-a511-9ca807dfc5ad%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Pau Carnicero Campmany

unread,
Jan 21, 2016, 8:24:40 AM1/21/16
to BioGeoBEARS, mat...@nimbios.org

Hi all,


I got the same error as Liming.


When I checked the objects res2$state_probs_at_nodes_across_all_trees and res2$state_probs_at_corners_across_all_trees I found out that they contain the probabilities of each node and each tree, not the average probabilities of all trees analyzed. These objects have n times (the number of trees analyzed) the number of rows of res$ML_marginal_prob_each_state_at_branch_top_AT_node (probabilities at nodes for the single tree analysis), I expected them to be the same size..

So maybe this is the problem? How can I obtain the averaged probabilities at nodes and corners?


Thanks,

Nick Matzke

unread,
Jan 27, 2016, 1:45:37 AM1/27/16
to BioGeoBEARS, mat...@nimbios.org
I took a look at this, somehow I left out the summarization step. I have revised the multiple-trees script, and run it on the dataset.  I am pasting the script below, with an updated mini-essay on why the whole "averaging over many runs" approach of "Bayes-DIVA", "Bayes-Lagrange", and/or "Bayes-BioGeoBEARS" is kind of dubious in the first place (for starters, it's not really Bayesian, more like pseudo-Bayesian).

Cheers!
Nick


EXAMPLE SCRIPT -- RUNNING BIOGEOBEARS ON MULTIPLE TREES, 
AND SUMMARIZING VIA A DUBIOUS AVERAGING OF "COMMON" ("SHARED") NODES AMONG TREES

=====================
###########################################################
# Run BioGeoBEARS on multiple trees and store the output 
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges
# Run the analysis, on the master tree
# For a slow analysis, run once, then set runslow=FALSE to just 
# load the saved result.
runslow = TRUE
resfn = "master_tree_single_run_v1.Rdata"
if (runslow)
    {
    master_tree_res = bears_optim_run(BioGeoBEARS_run_object)
    master_tree_res

    save(master_tree_res, file=resfn)
    resDEC = master_tree_res
    } else {
    # Loads to "master_tree_res"
    load(resfn)
    resDEC = master_tree_res
    }


# multiple tree settings
#' @BioGeoBEARS_run_object Set up the inference model you want to run on 
#' each tree, like you would for a normal single-tree run.
#' @newick_fns A list of the Newick files (e.g., you should extract some trees
#' from a BEAST NEXUS MCMC output)
#' @model_name The name you would like added to output filenames
#' @geog_fns A list of corresponding geography files (by default, these are just .geog instead of .newick)
#' @resfns A list of results filenames for .Rdata files, either for saving BioGeoBEARS analyses on each tree, or 
#' for loading previously-saved runs (by default, these are just _model_name.Rdata instead of .newick)
#' @run_or_collect If you want to run BioGeoBEARS on each tree (slow), use "run". If you just want to 
#' collect the results over all the trees, use "collect".  For both, pick "both".
#' @start_treenum Default 1. Change if you want to skip some trees
#' @end_treenum Default is length(newick_fns). Change if you want to run a subset of trees.
#' @runslow If FALSE, old, saved .Rdata results are loaded via load(). Default TRUE generates new .Rdata files and 
#' saves them via save()

runslow = TRUE
if (runslow)
{
res = run_bears_optim_on_multiple_trees(BioGeoBEARS_run_object, newick_fns, model_name="Elatinaceae_DEC", geog_fns, resfns, run_or_collect="run", start_treenum=1, end_treenum=length(newick_fns), runslow=TRUE, plot_params=FALSE)
# Save res to "res_over_multiple_trees" 
res_over_multiple_trees = res
save(res_over_multiple_trees, file="res_over_multiple_trees.Rdata")
} else {
# Loads to "res_over_multiple_trees"
load(file="res_over_multiple_trees.Rdata")
} # END if (runslow)



# Now, collect the saved results from each individual run
res2 = run_bears_optim_on_multiple_trees(BioGeoBEARS_run_object, newick_fns, model_name="Elatinaceae_DEC", geog_fns, resfns, run_or_collect="collect", start_treenum=1, end_treenum=length(newick_fns), runslow=TRUE, plot_params=FALSE)
#res2


names(res2)

# This has the matrix of state probabilities at nodes
# for ALL of the trees you specified, in one huge matrix. 
dim(res2$state_probs_at_nodes_across_all_trees)
# The rightmost column is text, and contains the alphabetical, 
# comma-delimited list of tipnames (OTU names) descending from 
# that node or corner.
#
# These tipnames are used to identify "common" nodes between
# individual trees and the master tree. (See debates about
# the meaning of "common nodes", below.)
#
# The function summarize_stateprobs_on_master_tree(), below, is 
# what will go through this matrix, extract the probabilities
# for common nodes, and produce a matrix appropriate for
# plotting with standard functions.

# This matrix has probabilities for corners (branch bottoms), 
# instead of nodes.
dim(res2$state_probs_at_corners_across_all_trees)

# This has the averaged parameter values
res2$optim_results_mean


#################################################################
# To average the probabilities across all trees
#################################################################

# To plot with the standard BioGeoBEARS plot function you will need to 
# - run a standard analysis on the master tree, and 
# - then copy these average state probabilities into that res object 
#   (into the state probabilities at nodes, and state probabilities at branch bottoms)
# - then copy res2$optim_results_mean over the standard optim results


# Let's assume this tree is the master tree
best_trfn = "RAxML_best0821.newick"
tr = read.tree(best_trfn)
master_OTUs = tr$tip.label


stateprobs_list = summarize_stateprobs_on_master_tree(master_tree_fn=best_trfn, state_probs_at_nodes_across_all_trees=res2$state_probs_at_nodes_across_all_trees, state_probs_at_corners_across_all_trees=res2$state_probs_at_corners_across_all_trees, plotflag=FALSE)
stateprobs_list

#####################################################
# Get the single-tree master tree analysis
#####################################################
# Load the tipranges
tipranges = getranges_from_LagrangePHYLIP(lgdata_fn=geogfn)
tipranges

# Load master-tree analysis to "master_tree_res"
load(resfn)
resDEC_averaged = master_tree_res



#####################################################
# Input the averaged probabilities
#####################################################

# At nodes
resDEC_averaged$ML_marginal_prob_each_state_at_branch_top_AT_node = stateprobs_list$stateprobs_nodes_master_tr

# At corners
resDEC_averaged$ML_marginal_prob_each_state_at_branch_bottom_below_node = stateprobs_list$stateprobs_corners_master_tr

# Average parameter inferences
resDEC_averaged$optim_result = res2$optim_results_mean


#####################################################
# PLEASE NOTE: As you can see, 
# SOME NODES HAVE NO STATES OR PIE CHARTS. This is 
# because those nodes were not found in any of 
# the 3 trees sampled in example, above.
#
# In the probabilities matrix, these will be rows 
# that have "NA" instead of numbers.
#
# (Also see discussion of the ambiguity of what 
#  "same node" even means, in this context.)
#####################################################




#######################################################
# PDF plots
#######################################################
pdffn = "Plot_of_averages_on_master_tree_v1.pdf"
pdf(pdffn, width=6, height=18)

#######################################################
# Plot ancestral states - DEC
#######################################################
analysis_titletxt ="BioGeoBEARS DEC, averaged over 3 trees, plotted on master tree"

# Setup
results_object = resDEC_averaged
scriptdir = np(system.file("extdata/a_scripts", package="BioGeoBEARS"))

# States
res2 = plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="text", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)

# Pie chart
plot_BioGeoBEARS_results(results_object, analysis_titletxt, addl_params=list("j"), plotwhat="pie", label.offset=0.45, tipcex=0.7, statecex=0.7, splitcex=0.6, titlecex=0.8, plotsplits=TRUE, cornercoords_loc=scriptdir, include_null_range=TRUE, tr=tr, tipranges=tipranges)


dev.off()  # Turn off PDF
cmdstr = paste("open ", pdffn, sep="")
system(cmdstr) # Plot it




###############################################################
# Nick's advice/rant on the pseudo-Bayesian approach
###############################################################
# After you've done all this, you'll have a plot of probabilities of states, 
# averaged over many trees. Congratulations.
# However, I bet it will probably look pretty much like a plot of one run on the single ML or MCC tree.
# This is one of several reasons why I am skeptical of the popularity of "Bayes DIVA", "Bayes Lagrange", etc. 
# approaches. 
#
# SEVERAL POTENTIAL PROBLEMS WITH THE (PSEUDO) "BAYES" APPROACHES 
# TO HISTORICAL BIOGEOGRAPHY (Bayes DIVA, Bayes-Lagrange) 
#
# 1. First, they aren't really fully Bayesian -- more like pseudo-Bayesian.  
#
# 2. Second, the "Bayes" approach is sometimes done to mollify reviewers who heard somewhere 
# it was a good idea, probably because they think it helps "account for uncertainty." 
# However, The REALLY BIG uncertainty, is in things like:
#
# 2a. Model misspecification. For example, Bayes-DIVA is still giving DIVA results, and 
#     DIVA may be a poorly-fitting model on many datasets.  Running it over 2000 trees 
#     or whatever doesn't fix any of that, despite the elaborate and impressive-sounding
#     computational effort.
#
# 2b. Missing species (due to incomplete sampling, or extinction) will be missing in any
#     analysis, and this will not be fixed by running over many trees.
# 2c. The "single most probable state" plots hide the uncertainty, because they just plot
#     the single-most probable estimated ancestral state at each node anyway. You need 
#     to do stochastic mapping to really feel, and characterize, the uncertainty.
#
# 3.  A third problem is that "averaging" "node probabilities" across trees is a weird 
#     operation. What is the definition of "same node" between two different trees?
#
# Consider these two clades, from two sampled trees:
# tree 1: ((A,B),C)
# tree 2: ((A,B),D)
#
# Is the common ancestor of (A,B) the "same" between the two trees?  The analysis above
# assumes "yes".
#
# (And, I think, all Bayes-DIVA/Bayes-Lagrange approaches assume this, except perhaps 
# the one I implemented in Wood, Matzke et al. (2013), Sys. Bio., which had more options.)
#
# Also, consider the "corner" at the bottom of the branch of the (A,B) node. Is *it* the same between
# the two trees?  The BioGeoBEARS program above assumes "yes".
#
# How about:
# tree 1: (((A,B),C),D)
# tree 2: (((A,C),B),D)
#
# Is the node that is common ancestor to ((A,B),C) "the same" as the node that is common ancestor to ((A,C),B)?
# Sort of, but not really. But, BioGeoBEARS and presumably the other programs above assume "yes".
#
# You can imagine getting "better matches" by requiring more constraints (same daughter tips, and same sister),
# but this will reduce matches between trees, and still doesn't really solve the problem, because unless
# trees agree perfectly in topology, there will always be mismatches of this sort.  And because biogeographical
# models have sudden cladogenetic transitions in addition to the continuous-time anagenetic ones, topological
# differences may have more significance than they would under typical, purely continuous-time models.
#
# WHAT TO DO INSTEAD
#
# After thinking about it, my current conclusion is that the pseudo-Bayesian approaches
# need some serious critical thinking before being employed.
#
# Rather than running hundreds or thousands of trees and presenting dubious "average",
#
# * It would be better to spend time *thinking* about and discussing all of the sources
# of uncertainty, above, rather than spending days running a model on a bunch of (typically) very 
# similar trees that (typically) all say about the same thing.
#
# * It would be better to just run the analysis on a few different trees, and put those
#   plots in the paper.
#
# * If you are really concerned about some specific events, run a number of different trees, and then
#   do stochastic mapping on each one. The summary counts of events would be one way of summarizing
#   the uncertainty in event histories.  You could also identify some "common" nodes by some
#   explicitly-stated criterion, and count up the ranges inhabited and the counts of events at those nodes.
#
# The best argument for doing the pseudo-Bayesian approach is if you have a particular 
# node you are interested in, and the geography estimated at that node is strongly 
# affected by nearby uncertainty in the tree.  In that case,
# running a model over a bunch of trees can be useful.  
#
# I may be overly skeptical. But I have yet to see a case where the pseudo-Bayesian approach
# gave any real additional insight. It's only likely if your tree is very uncertain, but
# typically people get enough data to avoid that. Very often, I guess, people just want to
# be able to say, "we ran it on 100 trees and got the same result".  But really, if your
# tree is decently estimated, you actually knew that going in, from first principles.
#
# The pseudo-Bayesian approach might be useful if your dating is highly uncertain (as is common), 
# and you are doing a time-stratified analysis where uncertain nodes are near the time-strata.
# This would be the case I would focus on.
#


=====================
Reply all
Reply to author
Forward
0 new messages