Hey all!
I would like to request some advice/suggestions! I am examining if the Transfer events in my gene-tree are affected by the following parameter-
a)tree topology (1000 simulations of randomized host-tree topologies),
b)gene-tree models (HKY (kappa=2) vs GTR),
c)missing data (50% data present, 75% data present)
I find that Transfer rates of my gene-tree are significantly different for (a) and (c).
I used the Kruskal-wallis test to compare the parameters/methods followed by the Dunn test for pairwise comparisons.
>kruskal.test(transfer_rates ~ method,data=alldata) #global test
>library(PMCMRplus)
>posthoc.kruskal.dunn.test(x=alldata$transfer_rates, g=as.factor(alldata$method), dist="Tukey") #pairwise test
GTR GTR_50perc GTR_75perc HKY
GTR_50perc 0.000014359699 - - -
GTR_75perc 1.00000 0.00033 - -
HKY 0.29878 0.000000000028 0.10013 -
HKY_50perc 0.00033 1.00000 0.00332 0.000000019176
HKY_75perc 0.10013 0.26188 0.29878 0.00016
simulation < 0.0000000000000002 < 0.0000000000000002 < 0.0000000000000002 < 0.0000000000000002
HKY_50perc HKY_75perc
GTR_50perc - -
GTR_75perc - -
HKY - -
HKY_50perc - -
HKY_75perc 0.29878 -
simulation < 0.0000000000000002 < 0.0000000000000002
I wanted to ask if 1) the steps used here for statistical analysis (Kruskal-Wallis and Dunn test) are correct?
2) Is it common for ML-based HGT estimation methods to be biased towards missing data?
Thanks again!