Hello!
My questions might be quite naive since I'm new to using nested sampling.
1. Can I use nested sampling to select the optimal priors (i.e. Yule model, constant coalescent) for my data apart from nested sampling being used for model selection i.e. strict or relaxed clock?
2. I'm really not sure how long the sub-chain length should be (MCMC).
I read this from a paper by Maturana et al 2018:
"On the other hand, NS only requires the number of active points and the number of MCMC steps used
to generate the replacement points. The latter should be chosen in relation to the number of parameters.
This number should exceed the dimension of the parameter space in order to guarantee the generation of an
independent point, otherwise the marginal likelihood estimate will be biased lower. In our experience, it should
be at least 10 times more than the number of parameters, but it is recommended to try different values."
Im not really sure if I understand this very well. Which parameters are being referred in this paragraph? Does it refer to the number of parameters in the setting when performing nested sampling i.e. 4 so then it should be 4x 10= 40 as suggested?
I read from a blog on Nested sampling from Beast website:
"The other tuning parameter is the sub-chain length. This plays the most important role in the reliability of the method. It defines the number of MCMC steps to generate a new point from the prior with the likelihood restriction at each iteration. NS works in theory if and only if the points generated at each iteration are independent. Therefore, this number should be large enough to meet this condition. Otherwise, the method produces underestimates. The implementation of NS in BEAST2 has the option to auto-calibrate this tuning parameter."
Maybe, this is the answer to my previous question. I have not figured out though how to auto-calibrate this tuning parameter so it would give me an idea how long the MCMC run should be. Is there a tutorial to do that?
3. This paragraph from the Nested Sampling blog is quite ambiguous for me:
"If the two models with the highest marginal likelihoods are reasonably far away in terms of their standard deviations, we choose the one with the highest value.Here one can be conservative and prefer a model if and only if the winning model is very far away in terms of SDs. On the other other hand, if the two highest marginal likelihoods are close to each other in terms of their standard deviations, the estimate procedure can be performed again, but with a larger number of active points to increase the precision of the estimates and then decide."
Let me know if I understand it correctly. So there were two competing models being compared in this paragraph and the marginal likelihood for each of this model was computed with their corresponding standard deviations. Given that each of them has a high standard deviation (e.g. more than 1 like 30), the winning model should be the one with the highest marginal likelihood (value with the lowest negative value). But if the standard deviation of each the model is quite small (like each of them has sd=1), then I should repeat the Nested sampling analysis for each of the two models by increasing their number of active points. Is my understanding correct?
4. Finally, below, which of the likelihood should I check. There was a similar question from the previous post. Im just double checking if I am correct that any of these marginal likelihoods can be used for model selection given that I have enough number of subchain length and active point.
"Total calculation time: 1127.811 seconds
End likelihood: 224.80084680134817
Producing posterior samples
Marginal likelihood: -3237.6076331280424 sqrt(H/N)=(18.032373246914972)=?=SD=(18.120594329894292) Information: 325.16648491605474
Max ESS: 25.758872875344174
Processing 653 trees from file.
Log file written to /Users/Jasper/Documents/Tetrastigma/Phylogenetic_analysis/NS/ITSonly_83samples_RelaxLogYule.posterior.trees
Done!
Marginal likelihood: -3238.016583866134 sqrt(H/N)=(18.05014040335818)=?=SD=(17.726293702607325) Information: 325.80756858094344
Max ESS: 25.904104821434654
Log file written to /Users/Jasper/Documents/Tetrastigma/Phylogenetic_analysis/NS/ITSonly_83samples_RelaxLogYule.posterior.log
Done!"
Thanks for bearing with me and I hope you find time to help me. I have attached my xml file just in case you need to try it.
Cheers,
Jasper Obico