Questions regarding Nested Sampling for model selection

264 views
Skip to first unread message

Taehyung Kwon

unread,
Aug 24, 2018, 1:54:52 AM8/24/18
to beast-users
Dear all

Thanks to Remco, I applied Nested-Sampling method on my BEAST run, to select the fittest model for my data.
I tried to figure out by myself, but still there are some questions left regarding the interpretation of runs.

1) I got a results at the end of the run as follows:
Marginal likelihood: -51343.395848769935 sqrt(H/N)=(55.12190527356013)=?=SD=(53.682782708646165) Information: 3038.424440987336
Max ESS: 48.537957290670896

Processing 6043 trees from file.
Log file written to rcl_exp.posterior.trees
Done!

Marginal likelihood: -51336.54201981124 sqrt(H/N)=(55.055892988737405)=?=SD=(52.98398374628466) Information: 3031.151352787305
Max ESS: 48.60835234006053

Log file written to rcl_exp.posterior.screen
Done!

Marginal likelihood: -51336.35764692686 sqrt(H/N)=(55.070964181563546)=?=SD=(54.3665073383475) Information: 3032.811095887055
Max ESS: 52.42912908387789

Log file written to rcl_exp.posterior.screen
Done!

I think I suppose to use one of those Marginal likelihood values but I don't know which one to use.

2) I accidentally closed linux shell screen showing a couple of BEAST results, and I couldn't find any clues of Marginal likelihood from log files (either ".screen.log" and ".posterior.screen" as in the path-sampling result). Can I obtain those values from result file?

3) I added "<run id="mcmc" spec="beast.gss.NS" chainLength="40000" particleCount="1" subChainLength="5000" epsilon="1e-12">" in my xml.
BEAST chains with some models went until ~6000 runs, while others did only around 5000 runs. According to my parameters, I think these chains must run 40000 times. Since ESS values are so low in TRACER, I don't know if they ended soon because they are converged or what. Why do they end earlier? 

4) When I open tracelog with TRACER, I can't tell if I have to extend chainlength or subchainlength because it shows low ESSs in posterior and NSlikelihood. If this run is impaired, what can I do to make it right?

trace_example.PNG


Any helps will be appreciated.
Thank you for taking times to read this.

Best regards,

Taehyung

Taehyung Kwon

unread,
Aug 26, 2018, 8:05:34 AM8/26/18
to beast-users
<Update>

I run NS test again, and parsed the result (which didn't take that much time).
With Marginal Likelihoods of models I used, I can easily tell strict clock X constant pop model has the highest value of all.
So I chose to use the previous result of my BEAST run using "strict clock x constant pop model"  (10XE-8) based on my NS test result.
If I did anything wrong, please don't hesitate to let me know about it.
Thanks.

Best regards,

Taehyung

Remco Bouckaert

unread,
Aug 26, 2018, 7:05:22 PM8/26/18
to beast...@googlegroups.com, Patricio Maturana Russel
Hi Taehyung,


On 24/08/2018, at 5:54 PM, Taehyung Kwon <ted...@gmail.com> wrote:
1) I got a results at the end of the run as follows:
Marginal likelihood: -51343.395848769935 sqrt(H/N)=(55.12190527356013)=?=SD=(53.682782708646165) Information: 3038.424440987336
Max ESS: 48.537957290670896

Processing 6043 trees from file.
Log file written to rcl_exp.posterior.trees
Done!

Marginal likelihood: -51336.54201981124 sqrt(H/N)=(55.055892988737405)=?=SD=(52.98398374628466) Information: 3031.151352787305
Max ESS: 48.60835234006053

Log file written to rcl_exp.posterior.screen
Done!

Marginal likelihood: -51336.35764692686 sqrt(H/N)=(55.070964181563546)=?=SD=(54.3665073383475) Information: 3032.811095887055
Max ESS: 52.42912908387789

Log file written to rcl_exp.posterior.screen
Done!

I think I suppose to use one of those Marginal likelihood values but I don't know which one to use.


The difference between the estimates is the way they are estimated from the nested sampling run. Since these are estimates that require random sampling, they differ from one run to another. When the standard deviation is small, the estimates will be very close, but since your standard deviations is quite large, they differ a bit. Any of these estimates are valid estimates, but make sure to report them with their standard deviation.



2) I accidentally closed linux shell screen showing a couple of BEAST results, and I couldn't find any clues of Marginal likelihood from log files (either ".screen.log" and ".posterior.screen" as in the path-sampling result). Can I obtain those values from result file?

The NS package has a NSLogAnalyser that you can run from the command line on Linux using

/path/to/beast/bin/applauncher NSLogAnalyser -N 1 -log  xyz.log

where the argument after N is the particleCount you specified in the XML, and xyz.log the trace log produced by the NS run.




3) I added "<run id="mcmc" spec="beast.gss.NS" chainLength="40000" particleCount="1" subChainLength="5000" epsilon="1e-12">" in my xml.
BEAST chains with some models went until ~6000 runs, while others did only around 5000 runs. According to my parameters, I think these chains must run 40000 times. Since ESS values are so low in TRACER, I don't know if they ended soon because they are converged or what. Why do they end earlier? 


Nested sampling stops automatically when the accuracy in the ML estimate cannot be improved upon. Because it is a stochastic process, some analyses run longer than others.


4) When I open tracelog with TRACER, I can't tell if I have to extend chainlength or subchainlength because it shows low ESSs in posterior and NSlikelihood. If this run is impaired, what can I do to make it right?

An NS analysis produces two trace log files: one for the nested sampling run and one with the posterior sample.

The ESSs in Tracer of log files with the posterior samples are meaningless, because the log file is ordered using the nested sampling run. If you look at the trace of the Likelihood, it should show a continuous increasing function. It is not quite clear how to estimate ESSs of a nested sampling run yet, though the number of entries in the posterior log is equal to the maximum theoretical ESS, which is almost surely an overestimate.

Regardless, if you are interested in comparing models, you only need to care about the marginal likelihood estimates.

Hope this helps,

Remco

Taehyung Kwon

unread,
Aug 27, 2018, 1:12:52 AM8/27/18
to beast...@googlegroups.com
Dear Remco

Thank you for clear answer. 
This helped me a lot.
I appreciate it.

Sincerely,

Taehyung
Reply all
Reply to author
Forward
0 new messages