Hi,
I'm running structure on a dataset for a range of values of K and 20 iterations for each K. The problem is that many runs seem to get stuck on a local maximum as there is an enormous variance in the log-likelihoods between different runs. My question concerns finding the runs with the highest likelihood as I wanted to restart them later using the option STARTATPOPINFO=1.
I'm attaching two examples, which illustrate my confusion. First, I compared the 'Ln Like' chain, which I plotted in two figures (see attached PNG). As you can see, mcmc2.pdf has a much higher value than mcmc16.pdf, suggesting that mcmc2 has approached a much higher value.
Conversely, when I compared the output_f files, the results suggest something different.
The file output2_f has the following summary statistics:
--------------------------------------------
Estimated Ln Prob of Data = -451648.6
Mean value of ln likelihood = -20698.1
Variance of ln likelihood = 861901.0
Mean value of alpha = 0.3619
Mean value of r = 0.0009
Standard deviation of r = 0.0203
while the file output16_f has the following summary statistics:
--------------------------------------------
Estimated Ln Prob of Data = -283633.9
Mean value of ln likelihood = -18983.1
Variance of ln likelihood = 529301.7
Mean value of alpha = 0.2454
Mean value of r = 0.0011
Standard deviation of r = 0.0241
This in turn suggests the run 16 was a much better run.
I'm running the results on a cluster, which prevents me from running longer MCMC chains (I'm already doing 800,000 + 300,000 iterations). Thus, I need to make sure that the runs converge on a global maximum before I analyse the results and estimate the best value of K.
Help would be highly appreciated!
R