Error when specifying independent priors for each parameter

16 views
Skip to first unread message

Aidan Shaw

unread,
Oct 11, 2025, 1:22:33 PMOct 11
to migrate-support
Hello Peter and others,

I have been unable to figure out how to set different priors for each migration rate. Following the syntax in the parameter file causes a segmentation fault when the program is started every time. Changing the priors back to the default fixes it, so I am confident they are the cause of the error. Here is an example of my syntax:

bayes-priors= THETA * * UNIFORMPRIOR: 0.000000 0.100000 0.010000
bayes-priors= MIG 2 1 EXPPRIOR: 0.000000 1000.000000 2000.000000
bayes-priors= MIG 3 1 EXPPRIOR: 0.000000 1000.000000 2000.000000
bayes-priors= MIG 1 2 EXPPRIOR: 0.000000 1000.000000 2000.000000
bayes-priors= MIG 3 2 EXPPRIOR: 0.000000 2000.000000 3000.000000
bayes-priors= MIG 1 3 EXPPRIOR: 0.000000 1000.000000 2000.000000
bayes-priors= MIG 2 3 EXPPRIOR: 0.000000 2000.000000 3000.000000


I have attached my parmfile and the simulated sequence data I have been experimenting with. Also, fyi these sequences were simulated with m=0.2 and mu=1e-7, so M should be 2,000,000, meaning that my priors are way too small. For now I have just been trying to figure out how to set independent priors, but I would also appreciate any advice for setting priors when M is expected to be very large.

Thanks!
Aidan

3_pop_simple_0
parmfile

Peter Beerli

unread,
Oct 12, 2025, 4:08:18 PMOct 12
to migrate...@googlegroups.com
Aidan,

(1) There was a bug in 5.0.7 (so I assume that was also in 5.0.4 — the version you used). I updated the GitHub version (https://github.com/pbeerli/migrate-5.0.7.git). If you are familiar with compiling sources from GitHub, you can use that; otherwise, you will need to wait a few more days so that I can produce a source distribution and an executable for Mac (and hopefully for Windows [I don’t do well with that]). Tests with the single CPU version seem to work. Your example of prior differences seems too similar to make a difference compared to having the same prior.

(2) migrate-n will work fine if you use large prior bounds for M. Still, several difficulties will arise, for assembling percentiles migrate-n will use a histogram so if you use bounds between 0 and 10M with a default histogram setting (bins=1500) then each bin will include a large range of immigration rates, so you wil get a course result for immigration, the larger the range for M it also seems that the power to estimate population size decreases (I did a test with your data and with upper bounds of 2,000,000 your single locus seem to lose all power to estimate accurate populationsize.

The more serious problem is the fact that you simulated the data with M=2,000,000. I assume that your population size Theta is somewhere in the range of your prior 0..0.1; assuming a potentially low population size of theta=0.00001, then your 4Nm=0.00001*2,000,000 =  20, with theta=0.1 (your upper bound), this would be 4Nm=200,000; with immigration rates higher than 4Nm>10, the structured coalescence model is not appropriate (and you should use a single population model. I compared three runs: a. your priors in your parmfile, b. parmfile, but increasing the upper bounds to two million for immigration M. and a single population model, the single population model wins with about 40 log marginal likelihood units (model probability for the single pop model = 1.0 when using these three models. In early papers about migrate-n, you will also find that migrate will underestimate the high immigration rates because you would need to run the MCMC longer than you can or want to wait.

(3) Do not use more than one long chain. Bayesian inference will only run one cold chain (and several hot chains, but that is not defined in the option long-chain). If you want to get better results, run replicates [these can be parallelized]
A single locus will not be very precise for the lengths of sequence data; more loci will be better to estimate population sizes and immigration rates.

If you have more questions please ask
thanks
Peter



--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/migrate-support/bc471d99-9b67-40b1-ba98-ad60973a6dbdn%40googlegroups.com.
3_pop_simple_0
parmfile

Aidan Shaw

unread,
Oct 13, 2025, 10:22:36 AMOct 13
to migrate-support
Peter,

I compiled 5.0.7 from github and it is working now. By the way, I am not sure if this was a bug on my end or in the distro, but my compiler was not able to find the hpdf_config script for libharu and I had to add it to lib/haru/include manually. Once I did everything worked fine. 

As for your advice in setting parameters, this is very helpful. I imagine my running so many long chains is what has caused some runs to take so long. I have found that having large prior bounds (i.e. 0 to 2,000,000) really slows things down, but this is surely to be expected. And thank you for pointing out the concern with having 4Nm >10, I have been curious to see what parameter estimates look like when you give migrate subpopulations that have high migration rates bordering on panmixia. However, it seems based on your advice that a better approach is to compare model likelihoods between a model with multiple populations specified and one with populations combined.

Thanks!
Aidan

Peter Beerli

unread,
Oct 13, 2025, 4:26:06 PMOct 13
to migrate...@googlegroups.com
Aidan,

I thought I had fixed the hpdf_config issue, but it obviously needs more work. I am glad that you could compile it. 

Many years ago, I presented a graph in talks that illustrates the model selection of a 2-pop model versus a 1-pop model in relation to the magnitude of the immigration parameter.
I did not find the original simulations (but remember that I did 100 per scenario and sorted according to the log marginal likelihood [the value =0 is where the model assignment flips. 
from left to right: 
1. Simulated a single population split dataset in half and assigned pop1 and pop2
2. Simulated two populations with 4Nm >>>>> 4
3. Simulated 2 populations with  (I assume M=100 and theta=0.01) 4Nm=1
4. Simulated two populations with (I assume M=1 and theta=0.01) 4Nm=0.01

It shows that with a high immigration rate, we should always prefer the simpler model.


hope this helps.
Peter

Screenshot 2025-10-13 at 12.34.41.png


Aidan Shaw

unread,
Oct 13, 2025, 4:46:19 PMOct 13
to migrate-support
This is very helpful, exactly what I have been wondering about. Thanks!
Reply all
Reply to author
Forward
0 new messages