Question regarding the appropriate mutation rate model and challenges with high autocorrelation.

83 views
Skip to first unread message

victor SUAREZ SANTIAGO

unread,
Oct 14, 2024, 8:35:57 AM10/14/24
to migrate-support

Hi. I'm trying to run Migrate to estimate immigration among 2-3 populations of a threatened plant, using 8 microsatellite loci that show considerable variation. I have a question about the mutation rate and a problem with autocorrelation that I would be very grateful if someone could help me with.

  1. Mutation rate: Since there are 8 independent loci, wouldn't it be more realistic to work with relative rates (DATA option) than with a constant rate? This seems reasonable, but in the literature, a constant rate is often used for microsatellites. The manual states that a constant rate usually works well, but wouldn't a relative rate be better? I have tried both rates and obtain different results. With the relative rate, I obtain higher theta values and lower M values. The direction of migration (between populations) is even reversed for the highest M value. Seeing the discrepancy in the results, I don't know which rate to use.

  2. Autocorrelation: I have problems with high levels of autocorrelation >0.9. For analyses that conclude in a reasonable time of 4-7 days, it doesn't matter whether I run longer or shorter analyses because the rates remain very high. The histograms obtained from the posterior distribution look good. I have observed that when I widen the range of the priors for Theta and M, the autocorrelation decreases somewhat, but the values of Theta and M increase enormously. For example, priors with a maximum of 500 (delta 50) for Theta and 1000 (delta 100) for M result in autocorrelation values of 0.78-0.85 and Thetas of around 45 and Ms of approximately 95; while priors of 100 (delta 10) for Theta and M result in autocorrelation values > 0.9 and Thetas of 4 and M of 10. Why does this happen and how can I optimize my analyses?

Thank you very much in advance.

Peter Beerli

unread,
Oct 14, 2024, 9:14:13 AM10/14/24
to migrate...@googlegroups.com
Dear Victor

1. The mutation rate DATA option is only useful if the variability in the data is very different among loci, with a similar number of alleles, there should be no difference [you could compare using marginal likelihoods] As an example, I would use the DATA option if  half of the loci have <4 alleles and others >10

2. Autocorrelation is not so important other than with high correlation. You will need to run much longer and use replicates, It is important to note that
with weak data, increasing the range of the prior will lead to upward estimates because, for microsats, it seems common to see values in the range of 0.5-4, but rarely >50, so a prior above 50 seems wide for the population size. For M, if you consider the high mutation rate (remember M=m/mu), then values higher than the population sizes are usually a signal that there may be a panmictic situation and the subpopulations are mixing (check using marginal likelihoods]. Again, if the prior range for M is huge, then this will result in an upward bias in short runs or data with no strong signal for gene flow; this is not uncommon in msats because they are so variable (diffusing the gene flow signal). I usually use a prior range for M with msats that is about the same magnitude as the one for the population size.


Peter


--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/d1e94c6e-97d2-4843-b18b-6f8c357e7c28n%40googlegroups.com.

Víctor Suárez

unread,
Oct 15, 2024, 7:32:22 PM10/15/24
to migrate-support

Thank you very much, Peter. 

My microsatellite loci show the following alleles: 2 loci have 23 and 19 alleles respectively; 2 have 8; 3 have 5 and 1 has 2 alleles. What do you think? Whenever I compare analyses with a constant rate and a relative rate, the Log-Probability value is always much higher (less negative) for the relative rate analysis. I'm sending you the results of 4 analyses, two considering 3 populations (according to the structure analyses; STRUCTURE and DAPC) and two considering 2 populations (two of the previous three populations combined, with a higher relationship in the structure analyses and geographically closer). In each pair of analyses, one is with a relative rate and the other with a constant rate.

Regarding correlation, I always use 10 replicates. If you compare the autocorrelation rates of the previous analysis with 2 populations and a constant rate, with 250,000 Visited (sampled) parameter values with another result I send you (outfile_2pop_Full_constante_short2b.pdf) and 2,000,000 Visited (sampled) parameter values, you will see that they do not decrease. I am sending you the parmfile (parmfile_msat_2pop_Full_tasaconstante_short2c) of the last analysis that I am running according to your recommendation on the prior of Theta and M (I have also increased the long-inc), in case you observe something that is influencing.

I take this opportunity to ask you another question: In the parmfile, in addition to the migration model, is there anything else that needs to be changed to evaluate the divergence of one population from another (with or without gene flow)? I'm attaching a parmfile (parmfile_msat_3pop_MU1a3fromMU4conD_tasaRelat_short5) in which I've changed the model, but I'm not sure if I should specify any prior for the Split and SplitSD.

Thanks again, and sorry if I’m being a pain.

Best
V.-
outfile_2pop_Full_constante_short2b.pdf
outfile_2pop_Full_relat_short5.pdf
outfile_3pop_Full_relat_short5.pdf
outfile_2pop_Full_Const_short5.pdf
parmfile_msat_3pop_MU1a3fromMU4conD_tasaRelat_short5.txt
parmfile_msat_2pop_Full_tasaconstante_short2c
outfile_3pop_Full_constante_short5.pdf
Reply all
Reply to author
Forward
0 new messages