Re: [migrate-support] Data Interpretation of the migrate-n analysis

Peter Beerli

unread,

Aug 11, 2012, 12:53:04 PM8/11/12

to migrate...@googlegroups.com

You will need to rerun and adjust the prior bounds for micro satellites data, it looks like your upper bound for Theta is 0.1 (appropriate for sequence data) change the upper bound to 100 for both Theta and M

Peter

On Aug 11, 2012, at 12:35 PM, Sajed <m.s...@gmail.com> wrote:

Dear migrate-n users,
I am new on the migrate-n and not so a mathematics person. I have just a very basic question about the data interpretation. I analyzed 9 sub-populations of plant species using 10 microsatellite loci within Island. I was trying to measure historical/long-term gene flow between populations. The strategy I chose was Bayesian interfere using Microsatellite data [Brownian motion]. So far I understand that mean value from Posterior distribution table should be considered as migration rate/gene between populations such as migration from sub-pop 2 to sub-pop 1 is 44.50. My questions what is the highest value and lowest value of the migration rate? I found some article where, for example the value 35.00 is considered as very low gene flow. Does it relative and based on the population size? How do I interpret the gene flow rate based on the populations size? Is there anyrealtion between migration rate and theta. In my another analysis of 3large populations (among islands) using the same species and same microsatellites, I found very high mean value from Posterior distribution , for example between island 1 to island 3, this mean value is 937.90. The sections of my output results are pasted below. Could anyone help explain the data interpretation about the high or low gene flow from these measures.

Locus Parameter        2.5%      25.0%    mode     75.0%   97.5%     median   mean
All Theta_1         0.05010 0.08119 0.08336 0.09039 0.10000 0.07922 0.07644
All Theta_2         0.08746 0.09373 0.09583 0.09746 0.10000 0.09496 0.09386
All Theta_3         0.00000 0.00000 0.00003 0.00047 0.00140 0.00050 0.00406
All Theta_4         0.06905 0.08232 0.08376 0.08579 0.10000 0.08642 0.08485
All Theta_5         0.06258 0.08272 0.08436 0.09526 0.09920 0.08482 0.08217
All Theta_6         0.05704 0.08099 0.08509 0.09320 0.10000 0.08202 0.07796
All Theta_7         0.00514 0.00947 0.01104 0.01281 0.01855 0.01097 0.01149
All Theta_8         0.03456 0.04223 0.04506 0.04903 0.06711 0.05067 0.05555
All Theta_9         0.00300 0.00894 0.01051 0.01241 0.02008 0.01084 0.01147
All M_2->1            14.67    33.33      44.33      54.67      73.33      45.00      44.50
All M_3->1             0.00      6.67        15.00    22.00      34.67      17.00      15.15
All M_4->1             0.00     0.00         4.33       18.67      67.33      19.00     21.25
All M_5->1             0.00    12.00        21.00     28.67     40.67      21.67     20.86
All M_6->1             0.00     8.67         17.00    24.67      38.00       19.00     17.62
All M_7->1             0.00     0.00         6.33     12.67      30.00       13.00     11.27
All M_8->1             0.00     0.00          0.33      8.00       22.00         8.33     5.94
All M_9->1             0.00     6.00          13.67   20.67      32.00        15.67     14.00
-----------------------------------------------------------
All Theta_1         0.02642 0.02842 0.02979 0.03135 0.03789 0.03052 0.03137
All Theta_2         0.00420 0.00640 0.00737 0.00834 0.01027 0.00751 0.00769
All Theta_3         0.00100 0.00227 0.00317 0.00394 0.00527 0.00324 0.00316
All M_2->1            78.00    92.67      102.33   110.67     130.67   103.67     104.26
All M_3->1            10.00    21.33      29.67      36.67       48.00      30.33      29.62
All M_1->2           843.33   907.33    952.33   976.00    1000.00   928.33   919.42
All M_3->2           117.33   143.33     158.33    172.67   201.33    157.67    153.72
All M_1->3           860.67   947.33     981.00    994.67    1000.00   953.67    937.90
All M_2->3           460.00   534.00      565.00   623.33    736.00     587.00    591.34

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To view this discussion on the web visit https://groups.google.com/d/msg/migrate-support/-/_U_GKB-t54YJ.
To post to this group, send email to migrate...@googlegroups.com.
To unsubscribe from this group, send email to migrate-suppo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/migrate-support?hl=en.

Sajed

unread,

Aug 11, 2012, 4:04:34 PM8/11/12

to migrate...@googlegroups.com

Thank you very much, Peter.

Peter Beerli

unread,

Aug 16, 2012, 10:37:22 AM8/16/12

to migrate...@googlegroups.com

Upper bounds of priors for microsatellites:

Assume a sequence data set where we consider Theta=4 Ne mu with mu per site and generation

with a mutation rate of 10^-8 a Theta of 0.001 lead to an Ne of 25000 and a Theta=0.1 to 2.5 million

For many species we may encounter population sizes from a few hundred to millions.

Considering micro satellites data we also show theta=4 Ne mu but now mu is measured per locus and generation, obviously we do not really know how big the locus really is but there are papers that suggest that mirosatellites in humans have a mu between 10^-2 to 10^-8 looking at theta for the range of population sizes Ne

Ne_small=100 mu_msat=10^-4 ----> theta = 4 100 10^-4 =0.04

Ne_large=10^6 mu_msat=10^-4 ----> theta = 4 10^6 10^-4 =400

The range of values for msats is between say 0 and 400 in my little example, I believe that a population size of a million is huge and most populations are much much smaller.

When choosing uniform priors with boundaries a and b you want to pick these boundaries so that almost all of the 'true' posterior distribution is enveloped by the prior, so without any knowledge of the size of your population you would want to pick a wide prior from example a=0 and b=400, if you know that your populations are small then

perhaps use an upper bound of 40 or 4.

Why do we pick an upper bound at all? Well, proposing values from a prior that has say a range from 0 10^6 but the 99.999% of all probability mass is below 10, then the MCMC algorithm will sample many situations that are not very important for the final results with values larger than 10 leading to slow convergence of the run, thus leading to unreliable results. Picking priors and boundaries is important, I often choose 100 for microsats because I have seen little evidence to try higher values , but I have seem estimates where the maximum posterior values is >50.

How do we choose the upper bound? if the upper bound is too low, for example you use 0.1 instead of 100 for msats, that means that the data will push the posterior towards that upper boundary and a posterior may look like this:

| X

| XXX

| XXXXXX

| XXXXXXXXX

+-------------------------|

0 upper prior bound

it would be better it looked liked this

| X

| XXX

| XXXXXX

| XXXXXXXXX

+-------------------------|

0 upper prior bound

In migrate I have seen users adding boundaries like this 0 to 100000000

this leads to posteriors that look like these

|X

|XX

|XXX

|XXXX

+-------------------------|

0 upper prior bound

this suggests that the prior is too wide, in principle if you run long enough this should still work but you may need to wait very very long.

hope this helps,

Peter

On Aug 14, 2012, at 3:01 PM, Dylan Short <dylanp...@gmail.com> wrote:

Peter,

Could you briefly explain why the upper bounds should be 100 for both Theta and M when using microsatellite data? Is it because of the higher mutation rates associated with microsatellite repeat numbers compared to nucleotide substiutions? Also, is there a way to estimate the appropriate prior bounds for a specific dataset?

Thanks,

Dylan

To view this discussion on the web visit https://groups.google.com/d/msg/migrate-support/-/823ofkrMEkEJ.

Darius Danusevicius

unread,

Apr 17, 2023, 12:37:38 PM4/17/23

to migrate-support

Dear All,

I have also met the same problem of setting appropriate priors for Theta on SSR dat set.

My experience on this is as follows:

- I use exp distribution (suits better for highly variable SSRs)

- I fix MEAN to say 0.01 and vary MAX as 1, 5 10, 20, 50, 100

- At certain MAX say >10, the maxima of the posterrior distribution becomes unreasonably flat, i.e. for prediction does not matter which Ne 50 or 100 is the most likely.

- I decide that MAX > 10 are not optimal for my data.

- For some datasets, the runs with fixed MEAN and increased MAX return similar Theta; This I used as an indication that my MEAN setting is OK.

- The I use this OK MEAN with MAX 1 or 5, which gave my higher ESS value to run my full data set.

- The tricky choice here is the choice of MEAN, because if I increase the MEAN from say 0.01 to 0.1 and repeat the runs with variable MAX, result is similar by thetas 10 fold higher.

May be Peter can comment on the choice of MEAN prior?

Darius

Peter Beerli

unread,

Apr 17, 2023, 12:50:23 PM4/17/23

to migrate...@googlegroups.com

Darius,

I have the impression that you overthink all this.

A prior is what it is a prior guess.

If the priors has a huge influence on your results then your data is most liked not very variable.

For microsatellites, a priors mean under 1.0 seem rather low, I suggest that you use an

exponential prior with bounds 0 and 50 and a mean of 1.0,

if all the values for theta are lower than, say, 5 then I would reduce the upper bound to perhaps 10 or 20,

with very low means you force a strong prior on the data and given that the posterior is a scaled likelihood*prior, the estimated will be pulled towards lower values, with a mean more to the middle of the prior distribution (e.g. low=0 - mean=1 - upper=50) there is less pull to the zero value,

if you think that the prior with values such as 0,1,50 are still pulling to strong, then set 0, 10,50, an exp prior with such specification starts t to look more

like a uniform prior.

It also may help to look at the attached texts.

Peter

beerli_2008.pdf

beerli-et-al-2019-currentprotocols.pdf

Darius Danusevicius

unread,

Apr 17, 2023, 1:24:09 PM4/17/23

to migrate-support

Thanks Peter, The SSRs are variable 18 loci with high He. My willingness to have lower MEAN for priors is that I believe that the historical Ne (by coalescence) has to be lower that the N census.

In my case, the N census is 150 indv. So I set MAX prior to be Ne <150.

I am I wrong?

I know from parentage analysis that these 150 individuals are related to each other and structured in ca. 100 hsib fams.

Darius

Peter Beerli

unread,

Apr 17, 2023, 1:44:24 PM4/17/23

to migrate...@googlegroups.com

Darius,

the key question will be ‘what is the mutation rate for your SSR?’ if it is 10^-4 then your 150 individuals lead to a small Theta=10^-4 * 4 * 150

but given that the coalescent is an approximation assuming that the sample size is much smaller than the actual Ne and that the sample is collected randomly from the population a lot of things can go wrong, and using your 150 as a max for the prior will be problematic, I would pick a max 10x or 100x higher than that. Given that your sample is closely related to each other, this will also reduce the estimated population size and reduce Theta.

Standard Heterozygosity estimators may lead to an overestimate because it treats each allele as independent (infinite allele model) instead of assuming a relationship among the alleles, in a Brownian motion mutation model (or stepwise mutation model) an 11repeat and a 12repeat are more closely related than an 11repeat to a 20repeat. Your data may be a marginal use-case for the coalescent.

Peter

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/2e4fc509-c844-4492-91d9-e0b04e78e9een%40googlegroups.com.

Darius Danusevicius

unread,

Apr 17, 2023, 2:11:55 PM4/17/23

to migrate-support

Yes, I use the the mutation rate for SSRs Theta=10^-4 * 4 . But if I seek for historical Ne for a pop of say wolfs, which are sampled with the amount of the sample reaching of 1/2 census number in a region. Further, I know that historically the pop was reduced down to 50 individuals, some 70 years ago. Then I could expect the historical Ne vary below may sample size of say 500. That is why I try to push the prior down...

eg below is result from a modeling run with pop of wolfs = 120, loci = 18, He is high:

MIN MEAN MAX ESS Ratio Theta Ne
0 0.01 2 150 0.00001 0.0860 22
0 0.01 5 86 0.00001 0.1318 33
0 0.01 10 82 0.00001 0.0985 25
0 0.01 15 69 0.00001 0.1050 26

Theta stays at about Ne = 25, thoug MAX varies much ..., However, as you say it is forces by MEAN of 0.01

Thanks a lot . I will read the suggested papers.

Darius

Peter Beerli

unread,

Apr 17, 2023, 3:09:25 PM4/17/23

to migrate...@googlegroups.com

Darius,

The bottleneck down to 50 will have a big impact on the migrate-n theta estimates, so 25 seems quite reasonable to me, I wonder about the assumed sex ratio (you know how many males sired offspring in your pedigree, perhaps that would also help to explain lower numbers (e.g. few males mate and dominate the offspring 'production’)

Peter

To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/4b747ccb-2591-4e25-a915-093a7725979en%40googlegroups.com.

Darius Danusevicius

unread,

Apr 17, 2023, 11:49:41 PM4/17/23

to migrate-support

Dear Peter Thanks for your time

Yes I know the sex ratio (by the Melanin marker) and it is about 50/50 in the sample of 150. Other common gen div parameters are OK He ca 0.8, Fis 0.05, the pop recovered because of migrants from the east. That is why we were interested in MIGRATE_N to estimate namely the historic Ne by coalescence. The settings above gave me the reasonable estimate of ca, 25-40 no matter how I have varied the MAX prior. However, I am not certain how I could defend my prior MEAN of 0.01?

Peter Beerli

unread,

Apr 17, 2023, 11:54:17 PM4/17/23

to migrate...@googlegroups.com

show us to 2.5% and 97.5% percentiles when run with the prior with mean 0.01 and also for 1.0 using your same Max prior bounds,

say pick 10

Peter

To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/2a1ceb4a-1268-4aee-90c6-abbad7a960adn%40googlegroups.com.

Darius Danusevicius

unread,

Apr 18, 2023, 6:08:50 AM4/18/23

to migrate-support

Peter, Here they are

MIN/MEAN/MAX= 0/ 0.01/ 10

Locus Parameter 2.5% 25.0% mode 75.0% 97.5% median mean

1 Theta_1 0.00000 0.05333 0.13667 0.21333 0.33333 0.16333 0.13502

2 Theta_1 0.00000 0.04000 0.11667 0.19333 0.31333 0.15667 0.11798

3 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32000 0.16333 0.13085

4 Theta_1 0.00000 0.05333 0.13667 0.21333 0.33333 0.16333 0.13835

5 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32667 0.16333 0.13038

6 Theta_1 0.00000 0.03333 0.11667 0.18667 0.30667 0.15000 0.11394

7 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32000 0.16333 0.13110

8 Theta_1 0.00000 0.04000 0.12333 0.19333 0.31333 0.15667 0.12197

9 Theta_1 0.00000 0.06000 0.15000 0.22667 0.34000 0.17667 0.14598

10 Theta_1 0.00000 0.06000 0.15000 0.22667 0.34000 0.17667 0.14659

11 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32667 0.16333 0.12776

12 Theta_1 0.00000 0.04000 0.12333 0.19333 0.32000 0.15667 0.12318

13 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32667 0.16333 0.12954

14 Theta_1 0.00000 0.04667 0.13000 0.20667 0.32667 0.16333 0.13406

15 Theta_1 0.00000 0.04000 0.11667 0.19333 0.31333 0.15667 0.11866

16 Theta_1 0.00000 0.04667 0.12333 0.20000 0.32000 0.15667 0.12338

17 Theta_1 0.00000 0.04000 0.12333 0.19333 0.32000 0.15667 0.12128

18 Theta_1 0.00000 0.04000 0.12333 0.19333 0.31333 0.15667 0.12286

All Theta_1 0.00000 0.02000 0.09667 0.16667 0.29333 0.14333 0.09846

MIN/MEAN/MAX = 0 /1 /10

Locus Parameter 2.5% 25.0% mode 75.0% 97.5% median mean

1 Theta_1 2.36000 2.47333 3.37000 4.48667 6.27333 4.13000 4.54918

2 Theta_1 3.92000 3.92000 6.49000 7.19333 7.19333 5.85667 5.93629

3 Theta_1 3.00667 4.68000 5.97000 6.52000 7.60667 5.37667 5.31389

4 Theta_1 2.78000 3.20000 3.71000 4.69333 6.78667 4.73000 4.97432

5 Theta_1 8.53333 8.58667 8.77667 8.93333 8.99333 6.62333 6.60202

6 Theta_1 4.35333 5.19333 6.35000 6.96667 9.00667 6.45667 6.58068

7 Theta_1 3.98667 5.41333 6.19000 6.74000 7.98667 6.39667 6.73952

8 Theta_1 5.83333 8.26667 8.58333 9.06667 9.99333 8.32333 8.14078

9 Theta_1 3.41333 7.27333 7.76333 8.10000 8.46667 6.35667 6.39594

10 Theta_1 8.28667 8.90000 9.24333 9.76667 9.98667 8.03667 8.03270

11 Theta_1 4.35333 6.10667 6.44333 7.25333 8.84667 6.78333 6.78160

12 Theta_1 5.30000 7.48667 8.67667 8.92667 9.03333 7.85000 7.65778

13 Theta_1 6.40667 8.15333 8.32333 8.52000 9.99333 8.25000 8.23331

14 Theta_1 3.62000 4.53333 5.25000 5.90000 7.90000 5.51667 5.72303

15 Theta_1 3.84667 5.36000 6.17667 6.97333 8.02667 6.35000 6.59572

16 Theta_1 3.11333 5.00667 6.46333 6.88000 8.39333 6.00333 5.90537

17 Theta_1 5.04667 5.56000 6.06333 6.74000 8.45333 7.07000 7.15732

18 Theta_1 8.26667 9.67333 9.85000 9.99333 9.99333 9.27000 9.24944

All Theta_1 5.76000 5.88000 5.97667 6.06667 6.18000 5.98333 5.97660

Peter Beerli

unread,

Apr 18, 2023, 7:30:49 AM4/18/23

to migrate...@googlegroups.com

Darius,

without any detailed knowledge of your data, I would prefer the run with the Thetas around 7 and not 0.1

because the 2.5% percentile is not zero, but I would like to see the complete outfiles (pdf and text files) for

these two runs. I also assume that the mutation rate is not really known and estimated, but looking at the different loci there is variation that has at least two sources: 1. MCMC error, 2. different mutation rates among loci.

Please send me the outfile* for the two runs (bee...@fsu.edu)

Peter

To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/38f185c6-d331-49eb-bf67-d695d5d2b0bfn%40googlegroups.com.

Darius Danusevicius

unread,

Apr 18, 2023, 9:14:56 AM4/18/23

to migrate-support

7 seems high for MEAN prior, may be you mean MAX prior of 7. The outputs are attached.

outfile_0_1_10

outfile0_1_10.pdf

outfile_0_0.01_10

outfile_0_0.01_10.pdf

Darius Danusevicius

unread,

Apr 19, 2023, 2:34:39 AM4/19/23

to migrate-support

Dear Peter,

Have you got the output files? Why there could be the MCMC error?

Darius

Peter Beerli

unread,

May 3, 2023, 3:33:05 PM5/3/23

to migrate...@googlegroups.com

Darius,

The word error was slang for variability introduced by MCMC that run too short, data introduces variability because of sampling. I responded to you about the output files in a separate email.

Peter

To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/0f17556b-b974-4e70-acad-4915bc2e8b65n%40googlegroups.com.

Reply all

Reply to author

Forward