Re: [migrate-support] Data Interpretation of the migrate-n analysis

1,773 views
Skip to first unread message

Peter Beerli

unread,
Aug 11, 2012, 12:53:04 PM8/11/12
to migrate...@googlegroups.com
You will need to rerun and adjust the prior bounds for micro satellites data, it looks like your upper bound for Theta is 0.1 (appropriate for sequence data) change the upper bound to 100 for both Theta and M

Peter




On Aug 11, 2012, at 12:35 PM, Sajed <m.s...@gmail.com> wrote:

Dear migrate-n users,

I am new on the migrate-n and not so a mathematics person. I have just a very basic question about the data interpretation.  I analyzed 9 sub-populations of plant species using 10 microsatellite loci within Island. I was trying to measure historical/long-term gene flow between populations. The strategy I chose was Bayesian interfere using Microsatellite data [Brownian motion].  So far I understand that mean value from Posterior distribution table should be considered as migration rate/gene between populations such as migration from sub-pop 2 to sub-pop 1 is 44.50. My questions what is the highest value and lowest value of the migration rate? I found some article where, for example the value 35.00 is considered as very low gene flow. Does it relative and based on the population size?  How do I interpret the gene flow rate based on the populations size? Is there anyrealtion between migration rate and theta.  In my another analysis of 3large populations (among islands) using the same species and same microsatellites, I found very high mean value from Posterior distribution , for example between island 1 to island 3, this mean value is 937.90. The sections of my output results are pasted below. Could anyone help explain the data interpretation about the high or low gene flow from these measures.


Locus Parameter        2.5%      25.0%    mode     75.0%   97.5%     median   mean
  All  Theta_1         0.05010  0.08119  0.08336  0.09039  0.10000  0.07922  0.07644
  All  Theta_2         0.08746  0.09373  0.09583  0.09746  0.10000  0.09496  0.09386
  All  Theta_3         0.00000  0.00000  0.00003  0.00047  0.00140  0.00050  0.00406
  All  Theta_4         0.06905  0.08232  0.08376  0.08579  0.10000  0.08642  0.08485
  All  Theta_5         0.06258  0.08272  0.08436  0.09526  0.09920  0.08482  0.08217
  All  Theta_6         0.05704  0.08099  0.08509  0.09320  0.10000  0.08202  0.07796
  All  Theta_7         0.00514  0.00947  0.01104  0.01281  0.01855  0.01097  0.01149
  All  Theta_8         0.03456  0.04223  0.04506  0.04903  0.06711  0.05067  0.05555
  All  Theta_9         0.00300  0.00894  0.01051  0.01241  0.02008  0.01084  0.01147
  All  M_2->1            14.67    33.33      44.33      54.67      73.33      45.00      44.50
  All  M_3->1             0.00      6.67        15.00     22.00      34.67      17.00      15.15
  All  M_4->1             0.00     0.00         4.33       18.67       67.33      19.00     21.25
  All  M_5->1             0.00    12.00        21.00     28.67       40.67      21.67     20.86
  All  M_6->1             0.00     8.67         17.00     24.67      38.00       19.00     17.62
  All  M_7->1             0.00     0.00          6.33     12.67       30.00       13.00     11.27
  All  M_8->1             0.00     0.00          0.33      8.00        22.00         8.33     5.94
  All  M_9->1             0.00     6.00          13.67   20.67      32.00        15.67     14.00

-----------------------------------------------------------

  All  Theta_1         0.02642  0.02842  0.02979  0.03135  0.03789  0.03052  0.03137
  All  Theta_2         0.00420  0.00640  0.00737  0.00834  0.01027  0.00751  0.00769
  All  Theta_3         0.00100  0.00227  0.00317  0.00394  0.00527  0.00324  0.00316
  All  M_2->1            78.00    92.67      102.33   110.67     130.67   103.67     104.26
  All  M_3->1            10.00    21.33      29.67      36.67       48.00      30.33      29.62
  All  M_1->2           843.33   907.33     952.33    976.00    1000.00   928.33   919.42
  All  M_3->2           117.33   143.33     158.33    172.67    201.33    157.67    153.72
  All  M_1->3           860.67   947.33     981.00    994.67    1000.00   953.67    937.90
  All  M_2->3           460.00   534.00      565.00   623.33    736.00     587.00    591.34
 

 

--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To view this discussion on the web visit https://groups.google.com/d/msg/migrate-support/-/_U_GKB-t54YJ.
To post to this group, send email to migrate...@googlegroups.com.
To unsubscribe from this group, send email to migrate-suppo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/migrate-support?hl=en.

Sajed

unread,
Aug 11, 2012, 4:04:34 PM8/11/12
to migrate...@googlegroups.com
Thank you very much, Peter.

Peter Beerli

unread,
Aug 16, 2012, 10:37:22 AM8/16/12
to migrate...@googlegroups.com
Upper bounds of priors for microsatellites:

Assume a sequence data set where we consider Theta=4 Ne mu with mu per site and generation
with a mutation rate of 10^-8 a Theta of 0.001 lead to an Ne of 25000 and a Theta=0.1 to 2.5 million
For many species we may encounter population sizes from a few hundred to millions.
Considering micro satellites data we also show theta=4 Ne mu but now mu is measured per locus and generation, obviously we do not really know how big the locus really is but there are papers that suggest that mirosatellites in humans have a mu between 10^-2 to 10^-8 looking at theta for the range of population sizes Ne
Ne_small=100 mu_msat=10^-4 ----> theta = 4 100 10^-4 =0.04
Ne_large=10^6  mu_msat=10^-4 ----> theta = 4 10^6 10^-4 =400

The range of values for msats is between say 0 and 400 in my little example, I believe that a population size of a million is huge and most populations are much much smaller. 

When choosing uniform priors with boundaries a and b you want to pick these boundaries so that almost all of the  'true' posterior distribution is enveloped by the prior, so without any knowledge of the size of your population you would want to pick a wide prior from example a=0 and b=400, if you know that your populations are small then
perhaps use an upper bound of 40 or 4.

Why do we pick an upper bound at all? Well, proposing values from a prior that has say a range from 0 10^6 but the 99.999% of all probability mass is below 10, then the MCMC algorithm will sample many situations that are not very important for the final results with values larger than 10 leading to slow convergence of the run, thus leading to unreliable results. Picking priors and boundaries is important, I often choose 100 for microsats because I have seen little evidence to try higher values , but I have seem estimates where the maximum posterior values is >50. 

How do we choose the upper bound? if the upper bound is too low, for example you use 0.1 instead of 100 for msats, that means that the data will push the posterior towards that upper boundary and a posterior may look like this:

|                                X
|                          XXX
|                   XXXXXX
|            XXXXXXXXX
+-------------------------|
0                                upper prior bound

it would be better it looked liked this

|                    X
|                 XXX
|            XXXXXX
|       XXXXXXXXX
+-------------------------|
0                                upper prior bound

In migrate I have seen users adding boundaries like this 0 to 100000000
this leads to posteriors that look like these

|X
|XX
|XXX
|XXXX
+-------------------------|
0                                upper prior bound

this suggests that the prior is too wide, in principle if you run long enough this should still work but you may need to wait very very long. 

hope this helps,
Peter





On Aug 14, 2012, at 3:01 PM, Dylan Short <dylanp...@gmail.com> wrote:


Peter, 

Could you briefly explain why the upper bounds should be 100 for both Theta and M when using microsatellite data? Is it because of the higher mutation rates associated with microsatellite repeat numbers compared to nucleotide substiutions?  Also, is there a way to estimate the appropriate prior bounds for a specific dataset?

Thanks,

Dylan
To view this discussion on the web visit https://groups.google.com/d/msg/migrate-support/-/823ofkrMEkEJ.

Darius Danusevicius

unread,
Apr 17, 2023, 12:37:38 PM4/17/23
to migrate-support
Dear All,

I have also met the same problem of setting appropriate priors for Theta on SSR dat set.

My experience on this is as follows:
- I use exp distribution (suits better for highly variable SSRs)
- I fix MEAN to say 0.01 and vary MAX as 1, 5 10, 20, 50, 100
- At certain MAX say >10, the maxima of the posterrior distribution becomes unreasonably flat, i.e. for prediction does not matter which Ne 50 or 100 is the most likely. 
- I decide that MAX > 10 are not optimal for my data.
- For some datasets, the runs with fixed MEAN and increased MAX return similar Theta; This I used as an indication that my MEAN setting is OK.
- The I use this OK MEAN with MAX 1 or 5, which gave my higher ESS value to run my full data set.
- The tricky choice here is the choice of MEAN, because if I increase the MEAN from say 0.01 to 0.1 and repeat the runs with variable MAX, result is similar by thetas 10 fold higher.
May be Peter can comment on the choice of MEAN prior?

Darius 

Peter Beerli

unread,
Apr 17, 2023, 12:50:23 PM4/17/23
to migrate...@googlegroups.com
Darius,
I have the impression that you overthink all this.
A prior  is what it is a prior guess. 

If the priors has a huge influence on your results then your data is most liked not very variable.
For microsatellites, a priors mean under 1.0 seem rather low, I suggest that you use an
exponential prior with bounds 0 and 50 and a mean of 1.0, 
if all the values for theta are lower than, say, 5 then I would reduce the upper bound to perhaps 10 or 20,
with very low means you force a strong prior on the data and given that the posterior is a scaled  likelihood*prior, the estimated will be pulled towards lower values, with a mean more to the middle of the prior distribution (e.g. low=0 - mean=1 - upper=50) there is less pull to the zero value,
if you think that the prior with values such as 0,1,50 are still pulling to strong, then set 0, 10,50, an exp prior with such specification starts t to look more 
like a uniform prior.

It also may help to look at the attached texts.

Peter

beerli_2008.pdf
beerli-et-al-2019-currentprotocols.pdf

Darius Danusevicius

unread,
Apr 17, 2023, 1:24:09 PM4/17/23
to migrate-support
Thanks Peter, The SSRs are variable 18 loci with high He. My willingness to have lower MEAN for priors is that I believe that the historical Ne (by coalescence) has to be lower that the N census.  

In my case, the N census is 150 indv. So I set MAX prior to be Ne <150.

I am I wrong?   

I know from parentage analysis that these 150 individuals are related to each other and structured in ca. 100 hsib fams. 

Darius

Peter Beerli

unread,
Apr 17, 2023, 1:44:24 PM4/17/23
to migrate...@googlegroups.com
Darius,
the key question will be ‘what is the mutation rate for your SSR?’ if it is 10^-4 then your 150 individuals lead to a small Theta=10^-4 * 4 * 150
but given that the coalescent is an approximation assuming that the sample size is much smaller than the actual Ne and that the sample is collected randomly from the population a lot of things can go wrong, and using your 150 as a max for the prior will be problematic, I would pick a max 10x or 100x higher than that. Given that your sample is closely related to each other, this will also reduce the estimated population size and reduce Theta.
Standard Heterozygosity estimators may lead to an overestimate because it treats each allele as independent (infinite allele model) instead of assuming a relationship among the alleles, in a Brownian motion mutation model (or stepwise mutation model) an 11repeat and a 12repeat are more closely related than an 11repeat to a 20repeat. Your data may be a marginal use-case for the coalescent.

Peter



--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/migrate-support/2e4fc509-c844-4492-91d9-e0b04e78e9een%40googlegroups.com.

Darius Danusevicius

unread,
Apr 17, 2023, 2:11:55 PM4/17/23
to migrate-support
Yes, I use the the mutation rate for SSRs  Theta=10^-4 * 4 . But if I seek for historical Ne for a pop of say wolfs, which are sampled with the amount of the sample reaching of 1/2 census number in a region. Further, I know that historically the pop was reduced down to 50 individuals, some 70 years ago. Then I could expect the historical Ne vary below may sample size of say 500. That is why I try to push the prior down... 

eg below is result from a modeling run with pop of wolfs = 120, loci = 18, He is high:

MIN MEAN MAX ESS  Ratio Theta Ne
0 0.01 2 150 0.00001 0.0860 22
0 0.01 5 86 0.00001 0.1318 33
0 0.01 10 82 0.00001 0.0985 25
0 0.01 15 69 0.00001 0.1050 26

Theta stays at about Ne = 25, thoug MAX varies much ..., However, as you say it is forces by MEAN of 0.01

Thanks a lot . I will read the suggested papers.

Darius

Peter Beerli

unread,
Apr 17, 2023, 3:09:25 PM4/17/23
to migrate...@googlegroups.com
Darius,

The bottleneck down to 50 will have a big impact on the migrate-n theta estimates, so 25 seems quite reasonable to me, I wonder about the assumed sex ratio (you know how many males sired offspring in your pedigree, perhaps that would also help to explain lower numbers (e.g. few males mate and dominate the offspring 'production’)

Peter



Darius Danusevicius

unread,
Apr 17, 2023, 11:49:41 PM4/17/23
to migrate-support
Dear Peter Thanks for your time 
Yes I know the sex ratio (by the Melanin marker) and it is about 50/50 in the sample of 150. Other common gen div parameters are OK He ca 0.8, Fis 0.05, the pop recovered because of migrants from the east. That is why we were interested in MIGRATE_N to estimate namely the historic Ne by coalescence. The settings above gave me the reasonable estimate of ca, 25-40 no matter how I have varied the MAX prior. However, I am not certain how I could defend my prior MEAN of 0.01?

Peter Beerli

unread,
Apr 17, 2023, 11:54:17 PM4/17/23
to migrate...@googlegroups.com
show us to 2.5% and 97.5% percentiles when run with the prior with mean 0.01 and also for  1.0 using your same Max prior bounds, 
say pick 10
Peter
 

Darius Danusevicius

unread,
Apr 18, 2023, 6:08:50 AM4/18/23
to migrate-support
Peter, Here they are

MIN/MEAN/MAX= 0/ 0.01/ 10

Locus Parameter        2.5%      25.0%    mode     75.0%   97.5%     median   mean

    1  Theta_1         0.00000  0.05333  0.13667  0.21333  0.33333  0.16333  0.13502

    2  Theta_1         0.00000  0.04000  0.11667  0.19333  0.31333  0.15667  0.11798

    3  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32000  0.16333  0.13085

    4  Theta_1         0.00000  0.05333  0.13667  0.21333  0.33333  0.16333  0.13835

    5  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32667  0.16333  0.13038

    6  Theta_1         0.00000  0.03333  0.11667  0.18667  0.30667  0.15000  0.11394

    7  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32000  0.16333  0.13110

    8  Theta_1         0.00000  0.04000  0.12333  0.19333  0.31333  0.15667  0.12197

    9  Theta_1         0.00000  0.06000  0.15000  0.22667  0.34000  0.17667  0.14598

   10  Theta_1         0.00000  0.06000  0.15000  0.22667  0.34000  0.17667  0.14659

   11  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32667  0.16333  0.12776

   12  Theta_1         0.00000  0.04000  0.12333  0.19333  0.32000  0.15667  0.12318

   13  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32667  0.16333  0.12954

   14  Theta_1         0.00000  0.04667  0.13000  0.20667  0.32667  0.16333  0.13406

   15  Theta_1         0.00000  0.04000  0.11667  0.19333  0.31333  0.15667  0.11866

   16  Theta_1         0.00000  0.04667  0.12333  0.20000  0.32000  0.15667  0.12338

   17  Theta_1         0.00000  0.04000  0.12333  0.19333  0.32000  0.15667  0.12128

   18  Theta_1         0.00000  0.04000  0.12333  0.19333  0.31333  0.15667  0.12286

  All  Theta_1         0.00000  0.02000  0.09667  0.16667  0.29333  0.14333  0.09846

 

MIN/MEAN/MAX = 0 /1 /10

Locus Parameter        2.5%      25.0%    mode     75.0%   97.5%     median   mean

    1  Theta_1         2.36000  2.47333  3.37000  4.48667  6.27333  4.13000  4.54918

    2  Theta_1         3.92000  3.92000  6.49000  7.19333  7.19333  5.85667  5.93629

    3  Theta_1         3.00667  4.68000  5.97000  6.52000  7.60667  5.37667  5.31389

    4  Theta_1         2.78000  3.20000  3.71000  4.69333  6.78667  4.73000  4.97432

    5  Theta_1         8.53333  8.58667  8.77667  8.93333  8.99333  6.62333  6.60202

    6  Theta_1         4.35333  5.19333  6.35000  6.96667  9.00667  6.45667  6.58068

    7  Theta_1         3.98667  5.41333  6.19000  6.74000  7.98667  6.39667  6.73952

    8  Theta_1         5.83333  8.26667  8.58333  9.06667  9.99333  8.32333  8.14078

    9  Theta_1         3.41333  7.27333  7.76333  8.10000  8.46667  6.35667  6.39594

   10  Theta_1         8.28667  8.90000  9.24333  9.76667  9.98667  8.03667  8.03270

   11  Theta_1         4.35333  6.10667  6.44333  7.25333  8.84667  6.78333  6.78160

   12  Theta_1         5.30000  7.48667  8.67667  8.92667  9.03333  7.85000  7.65778

   13  Theta_1         6.40667  8.15333  8.32333  8.52000  9.99333  8.25000  8.23331

   14  Theta_1         3.62000  4.53333  5.25000  5.90000  7.90000  5.51667  5.72303

   15  Theta_1         3.84667  5.36000  6.17667  6.97333  8.02667  6.35000  6.59572

   16  Theta_1         3.11333  5.00667  6.46333  6.88000  8.39333  6.00333  5.90537

   17  Theta_1         5.04667  5.56000  6.06333  6.74000  8.45333  7.07000  7.15732

   18  Theta_1         8.26667  9.67333  9.85000  9.99333  9.99333  9.27000  9.24944

  All  Theta_1         5.76000  5.88000  5.97667  6.06667  6.18000  5.98333  5.97660 

Peter Beerli

unread,
Apr 18, 2023, 7:30:49 AM4/18/23
to migrate...@googlegroups.com
Darius,
without any detailed knowledge of your data, I would prefer the run with the Thetas around 7 and not 0.1
because the 2.5% percentile is not zero, but I would like to see the complete outfiles (pdf and text files) for 
these two runs. I also assume that the mutation rate is not really known and estimated, but looking at the different loci there is variation that has at least two sources: 1. MCMC error, 2. different mutation rates among loci.

Please send me the outfile* for the two runs (bee...@fsu.edu)

Peter


Darius Danusevicius

unread,
Apr 18, 2023, 9:14:56 AM4/18/23
to migrate-support
7 seems high for MEAN prior, may be you mean MAX prior of 7. The outputs are attached. 
outfile_0_1_10
outfile0_1_10.pdf
outfile_0_0.01_10
outfile_0_0.01_10.pdf

Darius Danusevicius

unread,
Apr 19, 2023, 2:34:39 AM4/19/23
to migrate-support
Dear Peter,

Have you got the output files? Why there could be the MCMC error?

Darius 

Peter Beerli

unread,
May 3, 2023, 3:33:05 PM5/3/23
to migrate...@googlegroups.com
Darius,

The word error was slang for variability introduced by MCMC that run too short, data introduces variability because of sampling. I responded to you about the output files in a separate email.

Peter


Reply all
Reply to author
Forward
0 new messages