General run questions

317 views
Skip to first unread message

Hazel Perry

unread,
May 19, 2015, 12:55:46 PM5/19/15
to migrate...@googlegroups.com
I am very new to migrate (and Bayesian/mcmc analysis) but trying to learn very quickly as my time on this project is rapidly running out.  I am looking at a range of locations and want to compare mainly migration between two sites within each location based on two sets of loci (all SNPs).  I am expecting two things one, that migration will be mostly in on direction between my sites and two the migration rates will show a similar pattern to Fst values between the two sites with the different sets of loci.  I am using the bayesian estimation but have a few questions about some of the settings.  At the moment I have only run the analysis with little deviation from the defaults.

1) With one of my data sets I get some of my SNPs saying that the transversion/transition rate of 2.0 is impossible with that allele frequency.  For those where this occurs the allele frequency reported in the output files is what it is in the acutal data for that SNP (only two alleles have anything above 0) but for most of my SNPs the allele frequency is being set to 0.25 for all alleles.  In the datatype option I said yes to use empirical base frequencies and thought that this meant it would take the base frequencies from the data, but this isn't the case for all of my alleles.  What is the best way to do this?  Would I be better calculating the allele frequencies for each allele from the total data set (all the data going into that run) and manually enter that in?  Also for the transversion/transition rate would this be better if I calculated it or is 2 reasonable for SNP data?

2) Using heating?  As I said I am very very new to all of this but was wondering what the advantages/disadvantages of using the heating step are?  My system is very simple (19-30 SNPs per analysis with only two populations) so do I need to include heating?  Most of the recently published papers I can find which use have used heating, so I would be leaning towards yes, especially if the only main disadvantage is time. 

3) When setting the priors I have found a paper on the same species as mine and even some of the same populations, although using msats not SNPs.  They used the ML estimation in Migrate to get an idea of migration and population size.  To me it would therefore seem sensible to use these values to help set my priors.  They report a theta (4NeU) of between 0.3 and 2.5 and migration rates (Nem) of 0.1 and 7.  In my first few runs (with the default values) I was getting error messages saying the upper bound for all of my priors was too low, but when I increased it (even to silly numbers) I still get this message.  I am now running it with priors of (min, max, delta) of 0, 3, 0.3 for theta and 0, 70, 7 for M (based on the previous paper).  Does this sound reasonable or should my numbers be vastly different?

4) How do I tell when the I have reached convergence?  I am aware that I can use the output to determine if my runs have reached convergence but not completely sure what I should be looking for.  I am right in thinking the histograms should show a normal distribution?  What sort of acceptance ratios should I be getting, mine are in the range of 0.3-0.5 which seems ok based on other posts I've read on here. 

I have attached the pdf from the run I did with the above priors (everything else was default).  One thing I have noted from this is that when I ran this using M as the gene flow parameter migration from 2->1 was a lot higher than from 1->2 (which is what I am expecting from these sites), but this time when I set xNm, the difference is reversed, which is a bit concerning.  If anyone has any suggestions on the best way to improve this or anything to be wary of it would be much appreciated! 
outfile1.pdf

Peter Beerli

unread,
Jul 8, 2015, 5:45:58 PM7/8/15
to migrate...@googlegroups.com
Hazel,
since nobody else picked up,

1. The base frequency from data is a problem because for snp loci with only GC the mutation model will have problems and so some adjustments are tried, but sometimes this fails. it would be better I guess to use some empirical base frequency estimated from sequence data instead just of your snp (the most simple solution is certainly to set everything to 0.25)

2. Use heating, this also allows you to compare different population models (use the standard 4 chains with interval 1 and choose the heating scheme #).

3. msat has a very different mutation rate than snps, you would need to adjust for that, in addition snp data is rather poor because each snp locus does not deliver much information as a result, you will probably have always this boundary message (I suggest to ignore it, it is a hint to explore not an error — you would also see the effect in your histograms that would peak at the upper bound, that would be an indicator to increase the upper bound. In general, the upper bound should be ’tight’ because with lunatic large values you spend much time proposing values that are no good and therefore your convergence rate will be slow leading to poor results.

4. The manual has a section about that (the acceptance ratio is a poor indicator of convergence, simply saying that when, say the acceptance is 0.0001 you certainly should run for longer values around 0.2 to 0.5 are considered to be good, although for some dataset it will be difficult to get that high acceptances (just run longer). There is also a book chapter that may help: PDF.

5. Look at the tutorial how to compare population models (on the migrate website), that may help to decide on the direction.

Peter


--
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/d/optout.
<outfile1.pdf>

Vanessa Robitzch

unread,
Sep 27, 2018, 11:03:03 AM9/27/18
to migrate-support
Oh, and here is the warning message: 

WARNING: This transition/transversion ratio (2.000000)

WARNING: is impossible with these base frequencies (0.000000, 0.590909, 0.000000, 0.409091)!

WARNING:  Transition/transversion parameter reset

WARNING:   so transition/transversion ratio is 100961847194.630295

Vanessa Robitzch

unread,
Sep 27, 2018, 11:03:03 AM9/27/18
to migrate-support
Dear Peter, 
I have scouted the group for advice and tried running the model 3 times now but it still gives me the same error messages. 

I am working with data of a bit over 1000 SNPs (and 4 to 5 populations with less than 15 individuals, and this seems ok for migrate) BUT the warning I get is regarding the transition/transversion rate and the base frequencies... I set the former to 2.0 (as suggested and because I can't really enter that many values, or I guess I could?! but should I?!) and a quarter for the latter (0.25 or all "="). still here the warning:

Empirical Base Frequencies

------------------------------------------------------------

Locus     Nucleotide                        Transition/

          ------------------------------  Transversion ratio

          A       C       G       T(U)

------------------------------------------------------------

   2      0.6316  0.0000  0.3684  0.0000    379621214.70709

   1      0.4667  0.0000  0.0000  0.5333       2.00000

   3      0.0000  0.0000  0.1818  0.8182       2.00000

   4      0.3530  0.0000  0.6470  0.0000    372596253.70436

   5      0.2500  0.2500  0.2500  0.2500       2.00000

   6      0.2500  0.2500  0.2500  0.2500       2.00000

   7      0.2500  0.2500  0.2500  0.2500       2.00000

   8      0.0000  0.4500  0.5500  0.0000       2.00000

   9      0.2500  0.2500  0.2500  0.2500       2.00000



what shall I do? also is it normal that due to the warning the program does not finish? I.e., the last value I get in that table is for locus 85. and do the calculated ttratios even make sense?!

Thank you very very much for your helo and time!
And looking forward to hearing back from you soon, 

Vanessa

PS: I also attached my infile...  

Am Mittwoch, 8. Juli 2015 18:45:58 UTC-3 schrieb Peter:
infile_Conce2NnoCl.txt

Peter Beerli

unread,
Sep 27, 2018, 11:06:50 AM9/27/18
to migrate...@googlegroups.com
you still use empirical frequencies make sure that your parmfile has a line like this (search for freqs-from)

freqs-from-data=NO:0.250000,0.250000, 0.250000, 0.250000

Peter


For more options, visit https://groups.google.com/d/optout.
<infile_Conce2NnoCl.txt>

Vanessa Robitzch

unread,
Sep 27, 2018, 12:20:23 PM9/27/18
to migrate...@googlegroups.com
yayyy!!! It seems to work fine now :) 
thanks heaps! YOU GENIUS, but we all know that already ;) 

Have a great one and thanks again!

ps. I may bother you again at some point, though... 

Dr. Vanessa Robitzch, PhD 
Red Sea Research Center 
King Abdullah University for Science and Technology

Vanessa Robitzch

unread,
Sep 27, 2018, 2:10:49 PM9/27/18
to migrate...@googlegroups.com
Dear Peter, I ran it now twice and it starts working just fine but then it crashes (?!) I have attached here the parmfile and the nohup.out file... I was running it on 3 processors...

Thanks again a lot! And I hope you'll be able to find the bug since I have no WARNING nor Error message this time... :/

Dr. Vanessa Robitzch, PhD 
Red Sea Research Center 
King Abdullah University for Science and Technology

parmfile
nohup.out
Reply all
Reply to author
Forward
0 new messages