SNP analysis - different estimates of theta inferred from MIGRATE and SNAPP

761 views
Skip to first unread message

Pete Hosner

unread,
Aug 5, 2013, 1:54:44 PM8/5/13
to migrate...@googlegroups.com

Dear Peter and other MIGRATE users,

I am attempting a "large-scale" SNP using MIGRATE-N.

The dataset consists of 3500 SNPS (RAD-SEQ) from 36 individuals grouped into 7 populations (population assignment inferred using STRUCTURE; these groups correspond to mtDNA haplotype groups that are between 2–5% divergent [ND2 gene]). There is no missing data at the population level.

Biological relevance: The populations are sampled from an island archipelago, 4 populations are found on large islands with recent connectivity (hypothesized during low sea levels during last glacial maximum), 3 populations are found on small “peripheral isolate” islands, these populations differ slightly in phenotype. I wish to estimate migration rates between all geographically close populations (the islands have a near-linear arrangement), to determine:

            1. Do small island populations have small effective population sizes?

            2. Are phenotypically divergent populations linked by gene flow?

            3. If so, is gene flow to small islands unidirectional?

I am new to MIGRATE, so I've played around with settings and preliminary runs to familiarize myself with the program. I’ve used mostly default settings (BI, exponential priors), but I’ve used equal base frequencies and an empirical estimate of the ti/tv ratio (as previously suggested), and defined a migration matrix that includes only geographically relevant migration routes.

I used SNAPP to infer a population tree with the same dataset (attached, branch labels are thetas [scaled], all nodes have 1.0 posterior). The population tree is qualitatively similar to the mtDNA tree (reassuring). SNAPP estimated the thetas for each small island (populations 1, 2, and 5) an order of magnitude less that the large islands. Nucleotide diversity for populations 1, 2, and 5 are also an order of magnitude lower than other populations, so this all makes sense.

However, when I run MIGRATE (example attached, I know estimates of a couple Ms haven’t quite reached stationary), the estimates of theta for all populations (large and small islands) are similar (same order of magnitude) to one another. I know that directly comparing theta values from SNAPP and MIGRATE isn’t possible because the thetas in SNAPP are scaled differently, but I expected qualitatively more similar results.

I understand that running MIGRATE with SNP data is problematic. Do you have any ideas why estimates of theta from MIGRATE and SNAPP are so qualitatively different?

Many thanks,

-Pete


 

 

SNAPP_tree.jpg
outfile_SNP.pdf

Pete Hosner

unread,
Aug 5, 2013, 3:56:32 PM8/5/13
to migrate...@googlegroups.com
The previous message had the wrong outfile attached.

Cheers,
-Pete
outfile.pdf

Peter Beerli

unread,
Aug 7, 2013, 6:53:00 PM8/7/13
to migrate...@googlegroups.com
Dear Pete,

looking at the outfile it seems that your 'small' populations are smaller, but definitely not a magnitude smaller as SNAPP suggests,
did you look at the 'allele frequency table'? You should see also a 10x difference in number of allele if snapp is correct,

in migrate version 3.5.4 here is a hidden option that you may want to try to see whether the snp model makes a difference,
in the parmfile search for the option freqs-from-data 
it will read perhaps something like this:

freqs-from-data=YES

change this to 

freqs-from-data=YES [100]

and also change the datatype to sequence
migrate will then use your snp and assume that is embedded in a 100 bp sequence (like hour shortreads) that are invariant and use that, this may make a difference.

 I also wonder whether many of your snps are coming from the same short read, if so then this will lead to an upwards bias (I am working a paper on this)

Peter




-- 
You received this message because you are subscribed to the Google Groups "migrate-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to migrate-suppo...@googlegroups.com.
To post to this group, send email to migrate...@googlegroups.com.
Visit this group at http://groups.google.com/group/migrate-support.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
<SNAPP_tree.jpg><outfile_SNP.pdf>

Pete Hosner

unread,
Aug 9, 2013, 5:50:31 PM8/9/13
to migrate...@googlegroups.com
Peter,


looking at the outfile it seems that your 'small' populations are smaller, but definitely not a magnitude smaller as SNAPP suggests,
did you look at the 'allele frequency table'? You should see also a 10x difference in number of allele if snapp is correct,

I hadn't. I see the locus-by-locus 'allele frequency spectra,' but no summary is calculated. There is an 'average expected heterozygosity' table summarizing all loci, but pops 1, 2, and 7 all have "-nan" which I find disconcerting. Could an error in the infile cause this?
 
in migrate version 3.5.4 here is a hidden option that you may want to try to see whether the snp model makes a difference,
in the parmfile search for the option freqs-from-data 

Great! I'll give that a try.
 
 I also wonder whether many of your snps are coming from the same short read, if so then this will lead to an upwards bias (I am working a paper on this).

This dataset only includes the 1st SNP from each shortread locus, so at least I'm in the clear here.

Many thanks for your help,
-Pete

Peter Beerli

unread,
Aug 9, 2013, 6:29:28 PM8/9/13
to migrate...@googlegroups.com
Pete

-nan is always disconcerting
when you run only 100 snps do you get the same problem?

Of course there could be always funny things in the file (I will not be able to run your whole file but a shorter version should reveal the same pattern)

Peter






Robert Kraus

unread,
Aug 14, 2013, 1:49:34 AM8/14/13
to migrate...@googlegroups.com, migrate...@googlegroups.com
Dear Pete,
 
In addition to trying the things suggested by Peter, I'd add to run your analyses much longer too see if you can resolve inconsistencies. We've used 300+ SNPs in ducks and geese and learned that sometimes your run looks like it had converged but if you run it even longer results change. We've tested migrate settings for several months and finally ran the last analysis three times indepedantly to ascertain stable results. The duck paper is here: http://onlinelibrary.wiley.com/doi/10.1111/mec.12098/abstract. The goose paper is under 2nd review after revision and we expect replies from the editorial office in the next two weeks or so. If you remind me I can forward you a version soon.
 
I remember -nan in our first runs, too. I don't quite remember but I think it was connected to this empirical TI/TV ratio. But earlier you wrote that you've set that manually already. Are -nan still appearing after changing settings now?
 
Cheers,
robert
 
Gesendet: Samstag, 10. August 2013 um 00:29 Uhr
Von: "Peter Beerli" <beerli...@gmail.com>
An: migrate...@googlegroups.com
Betreff: Re: [migrate-support] SNP analysis - different estimates of theta inferred from MIGRATE and SNAPP
Reply all
Reply to author
Forward
0 new messages