Hi Eun,
Answers inline below.
Eun wrote on 11/7/18 6:51 PM:
> Hi,
>
> I would appreciate if someone can provide a clarification/explanation on
> these basic terms (fixed, variant, and polymorphic sites) on the
> population.sumstats_summary output file.
>
> I used Stacks v2.0b to analyze my ddRADseq data on 37 samples from 10
> populations. I used de novo pipeline with following parameter settings:
> M= n= 3, and m= 3. MAF and maximum observed heterozygosity were 0.01 and
> 0.5, respectively. I chose to retain loci that were present in all
> individuals (p=10 and r=1) due to small sample size. Further, I kept one
> random SNP per locus.
>
> In the population.sumstats_summary file, I got 100,809 sites (variant
> and fixed), of which only 298 were variant. The number of polymorphic
> sites varies for each population, but the greatest number is 103 sites.
>
> While I understand that I'm using conservative settings, the number of
> variant sites seem very low compared to the total number of sites (only
> ~0.3%).
>
> 1) By "sites," is Stacks referring to nucleotide positions? (A dumb
> question, but I wanted to confirm)
Yes, a site is a nucleotide position.
> 2) If a variant site is defined as "nucleotide positions that are
> polymorphic in at least one population" then a fixed site is monomorphic
> in all populations.
Yes, that is correct.
I have a difficult time imagining that so many sites
> (99.97%) passed all those filtering steps and be fixed in all ten
> populations. Can this be due to my parameter settings or just the low
> variability of the study species?
It could be due to either factor. Before you can rule out parameter
settings, you need to optimize your parameters (see Rochette 2017 if you
haven't already).
Otherwise, before you conclude a very low level of polymorphism in your
speices, you should check the basics of your analysis. What was the
depth of coverage for each individual sample, are most of your variant
sites shared across your populations or particular to one or two
individuals, or a single population?
If your analysis was solid, and you explored your parameters, and you
still see low polymorphism, then I would suggest it is real.
> 3) At which point can I start calling these "sites" as SNP loci? If I
> have 298 variant sites, can I describe this as 298 SNP loci?
It is a matter of opinion, but if the SNP calling model has called a
polymophic site, then it is a SNP. The gstacks model takes into account
all populations when it makes a call.
>
> Thank you!
>
> Eun
julian