Less polymorphic sites when populations are separate

125 views
Skip to first unread message

Ryan

unread,
Dec 10, 2021, 5:40:36 AM12/10/21
to Stacks
Dear all,

I am in need of some advice for comparing the populations outputs between 3 different species. I have done a reference aligned protocol with one of the species', Bt, reference genome. When I use populations to find all the snps for each of the 3 species in the same popmap with -p 3 I get that there are 26971 variant sites with 5473 polymorphic sites for Bh, 3217 for Br and 8534 for Bt (populations.bhbrbt.log). However, when I run populations with a popmap for the samples belonging to Bh, Br, and Bt separately, I get the following: 5357 variant/polymorphic site for Bh (populations.bh.log), 3028 variant/polymorphic sites for Br (populations.br.log) and 6642 variant/polymorphic sites for Bt (populations.bt.log).

Why do I get less polymorphic sites for each species when I run populations on each species separately? I would have expected more polymorphic sites seeing as the loci would be specific to that species group.

Any advice would be much appreciated.

Many thanks, 
Ryan
populations.bh.log
populations.bt.log
populations.br.log
populations.bhbrbt.log

Catchen, Julian

unread,
Dec 13, 2021, 5:16:00 PM12/13/21
to stacks...@googlegroups.com

Hi Ryan,

 

My first guess would be because of your use of the min-maf and max-obs-het filters. These are always calculated on a metapopulation basis. So, when all three populations are in the popmap, the minimum minor allele frequency filter is calculated based on all the individuals together. When you only have a single population in the map, the denominator of the calculation is smaller. One alternative flag to consider is the min-mac flag, which is very similar, but it is the minor allele count – that is, it is a fixed number of alleles, not a frequency, so it will not change regardless of the popmap contents. We tend to use min-mac to exclude alleles that are likely to be genotyping errors (i.e. you only find that particular allele in one or two individuals in the data set, so a --min-mac of three will find and remove such alleles).

 

Best,

 

julian

Ryan

unread,
Dec 21, 2021, 12:06:31 PM12/21/21
to Stacks
Hi Julian,

Thank you very much for your reply. 

I have tried using the min-mac filter, but I actually find I get more variant and polymorphic loci when using min-mac compared with a min-maf. So before, I was using min-maf of 0.016 (just under 3/184) and I getting 26971 variant loci (populations.maf.log). Whereas when I use a min-mac of 3 (and this is the only thing I have changed) I get 31802 variant loci and many more polymorphic loci too (populations.mac.log). I thought both of these filters at these values would result in the same amount of variant and polymorphic loci, because any alleles that are only seen once or twice would be removed in both cases, leaving only alleles seen at least 3 times or more. 

Have I misunderstood how the filters work? Why do they give such different results at these values?

Many thanks,
Ryan
populations.mac.log
populations.maf.log

Ryan

unread,
Dec 23, 2021, 6:22:06 AM12/23/21
to Stacks
Hi Julian,

So I believe I've figured it out, thank you for your advise in helping me get there.

When I do a min-maf of 3/368 I get the same output as min-mac of 3. I've realised this is because if I do a min-maf of 3/184 with 184 diploid samples, an allele actually needs to be present 6 times or more because each sample has two copies of the same locus. So I need to do min-maf of 3/368 for an allele to be present at least 3 times or more.

Many thanks,
Ryan

Reply all
Reply to author
Forward
0 new messages