unexpected behavior of -Populations- output to genepop format

169 views
Skip to first unread message

lp

unread,
Feb 16, 2015, 1:46:51 PM2/16/15
to stacks...@googlegroups.com
Julian et al,
Running populations, I have observed some differences between the output of SNP loci  compared with what I can see/expect from the web interface. It's close, but not quite right in a few cases.  Here is the populations call filtering for matching individuals, inclusion of all 4 pops, minimum of 5 reads, and MAF): 

populations -P ./stacks2/ -b 1 -M popmap -r .75 -p 4 -m 5 -a .03 -f p_value --genepop --structure -t 24 --fstats -W ./list
 
Below is the stacks web interface for this locus (locus ID 10) loaded from the same data/directory (60/60 ind. included across all 4 pops). 



Below is the genepop format genotypes output for locus ID 10 (3 snps): 

Stacks version 1.23; Genepop version 4.1.3; February 13, 2015
10_8,10_12,10_17
pop
Bj138, 0000 0202 0101
Bj142, 0000 0202 0103
Bj143, 0000 0202 0101
Bj146, 0000 0202 0101
Bj149, 0000 0202 0103
Bj150, 0000 0202 0103
Bj153, 0000 0202 0303
Bj154, 0000 0202 0103
Bj157, 0000 0204 0303
Bj158, 0000 0202 0103
Bj159, 0000 0202 0103
Bj160, 0000 0202 0101
Bj173, 0000 0202 0103
pop
IS913, 0000 0000 0103
Is903, 0000 0000 0103
Is904, 0000 0000 0103
Is905, 0000 0000 0101
Is906, 0000 0000 0101
Is907, 0000 0000 0103
Is910, 0000 0000 0303
Is911, 0000 0000 0103
Is914, 0000 0000 0103
Is916, 0000 0000 0303
Is918, 0000 0000 0101
Is921, 0000 0000 0101
Is922, 0000 0000 0103
Is923, 0000 0000 0103
Is926, 0000 0000 0103
Is929, 0000 0000 0103
Is931, 0000 0000 0303
pop
Juc18, 0303 0000 0303
Juc19, 0303 0000 0103
Juc24, 0303 0000 0303
Juc26, 0303 0000 0103
Juc27, 0303 0000 0101
Juc31, 0303 0000 0103
Juc35, 0303 0000 0103
Juc36, 0303 0000 0103
Juc39, 0103 0000 0103
Juc8, 0303 0000 0101
Mz3,         0303 0000 0101
Mz9,          0303 0000 0103
pop
Mel42, 0000 0000 0103
Mel44, 0000 0000 0303
Mel45, 0000 0000 0303
Mel47, 0000 0000 0303
Mel49, 0000 0000 0303
Mel50, 0000 0000 0103
Mel51, 0000 0000 0101
Mel52, 0000 0000 0303
Mel54, 0000 0000 0101
Mel55, 0000 0000 0101
Mel56, 0000 0000 0103
Mel57, 0000 0000 0103
Mel58, 0000 0000 0103
Mel59, 0000 0000 0103
Mel60, 0000 0000 0303
Mel61, 0000 0000 0303
Mel62, 0000 0000 0101
Mel65, 0000 0000 0303

1) Given the MAF filter of 0.03, I don't understand why  locus 10_8 (the first SNP at this stack) is called in the first place. Only two individuals have the alternative "A" allele (hets), which produces a MAF of 2/120=.016667, below the 0.03 filter.  Or is that filter only for calculating Fst values, or perhaps, Is the MAF filter applied within populations not calculated/applied across all populations?

2)Next, why would this [A/G] SNP be called in the Juc39 individual (Population ES), but not the Mel61 indvidual (Population Melaque) (both heterozygous A/G)?

3)Finally, though the other populations are fixed for the G allele at locus 10_8, they are scored as "0000". Why wouldn't they be called G/G  (e.g. "0303")? Similar non calls ("0000") for locus 10_12 in populations that are fixed...

10_17 appears to be output as expected-- The difference here is that this locus is polymorphic in all populations?

Any advice is much appreciated and I hope I have not missed something obvious  here (!).

Thanks,

LP




Julian Catchen

unread,
Feb 23, 2015, 10:59:36 PM2/23/15
to stacks...@googlegroups.com, louis....@gmail.com
Hi LP,

The previous minor allele frequency had a few inconsistencies which I
also came across recently. In the latest release, 1.26, I have changed
how the MAF filter functions. In the previous version, the MAF was
applied per population long after all the other filtering was done and
this led to cases where 0000 was being output in the genepop file -- the
SNP is homozygous in a population (but not overall) and was being
filtered only in that population, giving the 0s.

The new version of the MAF filter is applied just after the other
filters and it is applied across all populations together -- that is, if
a SNP is filtered from one or more populations it is now removed from
all populations.

The new MAF filter prunes the SNP objects out of memory so they don't
contribute to any of the analysis code downstream (the previous version
just marked SNPs as filtered and this was harder to handle properly
downstream).

Anyway, this is a long way to say, you should try the new version and
report back on how it looks. You don't need to rerun the pipeline, just
execute the new version of populations against your existing data set.
It would be helpful to get a few more people looking at my changes (it
was quite a large code change to make it happen).

Best,

julian

lp wrote:
> Julian et al,
> Running /populations, /I have observed some differences between the
> output of SNP loci compared with what I can see/expect from the web
> interface. It's close, but not quite right in a few cases. Here is the
> populations call filtering for matching individuals, inclusion of all 4
> pops, minimum of 5 reads, and MAF):
>
> ||
> populations -P ./stacks2/-b 1-M popmap -r .75-p 4-m 5-a .03-f p_value
> --genepop --structure -t 24--fstats -W ./list
> Below is the stacks web interface for this locus (locus ID 10) loaded
> from the same data/directory (60/60 ind. included across all 4 pops).
>
> <https://lh3.googleusercontent.com/-Wx7wJsPllNk/VOIzAiEmvpI/AAAAAAAACgg/2IKP903eQqA/s1600/Screen%2BShot%2B2015-02-13%2Bat%2B1.41.46%2BPM.png>
>
>
>
> <https://lh6.googleusercontent.com/-XFxOq7GRZjg/VOIx6UMSJ2I/AAAAAAAACgU/V9vGBhyt39s/s1600/Screen%2BShot%2B2015-02-13%2Bat%2B1.41.28%2BPM.png>

lp

unread,
Feb 24, 2015, 9:51:10 AM2/24/15
to stacks...@googlegroups.com, louis....@gmail.com, jcat...@illinois.edu
Thanks Julian. Will run populations on 1.26 and let you know. 

Edith

unread,
Mar 6, 2015, 2:26:57 PM3/6/15
to stacks...@googlegroups.com, louis....@gmail.com, jcat...@illinois.edu
Hi LP,

Have you run the MAF filter with the new version of Stacks (v1.27)? I'm wondering if the MAF is working for others and maybe there's something wrong with my command. My genepop output still includes loci that fall under the MAF filter.

Edith
Reply all
Reply to author
Forward
0 new messages