Maximum Observed Heterozygosity

706 views
Skip to first unread message

Nathaniel Hubbs

unread,
May 25, 2020, 8:17:52 AM5/25/20
to Stacks
Hello all,

I am currently attempting to filter out observed heterozygosity measures above 0.85. However, when I add the max-obs-het flag I still see observed heterozygosity levels higher than 0.85 in the sumstats file. Any advice on how to effectively filter out these loci would be greatly appreciated!

Best,

Wade

Julian Catchen

unread,
May 26, 2020, 11:33:08 AM5/26/20
to stacks...@googlegroups.com, Nathaniel Hubbs
Hi Wade,

Can you tell us the version of the software, the type of data, the
commands you ran, and an example where you see loci with too high
observed heterozygosity not being filtered?

julian

Nathaniel Hubbs wrote on 5/25/20 7:17 AM:

Nathaniel Hubbs

unread,
May 26, 2020, 2:42:10 PM5/26/20
to Stacks
Julian,

I am working in STACKS version 2.2 with GBS data with a 75 bp single end read chemistry. My command to filter out paralogs based on max_obs_het is as follows: 

#!/bin/bash 


#SBATCH --cpus-per-task=12
#SBATCH --time=1-00:00:00
#SBATCH --mem=125G

module load stacks 
populations -P ./ -M ./popmaptnfinal.txt -p 8 -r 0.50 --min_maf 0.01 --max_obs_het 0.85 --vcf --write_single_snp --genepop --fstats --fst_correction bonferroni_win --hwe --structure -B ./blacklist.txt -t 28

I attached my sumstats file and the log file as well. Please let me know any issues you see or if there is an alternative way to effectively filter out paralogous loci.


Thank you in advance,

Wade
05262020populations.sumstats.csv
populations 3.log

Roseanna Gamlen-Greene

unread,
Jul 22, 2021, 12:47:17 AM7/22/21
to Stacks
Hi

Did you figure this out? I have the same issue - stacks isn't filtering out many loci that are above my threshold of 0.6 of max obs het. When I use R to plot observed heterozygosity, I find there are many loci above 0.6. 

Here are my pop commands:
populations -P $src/Denovo_May2021/stacks_gappedmin0.9.M$M/ -M $src/info/Pop_map_835samples_plates_D701_to_D709_april2021.tsv -O $src/Denovo_May2021/stacks_gappedmin0.9.M$M/populations_maxhet0.6_writesinglesnp/ --max-obs-het 0.6 --write-single-snp --fstats --hwe --vcf --plink --structure --genepop --treemix --log-fst-comp --verbose -t 48

Here's the beginning of the output from the pop file:
"populations parameters selected:
  Percent samples limit per population: 0
  Locus Population limit: 1
  Percent samples overall: 0
  Minor allele frequency cutoff: 0
  Maximum observed heterozygosity cutoff: 0.6
  Applying Fst correction: none.
  Pi/Fis kernel smoothing: off
  Fstats kernel smoothing: off
  Bootstrap resampling: off"


stacks 2.53. GBS data - 101 bp paired-end reads. 

Thanks,
Roseanna

Julian Catchen

unread,
Jul 22, 2021, 5:20:10 PM7/22/21
to stacks...@googlegroups.com, Roseanna Gamlen-Greene
Hi Roseanna,

The maximum observed heterozygosity (MOH) filter (as well as the minor
allele frequency filter) are applied to the data as a single population.
So, if you re-run populations with a popmap that specifies your data in
a single population, you should see that the MOH filter is working as
advertised.

Best,

julian

Roseanna Gamlen-Greene wrote on 7/21/21 11:47 PM:
Reply all
Reply to author
Forward
0 new messages