Hi Every one,
I have two questions regarding my investigation of population genetic structure. Two questions all concern the filtering process of the raw data, first is regarding the minor allele frequency (MAF). The second question pertains to the Hardy-Weinberg equilibrium (HWE) filtering.
The first question pertains to the issue of filtering based on minor allele frequency (MAF). It is common for many studies to use a threshold of 0.05 or 0.01. However, when dealing with a dataset of 200 individuals, using a threshold of 0.05 would mean that SNPs present in only 10 individuals would be filtered out, and using a threshold of 0.01 would result in the removal of SNPs present in only one individual. In other words, if a SNP is only present in one or a few breeds, its overall MAF could be lower than 5%.
The second question pertains to the issue of filtering based on Hardy-Weinberg equilibrium (HWE). When using HWE filtering, the commonly recommended threshold is --hwe 0.001. However, if you have a dataset of 200 individuals that come from 10 different breeds, with 20 individuals per breed, it is important to consider the principles of Hardy-Weinberg equilibrium. HWE is valid for single populations and not for the entire set of 200 individuals simultaneously, as stratification can be a factor causing disequilibrium. This suggests that it may be more appropriate to test HWE within each breed separately to ensure accurate evaluation of equilibrium within individual populations. But I have seen in many papers that all individuals are filtered together, so what should I specifically do?
For my dataset consisting of 200 individuals from 10 different breeds, what thresholds are more appropriate for MAF and HWE?
Thanks
CHAO