Hello,
I am trying to document some details of QC for a dataset, and I want to make sure I am correct in when I use > and < vs. using >= and <=.
Can you please help me find out whether the following Plink flags are using strict inequality or are using >= or <= ? (In other words, whether something exactly equal to the threshold specified will be kept or excluded by each flag.)
--maf
--max-maf
--mac
--max-mac
--geno
--mind
--hwe
I think --maf 0.01 might keep variants with MAF==0.01, based on the documentation stating that it "filters out all variants with allele frequency below the provided threshold", and I would guess --mac probably works similarly.
I'm not sure about the upper bound allele frequency flags, --max-mac/--max-maf.
For --geno 0.05 and --mind 0.05 I suspect they may keep samples/variants where the missingness is exactly 0.05, based on the documentation stating that "--geno filters out all variants with missing call rates exceeding the provided value (default 0.1) to be removed, while --mind does the same for samples."
For --hwe, the documentation states that it "filters out all variants which have Hardy-Weinberg equilibrium exact test p-value below..." so my guess would be that it keeps variants where the HWE p-value is exactly equal to the threshold.
Can you please help me confirm the expected behavior of each of these flags when a variant or person has a value exactly equal to the specified threshold?
Is there a general "rule of thumb" for everything in Plink regarding whether flags specifying a threshold apply strict or non-strict inequality, or will this vary by flag?
Thank you,
Kristen